1
Supplemental Materials
Decision from Models: Generalizing Probability Information to Novel Tasks
by H. Zhang et al., 2014, Decision
http://dx.doi.org/10.1037/dec0000022
ORMAT Appendix A
Derivations for Eq. 5, Eq. 8, Eq. 10
Eq. 5 is derived from Eq. 1 and Eq. 4. Substituting Eq. 1 into Eq. 4:
1)
Taking logarithms on both sides:
2)
Solving for , we have Eq. 5.
Eq. 8 is derived from Eq. 4 and Eq. 6. Substituting Eq. 6 into Eq. 4:
3)
Taking logarithms on both sides:
4)
Dividing both sides by :
5)
Switching to the left side and to the right side, we have Eq. 8.
Eq. 10 is derived from Eq. 4 and Eq. 9. Substituting Eq. 9 into Eq. 4:
6)
Multiplying both sides by :
7)
Solving for , we have Eq. 10.
Appendix B
Maximum Likelihood Estimation of the Survival Function in the Training
In the training phase of the experiment, on each trial, participants received feedback specifying the outcome (reward or no reward; location of mine if a mine was encountered) on the path they chose. Suppose the length of path is . Participants observed either a success or they encountered a mine at a specific distance along the path.
If we assume the survival function is exponential (Eq. 1), then the likelihood of observing the i-th outcome is:
8)
If we assume the survival function is hyperbolic (Eq. 9), then the likelihood of observing the i-th outcome is:
9)
For each participant, we estimated the parameters that maximize the sum of the log likelihood of all the outcomes the participant observed in the training:
10)
Appendix C
Proof of the Invariance of under Different
Suppose that the participant’s utility function is a power function with power (Eq. 2). can be written as
11)
Substituting A11 into Eq. 5, the equivalent length predicted by the exponential model becomes:
12)
Similarly, for the Weibull model, substituting A12 into Eq. 8, the predicted equivalent length is:
13)
It is easy to see that in Eq. A12 and A13 scaling is confounded with scaling ; in Eq. A13 the estimated would not be influenced by the value of .
Appendix D
Stochastic Models of Choice
Given the same path lengths and rewards on two trials, human participants do not always make the same choice. This stochastic choice behavior poses no problem for the models we refer to as resampling models in the main text: stochastic variation in resampling leads naturally to stochastic choice behavior. For all other models, we model stochastic choice behavior as follows.
Denote the pair of path lotteries on a specific trial as and with rewards and path lengths . Let , the probability of survival on each path which differ for different survival functions. The expected utilities of the paths are and—if the participant chose the path with the higher expected utility, his choice—given the same paths and reward—would never vary. We instead assume a model of choice based on the expected utilities: the larger the difference in expected utilities, the greater the probability of choosing the path with high expected utilities.
Except for the resampling choice models (as we describe separately below), the probability of choosing on the trial is modeled as:
14)
where is a temperature parameter which affect the probability of choosing the path with higher expected utility, as , ; is a normalization term, reflecting the absolute distance between the value distributions of the two lotteries. Different choice models differ in their assumptions in the survival function.
Exponential Choice Model
The probability of success, , is assumed to be an exponential function of the path length (Eq. 1). The model has two free parameters: in Eq. 1 and in Eq. A14.
Weibull Choice Model
The survival function is Weibull (Eq. 6). The model has three free parameters: and in Eq. 6, and in Eq. A14.
Hyperbolic Choice Model
The survival function is hyperbolic (Eq. 9). The model has two free parameters: in Eq. 9 and in Eq. A14.
Learning-Based Non-Parametric Choice Model
The survival function is the result of a delta-rule learning process, which starts with
15)
and is updated after each trial in the training, where the participant observes the outcome of the path he chooses. Denote the survival function after trial as . If the participant survives the path of length , the survival function is updated for positive outcomes () up to :
16)
where is the weighting parameter for positive outcomes. That is, for path lengths that are greater than , the probability of survival is unchanged; for path lengths that are less than , the probability of survival is updated as a weighted average of the previous probability of survival and one.
If the participant runs into a mine at the distance , the survival function is updated for positive outcomes before and for negative outcomes () after :
17)
where is the same weighting parameter as in Eq. A16, is the weighting parameter for negative outcomes. That is, for path lengths that are greater than , the probability of survival is updated as a weighted average of the previous probability of survival and zero; for path lengths that are less than , the probability of survival is updated as a weighted average of the previous probability of survival and one.
Here is an example to illustrate how the delta-rule learning defined in Eq. A16 and Eq. A17 works. Suppose . The participant starts with . Suppose on the first trial, the path length is 10 (arbitrary unit) and the participant runs into a mine at the distance 8. The survival function should be updated using Eq. A17 (note that path length is irrelevant in Eq. A17). For , ; for , . Suppose on the second trial, the path length is 12 and it is a survival. Applying Eq. A16, we have: For , ; for , ; for , .
The resulting empirical survival function is a step function of steps, where is the number of trials in the training. In sum, the learning-based non-parametric choice model has three free parameters: , , and in Eq. A14.
Learning-Based Exponential Choice Model
The survival function of the learning-based exponential choice model comes from the same learning process as described in the learning-based non-parametric choice model but is smoothed by an exponential function as follows. We assume that the survival function is an exponential approximation to the step function, in which the hazard rate (Eq. 1) is estimated to minimize the summed square errors between the exponential function and the step function (i.e. itself is not a free parameter but depends on and ). The learning-based exponential choice model has three free parameters: and in Eq. A16 and Eq. A17, and in Eq. A14.
Resampling Choice Models
The utility function assumed in resampling choice models is the same as that in the survival-based choice models (Eq. 2). For a specific pair of lotteries and , we simulate a sampling process (as specified in the unbiased, optimistic and pessimistic resampling models) with sample size separately for each of the paths and compute the expected utility for each lottery. We assume that the participant would always choose the option of a higher expected utility. The stochasticity in their choices comes from the sampling process itself. We repeat the simulation 10,000 times and calculated the probability of being chosen, . The resampling choice models have one free parameter, the sample size .