Behavioral Consequences of Probabilistic Precision:
Experimental Evidence from National Security Professionals
Supporting Material

September 2017

Overview

This supplement contains the following material:

§A. Respondent demographics

Supplementary analysis of Survey Experiment 1: How decision makers interpret probabilities

§B. Response measure wordings
§C. Variation in probability assessments across scenario versions
§D. T-tests for Elite Sample B
§E. Multivariate analysis of scenario data

Supplementary analysis of Survey Experiment 2: How analysis estimate probabilities

§F. Question list for Survey Experiment 2
§G. Distributions of assessments across question types
§H. Additional rescoring of probability assessments
§I. Full results for Figure 4.
§J. Interactions between quantitative assessments and respondent attributes
§K. Replication of Survey Experiment 2 results using logarithmic scoring

For more information, please contact . This experiment was preregistered at Evidence in Governance and Politics and approved by Committees for the Protection of Human Subjects at Dartmouth College, Harvard University, and participating military institutions.

§A. Respondent demographics

The paper presents survey experiment results from three samples of respondents. National Security Officials took both Survey Experiment 1 and Survey Experiment 2 (in random order). Amazon Mechanical Turk (AMT) respondents were randomly assigned to take only one of the two surveys. We report demographics for both AMT populations separately. Elite Sample B responded to an abridged version of Survey Experiment 1, containing only the neutral hostage rescue scenario. Due to concerns for protecting respondent anonymity, we were able to ask fewer demographic questions of elite sample respondents than of AMT respondents.

The Berlin adaptive numeracy test (Cokely et al. 2012) divides respondents into four categories based on their responses to 2-3 word problems (the choice of problems depends on a respondent’s answers). Previous research with nationally-representative samples generally divides responses into four, equally-sized categories, in which higher scores indicate higher numeracy. We report the percentage of respondents falling into each of these categories below.

Table S1. Respondent demographics

National Security Officials (N=208) / AMT Respondents, Survey 1
(N=1,458) / AMT Respondents, Survey 2 (N=1,561) / Elite Sample B (N=199)
% Female / 15% / 52% / 52% / -
% White / 82% / 80% / 81% / -
% College degree / 100% / 61% / 61% / 100%
% U.S. citizen / 87% / 99% / 99% / 86%
% English as first language / 85% / 98% / 98% / 86%
% Active-duty military / 75% / 0.8% / 0.4% / 78%
Age / - / 34.7 (11.1) / 34.6 (11.3) / -
% Numeracy-1 / 29% / 40% / 42% / -
% Numeracy-2 / 29% / 21% / 19% / -
% Numeracy-3 / 11% / 17% / 18% / -
% Numeracy-4 / 31% / 22% / 21% / -

§B. Response measure wordings

Following each vignette, respondents were presented with the following questions.

On the following scale of 1-7, please rate your level of support for [approving the hostage rescue mission immediately / approving the proposed drone strike immediately / backing Ghamay Jan]

On the following scale of 1-7, please rate your level of support for [delaying decision 1-2 weeks to gather additional information about the compound / waiting another 1-2 days to gather more information about who might be meeting in this location / waiting another 6 months to gather more information about whether backing Ghamay Jan is a better option than backing local officials]

On the following scale of 1-7, please rate your level of confidence in making these assessments

§C. Variation in probability assessments across scenario versions

As described in the main text, we presented each scenario in one of three versions. These versions – which we label “pessimistic,” “neutral,” and “optimistic,” below – involved different probability assessments intended to shape respondents’ views about supporting risky actions. As shown in the main text (Table 1), these variations systematically influenced respondents’ views. Table S2 shows how we varied probability assessments across scenario versions.

Table S2. Variations in Probability Assessments Across Scenario Versions

Assessment / Probability Assessments Across Scenario Versions
Pessimistic / Neutral / Optimistic
Hostage Rescue Scenario
Hostages at compound / Even Chance
(50 percent) / Likely
(65 percent) / Very Likely
(80 percent)
Special forces can retrieve hostages / Likely
(65 percent) / Very Likely
(80 percent) / Almost Certain
(95 percent)
Soldiers wounded on mission / Likely
(65 percent) / Even Chance
(50 percent) / Unlikely
(35 percent)
Collateral damage / Even Chance
(50 percent) / Unlikely
(35 percent) / Very Unlikely
(20 percent)
Hostages killed if mission fails / Almost Certain
(95 percent) / Almost Certain
(95 percent) / Almost Certain
(95 percent)
Drone Strike Scenario
House contains Qaeda leaders / Even Chance
(50 percent) / Likely
(65 percent) / Very Likely
(80 percent)
Drone strike kills occupants / Very Likely
(80 percent) / Very likely
(80 percent) / Very Likely
(80 percent)
House contains women/children / Likely
(65 percent) / Unlikely
(35 percent) / Remote Chance
(5 percent)
Strike compromises surveillance / Almost Certain
(95 percent) / Almost Certain
(95 percent) / Almost Certain
(95 percent)
Local Security Forces Scenario
Jan can mobilize forces / Even Chance
(50 percent) / Likely
(65 percent) / Very Likely
(80 percent)
Jan’s forces can secure border / Unlikely
(35 percent) / Likely
(65 percent) / Almost Certain
(95 percent)
Jan previously assisted Taliban / Very likely
(80 percent) / Even Chance
(50 percent) / Very unlikely
(20 percent)
Jan will secure illegal smuggling / Almost Certain
(95 percent) / Very Likely
(80 percent) / Likely
(65 percent)
U.S. can retain local leaders’ support / Very Unlikely
(20 percent) / Unlikely
(35 percent) / Even Chance
(50 percent)

§D. T-tests for Elite Sample B

Table S3 presents two-way t-tests analyzing responses to the neutral hostage scenario for Elite Sample B. These results support the analyses presented in Section 3: quantifying probability assessments reduces support for risky action, increases support for gathering additional information, and had no significant impact on respondents’ confidence levels.

Table S3. Survey results from Elite Sample B

Support for hostage rescue
(1-7 scale) / Support for delaying decision (1-7 scale) / Confidence in assessment
(1-7 scale)
Qualitative assessments / 5.33 (1.56) / 3.14 (1.97) / 5.18 (1.19)
Quantitative assessments / 4.53 (1.86) / 4.11 (2.07) / 5.13 (1.26)
p=0.001 / p=0.001 / p=0.793

§E. Multivariate analysis of scenario data

The paper’s main text analyzed responses to Survey Experiment 1 using bivariate regressions. Tables S4a/b replicates those results in multivariate form. All models are ordinary least squares with fixed effects for respondent and standard errors clustered by respondent. Conducting these regressions using ordered probit/logit returns similar results.

1

Table S4a. Responses to Scenarios – National Security Officials

Model 1:
Predicting support for risky action / Model 2:
Predicting support for risky action, with interaction terms / Model 3:
Predicting support for delaying action
Quantitative assessment / -0.142 (.14) / -0.346 (.25) / 0.316 (.15)*
Optimistic scenario / 0.783 (.17)*** / 0.677 (.23)** / -0.656 (.18)***
Pessimistic scenario / -0.982 (.16)*** / -1.146 (.22)*** / 0.433 (.16)**
Hostage scenario / 1.148 (.16)*** / 1.152 (.16)*** / -0.227 (.18)
Drone scenario / -0.617 (.14)*** / -0.619 (.14)*** / 1.120 (.16)***
Numeracy / -0.157 (.06)** / -0.155 (.06)** / 0.102 (.06)
Female / -0.257 (.21) / -0.261 (.21) / 0.129 (.22)
Military officer / 0.185 (.17) / 0.181 (.17) / -0.187 (.16)
U.S. citizen / 0.096 (.35) / 0.078 (.34) / -0.132 (.36)
English as native lang. / 0.143 (.34) / 0.145 (.34) / -0.102 (.37)
Optimistic scenario
x Quantitative assessment / 0.232 (.33)
Pessimistic scenario
x Quantitative assessment / 0.364 (.31)
Constant / 3.739 (.29)*** / 3.841 (.31)*** / 4.575 (.30)***
R2 / 0.298 / 0.300 / 0.165

Ordinary least squares regressions predicting 7-point response measures with fixed effects for respondent and standard errors clustered by respondent.* p<0.05 ** p<0.01 *** p<0.001. All models have 624 observations.

Table S4b. Responses to Scenarios – AMT Respondents

Model 1:
Predicting support for risky action / Model 2:
Predicting support for risky action, with interaction terms / Model 3:
Predicting support for delaying action
Quantitative assessment / -0.429 (.05)*** / -0.759 (.09)*** / 0.378 (.06)***
Optimistic scenario / 0.791 (.06)*** / 0.479 (.08)*** / -0.518 (.07)***
Pessimistic scenario / -0.813 (.06)*** / -0.972 (.09)*** / 0.222 (.07)***
Hostage scenario / 1.501 (.06)*** / 1.505 (.06)*** / -0.327 (.06)***
Drone scenario / 0.329 (.06)*** / 0.327 (.06)*** / 0.450 (.06)***
Numeracy / -0.117 (.02)*** / -0.116 (.02)*** / 0.056 (.03)
Female / 0.002 (.05) / -0.006 (.05) / 0.291 (.06)***
Military officer / 0.373 (.25) / 0.343 (.24) / -0.566 (.41)
U.S. citizen / 0.076 (.27) / 0.067 (.27) / -0.325 (.26)
English as native lang. / -0.250 (.24) / -0.242 (.24) / -0.358 (.25)
Education / 0.069 (.04) / 0.070 (.04) / 0.007 (.05)
Optimistic scenario
x Quantitative assessment / 0.650 (.12)***
Pessimistic scenario
x Quantitative assessment / 0.325 (.12)**
Constant / 3.650 (.35)*** / 3.812 (.35)*** / 5.256(.30)***
R2 / 0.260 / 0.265 / 0.077

Ordinary least squares regressions predicting 7-point response measures with fixed effects for respondent and standard errors clustered by respondent. * p<0.05 ** p<0.01 *** p<0.001. All models have 4,386 observations.

§F. Question list for Survey Experiment 2

Table S5 presents the question list for survey experiment 2. Thirty of those questions had known answers as of the date these surveys were administered (August 5-7, 2015). Five questions, labeled “unknown,” had answers that were unknowable at the time. Five questions, labeled “forecast,” involve predictions, were evaluated on February 6, 2016.

As described in the paper, we varied question types in this way so that we could assess whether or not any biases we identified in our analysis were confined to a particular kind of estimate. As shown in part F of this supplement, below, we found that respondents who provided numeric estimates employed noticeably greater certitude than respondents who used words of estimative probability across all three question types.

Table S5. Question List for Survey Experiment 2
All assessments recorded between August 5-7, 2015; outcomes are coded relative to that date.

Question Text / Outcome
Q1 / In your opinion, what are the chances thatAfghanistan's literacy rate is currently above 50 percent? / 0
Q2 / In your opinion, what are the chances thatSaudi Arabia currently exports more oil than all other countries in the world combined? / 0
Q3 / In your opinion, what are the chances thatthe United States currently has a longer life expectancy than Jamaica? / 1
Q4 / In your opinion, what are the chances thatthe United States currently operates a military base in Ethiopia? / 0
Q5 / In your opinion, what are the chances thatthe United States has an active territorial claim in Antarctica? / 0
Q6 / In your opinion, what are the chances thatFrancecurrently has more soldiers stationed in Afghanistan than any NATO member besides the United States? / 0
Q7 / In your opinion, what are the chances thatmore than 20 countries currently operate nuclear power plants? / 1
Q8 / In your opinion, what are the chances thatJapan is currently a member of the International Whaling Commission? / 1
Q9 / In your opinion, what are the chances thatRussia is a member of the Nuclear Nonproliferation Treaty? / 1
Q10 / In your opinion, what are the chances thatthe United States currently has free trade agreements in place with fewer than 30 countries? / 1
Q11 / In your opinion, what are the chances thatfewer than 80 countries currently recognize Taiwan's independence from China? / 1
Q12 / In your opinion, what are the chances thatISIS draws more foreign fighters from Egypt than from any other country outside of Iraq and Syria? / 0
Q13 / In your opinion, what are the chances thatRussia's economy grew in 2014? / 0
Q14 / In your opinion, what are the chances thatHaiti has the lowest per capita income of any Latin American country? / 1
Q15 / In your opinion, what are the chances thatthere are currently more Muslims in the world than there are Roman Catholics? / 1
Q16 / In your opinion, what are the chances thatSweden is a member of NATO? / 0
Q17 / In your opinion, what are the chances thatTokyo's stock exchange is the second largest stock exchange in the world? / 1
Q18 / In your opinion, what are the chances thatthe U.S. State Department currently lists Iran as a state sponsor of terrorism? / 1
Q19 / In your opinion, what are the chances thatthe Arabic media organization al-Jazeera currently operates bureaus in more countries than does CNN? / 1
Q20 / In your opinion, what are the chances thatthe United States currently possesses more than 2,000 nuclear warheads? / 1
Q21 / In your opinion, what are the chances thatthe economy of North Korea is larger than the economy of New Hampshire? / 0
Q22 / In your opinion, what are the chances thatGerman President Angela Merkel is currently the longest-serving head of government in Western Europe? / 1
Q23 / In your opinion, what are the chances thatthere are currently more refugees living in Lebanon than in any other country in the world? / 0
Q24 / In your opinion, what are the chances thatthe United States currently conducts more trade with Mexico than with the European Union? / 1
Q25 / In your opinion, what are the chances thatthe largest U.S. Embassy is currently located in Beijing? / 0
Q26 / In your opinion, what are the chances thatthe U.S. defense budget is more than five times as large as China's defense budget? / 0
Q27 / In your opinion, what are the chances thatthe United States currently operates more aircraft carriers than all other countries in the world combined? / 0
Q28 / In your opinion, what are the chances thatIsrael receives more foreign aid than any other country in the world? / 0
Q29 / In your opinion, what are the chances thatmore than 3 million people live within the borders of the Palestinian territories of the West Bank and Gaza? / 1
Q30 / In your opinion, what are the chances thatmore than 5,000 people died as a result of the Ebola outbreak in West Africa in 2014? / 1
Q31 / In your opinion, what are the chances thatUkrainian rebels knew that Malaysia Airlines Flight 17 was a civilian aircraft before they shot it down? / Unknown
Q32 / In your opinion, what are the chances thatIran's Supreme Leader, Ayatollah Khamenei, currently intends to develop a nuclear weapon? / Unknown
Q33 / In your opinion, what are the chances thatthe United States would have invaded Iraq if the September 11 terrorist attacks had not occurred? / Unknown
Q34 / In your opinion, what are the chances thatwithout waterboarding captured terrorists, there would have been at least one more major terrorist attack (>1,000 casualties) on U.S. soil since 2001? / Unknown
Q35 / In your opinion, what are the chances thathigh-ranking members of Pakistan's intelligence services knew that Osama bin Laden was hiding in Abbottabad? / Unknown
Q36 / In your opinion, what are the chances thatwithin the next six months, Syrian President Bashar al-Assad will be killed or no longer living in Syria? / Forecast: 0
Q37 / In your opinion, what are the chances thatwithin the next six months, the Iraqi Security Forces will reclaim control of either Ramadi or Mosul (or both) from ISIS? / Forecast: 1
Q38 / In your opinion, what are the chances thatthere will be a new Pope within the next six months? / Forecast: 0
Q39 / In your opinion, what are the chances thatmore than 50,000 U.S. citizens will travel to Cuba within the next six months? / Forecast: 1
Q40 / In your opinion, what are the chances thatmore than 10 U.S. soldierswill be killed fighting ISIS within the next six months? / Forecast: 0

§G. Distributions of assessments across question types

Table 2 in the paper’s main text demonstrated that respondents made probability assessments with noticeably greater certitude when using quantitative as opposed to qualitative expressions. Those data only included questions with knowable answers (i.e., the questions for which we scored respondents’ performance).

Tables S6a and S6b show how the same pattern holds when we examine probability assessments that respondents made in response to forecasts or to questions with answers that were unknowable at the time. Statistical significance estimated via two-way t-tests.

Table S6a. Proportion of estimates registered in different segments of the number line (National Security Officials)

Remote
(0.00-0.14) / Very Unlikely
(0.15-0.28) / Unlikely
(0.29-0.42) / Even Chance
(0.43-0.56) / Likely
(0.57-0.71) / Very Likely
(0.72-0.85) / Almost Certain
(0.86-1.00)
Questions with knowable answers
Qualitative assessments / 4.51 / 7.10 / 16.67 / 12.87 / 25.28 / 20.34 / 13.24
Quantitative assessments / 15.73 / 8.73 / 8.60 / 15.60 / 8.87 / 16.33 / 26.13
p=0.00 / p=0.02 / p=0.00 / p=0.00 / p=0.00 / p=0.00 / p=0.00
Questions about unknown states of the world
Qualitative assessments / 7.59 / 11.67 / 19.26 / 14.44 / 19.63 / 17.96 / 9.44
Quantitative assessments / 20.20 / 11.40 / 8.00 / 13.80 / 9.00 / 14.20 / 23.40
p=0.00 / p=0.89 / p=0.00 / p=0.77 / p=0.10 / p=0.00 / p=0.00
Forecasts
Qualitative assessments / 12.04 / 17.41 / 26.67 / 15.56 / 16.30 / 8.89 / 3.15
Quantitative assessments / 32.40 / 16.00 / 11.60 / 15.60 / 6.20 / 9.40 / 8.80
p=0.00 / p=0.54 / p=0.00 / p=0.98 / p=0.78 / p=0.00 / p=0.00

Table S6b. Proportion of estimates registered in different segments of the number line (AMT Respondents)

Remote
(0.00-0.14) / Very Unlikely
(0.15-0.28) / Unlikely
(0.29-0.42) / Even Chance
(0.43-0.56) / Likely
(0.57-0.71) / Very Likely
(0.72-0.85) / Almost Certain
(0.86-1.00)
Questions with knowable answers
Qualitative assessments / 1.65 / 4.94 / 15.66 / 19.75 / 29.13 / 19.30 / 9.57
Quantitative assessments / 10.90 / 9.98 / 10.64 / 18.12 / 13.66 / 16.12 / 20.58
p=0.00 / p=0.00 / p=0.00 / p=0.00 / p=0.00 / p=0.00 / p=0.00
Questions about unknown states of the world
Qualitative assessments / 4.61 / 8.82 / 15.34 / 18.73 / 24.89 / 18.70 / 8.92
Quantitative assessments / 18.49 / 10.96 / 9.49 / 14.78 / 11.35 / 14.21 / 20.71
p=0.00 / p=0.00 / p=0.00 / p=0.00 / p=0.00 / p=0.00 / p=0.00
Forecasts
Qualitative assessments / 6.73 / 13.81 / 24.84 / 21.48 / 16.97 / 10.19 / 5.99
Quantitative assessments / 29.24 / 14.78 / 11.45 / 16.35 / 8.30 / 8.25 / 11.63
p=0.00 / p=0.22 / p=0.00 / p=0.00 / p=0.00 / p=0.00 / p=0.00

§H. Additional rescoring of probability assessments

In the paper’s main text, we develop a method for scoring the accuracy of qualitative and quantitative probability assessments in equivalent terms. First, we assign each quantitative estimate to a segment of the number line corresponding to each “word of estimative probability.” Then, for every question in the data set, we calculated the mean of numeric estimates falling within each of these segments. For the purposes of this analysis we call those means interpolated probabilities. We then replace all numeric estimates with their corresponding interpolated probabilities as well. Otherwise, numeric estimates would demonstrate greater possible variance, and we would not be scoring qualitative/quantitative assessments on a level playing field.

In the main text of the paper, we present results of scoring interpolated probabilities using Brier Scores. But this is only one method of scoring probability estimates. In Table S7, we show how our results are robust to the logarithmic scoring rule, and also to different methods of interpolation. The columns of Table S8 represent the results of scoring estimates according to:

  • Interpolated probabilities, Brier scoring
  • Interpolated probabilities, Logarithmic scoring
  • Words/numbers rounded to the midpoint of each segment, Brier scoring
  • Qualitative assessments interpreted according to “words of estimative probability” definitions from Mosteller and Youtz (1990); quantitative estimates rounded to the nearest such definition
  • Probabilities interpolated according to the segments on the 2015 Director of National Intelligence definitions of “words of estimative probability” (see Figure 1 in the main text) instead of the lexicon we used based on National Intelligence Estimates[1]

For each approach, and for both our National Security Officials and AMT samples, we present mean respondent scores for quantitative assessments, mean respondent scores for qualitative assessments, the proportional difference between these means, and the statistical significance of this difference according to a two-way t-test. Thus we restrict our sample size to one observation per respondent, based on respondents’ mean scores rather than scoring each assessment as an independent observation. Note that lower scores indicate better assessments under Brier scoring, whereas higher scores indicate better assessments under logarithmic scoring.

This analysis shows that, regardless of interpolation method, scoring rule, and sample, quantitative assessments are less accurate than qualitative assessments throughout our data.

1

Table S7. Evaluating qualitative/quantitative assessments using different operationalizations

Interp. Probabilities, Brier Scoring
(main result) / Interp. Probabilities,
Logarithmic Scoring / Probs. Rounded to WEP midpoints, Brier Scoring / Probs. Rounded to nearest WEP definition via Mosteller/Youtz / Interp. Probs. Using 2015 DNI Guidelines, Brier Scoring
National Security Officials (208 respondents)
Qualitative assessments / 0.230 (.03) / -0.679 (.10) / 0.231 (.03) / 0.243 (.04) / 0.243 (.04)
Quantitative assessments / 0.265 (.04) / -0.820 (.15) / 0.261 (.04) / 0.259 (.04) / 0.264 (.04)
Proportion diff. / 13% / 17% / 11% / 6% / 8%
p0.001 / p0.001 / p0.001 / p=0.006 / p0.001
AMT Respondents(1,561 respondents)
Qualitative assessments / 0.276 (.04) / -0.790 (.11) / 0.273 (.04) / 0.296 (.05) / 0.293 (.05)
Quantitative assessments / 0.310 (.06) / -0.934 (.19) / 0.305 (.05) / 0.307 (.05) / 0.309 (.06)
11% / 15% / 10% / 4% / 5%
p0.001 / p0.001 / p0.001 / p0.001 / p0.001

Table S7 presents mean scores (with standard deviations) across different treatment conditions, evaluated according to different scoring rules. Statistical significance reflects two-way t-test for difference in means.