Supplemental Materials
Questioning the End Effect: Endings Are Not Inherently Over-Weighted in Retrospective Evaluations of Experiences
by S. Tully & T. Meyvis, 2016, JEP: General
Web Appendix
Contents
Sample size information.
Pretest for positive sound stimuli (Study 3).
Supplemental Studies.
Supplemental Study 1: A Better Average versus a Better End
Supplemental Study 2: Single Versus Repeated Negative Experiences
Supplemental Study 3: Single versus Repeated Positive Experiences
Sample size information.
Study 1. An examination of past research using similar stimuli (annoying sounds that increased or decreased in intensity; i.e., Ariely and Zauberman, 2000; Schreiber and Kahneman, 2000) found that samples typically ranged from 20-54 participants. A calculation of effect sizes where possible demonstrated that changes in intensity resulted in effects with ƞp2 between .46 and .80. Although this would be considered a large effect size, as a conservative test, we selected a sample of 300 participants (150 people per cell) to ensure 99% power to detect a medium effect size (not including additional power achieved through the use of the covariate). We posted the study online and received 303 responses.
Study 2. Participants were undergraduate students who participated for course credit. The study was run over the course of two full semesters, utilizing all participants available to the researchers during that time, resulting in 349 student participants.
Study 3. The pre-test of the song clips (as well as the researchers’ prior experience in other projects) indicated that there is substantially greater variability in the enjoyment of music than in irritation with noise. We therefore opted to double the sample size that we had chosen in Study 1 for a total of 900 participants (300 per cell). We posted the study online and received 912 responses.
Study 4. In this study, we aimed to test the interaction of sound number (within-subjects) and sound order (between-subjects). Assuming a moderate correlation between ratings of the two stimuli in the repeated measures design (r = .4) and adjusting for the additional power expected through the use of the covariate (as seen in previous studies), 200 participants provided 90% power to detect a small effect size interaction. We posted the study online and received 204 responses.
Study 5. Participants were undergraduate students who participated for course credit. The study was run for a full semester, utilizing all participants available to the researchers, resulting in 303 student participants.
Study 6. This study was run in collaboration with the obstacle course racing company. The company emailed all race participants (approximately 7,000) with a request to complete the survey (without compensation). The response rate was slightly over 10%, yielding a total of 750 participants.
Study 7. Participants were undergraduate students who participated for course credit. The study was run for a full semester, utilizing all participants available to the researchers, resulting in 238 student participants.
Pretest for positive sound stimuli (Study 3).
Study 3 used three different music clips: one music clip consisting of four enjoyable pieces of instrumental music (used in the control condition) and two music clips consisting of those same four pieces and one additional, less enjoyable piece of instrumental music (inserted either in the middle or at the end). To select these music fragments, we first pretested a wide range of instrumental music fragments using a sample of 121 participants drawn from the same population as used for the main study (Mechanical Turk). Each participant listened to a selection of ten 30-second clips of instrumental music (out of a total set of 30 clips) and rated each clip on a 9-point scale. Based on this pretest, we selected four clips that were enjoyed by most participants, namely 30-second fragments from “Herd Reunion” (from the Ice Age: Continental Drift Soundtrack, M = 6.84, SD = 1.91), “Heart Song” (performed by Gosha Mataradze, M = 6.29, SD = 2.09), Bach’s “Goldberg Variations” (M = 6.38, 1.55), and Mozart’s “Rondo Alla Turca” (M = 6.05, SD = 2.03). We also selected one sound clip that was significantly less enjoyable than each of the four other clips: “Reanimator” (performed by Amon Tobin, M = 4.71, SD = 2.12), all t’s(79) > 2.92, p’s < .002. To further ensure that this last clip was less enjoyable than the others, we increased its repetitiveness by expanding it to 45 seconds and also applied a minor change in pitch.
Supplemental Studies.
Supplemental Study 1: A Better Average versus a Better End
(Conceptual Replication of Study 2).
This study is a conceptual replication of study 2 using a different set of aversive sound profiles. As in study 2, the goal of this study was to examine whether the positive effect of extending an aversive experience with a less aversive (but still negative) ending occurs because it improves the end of the experience or because it changes the range or average intensity of the experience. In this study, we exposed participants to one of three sound clips of an irritating noise: (1) a clip with a less intense (and thus better) middle section (Better Middle), (2) a clip with a less intense ending (Better End), or (3) a clip with a less intense middle section and an additional less intense ending (Added End). The Better Middle and Better End clips had approximately the same average intensity, but differed in the timing of the softer section. The Added End clip consisted of the Better Middle clip with an additional, less intense extension of the noise.
Thus, the Better Middle and Better End clips differed in the aversiveness of the ending, but not in the average intensity of the experience, whereas the Added End clip differed from both other clips in the average intensity of the experience. If endings are over-weighted in evaluations, then the Better End and Added End experiences should both be perceived as less aversive than the Better Middle experience. However, if adding a less aversive ending improves evaluations because it reduces the average intensity (or range of intensity), then the Added End experience should be perceived as less aversive than both the Better Middle and Better End experiences, and there should be no difference in perceived aversiveness between the Better Middle and Better End conditions.
Method
Two hundred and sixty undergraduate students participated in the study for either partial course credit or monetary compensation.
Participants were seated at a desktop computer and asked to wear headphones, the volume of which was fixed and approximately equal across computers. All participants first listened to a short drill sound, and rated their irritation with the sound on a 101-point sliding scale (0 = not at all irritating, 100 = very irritating). As in Study 1, this measure was included to be used as a covariate in the analyses and thus reduce error variance due to individual differences in aversion to annoying sounds. Next, participants completed a short filler task before continuing with the main study.
Participants were then asked to listen to the sound of a vacuum cleaner. They listened to one of three sound profiles, depending on condition. All three sound profiles consisted of a vacuum noise that fluctuated in intensity. The first 50 seconds of all of the clips were identical and oscillated in intensity (relatively high to relatively low to relatively high). Then the intensity differed by condition. In the Better End condition, it remained at a relatively high intensity until it tapered off to a lower intensity where it remained for the last 30 seconds. In the Better Middle condition, the 30-second low-intensity segment followed the initial oscillation (and tapering period) before increasing to a higher intensity for the remainder of the clip. In the Added End condition, the Better Middle clip was extended by an additional 30-second low-intensity segment which followed a short tapering period. Thus, the sound clips in the Better Middle and Better End conditions differed in ending, but not in approximate average intensity[1], whereas the sound clip in the Added End condition differed in average intensity from the clips in both other conditions. See Supplemental Figure 1 for a visual depiction of the sound profiles.
Supplemental Figure 1. Visual depiction of sound profiles used in supplemental study 1. The height of the waveform represents the momentary intensity of the sound as a percentage of the highest intensity in the sound clip. Time is represented on the horizontal axis in seconds.
Better End:
Better Middle:
Added End:
After participants listened to the clip, they rated the extent to which they found the experience of listening to the sound annoying (9-point scale: 1 = mildly annoying, 9 = extremely annoying), unpleasant (9-point scale: 1 = mildly unpleasant, 9 = very unpleasant), or irritating (measured on the same scale as the covariate: a 101-point slider scale anchored by: 0 = mildly irritating, 100 = extremely irritating).
After the primary dependent measures were collected, participants were asked to again listen to the drill sound that they listened to at the start of the study, and then indicated whether this experience was more or less irritating than listening to the vacuum sound (9-point scale: 1 = much less irritating, 9 = much more irritating). Participants then rated the volume of the vacuum sound (9-point scale: 1 = very quiet, 9 = very loud). Next, participants indicated how much money, out of $10, they would give back to avoid repeating the experience, and how long (in seconds) they believed the experience lasted. These four additional measures were included to test whether, if the end effect would again not obtain on scale measures of the subjective experience, it might instead manifest on alternative measures: a relative preference measure (which avoids scaling effects), an evaluation of the objective experience (volume), valuation, or a downstream effect (on time perception).
To verify that participants had noted the volume at the end of the clip, they were asked to indicate how the end of the experience compared to the rest of the experience (by selecting one of three options: the end was quieter, the end was about the same, the end was louder).
Finally, participants provided demographic information and completed an Instructional Manipulation Check (Oppenheimer, Meyvis, & Davidenko, 2009), which consisted of a paragraph of text explaining the importance of reading instructions and asking participants to choose “none of the above” from a marital status dropdown list.
Results
Thirty-five people failed the Instructional Manipulation Check, leaving a sample of 224 participants (MAge = 20.2, SD = 2.17; 44.2% male).
Manipulation check. Participants were more likely to indicate that the end was quieter than the rest of the sound clip in the Better End condition (P = 60.1%) than in the Better Middle condition (P = 31.5%), χ2 (1) = 12.82, p < .001, Cohen’s d = .61, indicating that the manipulation of the ending was successful. Participants in the Added End condition were also more likely to indicate that the end was quieter (P = 45.3%) than were participants in the Better Middle condition, but this effect was only marginally significant, χ2 (1) = 2.95, p = .086, Cohen’s d = .29. It is possible that because this clip was longer, the perception of the end extended beyond the final low-intensity segment, thus reducing the perceived difference. In addition, it is also possible that, because the Added End had a lower average intensity, the intensity of the end of this clip was not as different from the average as it was in the Better End condition.
Perceived aversiveness. The measures of annoyance, unpleasantness, and irritation were standardized and combined to form an aversiveness index (α = .93). As in the studies in the main paper, we analyzed this index while controlling for the aversiveness covariate (the rating of the drill sound at the start of the study) to increase the power of the tests and to control for any variation in volume across computers. The covariate was a significant predictor of aversiveness ratings, F(1, 220) = 75.38, p < .001, ηp2 = .255. The overall effect of condition did not reach significance, F(1, 220) = 2.25, p = .108, ηp2 = .020, but planned contrasts support our hypotheses. First, we tested the end effect by comparing the Better Middle and Better End conditions, which differed in ending, but not in average intensity. This planned contrast showed that the experience was not perceived as less aversive in the Better Middle condition (M = 0.10, SD = 0.81) than in the Better End condition (M = 0.06, SD = 0.82), F < 1, 95% CI [-0.31, 0.22], ηp2 < .001.[2] Thus, we did not find evidence of an end effect. Next, we tested whether adding a better end (rather than moving the better part to the end) changes the perceived aversiveness of the experience, by comparing the Added End condition to the other two conditions, both of which had a higher average intensity. A planned contrast confirmed that the experience was perceived as less aversive in the Added End condition (M = -0.16, SD = 0.81) than in the other two conditions, F(1, 220) = 4.43, p = .036, 95% CI [-0.94, -0.03], ηp2 = .020. Thus, while we again did not replicate the end effect, we did replicate the prior finding that extending a negative experience with a less aversive ending reduces the overall aversiveness of the experience (in spite of adding negative utility).
Supplemental Figure 2. Perceived Aversiveness Ratings by Condition (Supplemental Study 1)
Note: Error bars denote standard errors.
Other measures. The relative irritation measure (asking participants to rate their irritation from the vacuum noise relative to the drill noise) and the perceived volume question were not affected by the manipulations: neither the overall effect of condition, nor the planned contrasts were reliable (all F’s < 1.88, NS). However, the pattern of results for the question asking participants their willingness to pay to avoid repeating the experience replicated that for the aversiveness index. There was a marginally significant effect of condition, F(2, 220) = 2.63, p = .075, ηp2 = .012. There was no difference in willingness to pay between the Better End condition (M = $1.56, SD = $2.52) and the Better Middle condition (M = $1.60, SD = $3.37), F < 1, NS, 95% CI [-0.85, 0.86], ηp2 < .001. However, willingness to pay was significantly lower in the Added End condition (M = $0.78, SD = $1.92) than in the other two conditions, F(1, 220) = 5.25, p = .023, 95% CI [-3.20, -0.24], ηp2 = .023.
Finally, there was an overall effect of condition on estimates of duration, F(2, 220) = 9.94, p < .001, ηp2 = .083. Participants in the Added End condition provided higher estimates of clip duration (M = 155 secs, SD = 75.21) than those in the Better End condition (M = 117 secs, SD = 75.24) or the Better Middle condition (M = 101.09, SD = 75.00), F(1, 220) = 18.56, p < .001, 95% CI [49.87, 133.98], ηp2 = .078, which was consistent with the actual longer duration of the clip in that condition. Estimated duration did not differ between the latter two conditions, F(1, 220) = 1.63, NS, 95% CI [-8.58, 40.03], ηp2 = .007.
Discussion
Moving the less aversive part of an irritating noise to the end versus the middle did not affect the perceived aversiveness of the experience, casting further doubt on the notion that endings are inherently over-weighted in evaluations of experiences. However, extending the irritating noise with an additional, less aversive part did lead participants to perceive the overall experience as less aversive. Thus, this study replicates prior findings of the beneficial effects of “adding a better end,” but also indicates that this effect is not driven by a disproportionate impact of the end, but rather by another processes, such as lowering the average intensity of the experience. In addition, this study also argues against a scaling effect interpretation of the findings (e.g., the end changes the meaning of the rating scale) since the findings from the primary dependent measure were replicated with a monetary value measure.
A potential limitation of this study is that the average intensity was in fact slightly lower in the Better Middle condition than in the Better End condition. This slight difference was due to a gradual transition to the final volume in the Better Middle condition—which was used to avoid a jarring sound increase. While it is possible that the difference in the average intensity reduced the potential to find an end effect, this difference was only minimal (compared to the difference in average intensity with the Added End condition) and the findings were conceptually replicated in study 2, as well as (in the positive domain) in study 3.
Supplemental Study 2: Single Versus Repeated Negative Experiences
(Conceptual Replication of Study 4)
This study is a conceptual replication of study 4 using a different set of aversive sounds and conducted in the lab rather than online. Thus, this study also examined the impact of singular versus repeated experiences. As in study 4, each participant was sequentially presented with two clips of aversive sound, one that started well, but ended poorly (Worse End) and one that started poorly, but ended well (Better End). We expected that the position of the less aversive segment would not affect participants’ rating of the first sound clip, but would affect the rating of the second sound clip. That is, after listening to a sound clip with a worse (better) end, participants will rate a clip with a better (worse) end as less (more) aversive.
Method
One hundred and sixty-four undergraduate students participated in the study in exchange for partial course credit.
The procedure was similar to that of study 4. However, instead of asking participants to calibrate the volume settings after listening to a sample sound, the volume settings were fixed and approximately equal across computers. As in the other studies, participants listened to and rated their irritation with a drill sound clip (to be used as covariate). Participants then listened to one of two versions of the main stimulus (100 seconds of vacuum cleaner noise). Each clip consisted of 70 seconds at a relatively higher intensity and 30 seconds at a relatively lower intensity. The two sound clips were identical but reversed such that the lower intensity segment was positioned at the end of one clip (Better End) and at the beginning of the other clip (Worse End). See Supplemental Figure 3 for a visual depiction of the sound stimuli. Note that these sound clips are identical to the Added End and Added Beginning sound clips used in Study 2 of the paper. Immediately after listening to one of the sound clips, participants rated how annoying, unpleasant, and irritating it was to listen to the clip (on 9-point scales: 1 = not at all, 9 = very). Then, as in Study 4, they listened to the other clip and rated this second clip on the same three dependent measures. Unlike the studies in the main paper, this study did not have any additional measures or any manipulation checks. Finally, participants provided demographic information and completed an Instructional Manipulation Check, which consisted of a paragraph of text explaining the importance of reading instructions and asking participants to ignore the question underneath (a question about their geographical location with a list of regions). Participants were asked to write “none” instead of selecting a region.