1

Appendix to Etiology of Obsessions and Compulsions: A Behavioral-Genetic Analysis

Steven Taylor & Kerry L. Jang, University of British Columbia

Gordon J. G. Asmundson, University of Regina

Correspondence: Steven Taylor, Ph.D., Department of Psychiatry, University of British Columbia, Vancouver, BC, V6T 2A1, Canada. E-mail: .

Author note: This research was supported by CIHR grant PTS-63186.

Introduction Notes

Notes on Clinical versus Nonclinical Obsessions and Compulsions

Most studies suggest that obsessive-compulsive (OC) symptoms in people without obsessive-compulsive disorder (OCD) have similar form and content to symptoms reported by people who meet full diagnostic criteria for OCD (Belloch, Morillo, Lucerno, Cabedo, & Carrio, 2004; Clark, 2004; Muris, Merckelbach, & Clavan, 1997). However, Rassin, Cougle, and Muris (2007) reported content differences between nonclinical obsessions (i.e., those reported by people without OCD) and clinical obsessions (those reported by people meeting diagnostic criteria for OCD). Rassin et al. classified obsessions into clinical and nonclinical varieties on the basis of previous research comparing people with OCD to non-clinical controls. The list of putative clinical and nonclinical obsessions was then administered to a student sample. Students were more likely to endorse a lifetime history of putative nonclinical than clinical obsessions. In the absence of an OCD control group it is difficult to interpret these findings. Julien, O’Connor, and Aardema (2009) administered a similar questionnaire to students and to OCD patients and found that the groups did not differ in the prevalence of nonclinical obsessions relative to clinical obsessions. Thus, Rassin et al.’s findings were not replicated in the methodologically more rigorous Julien et al. investigation. The latter study supports the view that clinical and nonclinical obsessions are similar in content.

Notes on the Dimensional versus Taxonic Structure of OC-related Symptoms

Taxometric researchgenerally suggests that OC symptom severity is more likely to be continuous (dimensional) than taxonic (categorical) (Haslam, Williams, Kyrios, McKay, & Taylor, 2005; Olatunji, Williams, Haslam, Abramowitz, & Tolin, 2008). However, some commentators have interpreted the findings of Olatunji et al. as evidence of the taxonicity of hoarding, so this study merits some discussion. Olatunji et al.’s (2008) taxometric study suggested that 5 out of 6 subscales of the Obsessive-Compulsive Inventory-revised (OCI-R) were dimensional in terms of their distribution of scores. The results were less clear for the 6th subscale, which measured hoarding. Here, for the MAXEIG/MAXCOV method, taxonicity was suggested by 2 out of 3 plots. For the MAMBAC method, taxonicity was suggested by only 2 out of 6 plots. In other words, slightly more than half of the plots (5/9) suggested a dimensional structure of hoarding. Thus, there was no convincing evidence that hoarding was taxonic; the results suggest that hoarding is more likely to be dimensional. The taxonic versus dimensional structure of hoarding requires further investigation, particularly using methods that are capable of investigating whether hoarding is composed of more than two categories. This would involve the combined use of taxometric methods (e.g., MAXEIG/MAXCOV and MAMBAC methods) with latent class analysis. Taxometric methods alone only distinguish between dimensions and taxa composed of two categories.

Does the Rassin et al. (2007) study provide evidence that OCD or some aspects of the disorder are categorical, as proposed by one reviewer? This would be suggested if people with OCD tended to report one type of obsessions and people without this disorder tended to report a different type of obsessions. Rassin et al.’s (2007) findings offer no such evidence. As noted above, those authors created a questionnaire that they claimed assessed 2 types of obsessions; clinical and nonclinical obsessions. They administered their unvalidated questionnaire to a group of college students (and not to people with OCD) and found that the students more often reported experiencing clinical than nonclinical obsessions. This tells us nothing about whether OCD or its features are categorical. The probability of experiencing putative “clinical” obsessions could be dimensionally rather than dichotomously distributed. Moreover, Rassin et al. did not include an OCD control group, so those authors were unable to determine whether their college students scored any differently from people with OCD. As noted above, Julien et al. (2009) conducted a similar but more rigorous study, in which a similar questionnaire was administered to both students and people diagnosed with OCD. The two groups did not differ in the prevalence of nonclinical obsessions relative to clinical obsessions. Thus, the methodologically more rigorous research by Julien et al. offers no evidence that OCD, or aspects of OCD, are categorical.

Notes on Hoarding

Hoarding shares features of both OCD and OC personality disorder (Pertusa et al., 2010). However, psychometric studies indicate that hoarding is not strongly related to OC personality disorder (Hummelen, Wilberg, Pedersen, & Karterud, 2008; Livesley & Jackson, 2009).The question of whether hoarding phenomena should be classified as forms of OC symptoms has yet to be resolved. Mixed findings have been obtained in studies of the relationship of hoarding to prototypic OC symptoms such as checking, washing, and obsessing. Wu and Watson (2005), for example, found that checking, washing, and obsessing were more highly correlated with one another than they were with hoarding. This was not replicated in a larger study by Taylor et al. (2010). Those investigators found that hoarding and prototypic OC symptoms all loaded on a single higher-order OC factor and that hoarding had generally large correlations (>.50) with measures of other OC symptoms. Studies of the etiologic (behavioral-genetic) links among hoarding and other symptoms are needed to clarify whether hoarding is part of the heterogeneous constellation of symptoms currently classified as obsessions and compulsions. To our knowledge, there has yet to be a behavioral-genetic investigation of whether hoarding has genetic or environmental etiologic factors in common with OC symptoms such as checking, washing, and obsessing. That was an aim of the present investigation, the details of which are reported in the main article.

Methodological Details and Issues

Further Details on Sample and Recruitment

For each twin pair in our study, the twins were reared together. Prior to age 18, most twins (74% of MZ and 73% of DZ pairs) had never been separated from their co-twin for more than 1 month. Previous research from our twin registry indicates that the early environments (prior to age 18) of MZ twins are not significantly different from those of DZ twins (Taylor, Jang, Stewart, & Stein, 2008), thereby fulfilling the assumption of equal MZ and DZ environments, which underlies the methodology of twin studies. Twins from our registry have also been shown to be representative of the general population in terms of demographics, personality, and various clinical variables (e.g., Jang, Livesley, & Vernon, 2000).

Of the 660 individuals who completed the assessment battery in the present study, 46 people (7%) were singletons; that is, people for which their co-twin did not complete the assessment measures. The sample included in the present study consisted of 614 people for which both members of a twin pair completed the assessment (307 pairs). Completers and singletons did not significantly differ in their scores on any of the OCI-R subscales (ps.10).

Examples of OCI-R Items

The OCI-R consists of six 3-item subscales assessing checking (e.g., “I check things more often than necessary”), neutralizing (cognitive compulsions; “I feel I have to repeat certain numbers”), washing (“I sometimes have to wash or clean myself simply because I feel contaminated”), obsessing (“I am upset by unpleasant thoughts that come into my mind against my will”), hoarding (“I have saved up so many things that they get in the way”), and ordering (“I need things to be arranged in a particular order”).

Reliability and Validity of the OCI-R

Reliability. Coefficient alpha is typically used as an index of reliability because it purportedly measures reliability as internal consistency. Most reliability analyses of psychometric instruments rely exclusively on coefficient alpha. Despite its widespread use, there are important problems with alpha. In the following we summarize the findings concerning alpha values for the OCI-R subscales, and then we discuss the criticisms of alpha, which have to do with the interpretation of obtained alpha values. We report alpha values for the OCI-R scales at the request of a reviewer, even though we believe that coefficient alpha is largely irrelevant to the present study, partly because of the problems with alpha and partly because the subscales in the present study were measured as latent variables, not as the unit-weighted sum of items for which alpha is used.

Reported levels of coefficient alpha in studies using the OCI-R. In an influential psychometric text, Nunnally (1978) offered guidelines for interpreting alpha values:Values .70 was defined as “acceptable” and  .80 was “good.” Despite the widespread use of these criteria in research articles, in the subsequent edition of Nunnally’s book all mention of criteria for classifying alpha was omitted (Nunnally & Bernstein, 1994), perhaps reflecting recognition of the problems in interpreting alpha.

Regardless of whether OCI-R studies were based on clinical or nonclinical samples, for most studies and for most OCI-R subscales, coefficient alpha was close to or higher than .70, without any subscale having a consistently low alpha across studies. The range of alpha values across the 6 subscales were as follows: Abramowitz and Deacon (2006) .80 to .92; Foa et al. (2002) .83 to .90; Fullana et al. (2005) .61 to .82; Gönner et al. (2008) .51 to .96; Hajcak et al. (2004) .47 to .88; Huppert et al. (2007) .57 to .93; Roberts and Wilson (2008) .68 to .83; Sica et al. (2009) .60 to .80; Smari et al. (2007) .62 to .84; Wu and Carter (2008) .76 to .89; Zermatten et al. (2006) .63 to .83.

In the present study, for each phenotypic variable, scored for the purpose of computing alpha as the unit-weighted sum of item scores, coefficients alpha were computed separately for the twin 1 and twin 2 samples and then the mean alpha was calculated. The means were as follows: Obsessing .85, neutralizing .69, checking .80, washing .84, hoarding .76, ordering .86, affective lability .91, trait anxiety .94. According to Nunnally’s (1978) criteria, all values were very near to, or better than, the .70 criteria for an “acceptable” alpha.

Problems with alpha. Although we have reported alpha values at the request of a reviewer, there are 2 reasons why coefficient alpha is an inappropriate index in our study. First, it does not reflect the reliability of our measures. Alpha is used to estimate the reliability of scales consisting simply of the unit-weighted sum of their items. This is not how our scales were used, as described above. The second reason for not using coefficient alpha is that it has been widely criticized as an index of reliability. This is illustrated by the following quotations, which appeared in recent articles in the journal Psychometrika:

The general use of coefficient alpha to assess reliability should be discouraged on a number of grounds. The assumptions underlying coefficient alpha are unlikely to hold in practice, and violation of these assumptions can result in nontrivial negative or positive bias. … good alternative methods are available. (Green & Yang, 2009, p. 121)

When considering how well a test measures one concept, α is not appropriate… It has been known for a long time that α is a lower bound to the reliability, in many cases even a gross underestimate, and a poor estimate of internal consistency and in some cases a gross overestimate.” (Revelle & Zinbarg, 2009, pp. 145 & 153)

Alpha is not a measure of internal consistency. Neither is it a measure of the degree of unidimensionality… Alpha has been shown to correlate with many other statistics and much as these results are interesting, they are also confusing in the sense that without additional information, both very low and very high alpha values can go either with unidimensionality or multidimensionality of the data. But given that one needs the additional information to know what alpha stands for, alpha itself cannot be interpreted as a measure of internal consistency. (Sijtsma, 2009, p. 119)

Coefficient alpha is inappropriate as a single summary of the internal consistency of a composite score. Better estimators of internal consistency are available. (Bentler, 2009, p. 137)

Given that all variables in our study were specified and measured as latent variables, with subscale items loading on these variables, a better indication of internal consistency is the extent to which each latent variable accounts for,or explains, variance in the scores on its indicators (items). The following is the proportion of variance that each phenotype (latent variable) accounted for in their respective indicators: Obsessing .66, neutralizing .46, checking .60, washing .64, hoarding .53, ordering .67, affective lability .40, trait anxiety .51.

Item redundancy. Does the OCI-R contain redundant items as one reviewer suggested? The reviewer’s claim wasbased on the high levels of alpha sometimes reported for the OCI-R subscales. There are no empirically established guidelines for defining item redundancy. Rules-of-thumb have been proposed, such as those for interpreting alpha values, but these were arbitrarily defined. Moreover, Cortina (1993, p. 101) demonstrated that a value of alpha of .80, which is widely regarded as a high value, need not indicate that items are strongly related to one another. In Cortina’s example, for a 3-item scale with alpha = .80, the mean inter-item correlation was .57. In other words, the items had a mean of 33% overlapping variance. So, a high value of alpha need not indicateitem redundancy.

Analyses of our data also do not support the view that the OCI-R subscales are composed of redundant items. We computed, for each OCI-R subscale, the squared multiple correlation (SMC) of each item with the other 2 items in a given subscale. SMC is a form of R2, with values ranging from 0 (complete non-redundancy) to 1.00 (complete redundancy). We computed SMCs for each item in each of the 6 OCI-R subscales. The maximum values of SMC for each OCI-R subscale were as follows: Checking .54, hoarding .44, neutralizing .38, obsessing .57, ordering .58, and washing .56. These values are far below 1.00, showing that the OCI-R items are not redundant with one another. Further details of the SMC results appear in Table A1.

Test-retest reliability. Several studies have reported significant test-retest correlations for the OCI-R subscales: Foa et al. (2002, 2 week retest interval) .57 to .91; Fullana et al. (2005, 4 weeks) .45 to .66; Hajcak et al. (2004, 4 weeks) .54 to .77; Sica et al. (2009, 4 weeks) .76 to .99; Smari et al. (2007, 2 weeks) .69 to .86. In a 2-year followup study of 132 college students, Fullana et al. (2007) found that there were no significant changes in OCI-R subscale scores from baseline and follow-up, except for the obsessing subscale. For all subscales, including the obsessing scale, scores at baseline strongly predicted scores on the same subscale at 2-year followup. Taken together, these findings suggest that the OCI-R subscales have good test-retest reliability. These results are also consistent with what is known about the temporal stability of OCD; symptoms in people with this disorder tend to be chronic although symptom severity may wax and wane over time, particularly in response to life stressors (American Psychiatric Association, 2000).

Factorial validity. All the published factor analytic studies of the original and revised OCI that we could locate supported the same 6-factor solution in clinical and non-clinical samples. In these studies 6 correlated factors were obtained, corresponding to each of the 6 subscales (Abramowitz & Deacon, 2006; Foa et al., 2002; Fullana et al., 2005; Gönner et al., 2008; Hajcak et al., 2004; Huppert et al., 2007; Roberts & Wilson, 2008; Sica et al., 2009; Smari et al., 2007; Taylor et al., 2010; Woo et al., 2010; Zermatten et al., 2006). The 6 OCI-R factors all load on a single higher-order factor (Roberts & Wilson, 2008; Taylor et al., 2010), which is what would be expected if the 6 factors each assess OC symptoms or related (OC spectrum) phenomena. Overall, the findings support the factorial validity of the OCI-R.

Convergent and discriminant validity.Several studies have shown that the OCI-R total scale and subscales are significantly correlated with other measures of OC symptoms and that these correlations are often (but not invariably) greater than correlations with measures of general distress, such as measures of depression or general anxiety (e.g., Abramowitz & Deacon, 2006; Foa et al., 2002; Fullana et al., 2005; Gönner et al., 2008; Hajcak et al., 2004; Sica et al., 2009; Smari et al., 2007; Wu & Carter, 2008). High correlations with measures of general distress have raised concerns about the discriminant validity of the OCI-R. Such concerns are not specific to the OCI-R; they have been raised for many if not most other OC measures (Taylor, 1995, 1998). These high correlations are difficult to interpret; if OC symptoms are a significant source of distress, then the OCI-R should be strongly correlated with measures of depression and anxiety. In summary, overall the results support the convergent validity of the OCI-R. The discriminant validity of the OCI-R remains to be properly evaluated. Instead of correlating this scale with measures of general distress, a better test of discriminant validity would be to correlate the OCI-R scores with scores on specific forms of psychopathology (e.g., agoraphobia) and determine whether the resulting correlations are less than the correlations between the OCI-R and other measures of OC symptoms.

Criterion related (known-groups) validity. Abramowitz and Deacon (2006) found that all 6 OCI-R subscales significantly discriminated patients with OCD (n=167) from anxiety-disordered patients without OCD (n=155). OCD patients scored significantly higher on all subscales. Moreover, OCD patients classified as having primarily hoarding symptoms (as assessed by a structured clinical interview) had significantly higher scores on the OCI-R hoarding subscale than other OCD patients and other non-OCD anxiety disordered patients. Similar findings were reported in other studies (Gönner et al., 2008; Huppert et al., 2007; Sica et al., 2009). These findings support the criterion-related (known-groups) validity of the OCI-R. However, in other analyses the OCI-R hoarding subscale failed to discriminate people with OCD from other groups. For example, Foa et al. (2002) found that the hoarding scale did not distinguish people with OCD from people with generalized social phobia or non-anxious controls. Abramowitz, Wheaton, and Storch (2008) found that OCD patients and non-OCD anxiety patients scored non-significantly different from each other and significantly lower than college students on the OCI-R hoarding scale. These mixed findings would raise doubts about the criterion-related validity of the hoarding scale, but only if hoarding is conceptualized as a cardinal feature of OCD. Recently, hoarding has been conceptualized as an OC spectrum condition (Pertusa et al., 2010); that is, a clinical condition related to, but also significantly different from, OCD. According to Phillips et al. (2010), the OC spectrum “refers to a group of disorders that are presumed to be distinct from, but related to, obsessive-compulsive disorder” (p. 529). If hoarding falls within this spectrum, then one would not expect that people with OCD would invariably have significantly higher scores than other groups on the OCI-R hoarding scale.