Experimental Approaches to the study of Personality
Personality is an abstraction used to explain consistency and coherency in an individual’s pattern of affects, cognitions, desires and behaviors. What one feels, thinks, wants and does changes from moment to moment and from situation to situation but shows a patterning across situations and over time that may be used to recognize, describe and even to understand a person. The task of the personality researcher is to identify the consistencies and differences within and between individuals (what one feels, thinks, wants and does) and eventually to try to explain them in terms of set of testable hypotheses (why one feels, thinks, wants and does).
Personality research is the last refuge of the generalist in psychology: it requires a familiarity with the mathematics of personality measurement, an understanding of genetic mechanisms and physiological systems as they interact with environmental influences to lead to development over the life span, an appreciation of how to measure and manipulate affect and cognitive states, and an ability to integrate all of this into a coherent description of normal and abnormal behavior across situations and across time.
Although the study of personality is normally associated with correlational techniques associating responses or observations in one situation or at one time with responses at in other situations and other times, it is also possible to examine causal relations through the use of experimental methods. This chapter will outline some of the challenges facing personality researchers and suggest that an experimental approach can be combined with more traditional observational techniques to tease out the causal structure of personality.
Central to our analysis is the distinction between personality traits and personality states. States can be thought of as the current values of one’s affects, behaviors, cognitions and desires while traits have been conceptualized as either average values of these states or alternatively the rates of change in these states. In more cognitive terms, traits are measures of chronic accessibility or activation, and states are levels of current activation. It is perhaps useful here to think analogically and to equate states with today’s weather and traits as long terms characteristics of weather, that is to say, climate. On any particular day, the weather in a particular location can be hot or cold, rainy or dry. But to describe the climate for that location is more complicated, for it includes among other aspects a description of the seasonal variation in temperature and the long term likelihood of draught or blizzards or
hurricanes. Extending this analogy, climatologists explain differences in climate between locations in terms of variations in solar flux and proximity to large bodies of water, and changes in climate in terms of long term trends in greenhouse gases in the atmosphere. The role of the personality researcher is analogous to the meteorologist and climatologist, trying to predict what someone’s immediate states as well as understanding and explaining long term trends in feelings, thoughts and actions.
Integrating Two Alternative Research Approaches
Psychological research has traditionally been described in terms of two contrasting approaches: the correlational versus the experimental (viz., the influential papers by Cronbach, 1957, 1975; and Eysenck, 197x). Francis Galton and his associate Karl Pearson introduced the correlation as a means for studying how individual differences on one variable (e.g., the height of one’s parents or one’s occupation) could be related to individual differences in another variable (e.g., one’s own height or to one’s reaction time). Correlational approaches have been used in personality research since Galton to predict school achievement or job performance, and when combined with known family structures (e.g., parents and their offspring, monozygotic or dizgotic twins with each other, adopted and biological siblings) have allowed for an examination of the genetic basis of personality. Applying structural techniques such as factor analysis to matrices of correlations of self and other descriptions has led to taxonomic solutions such as the Giant Three or Big Five trait dimensions. The emphasis in correlational research is on variability, correlation, and individual differences. Central tendencies are not important, variances and covariances are. The primary use of correlational research is in describing how people differ and how these differences relate to other differences. Unfortunately for theoretical inference, that two variables are correlated does not allow one to infer causality. (e.g., that foot size and verbal skill are highly correlated among preteens does not imply that large feet lead to better verbal skills, for a third variable, age, is causally related to both.)
A seemingly very different approach to research meant to tease out causality is the use of experimental manipulation. The psychological experiment, introduced by Wundt and then used by his students and intellectual descendants allows one to examine how an experimental manipulation (an Independent Variable) affects some psychological observation (the Dependent Variable) which is, in turn, thought to represent a psychological construct of interest. The emphasis is upon central tendencies, not variation, and indeed, variability not associated with an experimental manipulation is seen as a source of noise or error that needs to be controlled. Differences of means resulting from different experimental conditions are thought to reflect the direct causal effects of the IV upon the DV. Threats to the validity of an experiment may be due to confounding the experimental manipulation with multiple variables or poor definition of the dependent variables or an incorrect association between observation and construct.
One reason that correlational and experimental approaches are seen as so different is that they have traditionally employed different methods of statistical analysis. The standard individual differences/correlational study reports either a regression weight or a correlation coefficient. Regression weights are measures of how much does variable Y change as a function of a unit change in variable X. Correlations are regressions based upon standard scores, or alternatively the geometric mean of two regression slopes (X upon Y and Y upon X). A correlation is an index of how many standardized units does Y change for a standardized unit of X. (By converting the raw Y scores into standardized scores, zy = (Y-Y.)/s.d.Y, one removes mean level as well as the units of measurement of Y. Experimental results, on the other hand, are reported as the differences of the means of two or more groups, with respect to the amount of error within each group. Student’s t-test and Fisher’s F test are the classic way of reporting experimental results. Both t and Fy are also unit free, in that they are functions of the effect size (differences in means expressed in units of the within cell standard deviation) and the number of sample size of participants.
But it is easy to show that the t-test is a simple function of a correlation coefficient where one of the variables is dichotomous. Similarly, the F statistic of an analysis of variance is directly related to the correlation between the group means and a set of contrast coefficients. The recognition that correlations, regressions, t and F statistics are all special cases of the general linear model has allowed researchers to focus on the validity of the inferences drawn from the data, rather than on the seeming differences of experimental versus correlational statistics.
The use of meta-analysis to combine results from different studies has forced researchers to think about the size of their effects rather than the significance of the effects. Indeed, realizing that r = sqrt(F/(F+df)) or sqrt(t2/ (t2+df)) did much to stop the complaint that personality coefficients of .3 were very small and accounted for less than 10% of the variance to be explained. For suddenly, highly significant F statistics were found to be showing that only a small fraction of the variance of the dependent variable was accounted for by the experimental manipulation.
The realization that although the statistics seemed different but are actually just transformations of each other forces experimentalists and correlationalists to focus on the inferences they can make from their data, rather the way in which the data are analyzed. The problems are what kind of inferences one can draw from a particular design, not whether correlations or experiments are the better way of studying the problem.
Latent constructs, observed variables and the problems of inference
Fundamental to the problem of inference is the distinction between the variables we measure and observe and the constructs that we think about. This distinction between latent (unobserved) constructs and measured (observed) variables has been with us at least since Plato’s Phaedra . Consider prisoners shackled in a cave and only able to see shadows (observed scores) on the cave wall of others (latent scores) walking past a fire. The prisoners attempt to make inferences about reality based upon what they can observe from the length and shape of the shadows. Individual differences in shadow length will correctly order individual differences in height, although real height can not be determined. To make this more complicated, as people approach the fire, their shadow lengths (the observed scores) will increase, even though their size (the latent score) has not changed. So it is for personality research. We are constrained to make inferences about latent variables based upon what we measure of observed variables. The problem may be shown diagrammatically (Figure 1) where boxes represent observed variables, circles latent constructs, and triangles experimental manipulations. From the observed pattern of correlations or t-tests we attempt to make inferences about the relationships between the latent variables as well as between the latent and observed variables.
Insert Figure 1 about hereThere are at least three challenges that we face when making inferences about the strength of the relationships between latent variables: the shape of the functional relationship between observed and latent variables, the strength of the functional relationship between observed and latent variables, and the proper identification of the latent variables associated with observed variables and manipulations.
Consider the following two hypothetical experiments. Both are field studies of the effect of education upon student outcomes. In study 1, students from a very selective university, a less selective university, and a junior college are given a pretest exam on their writing ability and then given a post test exam at the end of the first year. The same number of students are studied in each group and all students completed both the pretest and post test. Although there were differences on the pretest between the three student samples, the post differences were even larger (Figure 2a). Examining figure 2a, many who see these results conclude that students at the highly selective university learn more than students at the less selective university who change more than the students at the junior college. Some (particularly faculty members) like to conclude that the high tuition and faculty salaries at the prestigious and selective university lead to this greater gain. Others believe that the teaching methods at the more selective university are responsible for the gains, and if used at the other institutions, would also lead to better outcomes. Yet others (particularly students) point out that the students in the prestigious university were probably smarter and thus more able to learn than the students in the junior college.
Hypothetical study 2 was similar to study 1, in that it was done at the same three institutions during the first year, but this time the improvement on mathematics achievement was examined (Figure 2b). Here we see that students at the most selective school, although starting with very high scores, did not improve nearly as much as the students at the less selective university, who improved even less than the students at the junior college. Most faculty and students who see these results immediately point out that the changes for the selective university students were limited by a “ceiling effect” and that one should not conclude that the selective university faculty used less effective techniques nor that the students there were less able to learn.
The results and interpretations from these two hypothetical studies are interesting for in fact one is the inverse of the other. Scores in study 2 are merely the scores in study 2 subtracted from 100. The results form both study 1 and 2 can be seen as representing equal changes on underlying latent score, but using tests that differ in their difficulty. Study 1 used a difficult test in which improvements of the students at the less selective institution were masked, study 2 used an easy test where improvements of students at the more selective institution were masked (Figure 2c). That differences in outcome are explained by ability in study 1 but scaling effects (in this case, a ceiling effect) in study 2 exemplifies the need to examine one’s inferences carefully and to avoid a confirmation bias of accepting effects that confirm one’s beliefs and searching for methodological artifacts when facing results that are disconfirming.
We will revisit this problem of scaling effects upon inferring differential effects of personality and situational manipulations when we consider the appropriate interpretation of interactions of personality and situations.
A second problem in inferring differences in latent scores based upon changes in observed score is the strength of the relationship between latent and observed. This is the problem of reliability of measurement. Although addressed more completely in chapter XX [note to editor- this is an assumption], the basic notion of reliability is that any particular observed score reflects some unknown fraction of the latent score as well a (typically much larger) fraction of random error. By aggregating observations across similarl items or situations the proportion of the observed score due to the latent score will increase asymptotically towards 1 as a function of the number of items being used and the similarity of the items. Assuming that items are made up of a single latent score and random error, it is easy to show that the proportion of latent score variance in a test with k items and is k*r/(1+(k-1)*r) where r = the average correlation between any two items and is equal to the ratio of latent score variance in an item to total item variance. More generally, the reliability of a measure of individual differences is a function of what we are trying to generalize across (e.g., items, people, raters,situations, etc.)
Confirmatory versus disconfirmatory designs
Although it is very tempting (and unfortunately extremely common) to test hypothesis by looking for evidence that is consistent with the hypothesis (e.g., “testing” the hypothesis “all swans are white” by looking for white swans), in fact disconfirming evidence is the only test of a hypothesis (even after seeing 1,000 white swans, seeing 1 black swan disconfirms the hypothesis.) The use of strong inference (Platt, 1964) to ask what hypothesis a finding can disconfirm should be the goal of all studies. For science is the process of refining theories by excluding alternative hypotheses.
“I will mention one severe but useful private test – a touchstone of strong inference - that removes the necessity for third-person criticism, because it is a test that anyone can
learn to carry with him for use as needed. It is our old friend the Baconian “exclusion,” but I call it “The Question.” Obviously it should be applied as much to one’s own thinking as to others’. It consists of asking in your own mind, on hearing any scientific explanation or theory put forward, “But sir, what experiment could dis prove your
hypothesis?”; or, on hearing a scientific experiment described, “But sir, what hypothesis does your experiment dis prove?” Platt, Science, 1964
Consider the following sequence of numbers that have been generated according to a certain rule: 2, 4, 8, X, Y, … What is that rule? How do you know that is the rule? One can test the hypothesized rule by generating an X and then a Y and seeing if they fit the rule. Many people, when seeing this sequence will propose X=16 and then Y= 32. In both cases they would be told that these numbers fit the rule generating the sequence. Once again, most people would then say that the rule is successive powers of 2. A few people will propose that X = 10 and then Y=12 and conclude that the rule is to produce increasing even numbers. Few will try X = 9 and Y = 10.92, with the hypothesis that the rule is merely an increasing series of numbers. Even fewer will propose that X = 7 or that Y = sqrt(43), terms that did not fit the rule and allow us to reject the hypothesis that any number will work. This simple example shows the need to consider many alternative hypotheses and to narrow the range of possible hypothesis by disconfirmation. For, as that great (but imaginary) scientist Sherlock Holmes reasoned “when you have eliminated the impossible, whatever remains, however improbable, must be the truth” (Doyle, 18xx)
Experimental manipulations as tests of theories of causality
In the mid 1500’s, a revolutionary technique was added to the armamentarium of scientific reasoning. Rather that using arguments based upon assumptions and logical reasoning, the process of empirical observation and more importantly, experimental manipulation was introduced (see Shadish, Cook and Campbell, 2002, for a wonderful discussion of the development of experimentation and reasoning.) By observing the results of experimental manipulations it became possible to tease apart alternative hypotheses and to address issues of causality. Although statistically, there is little to differentiate experimental and correlational data, the importance of experimental techniques is the ability to make statements about causality and to exclude possible explanations by experimental control. When applied to personality theory, experiments allow us to test the range of generalization of the relationships between individual differences and outcome variables of interest.