Supplemental Materials
“The Big-Fish-Little-Pond Effect: Generalizability of Social Comparison Processes Over Two Age Cohorts From Western, Asian, and Middle Eastern Islamic Countries”
by H. W. Marsh et al., 2014, Journal of Educational Psychology
Appendix A
Big Fish Little Pond: Theoretical Background
Focusing on ASC in educational contexts, Marsh (1984; see also Marsh & Parker, 1984; Marsh, Seaton, et al., 2008) proposed the BFLPE to encapsulate frame of reference effects that are based on an integration of theoretical models and empirical research from diverse disciplines: relative deprivation theory (Davis, 1966; Stouffer, Suchman, DeVinney, Star, & Williams, 1949); sociology (Alwin & Otto, 1977; Hyman, 1942); psychophysical judgment (e.g., Helson, 1964; Marsh, 1974; Parducci, 1995; Wedell & Parducci, 2000); social judgment (e.g. Morse & Gergen, 1970; Upshaw, 1969); and social comparison theory (Festinger, 1954). In this BFLPE model, Marsh hypothesized that students compare their abilities with the abilities of their classmates and use this social comparison impression as one basis for forming their own self-concept. A negative BFLPE occurs when equally able students have lower ASCs if they compare themselves with more able classmates, and higher ASCs if they compare themselves with less able classmates.
Cross-Cultural Support for the BFLPE
One of the goals of cross-cultural research is to test the replicability of existing theories in other cultures, investigate new angles in diverse cultural contexts, and propose universal, pan-human theories (Segall, Lonner, & Berry, 1998, p. 1102). In their critique of self-concept research from this cross-cultural perspective, Marsh and Yeung (1999) noted the need to pursue more carefully constructed cross-national comparisons in order to evaluate more fully the generalizability of support for the BFLPE. Clearly, stronger cross-cultural studies need to compare the results from at least two—and preferably many—countries based on comparable samples, the same academic self-concept instrument, and the same measures of achievement. Because of the difficulty in achieving these criteria, apparent cross-cultural differences are typically confounded with potential differences in the composition of samples being compared and, perhaps, the appropriateness of materials.
However, there now exists very strong support for the cross-cultural generalizability of the BFLPE for high school students, based on successive data collections of the Organisation for Economic Co-operation and Development (OECD) Program for International Student Assessment (PISA) data. Marsh and Hau (2003) used the PISA 2000 data based on 103,558 15 year-old students from 26 predominantly industrialized Western countries. Using multilevel modeling, they found support for the BFLPE (positive effects of individual student achievement on ASC, but negative effects of school-average achievement on ASC) for the total sample and in 24 of the 26 countries considered separately. Although there were significant differences between countries, the country-level variation in the negative effect of school-average achievement was small, thus supporting the cross-cultural generalizability of the BFLPE.
Seaton, Marsh, and Craven (2009, 2010) used PISA 2003 (265,180 students, 10,221 schools, 41 countries), which included more collectivist and developing economies than PISA 2000. They also found strong support for the generalizability of the BFLPE, which was significant in 38 of the 41 countries. The BFLPE was not moderated by the cultural orientation or economic development level of the country. This led the authors to conclude that the BFLPE was a pan-human theory, as it “is not only a symptom of developed countries and individualist societies, but it is also evident in developing nations and collectivist countries of the world” (p. 414). Seaton et al. (2010) then evaluated 16 potential moderators of the BFLPE for PISA 2003, finding that BFLPEs were somewhat larger for students who were highly anxious, used memorization strategies, or preferred to work cooperatively. However, the BFLPE was not moderated by ability, SES, intrinsic and extrinsic motivation, self-efficacy, elaboration and control learning strategies, competitive orientation, sense of belonging to school, or relationship with teachers; this again attests to the broad generalizability of the BFLPE.
Nagengast and Marsh (2012) used the PISA 2006 database in the largest cross-cultural study of the BFLPE undertaken to date, and significantly extended the previous PISA studies. Based onnewly developed doubly latent contextual effects models (Lüdtke, et al., 2011; Marsh, et al., 2009), their results indicated that the BFLPE on science self-concept was significant in 50 out of 56 countries included in PISA 2006, which included more culturally and economically diverse countries than previously sampled. They also extended the BFLPE to career aspirations in science, demonstrating that career aspirations were positively predicted by individual student academic achievement but negatively predicted by school-average achievement. However, both the positive effects of individual achievement and the negative effects of school-average achievement on aspirations were significantly mediated by ASC.
In summary, ofthe three BFLPE-PISA studies, Nagengast and Marsh (2012) reported that the effect of school-average achievement was negative in all but one of the 123 samples considered across the three studies, and significantly so in 114 samples. However, particularly for the earliest of these PISA studies, the countries included were predominantly OECD and Western-developed countries; this restricted the generalizability of the findings.
Developmental Support for the Generalizability of the BFLPE
For many developmental, educational, and psychological researchers, self-concepts are a “cornerstone of both social and emotional development” (Kagen, Moore, & Bredekamp, 1995, p. 18; also see Davis-Kean & Sandler, 2001; Marsh, Ellis, & Craven, 2002); self-concepts develop early in childhood and, once established, they are enduring (e.g., Eder & Mangelsdorf, 1997). The development of self-concept is therefore emphasized in many early childhood programs (e.g., Fantuzzo et al., 1996). In a meta-analysis of the reliability of young children’s self-concepts, Davis-Kean and Sandler (2001) argued that young children have both the language and the cognitive ability to discuss the self by the time they are in preschool (see also Bates, 1990; Bornholt, 1997; Damon & Hart, 1988; Lewis & Brooks-Gunn, 1979; Penn, Burnett, & Patton, 2001), but that early childhood programs need a reliable basis for evaluating interventions to enhance children’s self-concepts (Fantuzzo et al., 1996; Marsh, Debus, & Bornholt, 2005). However, there is surprisingly little systematic self-concept research with young children, particularly in relation to individual student, class-average, and school-average achievement.
Hattie (1992; Hattie & Marsh, 1996) reviewed theoretical and empirical support for stages of growth in the development of self-concept, arguing against the notion of fixed stages that all persons must pass through. Instead, he posited seven parallel developments that are relevant to self-concept formation: (1) children distinguish self and others, (2) children distinguish self and the environment, (3) changes in major reference groups lead to changes in expectations, (4) attributions are made to salient personal and social or external sources, (5) cognitive processing capacities develop, (6) children develop particular cultural values, and (7) children develop strategies for confirmation and disconfirmation of self-referent information. Thus, with age and development, young children increasingly integrate information from their immediate environment into their self-concept formation. This is particularly relevant to the present investigation, emphasizing the integration of external frames of reference and social comparison into self-concept formation.
During the 1990s, developmental psychologists addressed progressive differentiation among self-concepts (e.g., Dweck, 1999; Eccles et al., 1993; Eder & Mangelsdorf, 1997; Harter, 1998; Marsh, Craven, & Debus, 1998; Ruble & Dweck, 1995; Wigfield et al., 1997). Harter (1983, 1999, in press) proposed a developmental model in which self-concept becomes increasingly abstract and differentiated with age, moving from a global perspective of being smart, to more differentiated self-representations in specific school subjects. She suggests that during early childhood the young child can construct concrete cognitive representations of observable features of self, but has difficulty in differentiating actual and desired attributes, and incorporating social comparison information for purposes of self-evaluation; this results in unrealistically positive self-evaluations. At the next stage of development, Harter (1998) indicates that young children form representational sets of related attributes—what Fischer (1980) labeled “representational mappings.” However, such self-descriptions are highly reflective of reductive, good-or-bad,all-or-none conceptions, resulting in unidimensional thinking. Harter suggested that it is not until middle childhood that children become capable of integrating information from specific features to higher-order generalizations reflecting trait labels—what Fischer has referred to as “representational systems”; more balanced representations of underlying competencies that were more closely related to external criteria. Consistent with Harter’s framework, there is growing evidence to suggest that the self-concept of children becomes more accurate (in relation to external criteria) and more differentiated with age and increasing cognitive functioning (see also Bouffard et al., 1998; Eccles et al., 1983, 1993; Russell, Bornholt, & Ouvrier, 2002; Wigfield et al., 1997; Wigfield & Eccles, 1992). On the basis of earlier research (e.g., Nicholls, 1979; Stipek & Mac Iver, 1989), Eccles et al. proposed that declining self-concepts for young children reflected an optimistic bias for young children that was tempered by experience, based on feedback and social comparison, so that their self-perceptions became more accurate with age. This trend is reinforced by changes in school environments, as educational achievements become more salient and education encourages competition, social comparisons, and external frames of reference.
Indeed, many authors (Chapman & Tunmer, 1995; Eccles, Wigfield, Harold, & Blumenfeld, 1993; Harter, 1999; Marsh, 1989; Marsh & Craven, 1997; Skaalvik & Hagtvet, 1990; Wigfield & Eccles, 1992; Wigfield et al., 1997) have offered a developmental perspective on the relation between academic self-concept and academic achievement. For example, Marsh (1989, 1990) proposed that the self-concepts of very young children are very positive and are not highly correlated with external indicators (e.g., skills, accomplishments, achievement, self-concepts inferred by significant others) but that with increasing life experience, children learn their relative strengths and weaknesses, so that specific self-concept domains become more differentiated and more highly correlated with external indicators. It should be noted, however, that this positive halo effect is normal in young children. As Harter (1999, p. 38) has pointed out, “Self-descriptions typically represent an overestimation of personal abilities. It is important to appreciate, however, that these apparent distortions are normative in that they reflect cognitive limitations rather than conscious efforts to deceive the listener.” In line with this perspective, Marsh et al. (1998) showed that reliability, stability, and factor structure of self-concept scales improve with age (children 5–8 years of age). In addition, consistent with the proposal that children’s self-perceptions become more realistic with age, self-ratings of older children were more correlated with inferred self-concept ratings by their teachers.
In a summary of this developmental research on relations between self-concept and achievement, Guay, Marsh, and Boivin (2003) suggested that this developmental trend could be explained by three factors: (a) Older children have higher cognitive abilities, which improves their coordination between self-representations, thus leading to better agreement between self-concept ratings and external indicators; (b) these higher cognitive skills lead older children to use social comparison processes, which foster a more balanced view of the self; and (c) older children have internalized evaluative standards of others, which lead to less egocentric evaluations of the self. These three developmental processes lead to greater accuracy, due to increased attunement to environmental feedback among older children, thus making it possible for ASC to predict changes in academic achievement. Using a multi-cohort multi-wave design (children in grades 2, 3, and 4 tested in each of three successive years), Guay et al. (2003) found that as children grew older, their ASC responses became more reliable, more stable, and more highly correlated with achievement. However, due in part to the modest sample sizes (Ns less than 150 for each age cohort), the age differences in stability and relations with achievement in multigroup structural equation models were not statistically significant. In their meta-analysis of studies evaluating relations between math and verbal self-concept and achievement, Möller et al. (2009) reported that relations among self-concept and achievement were higher when achievement was based on school grades rather than achievement test scores. Although they found that correlations among verbal and math self-concept became more differentiated with age, Möller et al. (2009) reported that relations between achievement and the matching ASC domain (.61 for math, .49 for verbal) were reasonably consistent over age. However, because of the paucity of available studies with young children (only 3 of 69 samples reported results for children in Grade 4 or younger) the generalizability of this finding was not strong.
An important limitation in BFLPE research is thus the lack of developmental perspective and a paucity of research with younger children. Indeed, very few of the studies reviewed by Marsh, Seaton, et al. (2008) were based on responses by primary school students. In the first BFLPE, Marsh and Parker (1984) coined the phrase “BFLPE” based on a small-scale study of primary students in sixth grade. Marsh, Chessor, et al. (1995) used a matching design to evaluate the effects of attending academically selective schools on the ASCs of primary school students. Compared to pre-test measures (prior to selection for selective schools) and compared to a matched control group (matched on achievement prior to selection for selective schools), attending selective schools had negative effects on ASC. In related German research, Jerusalem (1984) examined the self-concepts of West German students who moved from non-selective, heterogeneous primary schools to secondary schools that were streamed on the basis of academic achievement. Based on pre-test scores collected prior to the transition and post-test scores at the end of the first year of secondary school, the effect of attending selective schools on ASC was negative. Tymms (2001) evaluated the BFLPE as part of a large-scale (21,000 2nd grade students, 1,078 classes, 628 schools) study of school effectiveness. In linewith BFLPE predictions, he found that class-average academic achievement had negative effects on academic attitudes (which included some ASC-like items). Although these studies are heuristic and collectively suggest that the BFLPE can be identified in primary school students, it would be dubious to use them to make generalizations about the sizes of BFLPEs in primary schools, or to compare these to the large body of research based mostly on students attending secondary schools.
Appendix B
TIMSS Constructs Used in This Study
Math Self-Concept (MSC)
I usually do well in math (MSC1)
Math is harder for me than for many of my classmates (MSC2)
I am just not good at math (MSC3)
I learn things quickly in math (MSC4)
Individual Student Math Achievement
Composite based on Algebra; Data & Chance; Number; Geometry
Class-Average Math Achievement
Individual Student Achievement Aggregated to the class level
Cluster (Class ID; School ID; complex design cluster by class)
Note. Responses to the math self-concept, positive affect and coursework were all along the same 4-point Likert (agree–disagree) response scale.
Appendix C
Reliability Estimates
In preliminary analyses, we estimated the average reliability of the MSC score for each of the 26 (2 age cohorts 13 country) groups. Due in part to the brevity of the 4-item MSC scale, at least some of the coefficient alpha (α) estimates of reliability (Table 1) are modest for purposes of use in manifest models that do not correct for unreliability; reliabilities sometimes reached a desirable standard of .80, but in other cases fell below an acceptable value of .70 or even .60. Reliability estimates were systematically higher for the older age cohort (M α = .781) than the younger cohort (M α = .681). The reliability estimates were substantially lower in the Middle Eastern Islamic countries than in the Western or Asian countries. Although these country level differences are evident in both age cohorts, the reliability estimates were particularly low for the younger cohort in the Middle Eastern Islamic countries (M α = .512) compared to Western (M α = .725) and Asian (M α = .743) countries. Even though reliability estimates for the older Middle Eastern students (M α = .687) were still lower than for Western (M = .810) and Asian (M = .811) students, these differences were smaller than for the younger cohort. Overall reliability estimates are broadly similar for Western and Asian countries, but lower for Middle Eastern Islamic countries.
Particularly when reliability estimates are as low as in some younger cohorts from Middle Eastern Islamic countries, it is of dubious merit to make country-to-country comparisons based on manifest scale or composite scores, which are the basis of most TIMSS studies, and which are given implicit support in the test manual. In this sense, these preliminary results support the need to consider latent-variable models that control for unreliability, but are also consistent with the logic of country-specific control for measurement error. Similarly, systematic differences in reliability for the two age cohorts make problematic, those studies that do not control for these differences in measurement error. In summary, appropriately constructed latent variable models overcome limitations in large part due to poor reliability that have the potential to undermine the comparability of comparisons across countries or age cohorts based on TIMSS data—a critical limitation to TIMSS studies based on manifest models of these TIMSS self-belief constructs. We also note that reliability estimates based on the trichomized scale scores provided in the TIMSS database and used in many studies, would result in substantially lower and more biased estimates of relations among constructs and seriously undermine developmental studies of the different age cohorts.
Table S1
Variance Components of the TIMSS Math and Science Motivation Constructs Used in this Study