**Supplemental Materials**

**Cross-Cultural Generalizability of Year in School Effects: Negative Effects of Acceleration and Positive Effects of Retention on Academic Self-Concept**

**by H. W. Marsh, 2015, Journal of Educational Psychology**

1. Detailed Description of Constructs Considered in the Present Investigation

2. Theoretical Model Underpinning The Big-Fish-Little-Pond Effect (BFLPE)

3. Juxtaposition of the contrast and assimilation effects

4. Minimum methodological requirements for Multilevel BFLPE studies

5. Socioeconomic status (SES): Moderating Effects

Table S1: Moderating Effects of Global and Sepecific Components of SES: Models of BFLPE and RYiS Effects With Inclusion of Covariates and Their Interactions With BFLPE and RYiS

6. Generalizability of Negative Effects of School-Average Achievement and RYiSEs to Other Constructs

Table S2: Generalizability of RYiS and BFLPEs to Other Math-Specific Psychosocial Variables (Outcomes) Other Than Math Self-Concept

7. Generalizability of the BFLPE to Self-Efficacy

Table S3: Generalizability of RYiS and BFLPEs to Math Self-efficacy

8. Mediation: Tests of Mediation to Test Processes Underlying the Negative Effects of School-Average Achievement (the BFLPE) and Relative Year in School (RYiSEs)

9. Additional References not Cited in the printed article

**1. Detailed Description of Constructs Considered in the Present Investigation**

What follows is a more detailed summary of the variables considered in the present investigation (Note: Materials in this section are largely based on the PISA 2003 Data Analysis Manual, where this information is presented in greater detail). Scale indices were constructed through the IRT scaling of either dichotomous (Yes/No) or Likert-type items, using a weighted maximum likelihood estimate and a one-parameter item response model in which indices were then standardized so that the mean of the index value for the OECD student population was zero and the standard deviation was one (countries being given equal weight in the standardization process). In some cases, simple indices were constructed through the arithmetical transformation or recoding of one or more items to calculate meaningful variables. Variables indicated by * were constructed for the purposes of the present investigation.

*Age is calculated as the difference between year and month of the testing and the year and month of a student’s birth as reported by students.

*Starting Age = Age first started primary school (Year 1). For present purposes these were truncated at 4 and 9 years of age.

*Relative starting age = Difference between starting age and mean starting age of the country.

*Repeating This score is based on the number of times students indicated that they had repeated grades (years in school) at each of three different levels of education (items ST22Q01, ST22Q02, and ST22Q03 in the student questionnaire). For the present purposes these were scored as 0 (none), 1 (once) or 2 (twice or more) for each item, so that the sum of these items could vary between 0 and 6.

*Highest occupational status of parents (HISEI) corresponds to the higher ISEI score of either parent, or to the only available parent’s ISEI score. Higher values on these indices indicate higher level of occupational status. The responses were coded in accordance with the four-digit International Standard Classification of Occupation (ISCO 1988; ILO, 1990) and then mapped to the international socio-economic index of occupational status (ISEI).

*Educational level of parents derived from students’ responses to the items for mothers’ and father’s educational level are coded in accordance with the International Standard Classification of Education.

*Index of immigrant background (IMMIG) has three categories: (1) “native” students (those students born in the country of assessment or who had at least one parent born in the country); (2) “first generation” students (those born in the country of assessment but whose parent(s) were born in another country); and (3) “non-native” students (those students born outside the country of assessment and whose parents were also born in another country).

*Language background (LANG): The PISA 2003 index of foreign language spoken at home (LANG) was derived by asking students if the language spoken at home most of the time was the language of assessment, another official national language, another national dialect or language, or some other language. In order to derive this index, responses were grouped into two categories: (1) language spoken at home most of the time is different from the language of assessment, from other official national languages and from other national dialects or languages; and (0) the language spoken at home most of the time is the language of assessment, another official national language, or other national dialect or language.

*Relative Year in School (RYiS): In order to adjust for between-country variation, the index of relative year in school indicates whether students are at the average year in school for 15-year olds in their country (assigned a value of 0), below the average year in school (negative values) or above the average year in school (positive values).

Home educational resources (HEDRES) are based on five items asking whether they had: a desk for study; a quiet place to study; their own calculator; books to help with school work; a dictionary. These binary variables were used to construct a scale with IRT scaling, such that positive values on this index indicate higher levels of home educational resources.

Home possessions (HOMEPOS) isderivedfromstudents’responsesto 14items (a desk for study; a room of your own; a quiet place to study; a computer you can use for school work; educational software; a link to the Internet; your own calculator; classic literature; books of poetry; works of art (e.g. paintings); books to help with your school work; a dictionary; a dishwasher;more than 100 books).These binary variables were used to construct a scale with IRT scaling, such that positive values on this index indicate higher levels of home educational resources.

Economic, social and cultural status (ESCS) is derived from three variables: parents’ highest education level, highest occupational classification, index of home possessions.

Interest in and enjoyment of mathematics (INTMAT) is derived from students’ responses to four items (I enjoy reading about mathematics; I look forward to my mathematics lessons; I do mathematics because I enjoy it; I am interested in the things I learn in mathematics. (+) A four-point scale with the response categories recoded as “strongly agree” (= 0); “agree” (= 1); “disagree” (= 2); and “strongly disagree” (= 3) is used. All items are inverted for IRT scaling, and positive values on this index indicate higher levels of interest and enjoyment in mathematics.

Instrumental motivation in mathematics (INSTMOT) is based on four items: Making an effort in mathematics is worth it because it will help me in the work that I want to do later on; Learning mathematics is worthwhile for me because it will improve my career <prospects, chances>; Mathematics is an important subject for me because I need it for what I want to study later on; I will learn many things in mathematics that will help me get a job. Based on IRT scaling, positive values on this index indicate higher levels of instrumental motivation to learn mathematics.

Mathematics self-efficacy (MATHEFF) is based on eight items measuring the students’ confidence with mathematical tasks: Using a <train timetable>; how long it would take to get from Seville to Zed town; Calculating how much cheaper a TV would be after a 30 percent discount; Calculating how many square metres of tiles you need to cover a floor; Understanding graphs presented in newspapers; Finding the actual distance between two places on a map with a 1:10,000 scale; Solving an equation like 2(x+3) = (x + 3) (x - 3); Calculating the petrol consumption rate of a car. Based on IRT scaling, positive values indicate higher levels of self-efficacy in mathematics.

Mathematics anxiety (ANXMAT) is based on five items: I often worry that it will be difficult for me in mathematics classes. I get very tense when I have to do mathematics homework. I get very nervous doing mathematics problems. I feel helpless when doing a mathematics problem. I worry that I will get poor <parks> in mathematics. Based on IRT scaling, positive values indicate higher levels of mathematics anxiety.

Mathematics self-concept (SCMAT) is based on five items: I am just not good at mathematics; I get good <parks> in mathematics; I learn mathematics quickly; I have always believed that mathematics is one of my best subjects; In my mathematics class, I understand even the most difficult work. Based on IRT scaling, positive values indicate higher levels of self-concept in mathematics.

Memorisation/rehearsal learning strategies (MEMOR) is based on four items measuring preference for memorisation/rehearsal as a learning strategy for mathematics: I go over some problems in mathematics so often that I feel as if I could solve them in my sleep; When I study for mathematics, I try to learn the answers to problems off by heart; In order to remember the method for solving a mathematics problem, I go through examples again and again; To learn mathematics, I try to remember every step in a procedure. Based on IRT scaling, positive values indicate higher preferences for this learning strategy.

Elaboration learning strategies (ELAB) is based on five items: When I am solving mathematics problems, I often think of new ways to get the answer. I think how the mathematics I have learnt can be used in everyday life. I try to understand new concepts in mathematics by relating them to things I already know. When I am solving a mathematics problem, I often think about how the solution might be applied to other interesting questions. When learning mathematics, I try to relate the work to things I have learnt in other subjects. Based on IRT scaling, positive values indicate higher preferences for this learning strategy.

Control learning strategies (CSTRAT) is based on five items: When I study for a mathematics test, I try to work out what are the most important parts to learn; When I study mathematics, I make myself check to see if I remember the work I have already done; When I study mathematics, I try to figure out which concepts I still have not understood properly; When I cannot understand something in mathematics, I always search for more information to clarify the problem; When I study mathematics, I start by working out exactly what I need to learn. Based on IRT scaling, positive values indicate higher preferences for this learning strategy.

Preference for competitive learning situations (COMPLRN) is based on five items: I would like to be the best in my class in mathematics; I try very hard in mathematics because I want to do better in the exams than the others; I make a real effort in mathematics because I want to be one of the best; In mathematics I always try to do better than the other students in my class; I do my best work in mathematics when I try to do better than others. Based on IRT scaling, positive values indicate higher preferences for competitive learning situations.

Preference for co-operative learning situations (COOPLRN) is based on five items: In mathematics I enjoy working with other students in groups; When we work on a project in mathematics, I think that it is a good idea to combine the ideas of all the students in a group; I do my best work in mathematics when I work with other students; In mathematics, I enjoy helping others to work well in a group; In mathematics I learn most when I work with other students in my class. Based on IRT scaling, positive values indicate higher preferences for cooperative learning situations.

* These variables are considered in the analyses presented in the printed article. Other variables are considered in supplemental analyses presented in Supplemental Materials.

**2. Theoretical Model Underpinning The Big-Fish-Little-Pond Effect (BFLPE)**

Psychologists from the time of William James (1890/1963) have recognized that objective accomplishments are evaluated in relation to frames of reference, noting that “we have the paradox of a man shamed to death because he is only the second pugilist or the second oarsman in the world” (p. 310). The same objective accomplishment can lead to quite different self-concepts, depending on the frames of reference or standards of comparison against which individuals evaluate themselves. These self-concepts have important consequences for future choices, behaviour and performance. Historically, the theoretical underpinnings of the frame-of-reference research that contributes to the BFLPE, derive from research on adaptation level (e.g., Helson 1964), psychophysical judgment (Marsh 1974; Parducci 1995; Parducci et al. 1969; Rogers 1941; Wedell and Parducci 2000), social psychology (Morse and Gergen 1970; Sherif 1935; Sherif and Sherif 1969; Upshaw 1969; Volkman 1951), sociology (Alwin and Otto 1977; Hyman 1942; Meyer 1970), social comparison theory (Festinger 1954; Diener and Fujita 1997; Suls 1977; Suls and Wheeler 2000), and the theory of relative deprivation (Davis 1966; Stouffer et al. 1949).

On the basis of this broad theoretical perspective (particularly that based on frame of reference effects; e.g., Marsh 1974), Marsh (1984; 1990; Marsh and Parker 1984) formulated a theoretical model of the BFLPE as applied to ASC in an educational psychology setting (see overview by Marsh, 2007; Marsh, Seaton, et al., 2008). Assume that three students (X, Y, and Z) vary in terms of their objective academic achievement relative to the entire population of students across all schools: X (slightly below-average achievement), Y (average achievement), and Z (slightly above-average achievement). Although student Y has an average academic achievement relative to the population of all students, if Y attends a high-achievement school (i.e., a school where the school-average achievement is above the average across all schools), Y would have an academic achievement below the average achievement level of other students in the school. This is predicted to result in Y having a below-average ASC. However, if Y attends a low-achievement school (i.e., a school where the school-average achievement is below the average across all schools), then Y would be above the average achievement level in this school, leading to an above average ASC. In a similar vein, the ASCs of students X and Z will depend (positively) upon their objective academic abilities, but will also vary (negatively) with the school-average achievement. According to this model, a given academic achievement level leads to a distribution of psychological impressions, indicating that other constructs (and random error) also affect this mapping. Although there was support for such a model based on psychophysical research dating back to the early 1900s that was the primary basis of this early research (see Marsh 1974), Marsh (1984; see also Schwarzer et al. 1982) specifically developed the BFLPE paradigm to understand the formation of ASC in school settings.

Following from this tradition, Marsh (1984; see also Marsh & Parker, 1984; also see Marsh, 2007; Marsh, Seaton, et al., 2008; Seaton, Marsh & Parker, 2013) proposed the BFLPE to encapsulate frame-of-reference effects in educational settings, based on an integration of theoretical models and empirical research from diverse disciplines. In this theoretical model (see Figure 1A in the published article), a negative BFLPE occurs when equally able students have lower ASCs when they compare themselves with more able classmates, and higher ASCs when they compare themselves with less able classmates. ASC is positively affected by individual achievement (i.e., more able students have higher ASCs): The path from individual achievement to individual ASC is substantial and positive (++ in Figure 1A). However, ASC is negatively affected by school- or class-average achievement (i.e., the same student will have a lower ASC when school- or class-average achievement is high): The path from school- or class-average achievement is negative. Hence, ASC depends not only on a student’s academic accomplishments but also on those of the student’s classmates.

Although this is not a main focus of the present investigation, the growing support for the multidimensionality of self-concept and theoretical models positing self-concept as a multidimensional, hierarchical construct (see Marsh 1990, 2007; Marsh and Craven 2006; Marsh, Kuyper et al., 2014) is important in tests of the BFLPE. Historically, self-concept researchers emphasized a global, relatively undifferentiated measure of self-concept, also referred to as self-esteem. However, particularly in educational psychological research, many important academic outcomes are systemically related to ASC, but relatively unrelated to self-esteem. General academic self-concept refers to students’ self-perceptions of their academic accomplishments, their academic competence, their expectations of academic success and failure, and academic self-beliefs. Importantly, this general ASC can also be broken into components related to broad academic disciplines (e.g., math and verbal self-concepts) as well as even more specific components of academic self-concept related to specific school subjects (e.g., history, English, foreign language, mathematics, computer studies, science, etc.; see Marsh 2007). Early BFLPE research (Marsh and Parker 1984; Marsh 1987) demonstrated that support for the BFLPE was highly domain specific; whilst ASC was strongly influenced by individual student achievement (positively) and school-average achievement (negatively) in the matching academic domain, neither individual nor school-average achievement had much effect on either global self-esteem or non-academic components of self-concept. This support for the domain specificity of the BFLPE provided strong support for the importance of a multidimensional perspective of self-concept in educational psychology research, but also supported the construct validity of interpretations of the BFLPE.

A series of theoretical predictions—some of which appeared to be paradoxical at the time they were first proposed (Marsh 1984)—can be generated from this model. In particular the model predicts that:

- ASC will be positively related to academic achievement;
- school-average ASC will be similar in high-achievement and low-achievement schools, even though the corresponding achievement levels of individual students are substantially higher in high-achievement schools and substantially lower in low-achievement schools (i.e., the frame of reference is largely established by the student’s own school);
- school-average achievement will be negatively related to ASC after controlling for individual student achievement;
- ASC will be more highly correlated with individual achievement after controlling for school-average achievement;
- ASC can be more accurately predicted from individual and school-level achievement than from either of these predictors considered separately;
- the negative effect of school-average academic achievement is specific to ASC and is unlikely to generalize to non-academic components of self-concept (e.g., physical self-concept); and
- because the frame-of-reference is established by school-average achievement, all students in a high-achievement school are predicted to have lower ASCs than would the same students if they attended a low-achievement school; interactions between school-average and individual achievement on ASC are expected to be small or non-significant.

Expanding on this theoretical model, Marsh (1987, 1990, 1991;2007; Marsh, Seaton et al., 2008) posited that the BFLPE represents the net effect of a stronger negative BFLPE (a contrast effect) and a weaker positive (assimilation or reflected glory) effect. Although reflected-glory assimilation effects have a clear theoretical basis, these effects have been largely implicit and elusive in BFLPE studies. Marsh, Kong and Hau(2000; also see Trautwein et al. 2009) addressed this issue in a large representative sample of Hong Kong high school students by specifically asking students to evaluate the pride that they felt in attending their high school. As previously found in BFLPE studies, higher school-average achievement led to lower ASC in their longitudinal study. However, they also found that higher perceived school status had a counter-balancing positive effect on self-concept (an assimilation effect) that Marsh et al. (2000) likened to reflected glory and feelings of pride in belonging to a high-achieving school. The net effect of these counterbalancing influences was clearly negative, indicating that the contrast effect was stronger than the assimilation effect. Attending a school where school-average achievement is high, simultaneously resulting in a more demanding basis of comparison for students within the school to compare their own accomplishments (the negative contrast effects), and a source of pride for students within the school (the positive reflected glory, assimilation effects). Although theoretically important, the assimilation effect found in this study has been elusive in other research and not nearly as robust as the typical contrast effects found in other BFLPE research.