Patterns of Motivation Beliefs1
Supplementary Materials (to be made available online)
Missing Data
Cases were not excluded from further analysis because of missing data on one or more variables. Traditional methods like listwise or pairwise deletion of cases with missing values or single-value imputation (e.g., with the sample mean) can bias sample statistics and are not recommended (Peugh & Enders, 2004). The IMPUTE module in the Sleipner package identifies cases with missing data and searches for twin patterns in the data that meet a specified threshold of similarity. All of the analyses in this study used average squared Euclidean distance as the similarity measure, a .50 threshold for finding a twin, and a conservative criterion for the number of variables for which missing data may be imputed (25%, or 2 values missing for a set of 8 clustering variables). A description of cases for which data were imputed can be found in Table S-2.
There were 1,844 cases with complete data. For the 26 cases with missing data (1.5% of the sample), 16 had a twin that met the standard similarity criterion of .50, 1 had no twin that met the specified threshold, and 9 had missing values in too many variables.
Checking the Reliability and Validity of the Cluster Solutions
Sensitivity of the results to sampling variation was checked by performing identical analyses on two independent samples randomly selected from the same population and comparing the two classifications. The utility of the classification was verified by examining the amount of error explained by the chosen solution, and sensitivity of the results to the choice of clustering method was checked by comparing the classifications from different clustering algorithms. In a final validity check, between-cluster differences on theoretically-related constructs provided evidence of the validity of the cluster solutions. Strong support was obtained for the reliability and validity of the cluster solutions.
Random halves.An identical set of analyses was conducted on the A and B samples. The explained error sum of squares for the final 20 iterations and the increase in error sum of squares resulting from each fusion suggested a similar range of cluster solutions across both sample halves (6 to 12 cluster solutions for the A sample and 7 to 10 clusters for the B sample). Theoretical and statistical considerations suggested 7- and 8-cluster solutions as the best candidates for both A and B samples. The 8-cluster solution yielded a cluster pair that did not match, as indicated by an average squared Euclidean distance (ASED) between centroids of greater than .50 for the 8th cluster pair, suggesting a poor match between cluster centroids for the 8th cluster. Comparing the 7-cluster k means solution yielded good matches across the A and B samples for all cluster pairs. The ASEDs for the pairwise matches and the centroid graphs are provided in the online supplement (Table S-1 and Figure S-1). With a maximum ASED of .208, the correspondence between the cluster means for the 7-cluster solution across two different samples provided strong evidence of the reliability of the cluster solutions.
Comparing results across alternate methods. Ward’s method on the A sample indicated an optimal number of clusters and k means on the B sample used the optimal number of clusters suggested by the Ward’s analysis. Comparing cluster centroids indicated a high degree of overlap using different samples, different programs, and different clustering techniques (Maximum ASED = 0.182). Centroid graphs showing the correspondence between solutions across methods are provided in supplemental Figure S-2.
Amount of ESS explained. Another facet of the generalizability of a cluster solution concerns the match between the data and the classification solution, and the degree to which a solution accounts for the variability in the data. A 7-cluster solution accounted for about half of the variability in motivation beliefs for each sample half (48.70% for the A sample and 48.84 for the B sample using Ward’s and 54.53% (A) and 53.69 (B) using k means). This represents good correspondence between the classification system and the data and provides evidence of the generalizability of the results.
Using dropped items to establish validity of clusters. The validity of the cluster solution was checked by testing for differences between clusters on theoretically-related constructs. Patterns were compared using items that were dropped when forming the scales. Two subjective task value items cross-loaded on interest, utility, and attainment and were dropped from the scales. Patterns characterized by higher values on these scales should and did have the highest means on these dropped items. A full factorial multivariate analysis of variance (MANOVA) was conducted with cluster membership as a between-subjects factor. As expected, results indicated that three patterns with the same high levels of interest, utility, and attainment value did not differ from one another on these dropped items, F (2,757) = 1.49, p = .226. A second MANOVA with posthoc tests performed with a Games-Howell correction indicated that the three high patterns differed significantly in the expected direction from patterns characterized by lower subjective task value, F (12,3670) = 51.20, p = .000 (F ratio is Wilk’s approximation).
Table S-1
Distance between paired cluster k means centroids from A and B sample at Time 1Cluster Pair / Average Squared Euclidian Distance
1 / 0.023
2 / 0.030
3 / 0.065
4 / 0.075
5 / 0.091
6 / 0.100
7 / 0.208
Mean: 0.084 / Maximum: 0.208 / Minimum: 0.023
Table S-2
Missing data and imputation analysis for “A” (n=935) and “B” (n=935) samplesA / B
Number of cases with complete data / 916 / 928
Number of imputed cases / 13 / 3
Number of cases where no twin was found / 1 / 0
Number of cases with missing values in too many variables / 5 / 4
Total Cases / 935 / 935
Note. Values were imputed for up to 2 of the 8 clustering variables when a twin was found that met the standard similarity criterion of .50, based on standardized values.
Figure S-1. Centroid matches for 7-cluster solutions for A and B samples at Time 1.
Note. Int = interest value; util = utility value; attn = attainment value; cost = cost value; map = mastery approach goals; pap = performance approach goals; pav = performance avoid goals; eff = competence beliefs (efficacy).
Figure S-2. Centroid matches for 7-cluster Time 1 solutions across methods and samples.
Note. Int = interest value; util = utility value; attn = attainment value; cost = cost value; map = mastery approach goals; pap = performance approach goals; pav = performance avoid goals; eff = competence beliefs (efficacy).