Model Answers - 2004 Stats exams

Section A: BSc; MSc Occ Psy; MSc RMIP

Q1

(i)The dummy coding variable tdumcod1 compares the teacher coded 1 (teacher 1) with the teacher coded 0 in both dummy coded variables (teacher 3). Similarly tdumcod2 compares teacher 2 with teacher 3. These contrasts are used when one level of the factor is a natural/logical choice as the benchmark against which opther levels are to be compared. This would be relevant here if teacher 3 was the best-qualified of the 3 in the centre. Effect coding works differently. The effect coding variable teffcod1 compares the teacher coded 1 (teacher 1) with the mean of all teachers. Similarly, teffcod2 compares teacher 2 with the mean of all teachers. Nothing can be concluded directly about teacher 3 from this scheme. Getting the two coding schemes the wrong way round gets 0 marks.

(ii)It follows directly from the above that the dummy-coded printout reveals teacher 1 to be rated significantly differently in terms of satisfaction than teacher 3, while teachers 2 and 3 do not differ. (Teacher 1 has higher mean.) The direction of the effect can be worked out from the coding scheme but it is safer to look at the means for each teacher (not given in printout). Similarly, the effect coding output shows that teacher 1 has a satisfaction rating that is significantly different from (higher than) the mean of all teachers, whereas teacher 2 does not.

(iii)The chi-squared distribution. At the 0.001 level. With df=number of predictor variables (=5).

(iv)

  • what the researcher can conclude: The researcher would conclude that the predictors he used (teacher of the class; class difficulty; student experience; and size of the class), when considered as a group, were able to predict satisfaction ratings to a statistically significant degree. The Model R2 and the overall model Anova stats show this. Over a quarter of the variance in satisfaction ratings could be predicted from these variables. In addition, the predictors which make an independent contribution to the prediction of satisfaction, over and above the contribution of the other predictors in the model, were having teacher 1 relative to teacher 3 (t=3.6, p<0.001) and the difficulty of the class (t=-4.2, p<0.001). We have already commented on the direction of the teacher effect. The difficulty effect is such that more difficult classes were associated with lower satisfaction after taking account of the other variables in the model. The predictors which did not make an independent contribution to the model were: teacher 2 vs. teacher 3; the size of the class; student experience.
  • statistical concerns: Few obvious problems: sample size is enough for the number of predictors to look at the whole model and the individual predictors. Distributions of the predictors (based on means, range and s.d.) do not show any obvious very skewed variables or any illegal values. In fact, on the info presented there is really only one concern. There is a (marginal) indication of a collinearity between the rating of class difficulty and the experience of the student (variable=howlong). The bivariate correlation between these two predictors was -0.84, which is close to one conventional cut-off (-.90 or 0.90). This is further confirmed by the Collinearity statistics (tolerance=0.292 for these two variables, compared with the cut-off of <0.25, which is often recommended). However, these are only rough guides and the values are close enough to cause some alarm bells to sound. A general answer about all the things that might be wrong with the analysis should not get much credit here.
  • what to do about possible collinearity: The best approach is to start by rerunning the regression as two separate models with one including difficulty as one of the predictors but not experience, and the other including experience but not difficulty. These 2 regressions together would confirm the pattern of independent prediction for these two predictors (difficulty predictive, experience not predictive) in models without the presence of a potentially distorting collinear variable. Often it is suggested one can combine collinear variables in regression into a single variable (by standardising and then summing the two variables). Here one could do that but it would be less informative than the approach above, and requires care as one has to reverse the direction of one of the two variables (which are negatively related) before summing. (This would be done by multiplying one of the standardised variables by -1.) This combining approach might be used if, when the two regressions re run as suggested above, both difficulty and experience make independent contributions in their respective models. This is possible given the fact that both are significantly correlated with satisfaction in the simple correlations table. Then one wouldn’t know if it were difficulty or experience or a combination of the two that was important (and so combining them makes sense). So just a bland statement to combine the variables (or delete one of them) doesn’t get a lot of marks for this section.

(v)The data are clustered in the sense that respondents are drawn after a particular class (30 classes, with from 3 to 12 in a class). The ratings of participants in the same class are less likely to be independent of one another than participants from separate classes, as those of persons in the same class may be to be influenced by class-specific factors. These factors would be something that makes the particular class satisfying or not satisfying, and especially those factors that are unrelated to the usual predictors affecting satisfaction and the other predictors included in the model.

Q2

(i)

  • Logistic regression analyses are a hierarchical model-fitting process which has the goal of finding the best-fitting model with the fewest parameters
  • The DV is called cansee (3-levels) and there are 3 predictors (age and gender as 2-level factors; plus a 4-level covariate based on extraversion quartile score).
  • The first model could be a complete model, which would require: the 3 main effects of extraversion (2 parameters), gender (2 parameters) and age (2 parameters); plus 2-way interactions (extraversion x gender, 2 parameters; extraversion x age, 2 parameters; age x gender, 2 parameters); plus the 3-way interaction (extraversion x age x gender, 2 parameters). This model has a total of 14 parameters, which is rather a lot, especially for only 120 subjects in total. A more likely alternative first model would be to start with a full-factorial model which drops the all 3 of the interaction terms which involve the covariate (and so has only 8 parameters).
  • The first (full factorial) model is compared, using a likelihood ratio test, with a model that has no effects -- a so-called intercept-only model. This tests whether removing all the effects in the model results in a significant deterioration in the fit. This test was likely to have been significant showing that some of the effects contained within the first model are significant predictors. Other likelihood ratio tests also assess the deterioration in fit between the saturated model and the first model. A saturated model has as many parameters as there are freely varying data points to be fit. In this case this equates to treating the extraversion quartile score as a factor with 4 levels and generating a model with all the possible 2- and 3-way interactions in it (which means that every effect involving extraversion requires 3 times as many parameters as before). This gives the saturated model 30 parameters in total.
  • More importantly, likelihood ratio tests explore the effects of removing specific effects from within the first model. These tests compare the model with the effect with the model after removal of the effect. A nonsignificant test outcome means that the effect under test can be safely deleted from the model without a significant deterioration in fit. If we assume that the first model fitted was a full-factorial model, likelihood ratio tests of this kind could be applied to test the age*gender and extraversion effects (the age and gender effects cannot be removed because they are nested under the interaction term). The extraversion effect is retained in the final model in the printout and so the likelihood ratio test on the first model must have indicated that extraversion could not be removed from the model without a significant deterioration in fit (a significant likelihood ratio test). The age*gender interaction term does not appear in the final model and so must have been able to be safely removed (a nonsignificant likelihood ratio test)
  • The next step would be to fit a main effects model (i.e. removing the age*gender interaction from the first model). The likelihood ratios here must have indicated that only the age group factor can be safely removed from the model. The third and final step would then be to fit the model as in the printout with only the extraversion and gender main effects retained. The model has four parameters (2 each for the two main effects) and the saturated model against which its goodness of fit test is conducted treats extraversion as a factor and includes the gender*extraversion effect (and so has 14 parameters in total). The goodness of fit test statistics thus have 14-4=10 df.

(ii) In line with part (i) above, the final model printout shows that extraversion and gender main effects combined provide a significant model (model-fitting info stats). The goodness of fit stats also tell us that the saturated model, with 10 further parameters, does not provide a statistically better fit than our final 4-parameter model with extraversion and gender main effects only. The likelihood ratio tests confirm that there is a significant deterioration in fit if either of the 2 main effects is removed from our model. Finally the parameter estimates table tell us that the odds of always seeing the “magic eye” images, relative to never seeing them, increases about 1.5 times for every quartile increase in extraversion score (an effect which is only a trend, p=0.09). This odds ratio has 95% confidence intervals which embrace chance (1) at the lower end (0.938 to 2.476). The odds for sometimes seeing the images relative to never seeing them decrease by a factor of 0.776 for every quartile increase in extraversion score but this effect is not significant (p>0.25, confidence limits 0.5 to 1.2). For gender, the odds of always seeing the images relative to never seeing them is only about 1.1 times higher for males relative to females (not different from chance=1); however, the odds of sometimes seeing the images, relative to never seeing them, is only about one fifth for males relative to females, which is a significant odds ratio (p=0.002; CL=0.08 to 0.56). These observations fit with the frequency counts in the cross-tabulation printouts.

(iii) As is clear from (ii) above the odds ratios tested in the parameter estimates table (using cansee as DV) were for “always sees” relative to “never sees” and for “sometimes sees” relative to “never sees”. This doesn’t allow the researcher to look at the comparisons for “always sees” relative to “sometimes sees”. By recoding the cansee variable, “sometimes sees” becomes the reference level (as it has the highest numerical value).The bottom part of the parameter estimates table is essentially unchanged (except that the ratios are computed the other way up, ie “never sees” relative to “sometimes sees” rather than “sometimes sees” relative to “never sees”). The top half of the table gives the researcher what he wants: namely odds for “always sees” relative to “sometimes sees”. For every quartile increase in extraversion score this ratio increases by a factor of 2 (a significant effect, 0.003, CL 1.27 to 3.05). The ratio for males is about 5.4 times higher than it is for females (another significant effect: p=0.001, Cl 2.02 to 14.53). So this tells us that the extraversion effect of seeing magic eye images is located mainly in the comparison between those who always see the images relative to those who sometimes see them (comparing these two response outcomes extraverts display a tendency to “always sees” to a greater extent than introverts). Independent of this effect there is an effect of gender on the “always sees”:”sometimes sees” ratio (of these 2 response outcomes males display “always sees” responses to a greater extent than females). This is in addition to an effect of gender on the “sometimes sees”:”never sees” ratio (of these two response outcomes females display “sometimes sees” to a greater extent than males).

Q3

(i)[30 marks] In all cases a good answer should consider the specifics of the particular study and data provided in relation to the issues below

  • He would want to carry out a thorough screening of the data to ensure that it is broadly normally distributed (frequencies, graphs etc.) with no univariate, bivariate or multivariate outliers (e.g. frequencies, scatter plots etc.) Normality not required but helps get clearer solutions
  • He would also check for illegal values (e.g. <50 or >150).
  • He should consider collinearity and singularity by reviewing correlation matrix (.90 bivariate a problem; multicollinearity based on high SMC values for each variable predicted by all the others).
  • Factorisability of the correlation matrix -- a rule of thumb is that one needs bivariate correlations above 0.3, also should look for low partial pairwise correlations partialling out all other variables and can use KMO Measure of sampling adequacy (Bartlett’s sphericity test not much use for testing factorisability)
  • Answer must mention the requirements of sample size, noting that there is no single opinion on this matter (e.g. Comfrey & Lee = a minimum of 300 cases for a good factor analysis or ratio of cases to variables - Nunnally 10:1, Guildford 2:1, Barrett &Kline find 2:1 replicates structure while 3:1 is better).
  • Should mention ratio of Variables to Factors (as above e.g. Tabachnick & Fidell 5 or 6:1; Klein 3:1; Thurstone 3:1; Kim & Mueller 2:1).
  • Should mention whether to use listwise, pairwise (to be avoided) or imputation. Always listwise if numbers allow. Good answer may mention different forms of imputation (regression, mean).
  • He should hypothesise about the relationships between the items (i.e. the factors) a priori on the basis of the items used.
  • Decide on FA or principal components –give differences and reason for choice (should note that PCA doesn’t have underlying factor theory)
  • If FA, then should comment on the EXTRACTION methods available – bottom line is that it doesn’t really matter though – may suggest trying all and going for the most interpretable solution. (Could include Maximum Likelihood, Unweighted Least Squares, Generalised or Weighted Least Squares, Alpha Factoring , Image factoring) -ULS was used in this example
  • Have to decide how many factors to retain – should mention at least two approaches from Kaiser (eigenvalue approach), scree plots, hypothesis testing, interpretability (find solution that makes most sense) or significance testing (if using ML or LS)
  • Should note the trade off between number of factors and variance explained.
  • Should explain the use of the communalities to identify variance explained in each variable and what to do if it is too low (e.g. remove the variable or increase factors)
  • Should explain need for rotation and the basic choice or orthogonal or oblique. Should refer to the factors and whether we would expect them to be related. Should justify decision (e.g. orthogonal more easy to interpret, oblique more appropriate/realistic/useable) but choice to try both is acceptable or to start with an oblique method and if the best rotation has an angle between the factors that is close to orthogonal then this suggests an orthogonal solution will be OK
  • May comment briefly on different orthogonal and oblique methods, explaining the differences and choosing between them.
  • Should comment that there is a choice of factor score computation methods although needn’t give details -- often just adding up the standardised scores on the high loading items works well.

(ii)[20 marks] In relation to the conclusions that a researcher can draw from factor analyses, the main answer should focus on the understanding of the structure of the domain under interrogation. Credit should be given for anything said in answer to part (i) which should have been included here. Here, a mention of PCA vs. FA differences (in relation to underlying causal model) is worth making or repeating. The answer should reflect on the SPSS printout for the current data and conclude there are 3 factors -- and these might be labelled, sensibly one would hope. Talk about the amount of variance (of what kind -- depending on PCA / FA) the factors explain. Talk about how one might use the factor score coefficients to produce factor scores for each participant in future analyses. Good responses will also take into account the fact that the FA can only produce what is put in, and so might ask whether the researcher is likely to have used a wide enough set of items (eg other trait descriptors might reveal other factors -- in fact real research evidence suggests at least 2 more factors of this kind -- the Big 5 model)