Session 7: Week 8.

Normal Distribution

Today’s lab will continue with the subjective analysis of normality which you were introduced to in last week’s lab. Furthermore, this subjective assessment of graphically represented data will be extended into a more objective analysis at the quantitative level. You will also conduct the two forms of Chi2 analysis which we discussed in your lecture in week 6.

TASK 1: Assessment of Normality

At the end of the 2005-2006 term, a lecturer is keen to establish whether the ‘average’ mark for their research skills module is similar to that achieved by other students in previous years. He therefore enters the data from the current and previous (i.e. 2004-2005) year into an SPSS data-file (available here).

However, in order to establish which measure of central tendency to use, he must first assess whether the data is normally distributed. To begin with, the lecturer presents all of the data together (i.e. irrespective of year group) on a frequency chart to visually assess its distribution.

-You can do this the same way as last week, click ‘ANALYZE-DESCRIPTIVE STATISTICS-FREQUENCIES’ and then transfer the variable of interest (i.e. percent) into the box on the right hand side. Use the ‘Charts…’ button to create a histogram and check the box to add a normal curve. You can also use the ‘Statistics…’ button to include the mean, median and mode in your output for more quantitative analysis. Uncheck the ‘Display frequency tables’ icon then click ‘Continue’ and ‘OK’.

Next, the lecturer decides to split the data into two groups according to academic year and also to gain some information in relation to skewness and kurtosis (recall lecture in week 4). To do this, use the ‘ANALYZE-DESCRIPTIVE STATISTICS-EXPLORE’ function and then transfer percent into the dependent list and put year in as a factor. Then use the ‘plots…’ button to select histogram rather than stem and leaf. You can now use the descriptives table to assess the mean and the median for each specific group. In addition, the scores for skewness and kurtosis can be interpreted as follows:

Skewness

A score of zero infers a perfectly normal distribution

Negative scores infer a negative skew

Positive score infer a positive skew

Kurtosis

A score of zero infers a mesokurtic curve

Negative scores infer a platykurtic curve (too flat)

Positive score infer a leptokurtic curve (too pointly)

The more each score deviates from zero, the more the curve deviates from a normal distribution.

Using the available data in your descriptives table along with the figures below, decide whether you think these two data-sets are normally distributed.

The lecturer still feels that he needs a more objective analysis of whether the data is normally distributed. Therefore he decides to conduct a statistical test to examine whether there is a significant difference between the mean and the median (remember that the mean and the median will be the same for perfectly normally distributed data). To do this, simply repeat your previous analysis but make sure to check the ‘normality plots with tests’ box in the ‘EXPLORE-PLOTS’ window.

Your output will now also have an additional table labelled ‘Tests of Normality’. The value you are interested in is the Shapiro-Wilk significance level. This is the P-value we discussed in week 6, i.e. we require P £ 0.05 to conclude a significant difference.

Therefore,

P £ 0.05:

mean and median are significantly different (i.e. non-normal dist.)

P 0.05:

mean and median are not significantly different (i.e. normal dist.)

Decide which level of central tendency the lecturer should use to compare the two year groups.

SAVE YOUR WORK

Save your SPSS outputs to your ‘H:\studyskills’ folder so you can refer back to them for revision purposes.


TASK 2: Tests of Difference for Categorical Data

Recall from your lecture in week 6 that categorical variables (nominal data) can be coded (i.e. athlete = 1, non-athlete = 2) to allow quantitative statistical analysis using either the ‘goodness of fit’ or ‘contingency’ Chi2 test.

Goodness of Fit χ2 test

The example given in your lecture was that of a researcher who was recording how many males and how many females use a leisure centre. Click here to access the data from your lecture.

Now use the following link to see how to conduct a goodness of fit χ2 test.

Interpret the P-value to determine whether there is a gender bias in the number of people using the leisure centre.

Finally, can we be sure that this study meets all the necessary assumptions for χ2 analysis (covered in lecture)?

Contingency χ2 test

An example of a contingency χ2 test was provided in your earlier lecture in relation to a researcher who was trying to establish whether supplement use was more prevalent among athletes than among non-athletes. The data used to make this example is provided here.

You can now use the following link for instructions on how to conduct a contingency χ2 test.

Remember that it is the Pearson Chi-Square significance value which is relevant to this test.

Are supplements used equally by the athletic and non-athletic poulations?

SAVE YOUR WORK

You might like to save your SPSS output to your ‘H:\studyskills’ folder so you can use it for later reference.