Computer Use for Statistics

Computer Use Classes Accompanying Statistics Lectures 2000-2009:

Coordinator: Prof.Alan PickeringTutor: Dr. Ian Tharp

Weeks 16 -19 Autumn Term: Some Miscellaneous Techniques

Time and venue: Monday 4-5.15 pm in WB304 (Psychology Department Computer Classroom) for all students.

For students on the following courses:

Computer Use (PS71021A) MSc Research Methods in Psychology

Computer Use (PS81021A) Research Training for 1st Year PhD students

Statistics (PS71042A) MSc Cognitive and Clinical Neuroscience

WEEK 16: POWER AND SAMPLE SIZE DETERMINATION

Learning Outcomes:

After the session students should be able to:

Use SPSS to compute simple indices of effect size.
Use SPSS carry out approximate power calculations in a variety of ways.
Use SPSS to determine required sample sizes, for desired levels of power.
Use SPSS to printout observed power.

Dataset To Be Used:

If you have access to the Goldsmiths computer network then the data and syntax files can be downloaded from the J drive:

J:\psycholo\APstats\comp_classes\power.sav

J:\psycholo\APstats\comp_classes\power_syntax.sps

or these files can be downloaded from the web by clicking and following the relevant links at the following URLs:

The dataset contains data from a hypothetical experiment with 40 subjects in two groups (N=20 per group). Performance is recorded for a single dependent variable (SPSS variable = dv). The group variable is called group.

Specific Tasks and Questions:

(i)Let's start by doing the worked sample size calculation example from Campbell et al (1995) for a between-subjects 2-group design. The minimum mean between-groups blood pressure difference of interest was 5 mm Hg. Enter this value in the meandiff variable. The standard deviation of blood pressure in each group was expected to be around 17 mm Hg. Enter this as the sigma variable.

What (approximately) is the effect size, Cohen's d?

(ii)The approximate formula for m, the number of subjects required per group, for a two-tailed test with equal numbers per group, is given by:-

m = 2*(z[1- /2] + z[power])2/ d2 where d is the measure of effect size and z is the inverse cumulative normal distribution function (or probit function).

Suppose we want power=0.8 and =0.05. Enter these values in the appropriate variable names (power and alpha). Next, we need to use the first 3 COMPUTE statements in the syntax file to calculate the values of z[power] and z[1- /2]. The formulae put the results of the calculations into the variable names zpower and zalpha. The formula for m (the required sample size in each group) puts the result into the variable m.

Does it agree (roughly) with the value calculated by Campbell et al?

Why is there a small difference?

How many subjects do we need in total for 80% power?

What change would you need to make to the formula for a 1-tailed test and why?

(iii)It is actually expected that there will be an allocation ratio (r) of 2; i.e. there will be 2 subjects recruited in 1 group for every 1 subject in the other group. Enter the expected value of r into the variable name. The 4th COMPUTE formula in the syntax executes formula (1) from the Appendix of Campbell et al, to calculate the value of m'. The results is stored in a variable mdash.

How many subjects would we need in total for this allocation ratio?

Is it more or less than the number of subjects needed for equal size groups (r=1)?

(iv)Next we are going to calculate power in several different ways. Suppose we actually have only 75 participants in each group of the blood pressure experiment from Campbell et al. We can calculate the power for a two-tailed t-test for this number of subjects in various ways. First, we must enter the sample size to be tested in the variable mtest. Let's suppose we have a chance to test 75 participants in each group of the Campbell et al study. Enter 75 for mtest. We are going to calculate what power such a sample size would give us to detect the effect size of 5/17 (the expected mean difference divided by the expected standard deviation in each group).

(v)Method 1: Calculating Power Using an Approximate Formula. An approximate formula for calculating power (based on the Campbell et al article, although the formula is not actually given in the article), assuming equal numbers of subjects (given by mtest) in each of the two groups and a two-tailed test, is as follows:

power = z-1(d*sqrt(mtest/2) - z[1-/2]) where abbreviations are as before but z-1 is the cumulative normal distribution function.

This formula is stored as the 5th COMPUTE command in the syntax file and when you run it it will store the calculated power value in a variable called power1.

(vi)Method 2: Calculating power using the noncentrality parameter (delta as in Howell). For this design, delta = d*sqrt(mtest/2). Use the 6th COMPUTE statement in the syntax file to calculate the variable delta. Using Howell's power table, look up the power associated with the observed value of delta.

What value do you get from the table?

Is this a similar value to the value you calculated previously as power1?

(vii)If Howell's table isn't to hand, we can use SPSS to calculate the power for a given delta. First, you need to calculate the critical value of the t-statistic (at alpha=0.05, two-tailed) for your design using SPSS's IDF.T function.The IDF.T function requires a p-level (alpha) and degrees of freedom. The degrees of freedom for a between-groups t-test (with mtest=75 participants per group) is (2*mtest -2)=148. Enter this value as the variable tdf. If you run the 7th COMPUTEstatement in the syntax file this will calculate the critical t value and stores the result in a variable called tcrit.

Why does the syntax formula for tcrit use (1-alpha/2)?

(viii)You then need to use SPSS's tabulated non-central t-distribution function NCDF.T(q, df, nc) where q is the critical value of the t-statistic at the 0.05 significance level for a two-tailed test, for the entered degrees of freedom, df, and the noncentrality parameter nc (the value delta that you have just calculated). The NCDF.T function returns a probability. If you run the 8th COMPUTE statement in the syntax file this will calculate the power and store it as a variable power2.

Does power2 give the same answer as using the table or power1?

Why does the formula for power2 use (1-NCDF.T)?

(ix)Next we consider “Observed Power”. What does the observed power tell us?

(x)Using the dv and group data from the dataset, work out the observed mean difference between the two groups and enter it into the diff_obs variable.

(xi)What (approximately) is the standard deviation in each of the two groups? Enter the value in the sigma_ob variable.

(xii)Enter the observed number of subjects, in each group, in the variable m_obs. Calculate the approximate observed power for the dv data, using the 9th COMPUTE command in the syntax file which uses the Campbell et al approximate formula once again, and stores the result in a variable named obspower.

(xiii)Check: We can check the observed power for this result using the SPSS GLM:Univariate procedure with "Observed Power" selected under the Options tab.

Why can we use this ANOVA procedure to check the result when our earlier calculation was based on a t-test?

Is the result similar to that calculated as obspower?

Homework

Check out these results using special purpose power calculation software such as GPower.