Scientific Practice Exam Question Bank, Sem 2, 2015-16

Section A (explanation, 48%)

1) Explain the basis and significance of 3 of the following 5 termsin relation to the practice of science in general and healthcare science as appropriate; include definitions as appropriate. Each question is worth 16%.

  1. - One-way ANOVA
  2. - post hoc testing
  3. - correlation
  4. - simple linear regression
  5. - correlation vs regression
  6. - r vs r2
  7. - r2 and significance in linear regression
  8. - 2-way ANOVA
  9. - ANOVA and interaction
  10. - blocking factors in 2-way ANOVA
  11. - non-linear regression
  12. - stepwise multiple regression
  13. - chi-square
  14. - two-way chi-square
  15. - Mann-Whitney U
  16. - Wilcoxon Signed Rank
  17. - Spearman’s Rank Correlation coefficient
  18. - parametric vs non-parametric tests
  19. - qualitative research methods
  20. - the audit cycle
  21. - types of healthcare audit

Section B (interpretation, 26%)

Answer any 2 from 3. Each question is worth13%.

1) - One-way ANOVA

A comparison was made of four semi-automated machines (A,B,C & D) used to determine the amount of chloropheniramine in tablets. A sample of ten tablets was tested on each machine with a view to establishing whether there were any significant differences in the amounts as determined by the four machines. The data given below has been analysed as a one-way analysis of variance using Minitab.

Amount of chloropheniramine (mg)

Machine A / Machine B / Machine C / Machine D
4.13 / 4.00 / 3.98 / 4.00
4.10 / 4.02 / 3.86 / 4.02
4.04 / 4.01 / 3.82 / 4.03
4.07 / 4.01 / 3.93 / 4.04
4.05 / 4.04 / 4.00 / 4.02
4.04 / 3.99 / 3.82 / 3.81
3.99 / 4.03 / 3.98 / 3.91
4.06 / 3.97 / 3.99 / 3.96
4.10 / 3.90 / 4.00 / 4.05
4.04 / 3.98 / 3.93 / 4.02

One-way Analysis of Variance

Analysis of Variance

Source DF SS MS F P

Factor 3 0.08657 0.02886 8.24 0.000

Error 36 0.12614 0.00350

Total 39 0.21271

Individual 95% CIs For Mean

Based on Pooled StDev

Level N Mean StDev --+------+------+------+----

1 10 4.0620 0.0399 (-----*-----)

2 10 3.9950 0.0398 (-----*-----)

3 10 3.9310 0.0726 (-----*-----)

4 10 3.9860 0.0746 (-----*------)

--+------+------+------+----

Pooled StDev = 0.0592 3.900 3.960 4.020 4.080

a)Explain why this is a ‘one-way’ analysis of variance.

b)Explain why the analysis suggests that there are differences between the machines.

c)Why is the p-value reported as 0.000?

d)What is the test statistic and how is it calculated?

2) two-wayAnova

TV watching is a sedentary pastime which may affect health, so a student carried out a surveyof our TV-watching habits for his final year project. His data on the duration of TV watching per week (h) in relation to age group and gender is presented in the table below.

Duration of TV watching per week (h) by age group and gender; F: female, M: male; age group intervals are given in years

Rows: Gender Columns: Age Group

20-25 26-55 56+

F 25 32 44

21 26 45

27 33 50

26 33 43

31 28 51

M 20 23 33

27 21 34

20 24 38

22 28 33

28 26 37

a)What are the two factors involved? How many levels does each factor have? What is the number of treatment combinations? How many replicates did the student use for each treatment combination? What is the total number of observations he collected?

The data were analysed using analysis of variance. The computer output of this analysis is in the table immediately below. One p-value has been replaced by an asterisk.

Analysis of variance (ANOVA) for TV watching per week (h)

Source DF SS MS F P

Age Group 2 1486.87 743.43 69.48 0.000

Gender 1 340.03 340.03 31.78 0.000

Age Group*Gender 2 103.27 51.63 4.83 *

Error 24 256.80 10.70

Total 29 2186.97

Treatment Means for duration of TV watching per week (h)

Rows: Gender Columns: Age Group

20-25 26-55 56+ All

F 26.00 30.40 46.60 34.33

M 23.40 24.40 35.00 27.60

All 24.70 27.40 40.80 30.97

b)Using the output in the ANOVA table above, draw conclusions from hypothesis tests for any effect of Age Group, Gender, and the interaction between the two on the duration of TV watching per week (h). Use the p-value if it is available and critical values for F otherwise (one P-value has been replaced with an *).

3) - post hoc testing

Four cages of rats were kept on a particular diet. After two weeks, their weights were recorded (see table below). To test if there were there differences in weights (in grams) between cages, a one-way ANOVA was carried out. The p-value was 0.0005

a)What is the Null Hypothesis for the experiment?

b)Why is post-hoc testing necessary within ANOVA?

c)Under what circumstances is it not carried out?

d)In terms of post hoc testing which is more appropriate for the above data, Tukey or Dunnett, and why?

4) - correlation

Explain how the interpretation of correlation analysis can be affected by reverse causation and third factors. Illustrate your explanation with examples.

5) - blocking factors in 2-way ANOVA

The effects of 3 drugs on blood pressure in 10 subjects were investigated. The data and ANOVA analysis are shown below…

ANOVA: b.p. versus subject, drug

Factor Type Levels Values

subject fixed 10 1, 2, 3, 4, 5, 6, 7, 8, 9, 10

drug fixed 3 1, 2, 3

Analysis of Variance for b.p.

Source DF SS MS F P

subject 9 5530.80 614.53 21.71 0.000

drug 2 370.40 185.20 6.54 0.007

Error 18 509.60 28.31

Total 29 6410.80

a)Use this experiment to explain what is meant by a ‘randomised block design’.

b)Describe another approach to investigating these drugs, including the number of subjects required.

c)Why is there no ‘interaction’ effect in the analysis?

6) Linear Regression

A study of lung function in boys aged 10 to 15 looked at the relation between Forced Expiratory Volume (FEV) (litres) and the height (cm) for a group of 25 boys whose heights ranged from 134 cm to 178 cm.

The regression analysis from Minitab gave the following:

FEV = -5.264 + 0.051 height

PredictorCoefStdevt-ratiop

Constant-5.26412.6381.9950.090

height0.05050.01932.6130.013

s = 1.870R-sq = 24.4%

Analysis of variance

SOURCEDF SSMSFp

Regression 1 26.02426.0247.440.013

Error23 80.453 3.498

Total24106.477

(i)Explain what the gradient of the regression line means in the context of this study.

(ii)Explain the meaning of ‘R-sq’ in the context of the question.

It was then suggested that a better idea would be to fit a multiple regression of FEV on height and age.

The results were…

FEV = -4.892 + 0.042 height + 0.480 age

Predictorcoefstdevt-ratiop

Constant-4.89212.1422.2840.046

Height 0.04230.02121.9950.063

Age 0.48010.25141.9100.059

S = 1.881R-sq = 26.9%

SourceDFSSMSFp

Regression 2 28.64314.3214.0480.043

Error22 77.834 3.538

Total24106.477

(iii)Explain whether the multiple regression equation is, or is not, an improvement on the simple linear equation.

7) Multiple Regression

A study of a coast whose estuaries were becoming silted up because of the growth of a grass Spartinamaritima looked at the analysis of the water at various points together with a measure of the aerial mass of spartina (gm-2) (‘biomass’). The important independent variables were found to be the pH and the potassium content (ppm)(K) of the water.

A multiple regression of biomass on pH and potassium content gave the following…

The regression equation is

biomass = - 488 + 432 pH - 0.687 K

Predictor Coef StDev T P

Constant -487.9 287.0 -1.70 0.102

pH 432.44 66.39 6.51 0.000

K -0.6869 0.2076 -3.31 0.003

S = 363.9 R-Sq = 65.1% R-Sq(adj) = 62.2%

Analysis of Variance

Source DF SS MS F P

Regression 2 5924821 2962410 22.37 0.000

Residual Error 24 3178646 132444

Total 26 9103467

Source DF Seq SS

pH 1 4474720

K 1 1450101

(i)According to the equation what is the effect of increasing the pH by one unit while not changing the potassium level?

(ii)Give an interpretation of R-Sq

(iii)Is it apparently useful to include both the variables, pH and potassium in the regression model? What would you expect to happen to R-squared if we regressed biomass on just the single independent variable pH?

8) - non-linear regression

For her final year project, a student investigated growth in a high-temperature extremophile bacterium. The growth rate graph (see below) led the student to analyse the data with three different regression models (Minitab output reproduced below).

1)Regression Analysis: Growth versus Temp, TempSq

The regression equation is

Growth = 1.25 + 0.084 Temp + 0.0106 TempSq

Predictor Coef SE Coef T P

Constant 1.255 3.955 0.32 0.760

Temp 0.0841 0.1652 0.51 0.626

TempSq 0.010616 0.001463 7.26 0.000

S = 3.36239 R-Sq = 99.4% R-Sq(adj) = 99.3%

Analysis of Variance

Source DF SS MS F P

Regression 2 13524.5 6762.2 598.13 0.000

Residual Error 7 79.1 11.3

Total 9 13603.6

Source DF Seq SS

Temp 1 12929.4

TempSq 1 595.1

2) Regression Analysis: Growth versus Temp

The regression equation is

Growth = - 22.1 + 1.25 Temp

Predictor Coef SECoef T P

Constant -22.101 6.271 -3.52 0.008

Temp 1.2519 0.1011 12.39 0.000

S = 9.18031 R-Sq = 95.0% R-Sq(adj) = 94.4%

Analysis of Variance

Source DF SS MS F P

Regression 1 12929 12929 153.41 0.000

Residual Error 8 674 84

Total 9 13604

3) Regression Analysis: Growth versus TempSq

The regression equation is

Growth = 3.08 + 0.0113 TempSq

Predictor Coef SE Coef T P

Constant 3.084 1.572 1.96 0.085

TempSq 0.0113423 0.0003124 36.30 0.000

S = 3.20293 R-Sq = 99.4% R-Sq(adj) = 99.3%

Analysis of Variance

Source DF SS MS F P

Regression 1 13522 13522 1318.05 0.000

Residual Error 8 82 10

Total 9 13604

a)Explain why the student used these three different regression models.

b)Which regression model is best, and why?

c)What is the final regression equation that should be reported in the project write-up, and why?

d)Suggest a better way to approach the regression analysis.

9) - two-way chi-square

A clinical audit of knee joint replacement operations last year looked at the relative performance of two hospitals (A and B). The outcome of the operation was classified as…

  1. no improvement
  2. partial function restoration
  3. complete function restoration

The following results were found…

Hospital / A / B
No improvement / 18 / 21
Partial restoration / 28 / 56
Complete restoration / 45 / 38

Data were analysed in Minitab and the (edited) output appeared as follows...

A B Total

1 18 21 39

17.23 21.77

0.035 0.027

2 28 56 84

37.11 46.89

2.235 1.769

3 45 38 83

36.67 46.33

1.895 1.499

Total 91 115 206

(test statistic) = 7.460, DF = 2, P-Value = 0.024

a)What type of test is being applied here?

b)Specifically, what does the value 17.23 represent?

c)Specifically, what does the value 0.035 represent?

d)Using the appropriate table, what is the critical value for the test statistic?

e)Interpret the p-value in terms of the question being posed in the audit.

10) - two-way chi-square

A recent report on the effect of smoking among teenagers gave the following results for 54 smokers who had been smoking for at least two years compared with 54 non-smokers.

Cough / No cough
Smoker / 27 / 27
Non-smoker / 13 / 41

Data were analysed in Minitab and the (edited) output appeared as follows...

Cough No cough Total

1 27 27 54

20.00 34.00

2.450 1.441

2 13 41 54

20.00 34.00

2.450 1.441

Total 40 68 108

(test statistic) = 7.782, DF = 1, P-Value = 0.005

a)What type of test is being applied here?

b)Specifically, what does the value 34.00 represent?

c)Specifically, what does the value 1.441 represent?

d)Using the appropriate table, what is the critical value for the test statistic?

e)Interpret the p-value in terms of the question being posed in the audit.

Section C (calculation, 26%)

Answer any 2 from 3.Show your calculations where appropriate. Each question is worth13%.

1) - simple linear regression

A stability monitoring programme for a pharmaceutical product involved measuring “related substances” in the product at the time periods (in months), shown in the table below. The product was stored at 25C and 60% relative humidity. The measurements were made by HPLC with a UV spectrophotometer detector.

Time (months) / 0 / 3 / 6 / 9 / 12
Related Substances (%) / 0.149 / 0.185 / 0.245 / 0.297 / 0.320

The following scatterplot was produced, and the MINITAB regression analysis is shown afterwards:

a)What does R2=98.3% tell you?

b)State in words what the Time Coefficient’s p-value and actual value imply.

c)Find the 95% Confidence Interval for the gradient of the line (Time Coefficient).

d)It is possible to calculate a 95% confidence interval for the intercept, but explain why this might not be a sensible thing to do in the context of the experiment.

2) - ANOVA and interaction

An experiment was carried out to investigate the retention of ascorbic acid in frozen orange juice under storage conditions. Three storage temperatures –10C, –20C and –30C and four storage times 2, 4, 6 and 8 weeks were compared. A three by four factorial design was used with each treatment replicated 3 times.

The following ANOVA table was obtained from Minitab. Some figures in the ANOVA have been replaced by asterisks.

ANOVA

FactorTypeLevelsValues

Tempfixed 3123

Storagefixed 41234

ANOVA for ascorbic acid

SourcedfSSMSFp

Temp * 334.389 167.19 236.04 0.000

Storage * 40.258 13.51 19.08 0.000

Interaction * 34.056 *** ***

Error * 17.000 0.708

Total * 425.927

a)What are the missing df values? (*)

b)What are the missing MS and F values?

c)Determine if the interaction effect is significant.

3) - stepwise multiple regression

A medical student has gathered data from 10 randomly-sampled patients to test the idea that high blood pressure (mmHg) could be explained by plasma glucose concentration (mmol/L) and the Body Mass Index (BMI, kg/m2). The data are shown below.

Blood pressure (mmHg) / Plasma glucose (mmol/L) / BMI
(kg/m2)
68 / 5.7 / 46.2
72 / 7.4 / 23.8
56 / 3.1 / 24.2
84 / 10.2 / 35.5
90 / 8.4 / 29.7
88 / 7.0 / 38.5
80 / 6.6 / 29.0
70 / 5.9 / 39.4
78 / 6.7 / 26.5
72 / 3.6 / 32.0

First, the student fitted a linear regression model to the relationship between blood pressure and the two predictors, plasma glucose and BMI. The MINITAB output is shown below.

Linear regression of blood pressure (mmHg) on plasma glucose (mmol/L) and BMI (kg/m2)

The regression equation is

Blood pressure (mmHg) = 49.4 + 3.74 Plasma glucose (mmol/L) + 0.066 BMI (kg/m^2)

Predictor Coef SECoef T P

Constant 49.44 13.17 3.75 0.007

Plasma glucose (mmol/L) 3.744 1.201 3.12 0.017

BMI (kg/m^2) 0.0655 0.3438 0.19 0.854

S = 7.50470 R-Sq = 58.7% R-Sq(adj) = 47.0%

Analysis of Variance

Source DF SS MS F P

Regression 2 561.36 280.68 4.98 0.045

Residual Error 7 394.24 56.32

Total 9 955.60

Second, the student fitted a linear regression model to the relationship between blood pressure and plasma glucose. The MINITAB output is shown below.

Linear regression of blood pressure (mmHg) on plasma glucose (mmol/L)

The regression equation is

Blood pressure (mmHg) = 51.4 + 3.77 Plasma glucose (mmol/L)

Predictor Coef SECoef T P

Constant 51.423 7.588 6.78 0.000

Plasma glucose (mmol/L) 3.766 1.121 3.36 0.010

S = 7.03818 R-Sq = 58.5% R-Sq(adj) = 53.3%

a)Explain the interpretation of the first, two-predictor regression that caused the student to repeat the regression with a single predictor.

b)Does the second Minitab output suggest that plasma glucose explains any of the variation in blood pressure? Explain your reasoning.

c)What blood pressure would you predict for a person with plasma glucose of 3.0 mmol/L? Would that prediction be reliable and why?

4) - stepwise multiple regression

An article in a walking magazine gives the following method for calculating VO2max as a measure of fitness.

“Walk one mile at your fastest pace. Record the time taken in minutes to the nearest hundredth of a minute. Record your heart rate (beats per minute) at the end of the mile walk. You also need your weight in kilos, your age and your gender. If you are male your gender is 1, if female, it is zero. Then use the following equation to calculate your VO2max.”

VO2max = 132.853 – 0.1693 × weight – 0.3877 × age + 6.315 × gender - 3.2649 × time – 0.1565 × heart rate

a)What statistical technique would have been used to produce this equation?

b)Calculate the VO2max for a female (Mary) aged 45 with a weight of 55 kilos who walked the mile in 18.50 minutes and finished with a heart rate of 115 beats per minute.

c)Mary tries to get fitter and after a couple of weeks repeats the test with her weight and heart rate unchanged, but the time is now 17.50 minutes. What is the corresponding change in her VO2max according to the equation?

d)Assuming that all other variables are kept constant, what is the change in VO2max for a one year increase in age?

e)The magazine gave no indication of the reliability of this method but by tracing the source you find that “R-squared = 82%” Explain what this means.

5) - two-way chi-square

It is suspected that bacteria carried by health workers may be more resistant to antibiotics than those carried by office workers.

Samples of Staph. Epidermis taken by nose swabs from 60 female nurses and a comparable group of female office workers were tested for minimum inhibitory level of ampicillin. The results for the bacteria were classified as sensitive, intermediate and resistant. The numbers in the various categories were:

Sensitive / Intermediate / Resistant / Total
Nurses / 14 / 18 / 28 / 60
Office workers / 36 / 16 / 8 / 60

a)Use an appropriate statistical test to determine whether you think the suspicions above are justified or not.

6) - two-way chi-square

A recent run of a module showed the following results, in which students were classified by whether or not they achieved 40% or over in the examination and also by gender.

Exam mark / Male / Female
Less than 40% / 12 / 6
40% or over / 32 / 55

a)On this evidence is there a difference in examination performance between male and female students?

7) - two-way chi-square

A study of the attitudes of consultant obstetricians and heads of midwifery in British hospitals looked at whether the respondents thought it ethical for a woman to choose to have her baby by Caesarean section (surgery) rather than naturally. (BMJ 2005:331:490-491)

The question asked was ‘Do you believe that women should choose their method of delivery?’ The responses were…

Male consultants / Female consultants / Heads of midwifery / Total
Yes / 254 / 67 / 33 / 354
No / 207 / 117 / 85 / 409
Total / 461 / 184 / 118 / 763

a) Test whether or not this shows a difference of opinion between the groups.

8) - Mann-Whitney U

To compare the number of days spent in indoors by elderly patients in two areas (North and South) of the country, local Hospital Trusts randomly radio-tagged 8 patients in each area. Their movements were monitored throughout the winter to determine the number of days when they did not leave their homes. The number of days spent entirely in the home were as follows:

North / South
100 / 95
107 / 90
275 / 82
96 / 100
102 / 98
110 / 90
292 / 48
106 / 105

a)Use a suitable non-parametric test to decide whether or not the number of days spent at home differs in the two areas for the patients.

b)Explain why a non-parametric test is used here in preference to a t-test.

9) - Mann-Whitney U

A hospital trust has two laboratories A and B. Concern has been expressed about the relative levels of sick leave as the mean days absence per employee last year were 16.3 days and 9.9 days respectively. Lab A has 9 staff and Lab B 7. Days sick leave for each full-time staff member in the last financial year are shown below…

Laboratory A / 1 / 6 / 4 / 4 / 0 / 9 / 0 / 120 / 3
Laboratory B / 14 / 5 / 12 / 11 / 7 / 8 / 12

a) Carry out a suitable non-parametric test to decide whether there is evidence of a difference in days taken as sick leave.

b)Explain the result of your test.

c)Explain why a t-test would be the wrong one to use in this case.

10) - Wilcoxon Signed Rank

In a physiology study of the effect of sports training, 10 randomly selected students undertook a sports training programme. At the beginning of the programme each participant was asked to do as many press-ups as possible. This was repeated at the end of the training programme.

No. of press-ups

Before Training15 18 18 20 21 25 28 31 32 68

After Training19 19 22 23 27 29 32 35 37 71

a)Carry out a suitable non-parametric test to ascertain whether there is any evidence of difference in the number of press-ups achieved before and after the training programme.

b)Explain briefly why a non-parametric test is needed here.

11) - Spearman’s Rank Correlation coefficient

The following data show the age standardised mortality rates for cancer and for circulatory disease for 1995-1997 for 9 urban manufacturing centres.

Cancer / Circulatory disease
Birmingham / 141 / 161
Bradford / 145 / 172
Calderdale / 135 / 146
Coventry / 145 / 161
East Lancs / 152 / 180
Sandwell / 153 / 184
Walsall / 153 / 149
West Pennine / 154 / 180
Wolverhampton / 149 / 169

a)Calculate the rank correlation coefficient for these two variables and test whether it differs significantly from zero.