Scientific Practice Exam Question Bank, Sem 2, 2015-16
Section A (explanation, 48%)
1) Explain the basis and significance of 3 of the following 5 termsin relation to the practice of science in general and healthcare science as appropriate; include definitions as appropriate. Each question is worth 16%.
- - One-way ANOVA
- - post hoc testing
- - correlation
- - simple linear regression
- - correlation vs regression
- - r vs r2
- - r2 and significance in linear regression
- - 2-way ANOVA
- - ANOVA and interaction
- - blocking factors in 2-way ANOVA
- - non-linear regression
- - stepwise multiple regression
- - chi-square
- - two-way chi-square
- - Mann-Whitney U
- - Wilcoxon Signed Rank
- - Spearman’s Rank Correlation coefficient
- - parametric vs non-parametric tests
- - qualitative research methods
- - the audit cycle
- - types of healthcare audit
Section B (interpretation, 26%)
Answer any 2 from 3. Each question is worth13%.
1) - One-way ANOVA
A comparison was made of four semi-automated machines (A,B,C & D) used to determine the amount of chloropheniramine in tablets. A sample of ten tablets was tested on each machine with a view to establishing whether there were any significant differences in the amounts as determined by the four machines. The data given below has been analysed as a one-way analysis of variance using Minitab.
Amount of chloropheniramine (mg)
Machine A / Machine B / Machine C / Machine D4.13 / 4.00 / 3.98 / 4.00
4.10 / 4.02 / 3.86 / 4.02
4.04 / 4.01 / 3.82 / 4.03
4.07 / 4.01 / 3.93 / 4.04
4.05 / 4.04 / 4.00 / 4.02
4.04 / 3.99 / 3.82 / 3.81
3.99 / 4.03 / 3.98 / 3.91
4.06 / 3.97 / 3.99 / 3.96
4.10 / 3.90 / 4.00 / 4.05
4.04 / 3.98 / 3.93 / 4.02
One-way Analysis of Variance
Analysis of Variance
Source DF SS MS F P
Factor 3 0.08657 0.02886 8.24 0.000
Error 36 0.12614 0.00350
Total 39 0.21271
Individual 95% CIs For Mean
Based on Pooled StDev
Level N Mean StDev --+------+------+------+----
1 10 4.0620 0.0399 (-----*-----)
2 10 3.9950 0.0398 (-----*-----)
3 10 3.9310 0.0726 (-----*-----)
4 10 3.9860 0.0746 (-----*------)
--+------+------+------+----
Pooled StDev = 0.0592 3.900 3.960 4.020 4.080
a)Explain why this is a ‘one-way’ analysis of variance.
b)Explain why the analysis suggests that there are differences between the machines.
c)Why is the p-value reported as 0.000?
d)What is the test statistic and how is it calculated?
2) two-wayAnova
TV watching is a sedentary pastime which may affect health, so a student carried out a surveyof our TV-watching habits for his final year project. His data on the duration of TV watching per week (h) in relation to age group and gender is presented in the table below.
Duration of TV watching per week (h) by age group and gender; F: female, M: male; age group intervals are given in years
Rows: Gender Columns: Age Group
20-25 26-55 56+
F 25 32 44
21 26 45
27 33 50
26 33 43
31 28 51
M 20 23 33
27 21 34
20 24 38
22 28 33
28 26 37
a)What are the two factors involved? How many levels does each factor have? What is the number of treatment combinations? How many replicates did the student use for each treatment combination? What is the total number of observations he collected?
The data were analysed using analysis of variance. The computer output of this analysis is in the table immediately below. One p-value has been replaced by an asterisk.
Analysis of variance (ANOVA) for TV watching per week (h)
Source DF SS MS F P
Age Group 2 1486.87 743.43 69.48 0.000
Gender 1 340.03 340.03 31.78 0.000
Age Group*Gender 2 103.27 51.63 4.83 *
Error 24 256.80 10.70
Total 29 2186.97
Treatment Means for duration of TV watching per week (h)
Rows: Gender Columns: Age Group
20-25 26-55 56+ All
F 26.00 30.40 46.60 34.33
M 23.40 24.40 35.00 27.60
All 24.70 27.40 40.80 30.97
b)Using the output in the ANOVA table above, draw conclusions from hypothesis tests for any effect of Age Group, Gender, and the interaction between the two on the duration of TV watching per week (h). Use the p-value if it is available and critical values for F otherwise (one P-value has been replaced with an *).
3) - post hoc testing
Four cages of rats were kept on a particular diet. After two weeks, their weights were recorded (see table below). To test if there were there differences in weights (in grams) between cages, a one-way ANOVA was carried out. The p-value was 0.0005
a)What is the Null Hypothesis for the experiment?
b)Why is post-hoc testing necessary within ANOVA?
c)Under what circumstances is it not carried out?
d)In terms of post hoc testing which is more appropriate for the above data, Tukey or Dunnett, and why?
4) - correlation
Explain how the interpretation of correlation analysis can be affected by reverse causation and third factors. Illustrate your explanation with examples.
5) - blocking factors in 2-way ANOVA
The effects of 3 drugs on blood pressure in 10 subjects were investigated. The data and ANOVA analysis are shown below…
ANOVA: b.p. versus subject, drug
Factor Type Levels Values
subject fixed 10 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
drug fixed 3 1, 2, 3
Analysis of Variance for b.p.
Source DF SS MS F P
subject 9 5530.80 614.53 21.71 0.000
drug 2 370.40 185.20 6.54 0.007
Error 18 509.60 28.31
Total 29 6410.80
a)Use this experiment to explain what is meant by a ‘randomised block design’.
b)Describe another approach to investigating these drugs, including the number of subjects required.
c)Why is there no ‘interaction’ effect in the analysis?
6) Linear Regression
A study of lung function in boys aged 10 to 15 looked at the relation between Forced Expiratory Volume (FEV) (litres) and the height (cm) for a group of 25 boys whose heights ranged from 134 cm to 178 cm.
The regression analysis from Minitab gave the following:
FEV = -5.264 + 0.051 height
PredictorCoefStdevt-ratiop
Constant-5.26412.6381.9950.090
height0.05050.01932.6130.013
s = 1.870R-sq = 24.4%
Analysis of variance
SOURCEDF SSMSFp
Regression 1 26.02426.0247.440.013
Error23 80.453 3.498
Total24106.477
(i)Explain what the gradient of the regression line means in the context of this study.
(ii)Explain the meaning of ‘R-sq’ in the context of the question.
It was then suggested that a better idea would be to fit a multiple regression of FEV on height and age.
The results were…
FEV = -4.892 + 0.042 height + 0.480 age
Predictorcoefstdevt-ratiop
Constant-4.89212.1422.2840.046
Height 0.04230.02121.9950.063
Age 0.48010.25141.9100.059
S = 1.881R-sq = 26.9%
SourceDFSSMSFp
Regression 2 28.64314.3214.0480.043
Error22 77.834 3.538
Total24106.477
(iii)Explain whether the multiple regression equation is, or is not, an improvement on the simple linear equation.
7) Multiple Regression
A study of a coast whose estuaries were becoming silted up because of the growth of a grass Spartinamaritima looked at the analysis of the water at various points together with a measure of the aerial mass of spartina (gm-2) (‘biomass’). The important independent variables were found to be the pH and the potassium content (ppm)(K) of the water.
A multiple regression of biomass on pH and potassium content gave the following…
The regression equation is
biomass = - 488 + 432 pH - 0.687 K
Predictor Coef StDev T P
Constant -487.9 287.0 -1.70 0.102
pH 432.44 66.39 6.51 0.000
K -0.6869 0.2076 -3.31 0.003
S = 363.9 R-Sq = 65.1% R-Sq(adj) = 62.2%
Analysis of Variance
Source DF SS MS F P
Regression 2 5924821 2962410 22.37 0.000
Residual Error 24 3178646 132444
Total 26 9103467
Source DF Seq SS
pH 1 4474720
K 1 1450101
(i)According to the equation what is the effect of increasing the pH by one unit while not changing the potassium level?
(ii)Give an interpretation of R-Sq
(iii)Is it apparently useful to include both the variables, pH and potassium in the regression model? What would you expect to happen to R-squared if we regressed biomass on just the single independent variable pH?
8) - non-linear regression
For her final year project, a student investigated growth in a high-temperature extremophile bacterium. The growth rate graph (see below) led the student to analyse the data with three different regression models (Minitab output reproduced below).
1)Regression Analysis: Growth versus Temp, TempSq
The regression equation is
Growth = 1.25 + 0.084 Temp + 0.0106 TempSq
Predictor Coef SE Coef T P
Constant 1.255 3.955 0.32 0.760
Temp 0.0841 0.1652 0.51 0.626
TempSq 0.010616 0.001463 7.26 0.000
S = 3.36239 R-Sq = 99.4% R-Sq(adj) = 99.3%
Analysis of Variance
Source DF SS MS F P
Regression 2 13524.5 6762.2 598.13 0.000
Residual Error 7 79.1 11.3
Total 9 13603.6
Source DF Seq SS
Temp 1 12929.4
TempSq 1 595.1
2) Regression Analysis: Growth versus Temp
The regression equation is
Growth = - 22.1 + 1.25 Temp
Predictor Coef SECoef T P
Constant -22.101 6.271 -3.52 0.008
Temp 1.2519 0.1011 12.39 0.000
S = 9.18031 R-Sq = 95.0% R-Sq(adj) = 94.4%
Analysis of Variance
Source DF SS MS F P
Regression 1 12929 12929 153.41 0.000
Residual Error 8 674 84
Total 9 13604
3) Regression Analysis: Growth versus TempSq
The regression equation is
Growth = 3.08 + 0.0113 TempSq
Predictor Coef SE Coef T P
Constant 3.084 1.572 1.96 0.085
TempSq 0.0113423 0.0003124 36.30 0.000
S = 3.20293 R-Sq = 99.4% R-Sq(adj) = 99.3%
Analysis of Variance
Source DF SS MS F P
Regression 1 13522 13522 1318.05 0.000
Residual Error 8 82 10
Total 9 13604
a)Explain why the student used these three different regression models.
b)Which regression model is best, and why?
c)What is the final regression equation that should be reported in the project write-up, and why?
d)Suggest a better way to approach the regression analysis.
9) - two-way chi-square
A clinical audit of knee joint replacement operations last year looked at the relative performance of two hospitals (A and B). The outcome of the operation was classified as…
- no improvement
- partial function restoration
- complete function restoration
The following results were found…
Hospital / A / BNo improvement / 18 / 21
Partial restoration / 28 / 56
Complete restoration / 45 / 38
Data were analysed in Minitab and the (edited) output appeared as follows...
A B Total
1 18 21 39
17.23 21.77
0.035 0.027
2 28 56 84
37.11 46.89
2.235 1.769
3 45 38 83
36.67 46.33
1.895 1.499
Total 91 115 206
(test statistic) = 7.460, DF = 2, P-Value = 0.024
a)What type of test is being applied here?
b)Specifically, what does the value 17.23 represent?
c)Specifically, what does the value 0.035 represent?
d)Using the appropriate table, what is the critical value for the test statistic?
e)Interpret the p-value in terms of the question being posed in the audit.
10) - two-way chi-square
A recent report on the effect of smoking among teenagers gave the following results for 54 smokers who had been smoking for at least two years compared with 54 non-smokers.
Cough / No coughSmoker / 27 / 27
Non-smoker / 13 / 41
Data were analysed in Minitab and the (edited) output appeared as follows...
Cough No cough Total
1 27 27 54
20.00 34.00
2.450 1.441
2 13 41 54
20.00 34.00
2.450 1.441
Total 40 68 108
(test statistic) = 7.782, DF = 1, P-Value = 0.005
a)What type of test is being applied here?
b)Specifically, what does the value 34.00 represent?
c)Specifically, what does the value 1.441 represent?
d)Using the appropriate table, what is the critical value for the test statistic?
e)Interpret the p-value in terms of the question being posed in the audit.
Section C (calculation, 26%)
Answer any 2 from 3.Show your calculations where appropriate. Each question is worth13%.
1) - simple linear regression
A stability monitoring programme for a pharmaceutical product involved measuring “related substances” in the product at the time periods (in months), shown in the table below. The product was stored at 25C and 60% relative humidity. The measurements were made by HPLC with a UV spectrophotometer detector.
Time (months) / 0 / 3 / 6 / 9 / 12Related Substances (%) / 0.149 / 0.185 / 0.245 / 0.297 / 0.320
The following scatterplot was produced, and the MINITAB regression analysis is shown afterwards:
a)What does R2=98.3% tell you?
b)State in words what the Time Coefficient’s p-value and actual value imply.
c)Find the 95% Confidence Interval for the gradient of the line (Time Coefficient).
d)It is possible to calculate a 95% confidence interval for the intercept, but explain why this might not be a sensible thing to do in the context of the experiment.
2) - ANOVA and interaction
An experiment was carried out to investigate the retention of ascorbic acid in frozen orange juice under storage conditions. Three storage temperatures –10C, –20C and –30C and four storage times 2, 4, 6 and 8 weeks were compared. A three by four factorial design was used with each treatment replicated 3 times.
The following ANOVA table was obtained from Minitab. Some figures in the ANOVA have been replaced by asterisks.
ANOVA
FactorTypeLevelsValues
Tempfixed 3123
Storagefixed 41234
ANOVA for ascorbic acid
SourcedfSSMSFp
Temp * 334.389 167.19 236.04 0.000
Storage * 40.258 13.51 19.08 0.000
Interaction * 34.056 *** ***
Error * 17.000 0.708
Total * 425.927
a)What are the missing df values? (*)
b)What are the missing MS and F values?
c)Determine if the interaction effect is significant.
3) - stepwise multiple regression
A medical student has gathered data from 10 randomly-sampled patients to test the idea that high blood pressure (mmHg) could be explained by plasma glucose concentration (mmol/L) and the Body Mass Index (BMI, kg/m2). The data are shown below.
Blood pressure (mmHg) / Plasma glucose (mmol/L) / BMI(kg/m2)
68 / 5.7 / 46.2
72 / 7.4 / 23.8
56 / 3.1 / 24.2
84 / 10.2 / 35.5
90 / 8.4 / 29.7
88 / 7.0 / 38.5
80 / 6.6 / 29.0
70 / 5.9 / 39.4
78 / 6.7 / 26.5
72 / 3.6 / 32.0
First, the student fitted a linear regression model to the relationship between blood pressure and the two predictors, plasma glucose and BMI. The MINITAB output is shown below.
Linear regression of blood pressure (mmHg) on plasma glucose (mmol/L) and BMI (kg/m2)
The regression equation is
Blood pressure (mmHg) = 49.4 + 3.74 Plasma glucose (mmol/L) + 0.066 BMI (kg/m^2)
Predictor Coef SECoef T P
Constant 49.44 13.17 3.75 0.007
Plasma glucose (mmol/L) 3.744 1.201 3.12 0.017
BMI (kg/m^2) 0.0655 0.3438 0.19 0.854
S = 7.50470 R-Sq = 58.7% R-Sq(adj) = 47.0%
Analysis of Variance
Source DF SS MS F P
Regression 2 561.36 280.68 4.98 0.045
Residual Error 7 394.24 56.32
Total 9 955.60
Second, the student fitted a linear regression model to the relationship between blood pressure and plasma glucose. The MINITAB output is shown below.
Linear regression of blood pressure (mmHg) on plasma glucose (mmol/L)
The regression equation is
Blood pressure (mmHg) = 51.4 + 3.77 Plasma glucose (mmol/L)
Predictor Coef SECoef T P
Constant 51.423 7.588 6.78 0.000
Plasma glucose (mmol/L) 3.766 1.121 3.36 0.010
S = 7.03818 R-Sq = 58.5% R-Sq(adj) = 53.3%
a)Explain the interpretation of the first, two-predictor regression that caused the student to repeat the regression with a single predictor.
b)Does the second Minitab output suggest that plasma glucose explains any of the variation in blood pressure? Explain your reasoning.
c)What blood pressure would you predict for a person with plasma glucose of 3.0 mmol/L? Would that prediction be reliable and why?
4) - stepwise multiple regression
An article in a walking magazine gives the following method for calculating VO2max as a measure of fitness.
“Walk one mile at your fastest pace. Record the time taken in minutes to the nearest hundredth of a minute. Record your heart rate (beats per minute) at the end of the mile walk. You also need your weight in kilos, your age and your gender. If you are male your gender is 1, if female, it is zero. Then use the following equation to calculate your VO2max.”
VO2max = 132.853 – 0.1693 × weight – 0.3877 × age + 6.315 × gender - 3.2649 × time – 0.1565 × heart rate
a)What statistical technique would have been used to produce this equation?
b)Calculate the VO2max for a female (Mary) aged 45 with a weight of 55 kilos who walked the mile in 18.50 minutes and finished with a heart rate of 115 beats per minute.
c)Mary tries to get fitter and after a couple of weeks repeats the test with her weight and heart rate unchanged, but the time is now 17.50 minutes. What is the corresponding change in her VO2max according to the equation?
d)Assuming that all other variables are kept constant, what is the change in VO2max for a one year increase in age?
e)The magazine gave no indication of the reliability of this method but by tracing the source you find that “R-squared = 82%” Explain what this means.
5) - two-way chi-square
It is suspected that bacteria carried by health workers may be more resistant to antibiotics than those carried by office workers.
Samples of Staph. Epidermis taken by nose swabs from 60 female nurses and a comparable group of female office workers were tested for minimum inhibitory level of ampicillin. The results for the bacteria were classified as sensitive, intermediate and resistant. The numbers in the various categories were:
Sensitive / Intermediate / Resistant / TotalNurses / 14 / 18 / 28 / 60
Office workers / 36 / 16 / 8 / 60
a)Use an appropriate statistical test to determine whether you think the suspicions above are justified or not.
6) - two-way chi-square
A recent run of a module showed the following results, in which students were classified by whether or not they achieved 40% or over in the examination and also by gender.
Exam mark / Male / FemaleLess than 40% / 12 / 6
40% or over / 32 / 55
a)On this evidence is there a difference in examination performance between male and female students?
7) - two-way chi-square
A study of the attitudes of consultant obstetricians and heads of midwifery in British hospitals looked at whether the respondents thought it ethical for a woman to choose to have her baby by Caesarean section (surgery) rather than naturally. (BMJ 2005:331:490-491)
The question asked was ‘Do you believe that women should choose their method of delivery?’ The responses were…
Male consultants / Female consultants / Heads of midwifery / TotalYes / 254 / 67 / 33 / 354
No / 207 / 117 / 85 / 409
Total / 461 / 184 / 118 / 763
a) Test whether or not this shows a difference of opinion between the groups.
8) - Mann-Whitney U
To compare the number of days spent in indoors by elderly patients in two areas (North and South) of the country, local Hospital Trusts randomly radio-tagged 8 patients in each area. Their movements were monitored throughout the winter to determine the number of days when they did not leave their homes. The number of days spent entirely in the home were as follows:
North / South100 / 95
107 / 90
275 / 82
96 / 100
102 / 98
110 / 90
292 / 48
106 / 105
a)Use a suitable non-parametric test to decide whether or not the number of days spent at home differs in the two areas for the patients.
b)Explain why a non-parametric test is used here in preference to a t-test.
9) - Mann-Whitney U
A hospital trust has two laboratories A and B. Concern has been expressed about the relative levels of sick leave as the mean days absence per employee last year were 16.3 days and 9.9 days respectively. Lab A has 9 staff and Lab B 7. Days sick leave for each full-time staff member in the last financial year are shown below…
Laboratory A / 1 / 6 / 4 / 4 / 0 / 9 / 0 / 120 / 3Laboratory B / 14 / 5 / 12 / 11 / 7 / 8 / 12
a) Carry out a suitable non-parametric test to decide whether there is evidence of a difference in days taken as sick leave.
b)Explain the result of your test.
c)Explain why a t-test would be the wrong one to use in this case.
10) - Wilcoxon Signed Rank
In a physiology study of the effect of sports training, 10 randomly selected students undertook a sports training programme. At the beginning of the programme each participant was asked to do as many press-ups as possible. This was repeated at the end of the training programme.
No. of press-ups
Before Training15 18 18 20 21 25 28 31 32 68
After Training19 19 22 23 27 29 32 35 37 71
a)Carry out a suitable non-parametric test to ascertain whether there is any evidence of difference in the number of press-ups achieved before and after the training programme.
b)Explain briefly why a non-parametric test is needed here.
11) - Spearman’s Rank Correlation coefficient
The following data show the age standardised mortality rates for cancer and for circulatory disease for 1995-1997 for 9 urban manufacturing centres.
Cancer / Circulatory diseaseBirmingham / 141 / 161
Bradford / 145 / 172
Calderdale / 135 / 146
Coventry / 145 / 161
East Lancs / 152 / 180
Sandwell / 153 / 184
Walsall / 153 / 149
West Pennine / 154 / 180
Wolverhampton / 149 / 169
a)Calculate the rank correlation coefficient for these two variables and test whether it differs significantly from zero.