Statistical Methods in Scientific Research
Solution Sheet 1: Analysis of data using SPSS
1. Background
Q0. Before analysing these data, what questions would you want to ask the people who actually did the experiment?
How were the measurements made? For example, were they made by the same person, with the same equipment under the same conditions?
Were the three replicates identical?
What other factors might influence the measurements?
2. Introduction to SPSS
Q1. Follow the handout instruction to familiarise yourself with SPSS.
Dataset 1
Sample 1: total root length when no glyphosate is added. (n=12)
114.2, 104.7, 102.0, 114.2, 109.5, 109.0, 73.0, 133.0, 127.0, 178.0, 182.0, 145.0
Sample 2: total root length when glyphosate concentration level is 0.053ppm. (n=6)
143.1, 88.6, 87.0, 61.0, 150.0, 106.0
Q2. Go to your working directory, find the dataset. Open the data file to check the data format and the sample size.
The form of the dataset is:
rootsize group
114.2 0
104.7 0
102 0
etc.
The total sample size = 18, n(Group 0) = 12, n(Group 1) = 6
Q3. Read in the data and switch the window between Data Editor window and Variable View window to find out names and types of the variables. Save it as a SPSS dataset.
The variable names are: rootsize - continuous (scale), group – discrete (nominal)
Q4. Obtain the mean and standard deviation of the variable rootsize.
For rootsize, we obtain mean = 118.183, sd = 32.7583.
Q5. Obtain the mean and standard deviation of each sample.
Mean / Standard Deviation / Sample sizeSample 1 / 124.300 / 31.4711 / 12
Sample 2 / 105.950 / 34.6444 / 6
Q6. Construct the 95% confidence intervals for each population mean.
We can do this with some additional calculations as follows:
Using quatiles: t(0.975, df=11) = 2.201, t(0.975, df=5) = 2.571,
Sample 1: 124.300 ± 2.201´31.4711/Ö12 = (104.304, 144.296)
Sample 2: 109.950 ± 2.571´34.6444/Ö6 = ( 73.537, 146.263)
These intervals are very wide, especially that for Population 2, which is based on only 6 observations.
3. Comparing two samples
Comparing two independent samples: T test
Q7. Is there any evidence that the population means of root length are different at two concentration levels?
1. State the Null Hypothesis.
H0: m1 = m2, where m1 is the mean of Population 1 and m2 is the mean of Population 2.
2. What is the difference between the sample means? = 18.35.
3. What is the value of the t-test statistic, the number of degrees-of-freedom and the corresponding p-value?
t = 1.129, df = 16, p = 0.276.
4. Construct the 95% confidence interval for the population mean difference.
(-16.0943, 52.7943)
5. What is your conclusion? Although there was slight decrease in root length in these samples when glyphosate is added to the water at a concentration of 0.053ppm, the difference is not statistically significant, and thus might not reflect a true and reproducible effect.
4. Exploring relationship between variables: Dataset 2
Q8. Read in the data file and make sure types of the variables rootsize and glyph are recorded as Scale in the Variable View window.
Obtain numerical summaries for the variable rootsize by watertype. Is there any difference in root growth by water type?
Mean / Standard Deviation / Sample sizeWater type 1 / 81.156 / 31.5661 / 27
Water type 2 / 90.000 / 48.6012 / 27
There is no significant difference between water types, t = -0.972, df = 52, p = 0.335.
(The unequal variance version gives t = -0.972, df = 44.6 (under unequal variance), p = 0.336, which is hardly any different!)
Q9.
1. Create the scatter plot between rootsize and glyph describe the graph.
2. Does any of the linear regression fits appear to describe the underlying relationship properly?
As the concentration level goes higher, the total root length decreases. There is no clear pattern between two water types. Although the linear regression fits better than the mean regression only, the relationship may not be linear.
Q10. Create the scatterplot between logroot and glyph.
Q11. Create log transformed variable of glyphosate concentration called logglyph:
logglyph = Ln(glyph)
Check in the Data Editor window. Why are not all values defined, with warnings in the Output window?
Some values of the glyphosphate concentrations are 0, and log(0) is minus infinity.
Note that the log function can only take positive values and the zeros in glyphosate level are not acceptable. To avoid the problem, we define the new variable by
logglyphonew = Ln(1 + glyph).
Q12. The above pictures illustrate the effect of applying various transformations to the plant growth data. Suggest which, if any, would be appropriate for analysis using the linear regression model. Express the implied relationship between root size and glyphosate concentration levels on the scale of the original data.
Taking log transformation on both variables provides the most reasonable looking linear fit.
5. Linear regression
Obtain parameter estimates of the Linear Regression fit.
The fourth table produced by SPSS, Coefficients, contains the model parameters. By default, Intercept is included in the model.
coefficients(a)
Model / Unstandardized Coefficients / Standardized CoefficientsB / Std. Error / Beta / B / Std. Error
1 / (Constant) / 4.778 / .051 / 94.379 / .000
logglyphonew / -.944 / .079 / -.857 / -11.971 / .000
a Dependent Variable: logroot
From this table, we can see that the fitted model is:
log(rootsize) = 4.78 – 0.94 log(1+glyph) + e.
The third table provided by SPSS, ANOVA, provides the residual sum of squares (RSS) and the degrees of freedom (df). These are useful for model comparison.
ANOVA(b)
Model / Sum of Squares / df / Mean Square / F / Sig.1 / Regression / 11.365 / 1 / 11.365 / 143.307 / .000(a)
Residual / 4.124 / 52 / .079
Total / 15.489 / 53
a Predictors: (Constant), logglyphonew
b Dependent Variable: logroot
The standard error of the residual terms e is estimated by the residual mean square from the ANOVA table:
√0.079 = 0.281
The model can also be expressed as:
.
I wonder whether the power to which (1 + glyph) is raised is significantly different from 1.
9