GY2170Statistics Practical 3

STATISTICS PRACTICAL 3

The Normal Distribution, Student’s t Test & Confidence Levels

1. If the population of shell length-to-width ratios of a species of bivalve is normally distributed with a mean of 1.65 and a standard deviation of 0.05, what is the probability that any one shell picked at random has a length-to-width ratio: (i) less than 1.65; (ii) greater than 1.75; (iii) within two standard deviations of the mean; and (iv) between 1.58 and 1.69?

2. Assess whether a correlation coefficient value r = 0.9308 (N=18) represents a statistically significant relationship between two variables at the 99% confidence level (i.e.  = 0.01) by testing your value against the hypotheses: Ho (null hypothesis)  = 0 and Ha (alternate hypothesis)  > 0 (N.B. your lecture handout will help you with this question).

3.This question follows on from Q1 of the Statistics 1 practical.

Maximum pebble lengths were also measured at a second point-bar locality 5km upstream of the first. A sample of 31 pebbles yielded the following data (lengths in cm.):

9.710.610.911.011.511.811.912.212.212.312.4 12.6 12.6 12.8 12.9 12.9 13.0 13.1 13.1 13.4 13.8 14.3 14.8 14.9 15.1 15.6 16.2 17.4 17.9 18.2 19.6.

Your task is to establish whether or not the pebble lengths at this locality are significantly larger than those at the first locality, i.e. whether or not the two samples are likely to have been taken from different populations. If the means of the two samples are and ,the null hypothesis (Ho) may be stated thus: “ is not significantly larger than ”. The alternative hypothesis (Ha) is: “ is significantly larger than ”.

Student’s t test may be applied to this problem. When comparing two sample means, the test statistic, t, must be calculated using the expression:-

where N1 and N2 are the sizes of samples 1 and 2, respectively, and are their means, s1 and s2 are their standard deviations, and the number of degrees of freedom is given by  = N1+N2-2.

(a) Calculate and s2 (=10.51; s1=1.91 cm).

(b) Supposing you wish to be 99% confident in your final decision, fix , the size of the critical region.

(c) Decide whether you need to use the one-tailed or two-tailed test.

(d) Refer to the table listing critical values of t as a function of  and , and define the critical value of t in this case.

(e) Calculate the t-statistic as given by the formula above.

(f) Do you accept Ho or Ha? State your conclusion (hint: ignore the sign of the t value that you calculate, a negative sign just means that is smaller than ).

4. Volcanologists have measured the hydrogen content (in % of total number of atoms) of samples of gases collected from the 1970 and 1971 Mount Etna volcanic eruptions. Values are given in the following table:

1970
Hydrogen content (%) / 1971
Hydrogen content (%)
35.8 / 38.5 / 42.0 / 45.0
45.5 / 36.0 / 57.0 / 44.6
35.5 / 40.5 / 42.0 / 48.5
32.0 / 35.5 / 54.5 / 63.0
50.0 / 45.5 / 35.0 / 55.0
39.0 / 37.0 / 52.5 / 40.0
37.0 / 36.0 / 43.5 / 37.5
47.0 / 53.0 / 48.0 / 53.7

(a)Calculate a mean hydrogen value for the 1970 eruption and use Student’s t-distribution to find the 95% confidence limits for the true value.

(b) Use the Student's t-test for comparing means to determine whether there is a difference in the hydrogen content of the gases between the two eruptions at the 99% confidence level.

Statistics Practical 3: Answers

1.This question is best answered through the use of z-values and use of the standard normal curve:

(i)The probability is 0.5 (i.e. 50% chance) that a shell will have a L:W ratio <1.65.

i.e. 50% of the normal curve

(ii) i.e twice the standard deviation. 97.73% of the normal curve is under z = 2, so the probability of picking a shell at random with L:W > 1.75 is 1-0.9773 = 0.0227.

(iii) The probability that L:W is within 2 standard deviations of the mean = 0.9545.

(iv) Probability of the L:W ratio being between 1.58 and 1.69

Cumulative probability z = +0.8 = 0.7881

Cumulative probability z = -1.4 = 0.0808

Pr = 0.7881-0.0808 = 0.7073

2. The correlation coefficient for r = 0.9308, N = 18.

The test statistic is Student's t given as:

t = 10.19 with 16 degrees of freedom

The level of significance  = 0.01 and, from the table, the critical value of t with  = 16 is 2.583. We are dealing with a one tailed test so we reject Ho if t > critical value of t. Since this is true, we accept Ha and say that there is less than 1 in 100 chances of such an extreme correlation coefficient coming from a population coefficient  = 0.

3. (a) For the additional pebble data = 13.57 cm and s2 = 2.32 cm (31 observations).

(b) The critical region  = (100-99)/100 = 1%.

(c) We are testing to see if is significantly larger than so a one-tailed test is used. (If we were testing to see if is significantly different from then we would use a two-tailed test).

(d) The number of degrees of freedom,  = N1 + N2 - 2 = 51 + 31 - 2 = 80. Reading from the table of t values for 80 degrees of freedom and = 0.01, then the critical value of t is 2.37.

(e)

(g) The calculated value of t is negative, but all this means is that is less than (which we know already), when comparing t values we use the absolute value (i.e. 6.52). Therefore the calculated t value is greater than the critical value, thus Ho is rejected and Ha is accepted. We can state that there is a 99% certainty that is greater than and that the pebbles from the second locality are significantly larger, on balance, than those at the first locality.

4. (a) For the 1970 sample of gas:

The interval containing the true hydrogen content is given by:

The significance level  = 5% for a two-tail distribution and the number of degrees of freedom,  = N-1 = 15. From the table the critical value of t = 2.131.

The standard error, SE() of the mean is:

The 95% confidence interval for the value hydrogen content is therefore:

(b) For the 1971 eruption

For comparing means the test statistic is

Substituting values gives

The calculated value of t is negative, but all this means is that is less than (which we know already), when comparing t values we use the absolute value (i.e. 2.99).

The critical value of t for =1%, =N1+N2-2 = 30 and two-tailed distribution is 2.75. Therefore the calculated t value is greater than the critical value, thus Ho is rejected and Ha is accepted. We can state that there is a 99% certainty that is greater than and that the amount of hydrogen from the 1971 eruption was significantly higher, on balance, than that released during 1971.