AMS572.01 Final Exam Fall, 2009

Name ______ID ______Signature______AMS Major? ______

Instruction: This is a close book exam. Anyone who cheats in the exam shall receive a grade of F. Please enter “Yes” or “No” for “AMS Major”. Please provide complete solutions for full credit. The exam goes from 2:15 - 4:45pm. Good luck!

1. (for all students) How to become an art sleuth? Like all creative artists, composers of music develop certain personal characteristics in their works. One such characteristic is the number of melody notes in each bar of music. Now suppose you buy an old unsigned manuscript of a waltz which you suspect is an unknown work by Johann Strauss, and if so, very valuable. You count the number of melody notes per bar of several genuine Strauss waltzes and compare frequency distribution with a similar count of the unknown work. Would the following results support your high hopes? Use α = 0.05.

No. of melody notes per bar / 0 / 1 / 2 / 3 / 4 / 5 / ≥ 6 / Total
Strauss waltzes / 5 / 32 / 133 / 114 / 67 / 22 / 15 / 388
Unknown waltz / 6 / 60 / 62 / 96 / 33 / 7 / 18 / 282

SOLUTION: This is inference on several population proportions following a multinomial distribution. If the unknown work was from Johann Strauss, then we will expect the following frequency distribution of melody notes per bar:

No. of melody notes per bar / 0 / 1 / 2 / 3 / 4 / 5 / ≥ 6
Expected relative frequency () / 5/388 / 32/388 / 133/388 / 114/388 / 67/388 / 22/388 / 15/388
Expected frequency (count) () / 282*5/388 ≈ 3.63 / 282*32/388 ≈ 23.26 / 282*133/388
≈ 96.66 / 282*114/388
≈ 82.86 / 282*67/388
≈ 48.70 / 282*22/388
≈ 15.99 / 282*15/388
≈ 10.90
Observed frequency () / 6 / 60 / 62 / 96 / 33 / 7 / 18

The large sample chi-square test can be applied to test: versus is not true.

The chi-square test statistic is:

Since , we reject the null hypothesis at the significance level of α = 0.05 and conclude that it is not likely that the unknown waltz was written by Strauss.

2A. (for AMS majors) Suppose we have two independent random samples from two normal populations i.e., , and .

(a). At the significance level α, please construct a test of the hypothesis Ho: vs. H1: . Here are known constants.

(b). Suppose we have confirmed that . At the significance level α, please construct a test to test whether or not using the pivotal quantity method. Here are known constants. Please include the derivation of the pivotal quantity, the proof of its distribution, and the derivation of the rejection region for full credit.

SOLUTION:

This is inference on two normal population means, independent samples.

(a) This is the usual F-test on two normal population variances: versus

The test statistic is:

At the significance level α, we will reject H0 if F0 is smaller thanor F0 is greater than

(b) Given that , we set and thus . Here is a simple outline of the derivation of the test: versus , which are equivalent to: versus

[1] We start with the point estimator for the parameter of interest: . Its distribution is using the mgf for which is , and the independence properties of the random samples. From this we have . Unfortunately, Z can not serve as the pivotal quantity because σ is unknown.

[2] We next look for a way to get rid of the unknown σ following a similar approach in the construction of the pooled-variance t-statistic. We found that using the mgf for which is , and the independence properties of the random samples.

[3] Then we found, from the theorem of sampling from the normal population, and the independence properties of the random samples, that Z and W are independent, and therefore, by the definition of the t-distribution, we have obtained our pivotal quantity: .

[4] The rejection region is derived from , where . Thus . Therefore at the significance level of α, we reject in favor of iff

2B. (for non-AMS majors) A group of babies all of whom weighed approximately the same at birth are randomly divided into two groups. The babies in sample 1 were fed formula A; those in sample 2 were fed formula B. The weight gains attained from birth to age six months were recorded for each baby. The results were as follows:

Sample 1: / 5 / 7 / 8 / 9 / 6 / 7 / 10 / 8 / 6
Sample 2: / 9 / 10 / 8 / 6 / 8 / 7 / 11 / 10 / 9

(a). Please construct a 95% confidence interval for the mean differences in weight gains between the two formulas.

(b). Use suitable tests to investigate the differences between the weight gains of the two groups (Use α =.05. Please state the assumption(s) of the tests.)

(c). Please write up the entire SAS program necessary to answer questions raised in (b). Please include the data step as well as tests for testing for various assumptions.

SOLUTION:Inference on two population means. Two small and independent samples.

Formula A (sample 1): , ,

Formula B (sample 2): , ,

Under the normality assumption, we first test if the two population variances are equal. That is, versus . The test statistic is

, and .

Since F0 is between 0.23 and 4.28, we cannot reject H0 . Therefore it is reasonable to assume that.

(a) The 95% C. I. for the mean difference is

where

Therefore 95% C.I. is [-2.92, 0.24].

(b)

Next we perform the pooled-variance t-test with hypotheses versus

Since is greater than , we cannot reject H0. We have insufficient evidence to reject the hypothesis that there is no difference in the mean weight gain between the two formulas.

(b) (1) Both populations are normally distributed

(2)

data babies;

input formula wt_gain;

datalines;

110

210

211

210

29;

run;

proc univariate data=babies normal;

class formula;

var wt_gain;

title 'Check for normality';

run;

proc ttest data=babies;

class formula;

var wt_gain;

title 'Independent samples t-test';

run;

proc npar1way data=babies wilcoxon;

class formula;

var wt_gain;

title 'Nonparametric test for two-mean comparisons';

run;

3. (for all students) Coffee cans are to be filled with 16 oz. of coffee. The mean content of cans filled on a production line is monitored. It is known from past experience that the standard deviation of the content is 0.2 oz and the distribution is normal. A sample of cans is taken every hour with their mean content examined for quality control purposes.

(a). How many cans should be sampled to ensure, with 95% confidence, an estimation of the true mean content to be within 0.1 oz.?

(b). Now suppose you wish to testozs. vs ozs. If the true mean content during a particular period is 16.1 oz., how many cans should be sampled to ensure a test power of 90% at the significance level of 0.05?

SOLUTION:Inference on one population mean, normal population, variance known.

(a)

Therefore 16 cans should be sampled.

(b)

To assure 90% power for detecting a difference of 0.1 ozs, use =1-Power=0.1. The significance level is α = 0.05. Thus

Therefore, 42 cans should be sampled.

4. (for all students) A college has 500 women students and 1,000 men students. The introductory zoology course has 90 students, 50 of whom are women. It is suspected that more women tend to take zoology than men. Please test this suspicion at α =.05.

SOLUTION:Inference on one population proportion, large sample.

The hypotheses are:

The test statistics is:

Since 4.47 >1.645, we reject H0 at the significance level of 0.05 and claim that more women tend to take zoology than men.

5. (for all students) The student government of a large college polled a random sample of 325 male students and found that 221 were in favor of a new grading system. At the same time, 120 out of a random sample of 200 female students were in favor of the new system.

(a). At the significance level of α =.05, do the results indicate a significant gender bias on the students’ opinion of this new grading system?

(b). Please write up the entire SAS program necessary to answer question raised in (a). Please include the data step.

SOLUTION: This is inference on two population proportions following independent binomial distributions.

(a) The hypotheses are:

vs (*Note: a two-sided alternative is also acceptable.)

The test statistics is:

Since 1.87>1.645, we reject H0 at the significance level of 0.05 and claim, that more male students are in favor of the new system than female students. That is, the results indicate a significant gender bias on the students’ opinion of this new grading system.

(b) SAS code:

Data Vote;

Input gender $ outcome $ count;

Datalines;

Male yes 221

Male no 104

Female yes 120

Female no 80

;

Run;

Proc freq data=vote;

Tables gender*outcome/chisq;

Weight count;

Run;