Practice Final Examination
Statistics 515
Spring Semester 2002
E. A. Pena's Class
Part I:(20 points) Basic Concepts: Explain briefly what each of the following terms/phrases mean, or what their importance is.
- Which statistical hypothesis typically correspond to the "research hypothesis"?
- In statistical hypothesis testing, which type of error is considered to be more serious?
- Science Magazine reported that the mean listening time of 7-month-old infants exposed to a three-syllable sentence (e.g., "ga ti ti") is 9 seconds. Set up the null and alternative hypotheses for testing the claim.
- What is the level of significance of a test.
- How is the p-value used in making decisions in hypothesis testing?
- How are the probabilities of a Type I and a Type II error related for a fixed sample size?
- Why is it that we could not "accept" the null hypothesis, but instead simply conclude that we "fail to reject the null hypothesis"?
- What does the regression coefficient in the simple linear regression model represent?
- In simple linear regression, what is the idea behind the least-squares principle for obtaining the coefficients in the regression model?
- In simple linear regression, as well as in a one-way analysis of variance, which quantity serves as an estimator of the common variance
Part II: Problem Solving and Interpretations.
- (20 points) Environmental Science and Technology reported on a study of contaminated soil in The Netherlands. Seventy-two 400-gram soil specimens were sampled, dried, and analyzed for the contaminant cyanide. The cyanide concentration [in milligrams per kilogram (mg/kg) of soil] of each soil specimen was determined using an infrared microscopic method. The sample resulted in a mean cyanide level of 84 mg/kg and a standard deviation of S = 80 mg/kg. Perform a test of the null hypothesis that the true mean cyanide level in The Netherlands exceeds 100 mg/kg. Use a level of significance of 0.05.
a)State the hypotheses.
H0 (Null):
H1 (Alternative):
b)State your decision rule.
c)Compute your test-statistic.
d)State your decision.
e)State your conclusion with regards to the practical problem considered.
- (20 points) The Cleveland Casting Plant is a large, highly automated producer of gray and nodular iron automotive castings for Ford Motor Company. One process variable of interest to Cleveland Casting is the pouring temperature of the molten iron. The pouring temperatures (in degrees Fahrenheit) for a random sample of ten crankshafts produced at Cleveland Casting are listed below. The target setting for the pouring temperature is 2,550 degrees. Assuming the process is stable, conduct a test to determine whether the true mean pouring temperature differs from the target setting.
2543 / 2541 / 2544 / 2620 / 2560 / 2559 / 2562 / 2553 / 2552 / 2553
For this data set, the sample mean equals 2558.7 and the sample standard deviation is 22.7452.
a)State the hypotheses.
H0 (Null):
H1 (Alternative):
b)State your decision rule.
c)Compute your test-statistic.
d)State your decision.
e)State your conclusion with regards to the practical problem considered.
3. (20 points) Marine biochemists at the University of Tokyo studied the properties of crustacean striated muscles (The Journal of Experimental Zoology). The main purpose of the experiment was to compare the biochemical properties of fast and slow muscles of crayfish. Using crayfish obtained from a local supplier, the researchers excised twelve fast-muscle fiber bundles and tested each fiber bundle for uptake of calcium. Twelve slow-muscle fiber bundles were excised from a second sample of crayfish, and calcium uptake was measured.
A summary of the sample statistics associated with the calcium uptake (in moles per milligram) for these two groups is provided below.
Descriptive Statistics
Group / n / Sample Mean / Sample Standard DeviationFast Muscle / 12 / .57 / .104
Slow Muscle / 12 / .37 / .035
Based on this information, compare the population means of the calcium uptake for the fast and slow-muscle groups. In particular, test the null hypothesis that the two means are identical.
In performing your test you may assume that the population distribution of the calcium uptakes for each group is normally distributed, and that the two populations have equal variances.
Also, use a 5% level of significance. Again you may answer this question by following the steps below.
a) State the hypotheses.
b)State your decision rule.
c)Compute your test-statistic.
d)State your decision.
e)State your conclusion with regards to the practical problem considered.
- (30 points) The quality of the orange juice produced by a manufacturer (e.g., Tropicana) is constantly monitored. There are numerous sensory and chemical components that combine to make the best tasting orange juice. There is a measure of "sweetness" of an orange juice, with the higher the value of this "sweetness" measure, the better the orange juice. In order to study the relationship between the "sweetness" and a chemical measure such as the amount of water soluble pectin (parts per
million), in 24 production runs, the sweetness and the pectin level were measured..
A scatterplot of these 24 pairs of values is provided above.
A simple linear regression analysis with Sweetness as response or dependent variable and PectinLevel as predictor or independent variable was fitted using Minitab. The output of this analysis is given below.
Regression
The regression equation is
y = 6.25 - 0.00231 x
Predictor Coef StDev T P
Constant 6.2521 0.2366 26.42 0.000
x -0.0023106 0.0009049 -2.55 0.018
S = 0.2150 R-Sq = 22.9% R-Sq(adj) = 19.4%
Analysis of Variance
Source DF SS MS F P
Regression 1 0.30140 0.30140 6.52 0.018
Residual Error 22 1.01693 0.04622
Total 23 1.31833
a)By examining the scatterplot, describe the type of relationship between PectinLevel and Sweetness. For instance, is there a negative type of relationship?
b)Based on the simple linear regression analysis, what are the least-squares estimates of and ?
c) Provide an interpretation for the value of b, the estimate of .
d)For testing the hypothesis that = 0 (that is, there is no linear relationship between PectinLevel and Sweetness), what will be your conclusion at the 5% level of significance? Indicate the information you are using to make your conclusion.
e)What will be the estimate of the common standard deviation ?
Using the "fitted line" option in Minitab, the 95% confidence band and prediction interval were also generated. These are shown in the plot that follows.
f)Based on this plots, if a new production line produced a Pectin Level equal to 300, what will be a 95% confidence interval for the mean Sweetness of the orange juice?
g)What will be a 95% prediction interval for the exact value of the Sweetness of this orange juice with Pectin Level of 300?
g) The coefficient of determination of the fitted simple linear regression was 22.9%. Based on this value, how would you assess the ability of Pectin Level to explain the variation in the Sweetness measure? Is it high or is it low?
5. (20 points) The Journal of Hazardous Materials published the results of a study of the chemical properties of three different types of hazardous organic solvents used to clean metal parts: aromatics, choloalkanes, and esters. One variable studied was sorption rate, measured as mole percentage. Independent samples of solvents from each type were tested and their sorption rates were recorded. Summary statistics for the three groups are provided below.
Descriptive Statistics
Variable N Mean Median TrMean StDev SE Mean
Aromatic 9 0.9422 0.9500 0.9422 0.1683 0.0561
Chloroal 8 1.006 1.015 1.006 0.401 0.142
Esters 15 0.3300 0.3400 0.3292 0.2076 0.0536
Variable Minimum Maximum Q1 Q3
Aromatic 0.6500 1.1500 0.8050 1.0900
Chloroal 0.430 1.580 0.635 1.377
Esters 0.0600 0.6100 0.1000 0.5300
Overlaid boxplots for the three groups is also given below.
To determine whether the population mean sorption rate for the three groups are identical, a one-way analysis of variance was performed using Minitab. The output of this analysis is provided below.
One-way Analysis of Variance
Analysis of Variance
Source DF SS MS F P
Factor 2 3.3054 1.6527 24.51 0.000
Error 29 1.9553 0.0674
Total 31 5.2607
Individual 95% CIs For Mean
Based on Pooled StDev
Level N Mean StDev ----+------+------+------+--
Aromatic 9 0.9422 0.1683 (----*-----)
Chloroal 8 1.0063 0.4010 (------*-----)
Esters 15 0.3300 0.2076 (----*----)
----+------+------+------+--
Pooled StDev = 0.2597 0.30 0.60 0.90 1.20
Based on the description of the problem and the Minitab output, answer the following questions.
a)What will be your null hypothesis and your alternative hypothesis.
b)How many levels do you have in your factor? What are they?
c)What will be your estimate of the common variance of the three populations?
d)What will be your conclusion with regards to your hypothesis, and what is the basis of your conclusion?
e)Which population mean would you conclude is different from the other two?
1