Homework 12

  1. Amos wants to test ifincreasing the volume on his speakers will decrease a computer’s bit rate. He has pieces of his ANOVA table from his regression. (Assume the assumptions for regression have been met). Fill in the six missing values on the table.

df / SS / MS / F / P-value
Model / 1 / 5.8644 / 5.8644 / 3.240 / 0.075
Error / 96 / 173.76 / 1.81
Total / 97 / 179.6244
  1. Ross thinks the four people who sit next to him in class spend the same amount of money on their haircut. To find out he keeps track of how much their haircuts cost for the next couple of months. Assuming the cost of a haircut is random and that the sample sizes are large enough to assume normality, and that the variances are the same for all four people, test if the average cost is equal for all four people (the tip is included in the price).

Use all 7 steps of the hypothesis procedure and use α=0.05.

Name / Number of Haircuts / Average Cost / Standard deviation for one haircut
Tomson / 33 / 18 / 11
Lawrence / 35 / 11 / 10
Julien / 41 / 13 / 8
Ty / 32 / 14 / 12
ANOVA / df / SS / MS / F / P-value
Group / Missing
Residual
Total / 15178.6

For your information, the pooled standard deviation is 10.215. The table below shows the values for the F distribution (in other words, the critical F which has 0.05 in the right tail. This can be used instead of a p-value in step 5 to decide whether to reject or not). The first small number is the degrees of freedom for groups, second is the degrees of freedom for error, and the third is 0.05 for alpha.

F1,1,.05= 161.4 / F2,1,.05= 199.5 / F3,1,.05= 215.7 / F4,1,.05= 224.6
F1,31,.05= 4.16 / F2,31,.05= 3.305 / F3,31,.05= 2.911 / F4,31,.05= 2.679
F1,137,.05= 3.91 / F2,137,.05= 3.062 / F3,137,.05= 2.671 / F4,137,.05= 2.438

H0: µtomson= µlawrence= µjulien= µty

HA: Not all the means are the same

(note: µtomson ≠ µlawrence ≠ µjulien ≠ µty would not be an acceptable alternative hypothesis, since, for example, it’s possible that only one of the means is different)

α=0.05

G = 4

N = 141

To get the SSE:

Method 1:

The MSE is the pooled variance: 10.2152=104.35

The MSE is found by taking the SSE over the df=N-G=141-4=137

So the SSE=104.35*137 = 14296 (there may be a little rounding error)

Method2:

To get the SSG:

Method 1:

After finding the SSE is 14296, subtract it from the SST of 15178.6 to get 882.6

Method 2:

First find the overall average:

ANOVA / df / SS / MS / F / P-value
Group / 3 / 882.6 / 294.2 / 2.82 / Missing
Residual / 137 / 14296 / 104.35
Total / 140 / 15178.6

Now we don’t have a p-value, but we do have a critical F value. With the degrees of freedom (3,137) then the critical F we are looking for is the 2.671.

Since 2.82 is more than 2.671, then we are further out in the tail. That means we are in the rejection region, so we reject the hypothesis.

Conclude that Ross’ friends do not all spend the same amount on their haircuts.

  1. Ablative spray paint is used on government buildings because it helps make the walls stronger to resist terrorist attacks. It has been suggested in recent literature that it may be a fire hazard. To test these claims civil engineers are going select 1000 walls and randomly choose some to be covered with ablative spray paint. They will throw grenades at each wall and record whether the wall catches fire.

Assume you are asked to select and alpha other than 0.05. Choose your alpha and explain why.

H0: The proportion that catch fire with the paint = proportion that catch fire without the paint

HA: The proportion that catch fire with the paint is greater (the paint is a fire hazard)

Type 1 error: The paint is not a fire hazard but we claim it is

Type 2 error: The paint is a fire hazard and we say it is safe

Students who have low alpha should mention the need to protect government buildings

Students who have a high alpha should mention the danger of creating a fire hazard

  1. Kyle surveys 500 people and records their gender and if they are a vegetarian. His data is shown below. Test whether gender is related to being a vegetarian.

Vegetarian / Not a vegetarian
Male / 22 / 218 / 240
Female / 28 / 232 / 260
50 / 450 / 500

H0: Gender is not related to whether you are a vegetarian

HA: Gender is related to whether you are a vegetarian

α = 0.05

Method 1:

p-value = 0.2743 * 2 = 0.5486
Method 2:

p-value = 0.2743 * 2 = 0.5486
Method 3:
E / Vegetarian / Not a vegetarian
Male / 24 / 216 / 240
Female / 26 / 234 / 260
50 / 450 / 500
Chi2 / Vegetarian / Not a vegetarian
Male / 0.1667 / 0.0185
Female / 0.1538 / 0.0171
0.3561
Χ21=0.3561
p-value > 0.25 (the p-value happens to 0.5486 if you use a computer)

p-value>α

Fail to Reject

There is not enough evidence to conclude that the decision to be a vegetarian is related to a person’s gender

  1. TAMU admissions board believes the score you get on the SAT in high school can help predict your college GPA. Below is a regression model using the SAT scores and GPA for 100 college graduates. Calculate a 98% Confidence Interval for the slope of the regression line.

(Note: The SAT scores have been divided

by 100 just to make the numbers nicer

Simple linear regression results:
Dependent Variable: GPA
Independent Variable: SAT/100
GPA = 1.3186783 + 0.11072124 [SAT/100]
Sample size: 100
R (correlation coefficient) = 0.7751
R-sq = 0.6007698
Estimate of error standard deviation: 0.1911169


Parameter estimates:

Parameter / Estimate / Std. Err. / DF / T-Stat / P-Value
Intercept / 1.3186783 / 0.16711824 / 98 / 7.8906903 / <0.0001
Slope / 0.11072124 / .091174900 / 98 / 1.2143823 / 0.1244

t80 = 2.374, 0.1107 ± 2.374 (0.091174) = (-0.10574, 0.32714)

  1. Santa wants to use regression to learn how sleigh weight affects reindeer speed. He uses 20 different weights, and measures the reindeer speed at each weight. The output from his regression is below. When he did SSE, SSR, and SST, Santa calculated those with n=20 by hand.

SUMMARY OUTPUT
Regression Statistics
Multiple R / 0.759706825
R Square / 0.577154460
Adjusted R Square / 0.553663041
Standard Error / 6.350124954
Average Speed
(in kilometers/second) / 34.75
Observations / 20
Coefficients / Std Error / t Stat / P-value
Intercept / 48.76533 / 3.164064 / 15.41225 / 8.18E-12
Weight (in kilotons) / -0.94029 / 0.189702 / -4.95669 / 0.000102

Then Santa’s statistical elf pointed out that Santa forgot to include the last data point when he did the SSR, SSE, and SST by hand (although he did remember to use n=20). What would the correct values be for SSR, SSE, and SST, if the data point that Santa forgot was: at 19 kilotons speed was 41 kilometers/second.

First the notation:

yi=41

ybar = 34.75

yhat = 48.76533 - .94029*19 = 30.8998

SSR = sum[ (yhat – ybar)2 ] = 975.89 + (30.8998-34.75)2 = 990.71

SSE = sum [ (yi – yhat)2 ] = 623.82 + (41-30.8998)2 = 725.83

SST = sum [ (yi – ybar)2 ] = 1677.48 + (41 – 34.75)2 = 1716.55

An alternative way:

SST = sum[ (yi – ybar)2 ] = 1677.48 + (41 – 34.75)2 = 1716.55

R2 = SSR/SST so .57715=SSR/1716.55 so then SSR = 990.71

SSR+SSE=SST, so SSE=1716.55-990.71 = 725.83

  1. Data was collected for the average number of traffic citations per month given by Officer Smith and Officer Jones of the Highway Patrol. The last five months were looked at for both officers. Officer Smith had an average of 94 tickets with a standard deviation of 8.7 tickets. Officer Jones had an average of 98.6 tickets with a standard deviation of 6.4 tickets. The standard deviation of the differences in each month was 1.52 tickets. Using the .05 significance level, test the claim that there is a mean difference in the number of citations given by the two officers.

Cannot do

  1. Monica believes that the number of musicals a person sees depends on their gender. Houston says it depends on their age. To find out what really matters they randomly get 80 old women, 90 young women, 75 old men, and 36 young men. Assume the value of the standard deviation for each group should be 2.10, and the overall average number of musicals is 7.82. Calculate the test statistic if you know that for each data point

This is categorical by numerical, so this is ANOVA, the test statistic is F

The standard deviation is 2.1, so the MSE = 2.12 = 4.41

The sum of the squared distance from each data point to the overall average is SST

DF / SS / MS / F
Group / 3 / 11025.43 / 3675.14 / 833.3
Error / 277 / 1221.57 / 4.41
Total / 280 / 12247
  1. Santa wants to know what type of food will make his reindeer calves gain weight. He tries four different types of food and gets an overall average of 24.93 with a pooled standard deviation of 5.17. Note the p-value was nearly zero. Santa decides to keep feeding his reindeer hay.

Number of Calves / Standard deviation
(for each calf) / Mean growth weight (in pounds)
Hay / 4 / 4.6 / 41.228
Dog Food / 4 / 5.2 / 34.281
Cat Food / 6 / 4.8 / 41.228
Fish Food / 6 / 5.8 / -8.471

Based on the data show what Santa’s hypothesis test should have looked like.

H0: Food type is independent of growth weight

Ha: Food type depends on growth weight

Alpha=0.05

SSG = 4*(41.228-24.93)^2+4*(34.281-24.93)^2+6*(41.228-24.93)^2+6*(-8.471-24.93)^2 = 9699.77

SSE = sum[ (n-1) s2 ] = (4-1)*4.62 + (4-1)*5.22 + (6-1)*4.82 + (6-1)*5.82 = 428

Or 5.172 = SSE/(20 – 4) so SSE = 5.172 * (20-4) = 428

ANOVA / df / SS / MS / F / P-value
Group / 3 / 9699.77 / 3233.26 / 120.86 / 0
Residual / 16 / 428 / 26.75
Total / 19 / 10127.77

Reject

Conclude the type of food really does matter (duh – fish food? Really Santa)?

  1. You plan to fly from New York to Chicago and have a choice of two flights. You are able to find out how many minutes late each flight was for a random sample of 25 days over the past few years. (You have data for BOTH flights on the same 25 days.) For Scairline the average delay is 31 minutes with a standard deviation of 12 minutes. For PilotAirOr the average delay is 48 minutes with a standard deviation of 20 minutes. The matched pairs standard deviation is 10.1 minutes. Test whether either flight has a higher average delay assuming normality.

H0: µd=0

Ha: µd≠0

α=0.05

t24=(31-48)/(10.1/sqrt(25))=-8.41

p-value off the chart ≈ 0

Reject

Our data shows one of the airlines (PilotAirOr) is significantly higher than the other

  1. Michael is testing 7 different insecticides on fire ants. For each insecticide he sprays a group of 49 fire ants and notes the time it takes for the ants to die. Assume that ant deaths are normally distributed and each group of ants should have a different standard deviation. Michael then did an ANOVA test with a 5% significance level and got an F value of 2.02. The computer gave a right tailed area of 0.062. What conclusion do you think Michael should get from his results?

No conclusion the assumption of equal variances is not met

  1. Thomas knows horses live longer than pigs, so he doesn’t need to test it, but he does want a 99% confidence interval for how much longer they live. Assume the variances for both groups are not the same. A random sample of each type of animal is shown below. Calculate the 99% confidence interval.

Horses:

Sample size: 81 horses

Sample average: 15 years

Sample standard deviation: 3.5 years

Pigs:

Sample size: 81 pigs

Sample average: 12 years

Sample standard deviation: 1.2 years

Pooled variance: 6.93

Matched Pairs variance: 2.82

Weighted variance: 5.52

  1. Brittany has developed a cure for dogs poisoned with antifreeze, but it doesn’t always work. One experiment had 19 dogs out of 1000 survive antifreeze poisoning, but now only 16 out of 100 will die. Find a 94% confidence interval (Not Hypothesis test) for the difference in the percent of dogs that die from antifreeze poisoning with Brittany’s new medicine.

Note: It doesn’t matter that one sample has 10 times as many as the other sample, and the percentage of dogs who die without the cure is (1000-19)/1000 = 981/1000

n1π1=981 n1(1-π1)=19 n2π2=11 n2(1-π2)=89 So the assumptions for normality are met

= (0.7516, 0.8904)

  1. Jazz wonders if the height of an engineer determines how many complaints they get. She surveys the height of engineers. Assume complaints are normally distributed, and that the variances should be pooled. The engineers are categorized according to whether they get few, some, or many complaints. The data is shown below. Also is shown some math that may be helpful.

Test Jazz’s hypothesis that the average height is different according to the number of complaints using all 7 steps of a hypothesis.

Few / Some / Many
Data / 6.2 / 5.9 / 5.1
5.8 / 6.1 / 5.4
5.9 / 6.1
5.5
Average / 6.00 / 5.85 / 5.53
Std dev / 0.283 / 0.251 / 0.513

Overall average: 5.77

P-value = 0.3845

There are three groups, the data is normal, and the variances should be pooled. This is an ANOVA test. The first equation describes the SST, the second equation is SSE. The SSG is either the difference (1.096 – 0.796)=0.300 or you can use the equation = 0.304 (the difference is from rounding in the standard deviations)

DF / SS / MS / F
Groups / 2 / 0.300 / 0.150 / 1.13
Error / 6 / 0.796 / 0.133
Total / 8 / 1.096

H0: μ1= μ2= μ3

HA: At least one mean is different

α=0.05

F=1.13

p-value=0.3845 (this was given in the problem)

Fail to Reject

There is not sufficient evidence to suggest that the number of complaints is different depending on the height of the veterinarian.

  1. An experiment was run to compare four types of metal plating on the resistance of a sword to rust. The sample sizes were 25, 15, 181 and 22, the means were 50, 55, 48, 53, and the corresponding estimated standard deviations were 17, 24, 8 and 19. The overall average was 49, and the total sample size was 243. Test if metal plating changes the resistance if the p-value is 0. The following equations may or may not be helpful:

H0: metal plating all the same

Ha: metal plating not all the same

Alpha:0.05

DF / SS / MS / F
Groups / 3 / 1098 / 366 / 2.565
Error / 239 / 34101 / 142.68
Total / 242 / 35199

p-value=0

Reject

The different metal platings do not all have the same average

  1. There are three companies (A, B, and C) trying to bid for construction of a preschool. The preschool is worried about getting a bad company. They randomly select months (3 from company A, 4 from company B, and 5 from company C) and find the number of complaints the company had for those months. The data is shown below. Test with 5% significance whether it matters which company they choose (Assume normality and equal variances. Use all 7 steps of a hypothesis. As a hint, the p-value is 0.10)

Company A: 0, 19, 38

Company B: 0, 16, 24, 48

Company C: 16, 39, 64, 68, 68

H0: The means are the same

Ha: The means are not the same

Alpha:0.05

2 / 2690.667 / 1345.333 / 2.998514 / 0.100477
9 / 4038 / 448.6667
11 / 6728.667

Fail to Reject

We cannot say that some companies have more complaints than any others.