Stat 4220 homework
1)Santa noticed that a lot of people put candy canes on their trees. He wants to evaluate whether the color of a candy cane is related to the type of Christmas tree. After randomly selecting 3054 houses Santa collected the following data. Answer all three questions at the bottom of the paper.
Observations / Color of Candy CaneRed/White / Blue/White / Solid Red / Brown / Black with Green dots
Type of Christmas Tree / Classic / 298 / 201 / 132 / 97 / 19
Flocked / 301 / 210 / 162 / 101 / 15
Artificial / 300 / 203 / 148 / 112 / 13
Aluminum / 293 / 212 / 132 / 103 / 2
Expected Counts / Color of Candy Cane
Red/White / Blue/White / Solid Red / Brown / Black with Green dots
Type of Christmas Tree / Classic / 292 / 202 / 140 / 101 / 12
Flocked / 308 / 213 / 148 / 107 / 13
Artificial / 303 / A / 146 / 105 / 12
Aluminum / 290 / 201 / 139 / 100 / 12
Partial Chi-Squared / Color of Candy Cane
Red/White / Blue/White / Solid Red / Brown / Black with Green dots
Type of Christmas Tree / Classic / 0.142 / 0.005 / 0.502 / 0.160 / B
Flocked / 0.157 / 0.054 / 1.267 / 0.304 / 0.433
Artificial / 0.027 / Censored / 0.032 / 0.475 / 0.024
Aluminum / 0.040 / 0.638 / 0.399 / 0.070 / 8.241
Chi-Squared Value (meaning the sum of all the values in the completed table above): 17.30
What are the missing values for A and B
A=209.88 to 210 depending on how you round
B=4.106
2)A journalist want to estimate the proportion of students who are in debt. The current best guess is that the proportion should be about 75%. She wants to get a 60% confidence interval for the true proportion with a margin of error that is less than 0.013. How many students should she survey?
.013=.84*sqrt(.75*(1-.75)/n)
N=783
3)The Daily Stat Fact reports that over 10% of engineersget a job that is not engineering related. I think the report is way off. I randomly sample 625 engineers and find that 49 of them got a job that was not engineering related. Test whether the percentage reported really is too high
H0:p≥.1
Ha:p.1
α=0.05
z=(49/625-.1)/sqsrt(.1*.9/625)=-1.8
p-value=.036
Reject
Our data does show the proportion of 10% is too high
4)Harry Potter believes that he can tell if a person is a bad guy by listening to the background music when they come near. To find out if this is the case, Harry records what type of music he hears around 114 random people. Then Harry performs the Crucius curse to determine if the person is a good guy or bad guy. Based on the following data, determine if the type of background music is related to the person’s allegiance.
AllegianceGood Guys / Bad Guys
Background
Music / Ominous Music / 45 / 13
Happy Music / 38 / 18
Show all the steps of the hypothesis using specifically a Χ2 test of independence!
H0: Allegiance is independent of music
Ha: Allegiance depends on music
α=0.05
AllegianceGood Guys / Bad Guys
Background
Music / Ominous Music / 42.2 / 15.8
Happy Music / 40.8 / 15.2
Allegiance
Good Guys / Bad Guys
Background
Music / Ominous Music / .18196 / .48717
Happy Music / .18845 / .50457
Df=1
Chisquared = 1.167
.20<p-value<.25
Fail to Reject
Our data does not show you can tell who is the bad guy based on the music
5)Katelyn has discovered that salt-licks from the Great Salt Lake are normally distributed, but they contain trace amounts of arsenic. She asks four of her friends to buy a salt-lick and measure the amount of arsenic. Here are their results:
Raul: 28 cc
Blaine: 44 cc
Madison: 32 cc
Leanne: 20 cc
Using their data find a 98% CI for the amount of arsenic in a salt-lick
Xbar = (28+44+32+20)/4=31
S=sqrt(((28-31)^2+(44-31)^2+(32-31)^2+(20-31)^2)/(4-1))=10
31+-4.541*10/sqrt(4) = (8.295, 53.705)
6)Suppose you are testing whether green runts cause cancer. You have a large group of people who regularly eat runts, and a large group that never eat runts, you will mark which ones develop cancer before they die. The Willy Wonka Candy Company is worried that if a link is found to cancer that it would be devastating. They ask you to be extra cautious not to hurt the company’s image unless you’re absolutely certain about the results.
Choose an α level besides 0.05 and explain why.
H0: p1=p2 (runts do not cause cancer)
Ha: p1 ne p2 (runts do cause cancer)
Type 1: We say runts do cause cancer, but they do not
Type 2: We say runts don’t cause cancer when in fact they do
We were asked to avoid type 1 errors, so we should lower alpha
(small alpha – but I suppose students could say we want to guard against cancer and the Willy Wonka company be torqued)
7)Some buildings in Laramie have been having problems with insects nesting inside the walls. A supervisor has suggested that it could be based on whether the building has iron supports or steel supports. Based on the data below, use any method you like to test whether that could be true.
Iron / SteelInsect problems / 120 / 140 / 260
No insect problems / 250 / 230 / 480
370 / 370
H0: the metal type is independent of the insect problem
HA: the metal type is dependant of the insect problem
Alpha=0.05
If you do a proportions test the z-score should be ±1.54
If you do an independence test the chi-squared should be 2.37
P-value=.1236 or
.1<p-value<.15
Fail to Reject
Our data does not show that the metal type is related to the insect problem.
8)Donald Trump just finished studying 96 business, and has classified them according to the amount of risk the companies take (high, medium, or low), and what type of company (large, small, personal, or not-for-profit). His final conclusion is that the amount of risk a company takes does not depend on the type of company.
Bill Gates says that is so not true. He says different types of companies have different types of risk levels. To keep the two from arguing you decide to compute the χ2Test of Independence. When you hand the paper to Donald and Bill, they fight over it and tear the corner of the report (see the picture below).
Determine statistically who you would say the data supports.
As a hint, the partial χ2 values that you can see add up to 13.19, and the assumptions are met for the test.
H0: company risk level is independent of size
Ha: company risk level depends on size
Alpha = 0.05
High risk non-profit chi-squared value is (8-5)^2/5=1.8
High risk large corporation expected value is (10+6+1)*30/(32+34+30)=5.3125
High risk large corporation chi-squared value is (5.3125-1)^2/5.3125=3.5
Chi-squared = 13.19+1.8+3.5=18.49
.015<p-value<.02
Reject
We can say the business risk level depends on the size (Bill Gates was right)
9)Doctor Ann randomly selects 40 people to crack their knuckles daily, and 40 people to never crack their knuckles. Doctor Bob selects 40 pairs of twins and one twin will crack their knuckles daily and the other not. After 10 years they measure the amount of arthritis. Who will have a more powerful test?
a)Dr. Ann’s test is more powerful because Doctor Bob’s 80 subjects are only 40 pairs of twins so his results will be similar to having a smaller sample size.
b)Dr. Bob’s test is more powerful because taking the difference between twins will take out variability due to the genetics of each subject
c)Dr. Bob’s test is more powerful because it is very unlikely that two different sets of twins will be related to each other which increases the chance that they were selected randomly
d)Dr. Ann’s test is more powerful because the people who do not crack their knuckles will act as a control group in the experiment where they are not twins
e)Dr. Ann’s test is more powerful because the subjects do know which treatment they are getting beforehand and it will reduce the risk of a placebo effect
10)Dr. Carl asks 1000 people to rate whether they “crack their knuckles frequently”, “crack their knuckles sometimes”, and “almost never crack their knuckles”. Then he evaluates if they have arthritis in their hands. What kind of test should Dr. Carl run to analyze this data assuming the conditions are met?
A) 2 proportions z test B) One mean t-test C) Regression D) Matched Pairs E) Chi-squared
11)A genetics test is attempting to see if there is a relationship between nose type (Long, Medium, and Flat) and diet (Poor, Somewhat Healthy, and Healthy). Below is the data and output from a computerized Χ2 program.
OBS / Long / Med / FlatPoor / 10 / 15 / 8
Some / 12 / 16 / 2
Healthy / 15 / 9 / 4
EXP / Long / Med / Flat
Poor / 13.4 / 14.5 / 5.1
Some / 12.2 / 13.2 / 4.6
Healthy / 11.4 / 12.3 / 4.3
Χ2 / Long / Med / Flat
Poor / 0.87 / 0.02 / 1.68
Some / 0.003 / 0.60 / 1.48
Healthy / 1.15 / 0.89 / 0.02
Test whether there is a relationship between nose type and diet.
There are two categories (Flat Some and Flat Healthy) which have fewer than 5 expected values, so this cannot be done
12)A test to determine if major is related to social skills looks at 4 different majors and whether the student has social skills. The test has a p-value of 0.55. What is the conclusion?
A) Because the number of majors is less than 5, no conclusions can be drawn.
B) The p-value is less than α, so there is evidence to suggest a link between major and social skills.
C) The p-value is greater than α so there is not evidence to suggest a link between major and social skills.
D) The p-value is greater than α, so there is evidence to suggest a link between major and social skills.
E) The p-value cannot be great than ½, so an error was made
13)The NYTimes did a study on the proportion of football players that have sustained a head injury.
Their 95% confidence interval based on 109 random NFL players was (0.571, 0.629).
Check which of the following (if any) are true.
There is a 95% probability that the proportion is between 0.571 and 0.62995% of the time the true proportion will be between 0.571 and 0.629
This sample was not large enough to be able to use the normal distribution by the Central Limit Theorem
X / 95% of all confidence intervals from 109 NFL players will correctly contain the true proportion
X / The true proportion is between 0.571 and 0.629 with 95% confidence
For a new CI there is a 95% probability of the sample proportion being between 0.571 and 0.629
14)A sociologist wants to show that the food you eat actually changes your perception of how other people are feeling. She gathered 1000 volunteers, and randomly selected what food they would eat. Then she asked them to look at a photograph (of a person showing no emotion) and asked them to mark what emotion they thought the person was experiences. The data is shown below. Test at the 1% significance level (with all 7 steps of a hypothesis) if the food they ate is related to the emotion chosen.
Happy / Angry / Sad / Surprised / Sleepy / ScaredChocolate / 22 / 16 / 30 / 9 / 39 / 44 / 160
Oranges / 25 / 32 / 48 / 8 / 66 / 65 / 244
Breadstick / 33 / 29 / 32 / 11 / 51 / 25 / 181
Salad / 31 / 46 / 65 / 21 / 98 / 102 / 363
Steak / 11 / 8 / 7 / 6 / 8 / 12 / 52
122 / 131 / 182 / 55 / 262 / 248 / 1000
The expected value for the surprised steak group is 52*55/1000 = 2.86, which is not greater than 5, this problem cannot be done.
15)Google wants to know if the type of browser you use determines what you do on the internet. They installed spyware on 400 random computers and got the following data
Social Media / Games / WorkFirefox / 31 / 49 / 80 / 160
IE / 32 / 57 / 70 / 159
Chrome / 18 / 15 / 48 / 81
81 / 121 / 198 / 400
Test whether what you do on thecomputer is related to the type of browser you use.
H0: Browser is independent of internet use
HA: Browser use is dependent on internet use
α=0.05
Social Media / Games / WorkFirefox / 32.4 / 48.4 / 79.2 / 160
IE / 32.197 / 48.097 / 78.70 / 159
Chrome / 16.402 / 24.502 / 40.09 / 81
81 / 121 / 198 / 400
Social Media / Games / Work
Firefox / 0.060493827 / 0.007438 / 0.008081
IE / 0.001211468 / 1.647788 / 0.962798
Chrome / 0.15558642 / 3.685236 / 1.558524
Chisq=8.08
0.05 <p-value < 0.10
Fail to Reject
We cannot say internet use depends on the browser
16)Nick knows the UW football team is better than CSU, but he wants to compare their average rushing yards. He is fairly certain that the rushing yards are normally distributed with the same variance for both teams.
He randomly selects 11 UW games, and the average rushing yards were 110.
He randomly selects 7 CSU games, and the average rushing yards were 93.
Can Nick say with 99% confidence that UW has more rushing yards than CSU?
Standard deviation of one game for UW: 16 yards
Standard deviation of one game for CSU: 13 yards
Pooled standard deviation for one game: 15 yards
Matched Pairs standard deviation for one game: 7.5 yards
Average standard deviation for both teams: 14.5 yards
H0: μUW≤ μCSU
HA: μUWμCSU
α = .01
The rushing yards are normally distributed
There are (at least) 3 different ways of doing the next steps, but you must use the pooled standard deviation for all of them because it said the variance was the same for both teams
METHOD I (hypothesis test)
.01< p-value <.02
Since p-value> α fail to reject the null
METHOD II (confidence interval)
99% CI for μUW- μCSU: (110-93) ± 2.921 *Sqrt( 152/11 + 152/7)
= {-4.184. 38.184}
Since 0 is in the confidence interval, we fail to reject the null
METHOD III (pair of confidence intervals)
For UW : 110 ± 2.921 *Sqrt( 152/11 ) = {96.789. 123.211}
For CSU : 93 ± 2.921 *Sqrt( 152/7 ) = {76.439, 109.561}
Since the confidence intervals overlap, we fail to reject the null
Conclude that the claim is false, there is not enough evidence to suggest that UW has more rushing yards on average than CSU.