Exam 2 R Take home
Spring, 2018
EACH OF THE 8 QUESTIONS IS WORTH 12.50 POINTS.
Upload into R the csv file NewStatsCity14999.csv (from the Moodle page or from your computer if you already have it) and give it the name data. Then use the code
data = read.csv(file.choose())
attach(data)
plot(density(RawHappiness), xlim = c(30,70), ylim = c(0, .10), col = "blue", xlab = "RawHappiness", ylab = "Proportion Frequency", lwd = 5, main = "Histogram, Density Curve, and Theoretical Curve")
hist(RawHappiness, add = T, breaks = 50, freq = F, col = "red", angle = 45, density = 10 )
curve(dnorm(x, mean(RawHappiness), sd(RawHappiness)), from = 30, to = 70, add = T, col = "green", lwd = 5)
Happiness.sample = RawHappiness[sample(1:14999, 200)]
plot(density(Happiness.sample), xlim = c(30,70), ylim = c(0, .10), col = "blue", xlab = "RawHappiness", ylab = "Proportion Frequency", lwd = 5, main = "Histogram, Density Curve, and Theoretical Curve")
hist(Happiness.sample, add = T, freq = F, col = "red", angle = 45, density = 10 )
curve(dnorm(x, mean(RawHappiness), sd(RawHappiness)), from = 30, to = 70, add = T, col = "green", lwd = 5)
attach(data)
First plot the empirical density function for the RawHappiness scores using
plot(density(RawHappiness), xlim = c(30,70), ylim = c(0, .10), col = "blue", xlab = "RawHappiness", ylab = "Proportion Frequency", lwd = 5, main = "Histogram, Density Curve, and Theoretical Curve")
then add to the plot a histogram of the entire population of scores, using the code
hist(RawHappiness, add = T, breaks = 50, freq = F, col = "red", angle = 45, density = 10 )
Finally, using the population mean and standard deviation for the variable RawHappiness, plot the theoretical curve assuming that RawHappiness is normally distributed by using the code
curve(dnorm(x, mean(RawHappiness), sd(RawHappiness)), from = 30, to = 70, add = T, col = "green", lwd = 5)
Now, take a sample of 200 RawHappiness scores, using the code
Happiness.sample = RawHappiness[sample(1:14999, 200)]
and create the same graph for this sample as you did above for the population of RawHappiness scores, with its three parts, the density plot, the histogram, and the theoretical curve. Be sure to remove breaks = 50 from the histogram code. For the theoretical curve, continue to use the population mean and sd. SAVE THE Happiness.sample for the next problem.
2. To find the empirical proportion of scores between 60 and 65 we would count the number of scores in that interval and divide by the size of the sample, 200. We can do that using the code
sum(Happiness.sample>= 60 & Happiness.sample<= 65)/200
Find the empirical proportion of scores in the sample that are between 40 and 50 and compare it with the theoretical prediction based on the assumption that the population is a normal distribution and it has mean(RawHappiness) and sd(RawHappiness) as its mean and standard deviation. SAVE THE Happiness.sample for the next problem.
3. To find the score in the sample that has 20 per cent of the scores less than or equal to it we can use the code
sort(Happiness.sample)[40]
which first orders the scores from smallest to largest and then finds the 40th score in that sequence because 40 is 20 per cent of the sample size, 200.
a) Find the score that has 60 per cent of the scores at or below it.
b) Compare it to the theoretically predicted score that would have 60 per cent of the distribution below it, using the RawHappiness population parameters and the assumption of normality. By compare I mean judge whether they are sufficiently similar to justiofy the assumption of normality.
4. Create a plot that shows the probability distribution for the number of Heads in 15 flips of a coin that is biased so that the probability of a Heads happening on any flip is .7 rather than .5. Use the parameter type = "o" in the plot code.
5. Find the probability that a score sampled from the RawHappiness distribution with mean equal to mean(RawHappiness) and standard deviation equal to sd(RawHappiness) is:
a) between 50 and 65
b) less than 45
c) between 30 and 55
d) greater than 57
6. A barrel contains many colored balls. Half of them are red and half are blue. The experiment is to reach into the barrel and pull out at random 15 balls, one at a time, noting each ball's color and return the ball to the barrel before selecting the next one.
a) What is the probability that less than 10 of the balls in your sample are red? Call this probability the probability of Event A.
b) What is the probability that the number of balls in your sample that are red is 8, 9, 10, 11,or 12? Call this probability the probability of Event B.
c) What is the probability that Event A or Event B occurs?
d) What is the probability that Event A and Event B both occur?
7. How high above the x score of 100 is the density curve for a normal distribution with a mean of 90 and a standard deviation of 10?
8. a) Take 10 different samples of 20 scores from the normal distribution with a mean of 125 and a standard deviation of 15. Find the mean of each sample and display them.
b) Now take 10 different samples of 100 scores from the normal distribution with a mean of 125 and a standard deviation of 15. Display them.
c) Describe and explain the differences between the two sets of sample means.