YMS Chapter 9 Sampling Distributions
Q1. A parameter (which begins with p) is a number describing a _____; a statistic (which begins with s) is a number describing a ______.
Q2. What symbols are used in our book’s notation to represent a population mean, sample mean, sample proportion, and population proportion, respectively?
Q3. Suppose you were to take a large number of samples (all the same size) from a population, compute the mean of each, and plot a histogram of the sample means that you obtain. This histogram would approximate the shape of the ______ ________ of the mean.
Q4. The sampling distribution for a proportion or mean changes as the number in the sample increases: the mean of that sampling distribution (increases, stays the same, decreases) and the variance of the sampling distribution (increases, stays the same, decreases).
Q5. If the mean of a sampling distribution is the true value of the parameter being estimated, we refer to the statistic used to estimate the parameter as being _____.
Q6. True or false: if a statistic is unbiased, the value of the statistic computed from the sample equals the population parameter.
Q7. True or false: the variability (and thus the accuracy) of statistics are very sensitive to the size of the population from which the samples are drawn.
Q8. An organization wants to sample with equal accuracy from each state of the USA. Would it make more sense to sample 2000 from each state, or 1% of each state?
Q9. To review from Chapter 7, on the binomial distribution: what are the mean and standard deviation of a binomially distributed variable X, where p is the (population) probability of success, q is the probability of failure, and n is the size of the sample?
Q10. What are the mean and sd of sample proportion, which is X/n where X is binomially distributed (but X/n is not binomially distributed)?
Q11. If you want a standard deviation for a sample proportion that is half as big as some other one, you have to get a sample that is how many times bigger?
Q12. If the sample is a substantial fraction of the population, then the assumption of independence that leads to the binomial distribution is violated. How many times bigger should the population be than the sample, so that we don’t worry about this?
Q13. True or false: The standard deviation of the sampling distribution of a proportion is only approximately sqrt(pq/n); this approximation is most accurate when np>=10 and nq>=10.
Q14. If you know the population proportion, how do you use the normal approximation to figure out the probability that the proportion obtained from a random sample of size n will be between two given values?
Q15. How do the sampling distributions of means compare with the distributions of individual observations? They are less _____ and more _____.
Q16. Suppose you have a population with mean mu and sd sigma. What are the mean and sd of the sampling distribution for means with sample size n?
Q17. Under what conditions will the sampling distribution of the mean have an exact normal distribution, no matter what the sample size is?
Q18. What does the central limit theorem tell us?
Q19. True or false: suppose that income in a large country is not normally distributed, but is very skewed. The central limit theorem tells us that if we were to collect several very large samples and compute the mean income for each sample, those means would be approximately normally distributed, even though the incomes in the population are not normally distributed.
Q20. Why do you think the central limit theorem is so “central” to statistics?
YMS Chapter 10 Introduction to Inference
Section 1
Q1. Statistical inference consists in drawing conclusions about a ____ from data in a ____.
Q2. If the standard deviation of a population is sigma, what is the sd of the sampling distribution for the sample mean (this is often called the standard error of the mean) with sample size n?
Q3. Suppose we know that the sd of the sample mean (a.k.a. standard error of the mean) is 4.5. This implies that if we were to draw many samples from the population, about 95% of these sample means would fall within what interval?
Q4. True or false: we should imagine the sample mean as being at the center of a bell-shaped curve, with 2 standard deviations of the sample mean (a.k.a. standard errors) on either side of this point encompassing 95% of the other sample means.(Assume the sample is an srs and the sample means are normally distributed.)
Q5. True or false: The reasoning we use in making confidence intervals around a sample mean is as follows: if the sample mean is normally distributed, then 95% of the time, x-bar will be within 2 sample standard deviations (standard errors) of the population mean, mu. Whenever x-bar is within 2 standard errors of mu, mu is within 2 standard errors of x-bar. So if we make an interval + or – 2 standard errors around x-bar, that interval will encompass mu for 95% of the sample means we obtain.
Q6. A confidence inteval has two parts: 1) the interval itself (usually expressed as an estimate plus or minus a margin of error) and 2) ______________.
Q7. Someone says, “I read that the 95% confidence interval for a certain group’s score on a certain test was 115 to 128. That means that 95% of all the members of the group score in that range.” Is this an accurate interpretation? If not, please give a better one.
Q8. In order to construct a confidence interval for a mean, what two conditions need to be met?
Q9. A first person says, “I want a 90% confidence interval. So I’ll look in the normal table for the z-score with 95% of the area to the left of it.” A second person says, “You mean 90% of the area, don’t you?” What is the correct way to look in the table?
Q10. What are the “tail areas” you look for, for confidence intervals of .90, .95, and .99, respectively?
Q 11. If C is the confidence level, what is the expression for the area to the right of the interval subsuming fraction C of the distribution for sample means?
Q12. What does the symbol z* stand for?
Q13. True or false: The values mu- z* sigma/sqrt(n) and mu + z* sigma/sqrt(n) represent the upper and lower bounds for the confidence interval for the mean.
Q14. If my wife’s age falls in the interval of my age plus or minus 5 years, then my age must fall within the interval of my wife’s age plus or minus 5 years. Is this true, and is this sort of reasoning central to the reasoning about confidence intervals?
Q15. True or false: The way in which the statement in the previous question has its analogy in the reasoning about confidence intervals is: any time the sample mean falls within the interval of mu plus or minus the margin of error, then the population mean must fall within the interval of x-bar plus or minus the same margin of error.
Q15. True or false: the values x-bar –z* sigma/sqrt(n) and x-bar + z* sigma/sqrt(n) form the upper and lower bounds for the confidence interval for the mean (assuming the assumptions are met).
Q16. Example 10.5 on page 546 is worthy of careful study. What are the 4 steps that were exemplified in using confidence intervals?
Q17. Please tell whether the margin of error (which is half the width of the confidence interval), or the width of the confidence interval itself, gets bigger or smaller under each of the following circumstances: a. the population standard deviation gets smaller, b. the level of confidence C gets bigger (e.g. a move from a 90% confidence interval to a 99% confidence interval) c. the sample size gets bigger, and d. the population size gets bigger?
Q18. Is it preferable in research for a 95% confidence interval to have its upper and lower bounds closer together, or farther apart?
Q19. Suppose you are a researcher planning a study, and you are deciding how many subjects to enroll. You want a certain margin of error m. You know what level of confidence you want, and you know (or estimate) the sigma for the population. How do you figure out the sample size?
Q20. Some of the problems in the use of confidence intervals can be surmounted by getting a large enough sample size – with this, the distribution of sample means can be considered normal even if the population isn’t normal. Also, with a large enough sample size, the sample standard deviation is close to the population standard deviation. What’s the main problem that can’t be overcome with a large sample size?
Chapter 10, Section 2
Q21. True or false: The basic reasoning for significance testing is: an outcome that would happen rarely if a claim were true is good evidence that the claim is not true.
Q22. In doing statistical tests, the first step is to identify what you want to make conclusions about. Are you always wanting to make conclusions about sample statistics, or population parameters? Or is it sometimes one or sometimes another?
Q23. What does a “null hypothesis” typically state?
Q24. A significance test works by assessing how likely the ______ _____ would be if the ____ _____ were true.
Q25. True or false: the p-value is the probability of getting exactly the results we observed, presuming the null hypothesis to be true.
Q26. We are more likely to reject the null hypothesis of “no difference” or “no effect,” and infer that there is a difference or an effect, when the P-value is large, or small?
Q27. The null hypothesis has to do with a population parameter; in analyzing your sample data you calculate a ______ that estimates that population parameter.
Q28. When a drug company researcher is hoping to find evidence that a drug is better than placebo, is the researcher wishing to reject, or fail to reject, the null hypothesis?
Q29. Suppose someone is testing a drug versus placebo. If the researcher is interested only in the alternative that the drug is better than placebo, then the alternative hypothesis is _____-sided, but if the researcher both harmful effects (drug worse than placebo) and beneficial effects (drug better than placebo) as rejections of the null hypothesis, then the alternative hypothesis is ____-sided.
Q30. What is the meaning of the significance level, or alpha?
Q31. Do we reject the null hypothesis when the p value is less than alpha, or greater than alpha?
Q32. Are we more likely to reject the null hypothesis with a larger alpha, or a smaller alpha, all other things equal?
Q33. If a test is statistically significant at the .05 level, what does that mean?
Q34. Someone finishes writing up a statistical test by saying, “In conclusion, p=.021.” What step of the “inference toolbox” are they leaving out, that should come after what they said?
Q35. When we are testing the hypothesis that a population mean is equal to a certain hypothesized value, in the unlikely situation where we know the population standard deviation, what is our test statistic?
Q36. What distribution does the one-sample z statistic, a.k.a. the standardized sample mean, have when the null hypothesis is true?
Q37. True or false: for a one-sided test (or a one-sided alternative hypothesis), results extreme in one direction are counted as evidence against the null hypothesis; for a two-sided test (or a two-sided alternative hypothesis), results extreme in either direction are counted as evidence against the null hypothesis.
Q38. Please explain why the two-sided p-value is double that of the one-sided p-value.
Q39. How do you compute the one sample z statistic?
Q40. True or false: What is meant by doing “tests with fixed significance level” for a one-sample z test is that you become aware of what the cutoff (or critical) values are for z for the alpha you’ve picked. If the z your data yield is more extreme than the z for the alpha you’ve picked, the test is significant at the specified level of alpha. This method is most useful for those who don’t have access to calculators or computers that will give a p-value directly.
Q41. True or false: If you obtained a 95% confidence interval for a mean that ranged from 10 to 30, then a null hypothesis that the mean was equal to any value outside that range would be rejected and a null hypothesis of a mean within that range would not be rejected, at the .05 level, using a two-sided test.
Chapter 10, Sections 3 and 4
Q42. Suppose you thought your research would overturn a conclusion that many people had held for a long time. Would you tend to choose a smaller alpha (thus necessitating a larger sample size to reject the null hypothesis) or a larger alpha?
Q43. Suppose you had limited subjects with whom to work, and you were looking for evidence of toxicity from a chemical. The consequences of declaring that the chemical is safe when it isn't are very bad. The consequences of declaring the chemical dangerous when it isn't are primarily that more studies would be done than you have the resources to do. Given these consequences, would you tend to set alpha higher, or lower?
Q44. True or false: If you report the p-value itself, rather than saying, that p<.05, you in a sense let the readers of your journal set their own alpha, i.e. make their own decision as to whether they want to reject the null hypothesis given the p-value you report.
Q45.True or false: P-values slightly over .05 should not be considered statistically signficant.
Q46. Suppose we test a drug with a very large number of subjects. We find that on a 60-point rating scale, the drug group has a mean depression score rating that is 2 points lower than the placebo group. The p-value is .03. Someone is likely to say that the difference is ____ significant but not ____ significant.
Q47. A researcher designs a study, gathers data, punches the data into the computer, runs a significance test, and interprets the result based on the significance test. What important step is being left out? Please give one reason why this step is important.