There Are Two Important Forms of Statistical Inference

Chapter 8 – Estimation

There are two important forms of statistical inference:

•estimation (Confidence Intervals)

•Hypothesis Testing

Statistical Inference – drawing conclusions about populations based on samples of the population

parameter – unknown : μ, σ

A parameter is a number that describes the population of interest. Since we usually cannot examine the entire population of interest, parameters are generally unknown.

statistic – known: , s

A statistic is a number that is computed from sample data. We often use a statistic toestimate an unknown population parameter.

sample statistic and population parameter

Notation

• μ = population mean (unknown)

• = sample mean (computed from the data we have on hand from a sample of the population)

______

• σ = population standard deviation (unknown)

• s = sample standard deviation (computed from the data we have on hand from a sample of the population)

x ̅ estimates μ

s estimates

Point Estimate

•An estimate of a population parameter given by a single number.

•A point estimation of a population parameter is an estimate of the parameter using a single number. is a point estimate of μ

S is a point estimate of 

Sampling Variability
Example: What is the average weight of women 5’1” tall between the ages of 21 - 45? The American Medical Association takes a sample of 1000 women between the ages of 21-45 years and with height 5’1”

They find that that the mean weight is X ̅ = 136.2 lbs

Question: If our goal is to estimate the mean weight of the population, how should we deal with the fact

that different samples yield different estimates of the mean weight??

The basic fact that the value of a sample statistic varies in (hypothetical) repeated random sampling is called sampling variability.

Example: If another sample of 1000 women was chosen from the same population of 5’1” women between 21-45 years old, the value of would almost certainly be different – something other than 136.2 lbs.

Answer: Allow a margin of error that takes sampling variability into account.

Confidence Intervals

Confidence intervals are generally of the form point estimate ± margin of error

± margin of error

Question: Why should we estimate μ, true population mean, with an interval of numbers? Why not just use the point estimate as our estimate of μ?

Answer:

(1) Using an interval estimate (i.e. confidence interval) takes sampling variability into consideration,

and

(2) we can attach a level of confidence to an interval estimate which we cannot do with a point estimate.

A confidence interval for μ has two parts:

1) A margin of error says how close lies to μ.

2) A level of confidence says what percent of allpossible samples satisfy the margin of error.

A confidence level, c, is any value between 0 and 1 that corresponds to the area under the standard normal curve between –zc and +zc.

Margin of Error

•Even if we take a very large sample size,may differ from µ.

Critical Values

For an interval of numbers there is a left endpoint and a right endpoint.(lower bound, upper bound)

For a confidence level c, the critical value is the number such that the area under the standard normal curve between and equals c (your confidence level)

Example -Which of the following correctly expresses the confidence interval shown below?

Common Confidence Levels

Notice as the confidence level increase the interval gets wider

When constructing a confidence interval, you must decide on the risk you are willing to take of being wrong.

A confidence interval is “wrong” if it doesn’t contain the true value of the population parameter.

• 99% confidence ==> 1% chance of being wrong

• 95% confidence ==> 5% chance of being wrong

• 90% confidence ==> 10% chance of being wrong

How confidence intervals behave

•High confidence says that our method almost always gives correct answers.

•A small margin of error says that we have pinned down the parameter quite precisely.

The margin of error determines the width of the confidence interval.

1) The margin of error is larger for higher confidence levels. To obtain a smaller margin of error from the same data, you must be willing to accept lower confidence.

2) The margin of error is larger for smaller sample sizes.

3) The margin of error is larger for populations that have lots of variability.

Interpreting confidence levels

Take 95% confidence, for example.

Practical Interpretation:

We are 95% confident that the mean gain in score is between 18.9 and 25.1

points, on average.

Statistical Interpretation:

If we repeatedly take random samples of size n from the population andconstruct 95% confidence intervals for each sample, then in the long run 95% of these confidence intervals will capture the true value of μ. Our sample is either one of the 95% for which the calculated interval captures μ, or one of the unlucky 5% that do not.

Sampling Distribution

The sampling distribution of a statistic is the distribution of values taken by the statistic in all possible samples of the same size from the same population.

Facts about the sampling distribution of

These facts describe how varies from one sample to the next:

1) In repeated sampling, will sometimes fall above the true value of μ and sometimes below it, but there is no systematic tendency for to overestimate or underestimate μ. The sampling distribution of is centered at μ, and so is called an unbiased estimator of μ.

2) The values of from larger samples are less variable than those from smaller samples. The standard deviation of the sampling distribution of is

Confidence interval for μ for 95% Confidence

σ is known σ is unknown

If σ is known then we use Zc

If σ is unknown then we have s, then we use tc

Maximal Margin of Error

•Since µ is unknown, the margin of error | – µ| is unknown.

•Using confidence level c, we can say that differs from µ by at most:

The Probability Statement

•In words, c is the probability that the sample mean, , will differ from the population mean, µ, by at most , margin of error.

Confidence Intervals

A ‘c’ confidence interval for µ is an interval computer from sample data in such a way that c is the probability of generating an interval containing the actual value of µ

Example -For a population of domesticated geese, the standard deviation of the mass is 1.3 kg. A sample of 45 geese has a mean mass of 5.7 kg. Find the confidence interval for the population mean at the 95% confidence level.Notice that we have  (population standard deviation) so we can use Zc

Calculator: STAT, TEST, Z-Interval, Choose STAT

Critical Thinking

•Since is a random variable, so are the endpoints

•After the confidence interval is numerically fixed for a specific sample, it either does or does not contain µ.

•If we repeated the confidence interval process by taking multiple random samples of equal size, some intervals would capture µ and some would not!

•The equation states that the proportion of all intervals containing µ will be c.

Estimating µ When σ is Unknown

•In most cases, researchers will have to estimate σ with s (the standard deviation of the sample).

•The sampling distribution for will follow a non-normal distribution called the Student’s t distribution.

The t Distribution

Assume that x has a normal distribution with mean μ. For samples of size n with sample mean and sample standard deviation s, the t variable is has a Student’s t distribution with degrees of freedom = n-1

Properties of the t-distribution

•bell shaped and symmetric and centered at zero

•there is more area in the tails in the t-distribution than there is in the N(0,1) distribution

•the t-distribution is really a family of density curves such that each one is significantly different depending on the degrees of freedom

•as degrees of freedom gets larger and larger the t-density curve looks more and more identical to the N(0,1)

For different levels of Confidence:

For 95% Confidence Interval

For 90% Confidence Interval

For 99% Confidence Interval

Example -Find the t-value for the following data:

a). –27.62b). –0.11

c). –8.95d). –4.37

To find values of tc you use Table 6 of Appendix II to find the critical values tcfor a confidence level c.

Degrees of freedom, df, are the row headings.

Confidence levels, c, are the column headings

Maximal Margin of Error

•If we are using the t distribution:

What Distribution Should We Use?

Notes on Calculator:

For Normal Distribution / For Proportion
σ is unknown / σ is known
Test Statistic / tobs / zobs
Calculator / Stat⟶Test⟶T-test / Stat⟶Test⟶Z-test / Stat⟶Test⟶
1-PropZ-test
Confidence Interval / / /
Calculator / Stat⟶Test⟶
T-Interval / Stat⟶Test⟶
Z-Interval / Stat⟶Test⟶
1-PropZ-Interval

Example - A study was done to determine the average number of homes that a homeowner owns in his or her lifetime. For the 60 homeowners surveyed, the sample average was 4.2 and the sample standard deviation was 2.1. Calculate the 95% confidence interval for the true average number of homes that a person owns in his or her lifetime.Notice that we have s (sample standard deviation) so we use tc

Calculator: STAT, TEST, t-Interval, Choose STAT

Example - A study was done to determine the average number of homes that a homeowner owns in his or her lifetime. Suppose that this time sigma is known to be 2.8. Assume that we collect a sample of 60 homeowners and compute the sample average to be4.2. Calculate the 95% confidence interval for the true average number of homes that a person owns in his or her lifetime.

Notice that we have σ (population standard deviation) so we use Zc

Calculator: STAT, TEST, Z-Interval, Choose STAT

Example: The numbers of advertisements seen or heard in one week for 30 randomly selected people in the United States are listed below.

Construct a 95% confidence interval for the true mean number of advertisements.

Notice that we have s (sample standard deviation) so we use tc

Calculator: STAT, TEST, t-Interval, Choose DATA

598494441595728690

684486735808481298

135846764317649732

582677734588590540

673727545486702703

Estimating p in the Binomial Distribution

•We will use large-sample methods in which the sample size, n, is fixed.

•We assume the normal curve is a good approximation to the binomial distribution if both np 5 and nq = n(1 – p) 5.

Point Estimates in the Binomial Case

Margin of Error

•The magnitude of the difference between the actual value of p and its estimate is the margin of error.

The Distribution of

•For large samples, the distribution is well approximated by a normal distribution.

A Probability Statement

With confidence level c, as before.

Example - Suppose that 800 students were randomly selected from the student body of 20,000 and are given shots to prevent a certain type of flu. All 800 students were exposed to the flu, and 600 of them did not get the flu. Let p represent the probability that the shot will be successful for any single student selected at random from the entire population of 20,000.

a) What are the point estimates for p and q? What is the value of n and r?

b) Is the number of trials large enough to justify a normal approximation to the binomial?

c) Find a 99% confidence interval for p. Calculator: STAT, TESTS, A: 1- Prop Z Int. The value of x = r

Example: A survey of 300 fatal accidents showed that 123 were alcohol related. Construct a 98% confidence interval for the proportion of fatal accidents that were alcohol related.

Choosing Sample Sizes

•When designing statistical studies, it is good practice to decide in advance:

–The confidence level

–The maximal margin of error

–Then, we can calculate the required minimum sample size to meet these goals.

Sample Size for Estimating μ

*If σ is unknown use s

If σ is unknown, use σ from a previous study or conduct a pilot study to obtain s.

Always round n up to the next integer!!

Sample Size for Estimating

If we have a preliminary estimate for p, use the following.

If we have no preliminary estimate for p, use the following modification:

Example –A wildlife study is designed to find the mean weight of salmon caught be an Alaskan fishing company. A preliminary study of a random sample of 50 salmon showed pounds. How large a sample should be taken to be 90% confident that the sample mean , is within 0.20 pounds of the true mean weight μ?

Example: A researcher wishes to estimate the number of households with two cars. How large a sample is needed in order to be 98% confident that the sample proportion will not differ from the true proportion by more than 5%? A previous study indicates that the proportion of households with two cars is 19%.

1 | Page