Sampling Distributions &

Confidence Intervals for proportions

Problem: A claim is that 65% of AASU students have ssn beginning with the number 2. I might guess then that 65% of the students in here do. I am going to “randomly” ask 30 of you. What is the probability that exactly 2 of the 30 have 2 as 1st ssn? How many do you expect to have this (mean)? And what is the variance associated with the number of 2ers?

ANS.

Could this be Binomial?

<if drawing from n items from N

make sure n/N < .10….that is,

without replacement will not create

massive dependence if you select less

than 10% to inspect=trial>

Any problem with independence ? n/N

Exactly 2 of 30?

Mean?

Variance? Standard Deviation?

Q. If I selected a different 30 students would I get the same answer? And then a different 30?

Use (#with 2)/30 as the sample’s proportion (call it p-hat ; note that it is the probability of finding such a person) of those whose ssn begins with 2. This will vary depending on which 30 students I randomly select.

Note that is a mean=average since it is the sum of raw data (0 or 1) divided by total number of data points. Note also that we theoretically know everything about the population (I said 65% had a 2 at start of ssn).

So, the CENTRAL LIMIT THEOREM applies

Check these criteria:

1. Taking a large sample (n)? “The larger the better.” n = 30 usually works.

2. Make sure each sample=item=trial=person independent from the other.

Then my mean looks like many other possible means from different samples consisting of ‘n’ number of items and in fact comes from a normal (bell-shaped) distribution (nearly) and centers about the entire population mean with a spread away (standard deviation=standard error) equal to the population stdev divided by the square root of the sample size n.

Notation:

Is N(m , s/ )

What about ?

Normal centered at p (population percentage=proportion) with spread of
so long as n*p and n*(1-p)>10

Finally, let’s use the CLT on our ssn problem:
It is modeled by a Binomial (because we noted

that the 30 we picked was less than 10% of

the AASU population).

p = .65 (65%) n = 30; note 30*.65 and 30*.35>10

so our sample proportion = _________

is it near .65? spread away by

= = .087 ( stdev)

Think normal distribution

We call these sampling distributions = distributions that would result from repeatedly calculating a mean from this sample, then a different one, and another…

Q. Practically speaking why would we want to discern something about a sample when we know all about the population? Seems we would like to sample and then “guess” about the population.

EX. I sampled n=30 here and found 24 who had ssn starting with 2. What’s that say about all AASUers?

THEOREM (FACT):

To estimate the population proportion (i.e. %) do the following: sample n items and form X/n

Where X=# of successes. Call this p-hat and

Use

+ 2*

(this half width is called

DEFN. MOE=margin of error )

We say “I am 95% sure that the true=population p (proportion, i.e. percentage) is contained in this interval=range.

Called Confidence Intervals

Why 2?

Mean+-stdev is 68%

Mean+-2*stdev is 95%

Mean+-3*stdev is 99.7%

Interpretation:

If I repreated my experiment lots of times on 30 different students then 95% if the time the CI I build will contain the true parameter=population proportion.

Q. What if I wanted to be 84% sure ?

Let’s try this example: Build a 90% confidence interval for the proportion of AASU students who eat breakfast.

How many should I sample?

n = z*z*p*(1-p) (note: if p unknown use .5)

MOE*MOE

Is it Binomial?

Can It be about normal?

How do I build a 90% confidence interval

For a proportion=percentage from ?

Class activity

Problem: A claim is that 65% of AASU students have ssn beginning with the number 2. I might guess then that 65% of the students in here do. I am going to “randomly” ask 30 of you.

1. What is the probability that exactly 2 of the 30 have 2 as 1st ssn? _________________

2. How many do you expect to have this (mean)? ___________________

3. And what is the variance associated with the number of 2ers? _________________

Class activity

Problem: A claim is that 65% of AASU students have ssn beginning with the number 2. I might guess then that 65% of the students in here do. I am going to “randomly” ask 30 of you.

1. What is the probability that exactly 2 of the 30 have 2 as 1st ssn? _________________

2. How many do you expect to have this (mean)? ___________________

3. And what is the variance associated with the number of 2ers? _________________

Class activity

Problem: A claim is that 65% of AASU students have ssn beginning with the number 2. I might guess then that 65% of the students in here do. I am going to “randomly” ask 30 of you.

1. What is the probability that exactly 2 of the 30 have 2 as 1st ssn? _________________

2. How many do you expect to have this (mean)? ___________________

3. And what is the variance associated with the number of 2ers? _________________

Class activity

Problem: A claim is that 65% of AASU students have ssn beginning with the number 2. I might guess then that 65% of the students in here do. I am going to “randomly” ask 30 of you.

1. What is the probability that exactly 2 of the 30 have 2 as 1st ssn? _________________

2. How many do you expect to have this (mean)? ___________________

3. And what is the variance associated with the number of 2ers? _________________

9