Sampling Distributions &
Confidence Intervals for proportions
Problem: A claim is that 65% of AASU students have ssn beginning with the number 2. I might guess then that 65% of the students in here do. I am going to “randomly” ask 30 of you. What is the probability that exactly 2 of the 30 have 2 as 1st ssn? How many do you expect to have this (mean)? And what is the variance associated with the number of 2ers?
ANS.
Could this be Binomial?
<if drawing from n items from N
make sure n/N < .10….that is,
without replacement will not create
massive dependence if you select less
than 10% to inspect=trial>
Any problem with independence ? n/N
Exactly 2 of 30?
Mean?
Variance? Standard Deviation?
Q. If I selected a different 30 students would I get the same answer? And then a different 30?
Use (#with 2)/30 as the sample’s proportion (call it p-hat ; note that it is the probability of finding such a person) of those whose ssn begins with 2. This will vary depending on which 30 students I randomly select.
Note that is a mean=average since it is the sum of raw data (0 or 1) divided by total number of data points. Note also that we theoretically know everything about the population (I said 65% had a 2 at start of ssn).
So, the CENTRAL LIMIT THEOREM applies
Check these criteria:
1. Taking a large sample (n)? “The larger the better.” n = 30 usually works.
2. Make sure each sample=item=trial=person independent from the other.
Then my mean looks like many other possible means from different samples consisting of ‘n’ number of items and in fact comes from a normal (bell-shaped) distribution (nearly) and centers about the entire population mean with a spread away (standard deviation=standard error) equal to the population stdev divided by the square root of the sample size n.
Notation:
Is N(m , s/ )
What about ?
Normal centered at p (population percentage=proportion) with spread of
so long as n*p and n*(1-p)>10
Finally, let’s use the CLT on our ssn problem:
It is modeled by a Binomial (because we noted
that the 30 we picked was less than 10% of
the AASU population).
p = .65 (65%) n = 30; note 30*.65 and 30*.35>10
so our sample proportion = _________
is it near .65? spread away by
= = .087 ( stdev)
Think normal distribution
We call these sampling distributions = distributions that would result from repeatedly calculating a mean from this sample, then a different one, and another…
Q. Practically speaking why would we want to discern something about a sample when we know all about the population? Seems we would like to sample and then “guess” about the population.
EX. I sampled n=30 here and found 24 who had ssn starting with 2. What’s that say about all AASUers?
THEOREM (FACT):
To estimate the population proportion (i.e. %) do the following: sample n items and form X/n
Where X=# of successes. Call this p-hat and
Use
+ 2*
(this half width is called
DEFN. MOE=margin of error )
We say “I am 95% sure that the true=population p (proportion, i.e. percentage) is contained in this interval=range.
Called Confidence Intervals
Why 2?
Mean+-stdev is 68%
Mean+-2*stdev is 95%
Mean+-3*stdev is 99.7%
Interpretation:
If I repreated my experiment lots of times on 30 different students then 95% if the time the CI I build will contain the true parameter=population proportion.
Q. What if I wanted to be 84% sure ?
Let’s try this example: Build a 90% confidence interval for the proportion of AASU students who eat breakfast.
How many should I sample?
n = z*z*p*(1-p) (note: if p unknown use .5)
MOE*MOE
Is it Binomial?
Can It be about normal?
How do I build a 90% confidence interval
For a proportion=percentage from ?
Class activity
Problem: A claim is that 65% of AASU students have ssn beginning with the number 2. I might guess then that 65% of the students in here do. I am going to “randomly” ask 30 of you.
1. What is the probability that exactly 2 of the 30 have 2 as 1st ssn? _________________
2. How many do you expect to have this (mean)? ___________________
3. And what is the variance associated with the number of 2ers? _________________
Class activity
Problem: A claim is that 65% of AASU students have ssn beginning with the number 2. I might guess then that 65% of the students in here do. I am going to “randomly” ask 30 of you.
1. What is the probability that exactly 2 of the 30 have 2 as 1st ssn? _________________
2. How many do you expect to have this (mean)? ___________________
3. And what is the variance associated with the number of 2ers? _________________
Class activity
Problem: A claim is that 65% of AASU students have ssn beginning with the number 2. I might guess then that 65% of the students in here do. I am going to “randomly” ask 30 of you.
1. What is the probability that exactly 2 of the 30 have 2 as 1st ssn? _________________
2. How many do you expect to have this (mean)? ___________________
3. And what is the variance associated with the number of 2ers? _________________
Class activity
Problem: A claim is that 65% of AASU students have ssn beginning with the number 2. I might guess then that 65% of the students in here do. I am going to “randomly” ask 30 of you.
1. What is the probability that exactly 2 of the 30 have 2 as 1st ssn? _________________
2. How many do you expect to have this (mean)? ___________________
3. And what is the variance associated with the number of 2ers? _________________
9