AP STATISTICS Ch. 12: Sample Surveys

Vocabulary:

POPULATION- All experimental units that you want to make a conclusion about

· does not necessarily have to be a large group

SAMPLING FRAME- list of individuals from whom the sample is drawn. Not always the population of interest.

Ex: phone book, registered voter list, list of tax returns, school roster, etc.

SAMPLE – small group of the population that you use in your experiment/study/survey.

· Hopefully representative of the population

PARAMETER- describes a population. Often unknown. Fixed value.

Ex: population mean

STATISTIC – describes a sample of the population. Changes from sample to sample. We use the statistics from repeated samples to estimate the value of the parameter.

Ex: sample mean

VALUE PARAMETER STATISTIC

Mean μ

________________________________________________

Std. Dev. σ

________________________________________________

Proportion p

________________________________________________

A sample is said to be representative (or unbiased) if the statistics accurately reflect the population parameter.

EXAMPLE 1:

A polling agency takes a sample of 1500 American citizens from a list of tax returns and asks them if they are lactose intolerant. 12% say yes. This is interesting, since it has been shown that 15% of the population is lactose intolerant.

12% = _statistic_______ 15% = _parameter_________

Population? Sampling frame? Sample?

Population: All American citizens

Sampling Frame: List of tax returns

Sample: 1500 Amercian citizens

Parameter of Interest: The percent of American citizens that are lactose intolerant

EXAMPLE 2:

A random sample of 1000 people who signed a card saying they intended to quit smoking were contacted a year after they signed the card. It turned out that 210 (21%) of the sampled individuals had not smoked over the past six months.

21% = _statistic________ Population = All smokers

Sampling frame= People who signed a card saying they intended to quit smoking

Sample = 1000 people who signed the card

Parameter of interest = Percent of people who had not smoked over the past six months (quit smoking)

EXAMPLE 3:

On Tuesday, the bottles of tomato ketchup filled in a plant were supposed to contain an average of 14 ounces of ketchup. Quality control inspectors sampled 50 bottles at random from the day’s production. These bottles contained an average of 13.8 ounces of ketchup.

14 = _parameter______ 13.8 = _statistic________

Population? Sample? Sampling frame?

Population: All ketchup bottles

Sampling Frame: Ketchup bottles from one day’s production

Sample: 50 bottles

Parameter: The average number of ounces of ketchup in the bottles

EXAMPLE 4:

A researcher wants to find out which of two pain relievers works better. He takes 100 volunteers and randomly gives half of them medicine #1 and the other half medicine #2. 17% of people taking medicine 1 report improvement in their pain and 20% of people taking medicine #2 report improvement in their pain.

17% = _statistic_____ 20% = __statistic_____

Population? Sampling frame? Sample?

Population: All people

Sampling Frame: All Volunteers

Smaple: 100 volunteers, 50 taking medicine #1 and 50 taking medicine #2

Parameter: Percent of people who showed improvement in their pain

BIAS VS. VARIABILITY

BIAS – consistent, repeated measurements that are not close to the population parameter. Basically accuracy.

VARIABILITY - basically like reliability. Consistent measurements (doesn’t matter if they are accurate or not.)

· To reduce bias… use random sampling

· To reduce variability … use larger samples

SAMPLING VARIABILITY –

· Different samples give different results (even if they are from the same population)

· Different size samples give us different results

· Bigger samples are better!

SAMPLING DISTRIBUTION- If we take lots of samples of the same size and make a histogram

True parameter

Larger samples yield smaller variability:

Lots of samples sizes of 100: Lots of samples sizes of 1000:

True parameter True parameter

Bias vs. Variability:

True parameter True Parameter

high bias, low variability low bias, high variability

True parameter True parameter

high bias, high variability low bias, low variability

BIAS VS. VARIABILITY EXAMPLES:

*bias à accuracy

* variability à reliability

high bias, low variability low bias, high variability

high bias, high variability low bias, low variability

UNBIASED ESTIMATOR- When the center of a sampling distribution (histogram) is equal to the true parameter

SAMPLING DESIGNS

GOOD SAMPLING DESIGNS

1) Simple Random Sample (SRS)- Every experimental unit has the same chance of being picked for the sample and every possible sample has the same chance of being picked.

- give every subject in the population a number

- use TRD and read across to select your sample

- ignore repeats

Example: Take and SRS of 5 from the following list. Start at line 31 in the table.

01Smith 07Jones 13Holloway

02DeNizzo 08David 14Adams

03Schaefer 09Gray 15Capito Sample:

04Meyers 10Gingrich 16Card

05Dietrich 11Moreland 17Hall

06Walsh 12Whitter 18Jordan

Example: Take and SRS of 4 from the following list of math teachers. Start at line 18.

01McGlone 07McCuen 13Wilson

02Szarko 08Bellavance 14Woodring

03Stotler 09Kelly 15Wheeles Sample:

04Timmins 10Arden 16McNelis

05Gemgnani 11O’Brien 17Robinson

06Lorenz 12Lake 18Bainbridge


2) STRATIFIED RANDOM SAMPLE (not SRS)-

· Divide population into groups with something in common (called STRATA)

o Example: gender, age, etc.

· Take separate SRS in each strata and combine these to make the full sample

o Can sometimes be a % of each strata

Example: We want to take an accurate sample of CB South students. There are 540 sophomores, 585 juniors, and 530 seniors. Take a stratified sample.

3) SYSTEMATIC RANDOM SAMPLE-

The first experimental unit is selected at random, and each additional experimental unit is selected at a predetermined interval.

Examples: Surveying every 5th person that comes through the back door at CB South

Selecting a random person and then every 10th person on a list

4) CLUSTER SAMPLE –

Population is broken down into groups. All members of one (or more) group are taken as the sample.

The CB South population is broken down into sophomores, juniors, and seniors. We then sample ALL of the members in one grade level.

5) MULTI-STAGE SAMPLE-

· Used for large populations

· Example: sampling the population of the USA:

-Counties (pick 50)

- zip codes

- two streets

- 3 houses

CB South Example:

- Block

- 10 classes

- 5 students


BIASED SAMPLING METHODS:

1) VOLUNTARY RESPONSE SAMPLES-

Chooses itself by responding to a general appeal. Ex: call-in, write-in, etc.

2) CONVENIENCE SAMPLES-

Selecting individuals that are easiest to reach/contact

TYPES OF BIAS IN A SAMPLE:

· UNDERCOVERAGE-

Sampling in a way that leaves out a certain portion of the population that should be in your sample.

Ex: telephone polls, registered voter list, etc.

· NONRESPONSE-

Bias introduced when a large amount of those sampled do not respond. (You were selected, and you chose not to respond)

Ex: Don’t answer the phone or hang up, don’t mail back a questionnaire, refuse to answer questions

· RESPONSE BIAS -

Anything in the survey design that influences responses

Ex: respondents lying

responses to try to please the interviewer

unwillingness to reveal personal facts or information

leading or confusing questions

· Voluntary Response Bias-

Some people are eager to volunteer when they have a strong opinion on the matter