Ch 3. Producing Data

* Reading assignment: 3.1-3.2 of text book.

3.3. Sampling Design

(ex1) A political scientist wants to know what percent of the voting age population

consider themselves conservatives.

(ex2) An automaker hires a market research firm to learn what percent of adults aged 18

to 35 recall seeing television advertisements for a new sport utility vehicle.

(ex3) Government economists inquire about average living expense per month.

Time, cost, and inconvenience forbid contacting every individual for gathering information.

In such cases, we gather information about only part(sample) of the group in order to draw conclusions about the whole(population).

Population and Sample

Population- the entire group of individuals that we want information about

; Notice that “population” is defined in terms of our desire for knowledge.

Sample- a part(subset) of the population that we actually examine in order to

gather information.

(ex4) A political scientist wants to know how college students in U.S. feel about the Social Security system. She obtains a list of the 3456 undergraduates at her college and mails a questionnaire to 250 students selected at random. What is the population and sample in this study?

Voluntary Response Sample

A voluntary response sample consists of people who choose themselves by responding to a general appeal.

Voluntary response samples are biased because people with strong opinions, especially negative opinions, are most likely to respond.

(ex4, continued) In the above example, suppose that 10 students having strongly negative opinion about the Social Security system participated themselves in the research. Then the 10 students are voluntary response samples.

Sampling Frame; a list of the individuals from which a sample is actually selected.

(ex4, continued) In the above example, what is the sampling frame?

The design of a sample refers to the method used to choose the sample from the population.

•Observational Study: observes individuals and measures variables of interest but does not attempt to influence the responses

(ex5) sample survey, the age of the professors on campus

•Experiment: deliberately imposes some treatment on individuals in order to observe their responses

(ex6) give some drug to slow the heart rate then see what the change is

Simple Random Sample

; A simple random sample of size n consists of n individuals from the population chosen

in such a way that every set of n individuals has an equal chance to be the sample

actually selected.

=> Sometimes, we use table of random digits for SRS(table B)

(example 3.16, pp 250)

An academic dept. wishes to choose a three-member advisory committee at random from the members of the department.

To choose of size 3 from the 28 faculty listed below, first label the members of the populations as follows.

00 Abbott … 04 Engle… 11 Luo 12 Martinez 13 Nguyen 14 Pillotte … 26 Wilson 28 Wong

Start anywhere in table B. Suppose we begin at line 140, which is 12975 13258 13048 … The first 7 two-digit groups are 12 97 51 32 58 13 04

Ignore the labels 97, 51, 32, and 58(Why?)

Then, the committee consists of the faculties labeled 12, 13, and 04.

Stratified Random Sample

; To select a stratified random sample, first divide the population into groups of similar

individuals, called strata. Then choose a separate SRS in each stratum and combine

these SRSs to form the full sample.

(ex3, continued) Divide the US into 50 states and take a sample from each state. This way, each state is represented.

Multistage Samples

; Select successively smaller groups within the population in stages, resulting in a sample consisting of clusters of individuals. Each stage may employ a SRS, a stratified sample, or another type of sample.

Caution about sample surveys

; Random selection and the above sampling methods(SRS, stratified sampling, and multistage sampling) are kinds of statistical efforts to eliminates(or reduce)bias in the choice of a sample from a list of the population.

Serious sources in most sample surveys are undercoverage and nonresponse.

Undercoverage occurs when some groups in the population are left out of the process of choosing the sample.

Nonresponse occurs when an individual chosen for the sample can’t be contacted or does not cooperate.

3.4. Toward Statistical Inference

Parameter and Statistics

Parameter : a number that describes the population

  1. It is not variable but fixed number(constant) .
  2. we don’t know its value in practice.

Statistic: a number that describes a sample.

  1. The value of a statistic is known when we have taken a sample.
  2. But, it can change from sample to sample.

Sampling variability: The value of a statistic varies in repeated

random sampling.

  1. We use a statistic to estimate an unknown parameter.

NOTE: Population- the entire group of individuals that we want information about

; Notice that “population” is defined in terms of our desire for

knowledge.

Sample- a part(subset) of the population that we actually examine in order to

gather information.

Sampling distribution

; The sampling distribution of a statistic is the distribution of values taken by the statistic in all possible samples of the same size from the same population.

NOTE: The three most tyoically used tools for describing virtually any (population,

sampling) distribution are:

  1. Shape : Symmetric, skewed to the right(left)
  2. Center
  3. Spread

NOTE: we already know that which kind of measure of center(spread) should be

used according to the shape of distribution

Bias and Variability

Bias: concerns the center of the sampling distribution.

Unbiased: A statistic used to estimate a parameter is called “unbiased” if the mean of its sampling distribution is equal to the true value of the parameter being estimated.

Variability(of statistic): can be described by the spread of its sampling distribution.

=> Statistics from larger samples have smaller spreads(Why?)

(example) Figure 3.9 on pp 266

; Think of the true value of the population parameter as the bull’s-eye on a target and the

sample statistic as an arrow fired at the bull’s-eye. Then, bias and variability describe

what happens when an archer fires many arrows at the target.

Bias means that the aim is off, and the arrows land consistently off the bull’s-eye in the same direction.( The sample values do not center about the population parameter.)

Large variability means that repeated shots are widely scattered on the target.( Repeated samples do not give similar results but differ widely among themselves.)

Possible combinations of bias and variability

High bias and low variability,

Low bias and high variability,

High bias and high variability,

Low bias and low variability(Ideal situation)

Good sampling scheme must have both small bias and small variability. How we do this?

: 1. To reduce bias, use random sampling.

SRS generally produces “unbiased estimates”.

2. To reduce the variability of a statistic from an SRS, use a large sample.

You can make the variability as small as you want by taking a large enough sample.