Section 2.7: Sampling

Honors Analysis

Section 2.7: Sampling

In many cases, it is impossible to collect data about every member of a population. Sampling allows statistics to be calculated describing the population as a whole by collecting data about a subset of that population.

Data Collection Methods

To derive conclusions from data, we need to know how the data were collected; that is, we need to know the method(s) of data collection.

Methods of Data Collection

There are four main methods of data collection.

Census. A census is a study that obtains data from every member of a population. In most studies, a census is not practical, because of the cost and/or time required.

Sample survey. A sample survey is a study that obtains data from a subset of a population, in order to estimate population attributes.

Experiment. An experiment is a controlled study in which the researcher attempts to understand cause-and-effect relationships. The study is "controlled" in the sense that the researcher controls (1) how subjects are assigned to groups and (2) which treatments each group receives.
In the analysis phase, the researcher compares group scores on some dependent variable. Based on the analysis, the researcher draws a conclusion about whether the treatment ( independent variable) had a causal effect on the dependent variable.

Observational study. Like experiments, observational studies attempt to understand cause-and-effect relationships. However, unlike experiments, the researcher is not able to control (1) how subjects are assigned to groups and/or (2) which treatments each group receives.

Data Collection Methods: Pros and Cons

Each method of data collection has advantages and disadvantages.

Resources. When the population is large, a sample survey has a big resource advantage over a census. A well-designed sample survey can provide very precise estimates of population parameters - quicker, cheaper, and with less manpower than a census.

Generalizability. Generalizability refers to the appropriateness of applying findings from a study to a larger population. Generalizability requires random selection. If participants in a study are randomly selected from a larger population, it is appropriate to generalize study results to the larger population; if not, it is not appropriate to generalize.
Observational studies do not feature random selection; so generalizing from the results of an observational study to a larger population can be a problem.

Causal inference. Cause-and-effect relationships can be teased out when subjects are randomly assigned to groups. Therefore, experiments, which allow the researcher to control assignment of subjects to treatment groups, are the best method for investigating causal relationships.

Survey Sampling Methods

Sampling method refers to the way that observations are selected from a population to be in the sample for a sample survey.

Population Parameter vs. Sample Statistic

The reason for conducting a sample survey is to estimate the value of some attribute of a population.

Population parameter. A population parameter is the true value of a population attribute.

Sample statistic. A sample statistic is an estimate, based on sample data, of a population parameter.

Consider this example. A public opinion pollster wants to know the percentage of voters that favor a flat-rate income tax. The actual percentage of all the voters is a population parameter. The estimate of that percentage, based on sample data, is a sample statistic.

The quality of a sample statistic (i.e., accuracy, precision, representativeness) is strongly affected by the way that sample observations are chosen; that is., by the sampling method.

Probability vs. Non-Probability Samples

As a group, sampling methods fall into one of two categories.

Probability samples. With probability sampling methods, each population element has a known (non-zero) chance of being chosen for the sample.

Non-probability samples. With non-probability sampling methods, we do not know the probability that each population element will be chosen, and/or we cannot be sure that each population element has a non-zero chance of being chosen.

Non-probability sampling methods offer two potential advantages - convenience and cost. The main disadvantage is that non-probability sampling methods do not allow you to estimate the extent to which sample statistics are likely to differ from population parameters. Only probability sampling methods permit that kind of analysis.

Non-Probability Sampling Methods

Two of the main types of non-probability sampling methods are voluntary samples and convenience samples.

Voluntary sample. A voluntary sample is made up of people who self-select into the survey. Often, these folks have a strong interest in the main topic of the survey.
Suppose, for example, that a news show asks viewers to participate in an on-line poll. This would be a volunteer sample. The sample is chosen by the viewers, not by the survey administrator.

Convenience sample. A convenience sample is made up of people who are easy to reach.
Consider the following example. A pollster interviews shoppers at a local mall. If the mall was chosen because it was a convenient site from which to solicit survey participants and/or because it was close to the pollster's home or business, this would be a convenience sample.

Probability Sampling Methods

The main types of probability sampling methods are simple random sampling, stratified sampling, cluster sampling, multistage sampling, and systematic random sampling. The key benefit of probability sampling methods is that they guarantee that the sample chosen is representative of the population. This ensures that the statistical conclusions will be valid.

Simple random sampling. Simple random sampling refers to any sampling method that has the following properties.

The population consists of N objects.

The sample consists of n objects.

If all possible samples of n objects are equally likely to occur, the sampling method is called simple random sampling.

There are many ways to obtain a simple random sample. One way would be the lottery method. Each of the N population members is assigned a unique number. The numbers are placed in a bowl and thoroughly mixed. Then, a blind-folded researcher selects n numbers. Population members having the selected numbers are included in the sample.

Stratified sampling. With stratified sampling, the population is divided into groups, based on some characteristic. Then, within each group, a probability sample (often a simple random sample) is selected. In stratified sampling, the groups are called strata.
As an example, suppose we conduct a national survey. We might divide the population into groups or strata, based on geography - north, east, south, and west. Then, within each stratum, we might randomly select survey respondents.

Cluster sampling. With cluster sampling, every member of the population is assigned to one, and only one, group. Each group is called a cluster. A sample of clusters is chosen, using a probability method (often simple random sampling). Only individuals within sampled clusters are surveyed.
Note the difference between cluster sampling and stratified sampling. With stratified sampling, the sample includes elements from each stratum. With cluster sampling, in contrast, the sample includes elements only from sampled clusters.

Multistage sampling. With multistage sampling, we select a sample by using combinations of different sampling methods.
For example, in Stage 1, we might use cluster sampling to choose clusters from a population. Then, in Stage 2, we might use simple random sampling to select a subset of elements from each chosen cluster for the final sample.

Systematic random sampling. With systematic random sampling, we create a list of every member of the population. From the list, we randomly select the first sample element from the first k elements on the population list. Thereafter, we select every kth element on the list.
This method is different from simple random sampling since every possible sample of n elements is not equally likely.

Bias in Survey Sampling

In survey sampling, bias refers to the tendency of a sample statistic to systematically over- or under-estimate a population parameter.

Bias Due to Unrepresentative Samples

A good sample is representative. This means that each sample point represents the attributes of a known number of population elements.

Bias often occurs when the survey sample does not accurately represent the population. The bias that results from an unrepresentative sample is called selection bias. Some common examples of selection bias are described below.

Undercoverage. Undercoverage occurs when some members of the population are inadequately represented in the sample. A classic example of undercoverage is the Literary Digest voter survey, which predicted that Alfred Landon would beat Franklin Roosevelt in the 1936 presidential election. The survey sample suffered from undercoverage of low-income voters, who tended to be Democrats.
How did this happen? The survey relied on a convenience sample, drawn from telephone directories and car registration lists. In 1936, people who owned cars and telephones tended to be more affluent. Undercoverage is often a problem with convenience samples.

Nonresponse bias. Sometimes, individuals chosen for the sample are unwilling or unable to participate in the survey. Nonresponse bias is the bias that results when respondents differ in meaningful ways from nonrespondents. The Literary Digest survey illustrates this problem. Respondents tended to be Landon supporters; and nonrespondents, Roosevelt supporters. Since only 25% of the sampled voters actually completed the mail-in survey, survey results overestimated voter support for Alfred Landon.
The Literary Digest experience illustrates a common problem with mail surveys. Response rate is often low, making mail surveys vulnerable to nonresponse bias.

Voluntary response bias. Voluntary response bias occurs when sample members are self-selected volunteers, as in voluntary samples. An example would be call-in radio shows that solicit audience participation in surveys on controversial topics (abortion, affirmative action, gun control, etc.). The resulting sample tends to overrepresent individuals who have strong opinions.

Random sampling is a procedure for sampling from a population in which (a) the selection of a sample unit is based on chance and (b) every element of the population has a known, non-zero probability of being selected. Random sampling helps produce representative samples by eliminating voluntary response bias and guarding against undercoverage bias. All probability sampling methods rely on random sampling.

Bias Due to Measurement Error

A poor measurement process can also lead to bias. In survey research, the measurement process includes the environment in which the survey is conducted, the way that questions are asked, and the state of the survey respondent.

Response bias refers to the bias that results from problems in the measurement process. Some examples of response bias are given below.

Leading questions. The wording of the question may be loaded in some way to unduly favor one response over another. For example, a satisfaction survey may ask the respondent to indicate where she is satisfied, dissatisfied, or verydissatified. By giving the respondent one response option to express satisfaction and two response options to express dissatisfaction, this survey question is biased toward getting a dissatisfied response.

Social desirability. Most people like to present themselves in a favorable light, so they will be reluctant to admit to unsavory attitudes or illegal activities in a survey, particularly if survey results are not confidential. Instead, their responses may be biased toward what they believe is socially desirable.

Sampling Error and Survey Bias

A survey produces a sample statistic, which is used to estimate a population parameter. If you repeated a survey many times, using different samples each time, you might get a different sample statistic with each replication. And each of the different sample statistics would be an estimate for the same population parameter.

If the statistic is unbiased, the average of all the statistics from all possible samples will equal the true population parameter; even though any individual statistic may differ from the population parameter. The variability among statistics from different samples is called sampling error.

Increasing the sample size tends to reduce the sampling error; that is, it makes the sample statistic less variable. However, increasing sample size does not affect survey bias. A large sample size cannot correct for the methodological problems (undercoverage, nonresponse bias, etc.) that produce survey bias. The Literary Digest example discussed above illustrates this point. The sample size was very large - over 2 million surveys were completed; but the large sample size could not overcome problems with the sample - undercoverage and nonresponse bias.

Source:

Honors Analysis

Section 2.7: Sampling

Q1.) You could solve the equation below using exponents by raising each side to what power?

Q2.) The equation below can be solved using exponents by raising each side to what power?

Q3.) Explain why and are equivalent values.

Q4.) Explain how you could use the information you observed in Q3 to calculate by hand.

Q5.) Calculate by hand.

Q6.)Evaluate .

Q7.) Simplify:

Q8.) Evaluate

Q9.)Evaluate:

Q10.)

Problem Set

1.)What is selection bias? Give two examples of scenarios involving selection bias.

2.) Explain the difference between a census and a sample survey.

3.) Explain how leading questions may bias data collected in a survey.

4.) What is the difference between a population parameter and a sample statistic?

5.) How is a stratified sample different from a simple random sample?

6.) How can a convenience sample cause bias in a set of data?

7.) Determine whether each scenario would produce a random sample. If not, explain why not.

A)Selecting customers leaving an Italian restaurant to find out their favorite food

B) Putting the name of all seniors in a hat and selecting a name to select a sample of seniors
C) Surveying students in an honors chemistry class to find out how much students in the school study each week

D) Researchers stop one out of every ten people to leave a bar and ask them about their views on drunk driving laws.
E) The government sending a tax survey to everyone whose social security number ends in a particular digit.

8.) Some national political polls have recently been criticized for randomly selecting people by telephone number. How might this produce biased results?

9.) ESPN.com prints an article arguing that Lebron James is the greatest NBA player of all-time. On the same page, it offers a survey asking readers who they think the best player of all time is.

A) What kind of survey is this?

B) 62% of respondents believe Lebron is the greatest player of all-time. How confident are you in these results? Why?

10.)

11.)

12.)

13.)