Objectives:
- What happens if we want to gather information about a group of people?
- How can we identify population, sampling frame, and sample from a scenario?
- What happens if a sample is not representative?
- What happens if samples vary?
- What is the difference between a statistic and a parameter?
- What are the different methods to select a sample?
- What are potential problems?
Suppose we want to gather information about a group of people.
- If the group is small (for example, all students in this class) we can study each group member directly.
- If, however, the group is very large (for example, all students in the school), studying each member of the group may not be feasible.
The entire group of individuals (not necessarily people) that we want information about is called the ______. The part of the population in the study is called the ______.
The list of individuals who actually had a chance to be included in the sample is called the ______.
The method we use to select the sample is called the sample ______. The design of the sample is very important. If the design is poor, the sample will not accurately represent the population.
Example:
A business magazine mailed a questionnaire to the human resource directors of all the Fortune 500 companies and received responses from 23% of them9. Those responding reported that they did not find such surveys intruded significantly on their work day.
Population:
Sampling Frame:
Sample:
If the sample is not representative of the population, we say it is ______. Biased samples cannot be used to make reliable conclusions about a population. We therefore want to avoid bias as much as possible when sampling. One way to do so is to ensure that the sample is chosen ______. Random samples that are sufficiently large are likely to be representative of the population. Because we sample without replacement, however, we must ensure that we don’t sample more than ______of the population.
Samples drawn at random generally differ one from another. Each draw of random numbers selects different people for our sample. These differences lead to different values for the variables we measure. We call these sample-to-sample differences ______. If the samples show much sampling variability, the underlying population probably ______. If different samples from a population vary little from each other, then most likely the underlying population harbors ______.
If we have a representative sample, we can use the data from that sample to draw conclusions about the populations. Actual calculations based on data are called ______, while theoretical values for a model are called ______. Examples of statistics and their corresponding parameters:
Statistic / Parameter
Mean / (mu)
Standard Deviation / (sigma)
Correlation / (rho)
Regression Coefficient / (beta)
Proportion
The method we use to select the sample is called the sample ______. The design of the sample is very important. If the design is poor, the sample will not accurately represent the population.
- Convenience sampling: ______
- Advantage: Easy and less costly to collect
- Disadvantage: Not representative of the population
- Example: In order to get an idea of how students think of a new policy, the principal stands outside the library and asks a few students their opinions.
- Voluntary Response Sample: A sample obtained by ______
- Advantage: Easy to collect
- Disadvantage: Over represents people with strong opinions
- Example: We post an advertisement in the newspaper asking PVHS students to respond.
Random selection, however, eliminates bias from the sample chosen.
- Simple Random Sample (SRS): consists of ______individuals from the population chosen in such a way that every set of ______individuals has an ______chance of being the sample actually selected. This is often the best and most appropriate way to collect data.
- Advantage: Easy to accomplish with a table of ______
- Disadvantage: None
- Example: In order to determine how happy students are at PVHS, the principal assigns each student a number from 1 – 850 and then uses a random number generator to choose 50 numbers between 1 and 850. He surveys all the students with the chosen numbers.
- Systematic Random Sample: randomly select an arbitrary starting point, and then select every ______member of the population.
- Advantage: Every member has an equal probability of being selected
- Disadvantage: Not every sample of size n has an equal chance of being selected.
- Example: HP selects every 200th computer off the assembly line and inspects it for quality control.
- Stratified Sample: divide the population into ______, then select a random sample from each group.
- Advantage: Can produce more exact information by taking advantage of the fact that individuals in the same strata are similar to one another
- Disadvantage: Not appropriate unless strata are easily defined
- Example: Divide all students at PVHS into 4 homogeneous groups: Freshman, sophomores, Junior, and seniors. Then choose a SRS from each grade level
- Cluster Sample: divide the population into ______groups. Randomly select one or more groups and include ______individuals from those groups in the sample.
- Advantage: Do not need a list of entire population
- Disadvantage: More variability between samples depending on how clusters are determined
- Example: Select several departments within the school (Math, English, Art). Within each of those departments, select several teachers. Choose ALL students in each class.
- Multistage: combining ______sampling methods.
- Undercoverage: some groups are left out of the process of choosing the sample
- Example: Students who are at Monroe, who have early release, on suspension, or absent may be left out of the sample.
- Nonresponse: an individual chosen for the sample ______to cooperate
- Example: A student chosen for the sample may refuse to divulge information or may be absent
- Response Bias: the behavior of the individual or interviewer may influence the accuracy of the response
- Example:Students may lie about drug or alcohol use
- Wording of Questions: confusing or leading questions influence responses; poorly worded questions will not yield accurate responses.