Lecture Notes 3 MAT 120
1.5 Wherein we consider the issue of bias in sampling.
»If the results of the sample are not representative of the population, then the sample has bias.
There are different types of bias, and different categories within these types. The text takes us on an exhaustive tour:
»Sampling bias means that the technique used to obtain the individuals to be in the sample tends to favor one part of the population over another.
The dread 'convenience poll' discussed in the previous section suffers from sampling bias. The results of an internet poll of potential presidential candidates may not accurately reflect the percentages of support, for given candidates, in the general public. It may be that savvy, motivated, 'political junkies' organize and drive up the percentage for a candidate they favor (again, Ron Paul often wins internet polls of Republican candidates).
Undercoverage is a type of sampling bias. It occurs when the proportion of one segment of the population is lower in a sample than it is in the population. In other words, the strata aren't properly judged. An example, again, is presidential polling – where the samples have to reflect the percentages of Americans who identify themselves as Democrat, Republican, or Independent.
» Nonresponse Bias exists when those who do not respond to the survey have different opinions from those who do.
The census handles nonresponse (failure to fill out a Census form) by scheduling home visits by Enumerators.
Nonresponse can be caused by many factors – age, income levels, distrust, etc. And therein lies the problem – many of the nonresponders are likely to have similar situations and views, which is where the bias creeps in. Thus, those views are the ones most likely to be improperly represented.
» Response bias exists when the answers on a survey do not reflect the true feelings of the respondent. The text lists a number of different types:
● Interviewer Error – If the subject does not trust the interviewer, he/she may be loath
to answer questions like "Have you ever committed adultery?"
● Misrepresented Answers – "How much money do you make a year?" Answers
are often exaggerated, for whatever reason.
● Wording of Questions (interesting) - (read examples, p. 40) The order of the
questions, and even the arrangement of the words within the question can often
affect the answers given on the survey.
● Type of question open vs. closed. These are for qualitative data ('categories'). An
open question allows the subject to pick his own response. A closed one requests
that he respond from one of a group of predefined categories. A balance must be
struck between limiting the number of categories and eliminating bias.
● Data-entry error
We'll practice #14-24 even.
1.6 Here we take a look at issues in 'Design of Experiments'. First, the definition:
» An experiment is a controlled study conducted to determine the effect varying one or more explanatory variables or factors has on a response variable. Any combination of the values of the factors is called a treatment.
An experimental unit is a person, object, or some other well-defined item upon which a treatment is applied.
Recall that the experiment addresses some of the deficiencies of the observational study, which doesn't allow the establishing of a causal relationship between explanatory and response variables. The experiment attempts to account for possible confounding variables, by the use of the control group. One popular method for establishing a control group is use of a placebo. The placebo group is compared to the other group to test the effectiveness of a drug (usually). The placebo also helps neutralize the effect (psychological, physical) that the testing itself has on individuals in a study, as the patients on the placebo will be subject to the same stresses, and any differences in the groups cannot be attributed to these types of confounding variables.
»A single-blind experiment is one in which the experimental unit (or subject) does not know which treatment he or she is receiving. A double-blind experiment is one in which neither the experimental unit nor the researcher in contact with the experimental unit knows which treatment the experimental unit is receiving.
Read: Ex. 1, do #12
The text then launches into an explanation of the steps in experiment design. The high points:
(1) Identify the problem to be solved (the 'objective' is a better description).
(2) Determine the factors that affect the response variable.
(3) Determine the number of experimental units (sample size).
(4) Determine the level of each factor. You can use a control or
'randomization', as the text explains (badly).
(5) Conduct the experiment: replicate it over the experimental units, and collect
the data.
(6) Test the claim – out of scope at the moment, as it requires inferential stats.
Read: Ex. 2, do #14.
» A matched-pairs design is an experimental design in which the experimental units are paired up. The pairs are matched up so that they are somehow related.
Read: Ex. 3