Sample Design for Surveys
Joshua Rodd
GEOG 5161
Spring 2011
Introduction
A sample survey is implemented in order to provide a description of a population by studying a smaller sample of that population (Creswell 2009, 145). Sample design is the process by which the researcher chooses the part of the population that will be included in the survey (Kalton 1983, 7). These are relatively new technologies of investigation—at the beginning of the 20th Century, statisticians and social scientists still debated the validity of surveys that did not attempt to enumerate and collect data from an entire group. It is now widely recognized that attempts at complete enumeration of a large population (i.e., a census) are often less accurate and useful than a rigorously designed sample (Bernard 2002, 142-143).
Note the caveat—to be useful, sampling must be a rigorous and thoroughly considered aspect of the research design. As useful as sampling is, a poorly designed sample usually results in inaccurate survey results. Therefore, as Creswell notes (2009, 147-149), anyone attempting to conduct a survey of any sort should pay ample attention to their sampling.
The focus of this presentation and summary of terms is oriented towards survey researchers doing cross-sectional studies. However, regardless of whether their work is quantitative, qualitative, or mixed, the researcher must think carefully about how he or she selects those subjects who will represent the group under consideration.
Sample Surveys in Human Geography.
While certainly prominent, survey research has not been a dominant methodology in human geography. A particular contribution by the discipline has been the integration of human survey data and GIScience, a technique that is gaining increasing attention in other disciplines. However, in a review of 613 articles on methods published in Annals of the Association of American Geographers (Annals) over a 100 year period, Kwan (2010) found only 12 with the word “survey” used as a search term in the title. Nor were all 12 articles were oriented towards human research; Kwan’s article suggests many were oriented towards land surveys. Perhaps this lack of attention derives from skepticism of surveys and other quantitative methods of human research held by many human geographers since at least the 1970s. (For an excellent summary of these criticisms, as well as a robust defense of quantitative methods in human geography, see Kwan and Schwanen (2009) and Schwanen and Kwan (2009).) Nevertheless, a review of the last three years of publications in Annals as well as The Professional Geographer demonstrates that many human geographers continue to employ survey methods.
Key Concepts in Sample Design:
The target population and the survey population: The group of people (or anything else) that interests a researcher is the target population. The survey population includes only those who will potentially be included in the sample. It is important to distinguish the target population from the survey population, as they may not be the same. For example, in a survey targeting the United States, it is very difficult to successfully sample those who are enlisted in the military or in prison. Therefore, although a researcher’s target population may be the inhabitants of the US, the survey population will not include prisoners or military personnel (Kalton 1983, 6).
Non-probability sampling: In contrast to probability sampling (see below), in a non-probability sample the researcher does not know the likelihood that any possible respondent will be selected. In such cases, there is often no sampling frame (see below) or any practical way to define one. A researcher may pull a convenience sample (also known as haphazard or accidental sampling), which includes those who are available for the study but who have not been randomly selected. Examples include volunteer respondents, interviews conducted on a street corner, respondents in a geographic proximity to the researcher’s work, or any other situation in which the probability of participation is unknown. Alternatively, a researcher might conduct a judgment sample (or expert choice sample), in which the researcher or another expert identifies a series of respondents (or clusters) judged to be representative. Finally, quota sampling is used by some researchers. Each interviewer is given a quota of certain categories of respondents (categories might be ethnicity, gender, age or other relevant characteristics) and the sample is assembled based on the willingness to talk of each respondent (Kalton 1983, 90-93; Bernard 2002, 180-202).
Probability Sampling: Statistical analysis of survey data depends on knowing the probability that any respondent was selected out of the survey population. If that probability of selection is known, the sample is a probability sample. If it is not known, the sample is a nonprobability sample (see above) (Kalton 1983, 7).
Sampling Frames: Once a researcher has identified a target population and decided to take a probability sample, he or she must identify a list of possible respondents from that population. This is the sampling frame, and it is of critical importance because it defines the survey population. Depending on the type of research one is doing, possible sampling frames might include telephone books, voter registration lists, motor vehicle records, company personnel records, association membership lists, or refugee camp feeding rosters. In places that are either poor or not greatly bureaucratized, sampling frames may be hard to come by. In such a case, the researcher may need to create the sampling frame himself or herself (Kalton 1983, 56; Lohr 2010, 3-8).
Sample Size: The size of the target or survey populations is not the most important question in determining sample size[1]. Instead, the critical issue is the degree of precision needed in order to answer the question the surveyor is asking. If the researcher is trying to determine the difference between two subpopulations, then the sample size will depends on the subtlety of the distinction and the level of significance and power. If the distinction between null and alternative hypotheses being tested is slight, sample size must be high. If the difference between null and alternative hypotheses is large, sample size can be smaller. In addition, the higher the statistical significance level and the higher the statistical power sought, the higher the sample must be (Rosner 2000, 236-242). (In addition, as variance around a variable in the survey population goes up, the sample size must also be increased.) If the researcher is trying to estimate a proportion based on a sample, then the sample size depends on the confidence level at which the researcher is estimating the proportion (Bernard 2002, 176-179).Beyond these questions, the researcher must also take in to account the nonresponse among sampled respondents.
Nonresponse: It is rare that all solicited respondents will be available to respond to a survey or agree to participate. Therefore, the researcher must estimate a refusal rate and adjust sample size up accordingly. In addition, if those who are not available or who refuse to participate are systematically similar to each other and different from those who do respond, the survey may be biased. Researchers should take this possibility in to account (Kalton 1983, 63-68).
Census: The simplest form of probabilistic sample is one that includes 100% of the target population. This kind of sample is also called a census. Although a census can be useful in some cases, for large populations it is often inaccurate. It is likely that not every member of the target population can be reached, and those who cannot be reached are often systematically different from those who can be. In the US, census workers have more difficulty finding and interviewing the homeless, the poor, and the very rich. These groups are likely to be under-represented in a pure census.
Simple Random Sampling: Once a sample size has been determined (see below), the simplest way to select a sample is by simple random sampling, or SRS. SRS can be accomplished by pulling names out of a hat, assigning a number to each individual in the sampling frame and using a random number generator, or other similar methods. The probability of selecting any potential respondent is equal to the chance of selecting anyone else. Although SRS is supposed to be simple, it is in fact often laborious to implement (Kalton 1983, 8-15).
Systematic Sampling: In systematic sampling, the researcher lists all the members of his or her sampling frame, randomly selects a starting point, and then selects each member of the frame that falls a set period after the start. The period is defined by the sample size and the number of individuals in the sampling frame. As an example, if the period is 8 and the random stating point is the 36th member of the sampling frame, the research would then pick the 44th, 52nd, 60th, etc. members of the frame until the sample were filled (Kalton 1983, 16-19).
Stratification: If a researcher requires additional statistical power in order to study a sub-population of interest, he or she may draw more respondents from that sub-population, or strata, than a random sample would dictate. This requires that the researcher have an accurate idea of the proportion of the sub-population in the larger population and also that it be possible to draw a separate sample for each stratum. For example, if a researcher is surveying undergraduates at CU but is particularly interested in male First Years, he or she would stratify by class and sex and oversample for male First Years (Kalton 1983, 19-20).
Cluster Sampling: In cluster sampling, the researcher first selects a grouping within the larger survey population, and then systematically samples individuals within the selected group. This technique is often employed when sampling frames are not available; for instance, in southern countries. In such a case, the clusters are often communities, which can be listed by the researcher even if the populations of these communities cannot be. The researcher randomly selects communities, then enumerates the inhabitants of each selected community and then randomly selects from this new sub-frame. If all clusters are of equal size, then the researcher has a fairly easy time of it. If they are not, then the researcher must adjust the probability of selecting the cluster accordingly, so that larger clusters are more likely to be selected than smaller clusters (Kalton 1983, 28-37).
Works Cited
Bernard, H. Russell. 2002. Research Methods in Anthropology. 3rd ed. New York: Altamira Press.
Creswell, John W. 2009. Research Design: Qualitative, Quantitative, and Mixed Methods Approaches. 3rd ed. Thousand Oaks, CA: SAGE Publications.
Kalton, Graham. 1983. Introduction to Survey Sampling. SAGE University Paper 35. Thousand Oaks, CA: SAGE Publications.
Kwan, Mei-Po, and Tim Schwanen. 2009. Quantitative Revolution 2: The Critical (Re)Turn. The Professional Geographer 61, no. 3 (August):283-291.
Lohr, Sharon L. 2010. Sampling: Design and Analysis. 2nd ed. Boston, MA: Brooks/Cole.
Rosner, Bernard. 2000 Fundamentals of Biostatistics. 5th ed. Pacific Grove, CA; Duxbury.
Schwanen, Tim, Mai-Po Kwan. 2009. “Doing” Critical Geographies with Numbers. The Professional Geographer 61, no. 4 (November): 459-464.
[1] However, if the population in question is small or the sample size is large proportionate to the population, adjustments can be made based on population size.