SAMPLING
Sampling
• The act of studying or examining only a segment of the population to represent the whole.
Advantages
• lower the cost of a research study
• shorter time
• better quality of information
• more comprehensive data
Definitions
Population – entire group of individuals or items of interest in a study
Target Population – group from which representative information is desired and to which the inferences will be made
Sampling Population – population from which a sample will actually be taken
Sampling Unit – units which are chosen in selecting the sample
Sampling Frame – a collection of all the sampling units
Elementary Unit or Element – object or person on which a measurement is actually taken or observation is made
Example 1
Research: Prevalence of Disability Among Children in Region I
Population: children in Region I
Target Population: 6-10 yr old children
Sampling Population: 6-10 yr old pupils
Sampling Unit: Division/District/School
Sampling Frame: list of pupils
Elementary Unit: one 6-10 yr old pupil
Example 2
Research: Malnutrition Related to Weaning in Municipality X
Population: all children in Municipality X
Target Population: all 6-24 mos children in Mun. X
Sampling Unit: Barangay
Sampling Frame: list of 6-24 children at RHU
Elementary Unit: one 6-24 mo. child
Sampling Error – the difference between the population value of the parameter being investigated and the estimate of this value based on the different samples
Pop Ave = (1+2+3)/3=2
Sample 1 & 2 = 1.5
Sample 2 & 3 = 2.5
Sample 1 & 3 = 2
Criteria of a Good Sampling Design
• Representative of the population
• Adequate sample size
• Practical and Feasible
• Economy and Efficiency
BASIC SAMPLING DESIGNS
- Non-Probability Sampling Design
– probability of each member of the population being selected as part of the sample cannot be determined
Judgemental or Purposive
Accidental or Haphazard
Quota Sampling
Snowball Sampling
onvenience Sampling
- Probability Sampling Design
– each member of the population has a known non-zero chance of being selected as a sample
Simple Random Sampling (SRS)
Systematic Sampling
Stratified Random Sampling
Cluster Sampling
Multi-Stage Sampling Design
Non-Probability Sampling Designs
• Judgemental or Purposive
A “representative” sample of the population is selected based on an expert’s subjective judgement or some pre-specified criteria
• Accidental or Haphazard
The sample is made up of those who come at hand or who is available (e.g. ambush interviews)
• Quota Sampling
Samples of a fixed size (quota) are obtained from pre-determined subdivisions of the population (e.g. religion research)
• Snowball Technique (Chain Sampling)
The sample is obtained by a process whereby an individual to be included is identified by a member who was previously included (e.g. drug use)
• Convenience Sampling
Study units that are easily accessible are selected as samples
For convenience sake, the study units that happen to be available at the time of data collection are selected in the sample
Advantages of Non-Probability Sampling Designs
• Easier to execute
• The only possible means
Disadvantages of Non-Probability Sampling Designs
• More likely to produce biased result
• No defined rules to compute for estimates
• Cannot compute reliability of estimates
Simple Random Sampling (SRS)
The simplest form of probability sampling. Every element in the population has an equal chance of being included in the sample.
How to do an SRS?
• Make a numbered list of all the units in the population from which you want to draw a sample
• Decide on the size of the sample
• Select the required number of sampling units using a “lottery” method, a TRN, or the RAN function of a calculator
Sample Table of Random Numbers
64249 / 63664 / 39652 / 40646 / 97306 / 31741 / 07294 / 8414926538 / 44249 / 04050 / 48174 / 65570 / 44072 / 40192 / 51153
05845 / 00512 / 78630 / 55328 / 18116 / 69296 / 91705 / 86224
74897 / 67359 / 51014 / 33510 / 83048 / 17056 / 72506 / 82949
20872 / 54570 / 35017 / 88132 / 25730 / 22626 / 86723 / 91691
31432 / 96156 / 89177 / 75541 / 81355 / 24480 / 77243 / 76690
66890 / 61505 / 01240 / 00660 / 05873 / 13568 / 76082 / 79172
48194 / 57790 / 79970 / 33106 / 86904 / 48119 / 52503 / 24130
11303 / 87118 / 81471 / 52936 / 08555 / 20420 / 49416 / 44448
Advantages of SRS
• Simple Design
• Simple Analysis
Disadvantages of SRS
• Not cost efficient because elementary units maybe too widespread
• Requires a sampling frame listing all elementary units of the population
Systematic Sampling
Samples are chosen at regular intervals (for example every fifth) from the sampling frame. The researcher computes for the sampling interval (k=N/n).
Example
• Population Size (N) = 1200 nurses: Sample Size (n) = 100
• Sampling interval (k=N/n)=1200/100=12
• Draw a number between 1-12 (inclusive). The number drawn will be the starting point of the sampling
• If no. 6 is picked, then every sixth nurse will be included in the sample starting with nurse 6, until 100 nurses are selected: the numbers selected would be 6, 18, 30, 42, etc.
Advantages of Systematic Sampling
• Les time consuming and easier to perform
• Can be used even in the absence of the sampling frame
• Sometimes can result in a more representative sample
Disadvantages of Systematic Sampling
• Units are widely spread
• Systematic bias (e.g. in clinic attendance, systematic sampling with a sampling interval of 7 days would be inappropriate as all the study days would fall on the same day of the week)
Stratified Random Sampling
The population is first divided into non-overlapping groups called (stratum) strata and then a simple random sampling is done for each stratum. (e.g. different levels of high school students
Advantages of Stratified Random Sampling
• Ensure subgroups are adequately represented
• Accurate estimates for each stratum can be obtained
• Produces more reliable results
Disadvantages of Stratified Random Sampling
• May require a very large sample if reliable estimates for each stratum are wanted
Cluster Sampling
It is the selection of groups of study units (clusters) instead of the selection of study units individually. Clusters are often geographic units (e.g. provinces, municipalities) or organizational units (e.g. clinics, training groups).
Illustration. A researcher would like to determine the performance of nurses in different hospitals in a region. The population consists of 1000 nurses in 25 hospitals in the region. His desired sample size is 320. how will the researcher select his sample?
Steps:
- Prepare a list of the 25 clusters (hospitals) comprising the population and assume the desired sample size to be 320.
- Estimate the average number of members per cluster by dividing the population size and the number of hospitals (1000/25=40).
- Divide the desired sample size by the average no. of members per cluster to get the no. of clusters to be selected (320/40=8).
- Select the needed no. of clusters/hospitals through random sampling.
Multi-Stage Sampling
A procedure carried out in phases and usually involves more than one sampling method.
The population is divided into sets of primary or first stage sampling units and then a random sample of secondary stage units is obtained from each of the selected clusters in the first stage.
Combination of sampling designs maybe applied.
Often used in community-based studies.
Example
Nationwide survey of all the 15 regions (Stratified)
1 province per region – primary sampling unit (Simple Random)
1 urban and 1 rural brgy – secondary sampling unit (Stratified)
One cluster of 35 households – tertiary sampling unit (Cluster)
Choose the household – elementary unit (Systematic)
Sampling Design is: 3-stage stratified, systematic, cluster, simple random sampling design
Advantages of Cluster & Multi-Stage Sampling
• Cost efficient design
• Sampling frame for all elementary unit not required
• Sample is easier to select
Disadvantages of Cluster & Multi-Stage Sampling
• More complicated design to implement
• More complicated analysis
• Need for bigger sample size to achieve sample precision
• Units are widely spread
Bias in Sampling
- Non-Response
- Studying volunteers only. The fact that volunteers are motivated to participate in the study may mean that they are also different from the study population on the factors being studied. Therefore it is better to avoid using non-random selection procedures that introduce such an element of choice.
- Sampling of registered patients only. Patients reporting to a clinic are likely to differ systematically from people seeking alternative treatments.
- Missing cases of short duration. In studies of the prevalence of disease, cases of short duration are more likely to be missed. This may mean missing fatal cases, cases with short illness episodes and mild cases.
- Seasonal bias. . It may be that the problem under study, for example, malnutrition, exhibits different characteristics in different seasons of the year.
- Tarmac bias. Study areas are often selected because they are easily accessible by car. However, these areas are likely to be systematically different from more inaccessible areas.
Ethical Considerations
• Representativeness
• Truth in Publication
• Care in drawing conclusions and recommendations
Determination of Sample Size
Learning Objectives:
- Determine the different factors to be considered in calculating the sample size.
- Enumerate non-statistical considerations in sample size calculation.
- Calculate the sample size for specific objectives e.g. determination of proportion and comparison of means.
- Illustrate sample size calculation usign different methods.
How large a sample do I need?
• The bigger the sample, the better the study becomes.
• A sample size of 30 is large enough.
• Ideal is 10% of the population size.
Factors to be Considered in Sample Size Estimation
- Objectives of the study
- Research Design (one sample, two samples, cohort, case-control, etc.)
- Sampling Design (SRS, Cluster, etc.
- Magnitude of the parameter and Variability
- Level of Precision
Precision – amount of error one is willing to tolerate
Reliability – level of confidence one is wiling to commit that the population value is within the maximum tolerable error
Standard Error – is the standard deviation of all estimates if one is going to collect the estimates of all possible samples in the target population
- α, β and Power
α – the probability of rejecting a true null hypothesis. It represents the significance level.
*The null hypothesis is the hypothesis of no significance, no association or no relation.
*The researcher has the option to choose the alpha level. Conventionally, alpha is usually set from .01 .10.
Example: If the alpha level was set at 5% or .05 for the null hypothesis that there is no association between smoking and lung cancer, then the researcher is committing a 5% chance of erroneously rejecting that null hypothesis or a 95% confidence of rejecting a false null hypothesis.
β – the probability of not rejecting a false null hypothesis.
Power (1- β) – the probability of observing an effect in the sample if the specified effect size or greater exist in the population.
Example: If β is set at 0.10, then the investigator has decided that he is willing to accept a 10% chance of missing an association of a given effect size. This represents a power of 90% which is a 90% chance of finding an association of that size.
*Many studies set α at .05 and β at .20 (or a power of .80). These are arbitrary values and the investigator may choose values for:
α between .01 and .10 and
β between .05 and .20
Alpha (α) is kept at a low level when it is important not to make a mistake of rejecting a true hypothesis.
Beta (β) is kept at a low level if it is important not to accept a false hypothesis.
Z-value (normal dist) corresponding to the reliability level and power of the test
α and β / Z-valueTwo-tailed / One-tailed
.20 / 1.28 / 0.88
.10 / 1.64 / 1.28
.05 / 1.96 / 1.64
.025 / 1.96
.01 / 2.58 / 2.34
- Effect Size – the magnitude of association one would like to test in the sample.
- Is the value of clinical importance?
- Treatment of Parasitism: 2% effect vs 10%
- Type of Alternative Hypothesis
– Two-sided: P1>P2
– One-sided: P1 > P2 or P2 < P1
- Variability of the Parameter
- Type of Outcome
– Qualitative - proportion
– Quantitative; Continuous – mean, std. dev.
- Data Analysis Plan
– Multivariate vs Univariate
Other Non-Statistical Considerations
• Money
• People
• Fast
• Time
Determining the Sample Size
• Using the Formula
• Using Computer Software e.g. Epi-Info
• Using the Sample Size Table
Example 1
Slovin’s Formula
A researcher would like to determine the research capability of graduate students in SUCs in Region I. There are four SUCs in Region I offering graduate studies. Let us consider the hypothetical data on the number of graduate students in the four SUCs. Let us determine the sample size to be used in the study.
SUCs / Population (N)MMSU / 400
DMMMSU / 800
UNP / 600
PSU / 700
Total / 2500
Steps:
Determine the population (N = 2500)
Using margin or error (e) of 0.03, determine the sample size using the formula given below
N 2500
n = ------= ------= 769
1 + Ne2 1 + 2500(0.03) 2
Example 2
Estimation of Population Proportion
A public health parasitologist wishes to conduct a study to estimate the prevalence pf parasitism among Filipino 1-5 year old children. How many subjects should be included in his study if the prevalence in a related study is 70% and the desired precision and reliability for his study is ± 3% and 95% respectively.
Given the following specifications:
• Proportion, P = .70; Q = 1-P = .30
• Precision, d = .03
• Confidence Level = 95%; Z = 1.96
Z2PQ (1.96)2(.70)(.30)
n = ------= ------= 896.37 subjects
d2 (.03)2
Example 3
Hypothesis Testing of Diff. Between Two Means