8 – Sampling
Part 1 : Overview of sampling
What is sampling?
* Sampling is about selecting, without bias and with as much precision دقة as resources allow, the “item” or elements from which or from whom we wish to collect data.
In market and social research projects these elements are usually people, households or organizations, although they may be places, events or experiences
* Developing a sampling plan is one of the most important procedures in the research process, it involves:
- Defining the target population
- Choosing an appropriate sampling technique
- Deciding on the sample size
-Preparing sampling instructions.
-Provide details of sampling frame
Defining the populationالجمهور المستهدف
- In a research context, Population refers to “universe of enquiry” – or- people, organizations, events or items that are relevant to research problem.
- Define the population of interest depends on research problem and objectives
- It is important to define the population of interest as precisely as possible
Example: if we say older people, we need to be more precise and take about the age range and living circumstances?
Target and survey populations
* Target population: is the population from which results are required
* Survey population: is the population actually covered by the research
- The ideal conditions the two are the same, but for practical reasons – may not be
Example:
People or organizations in remote areas are difficult to access using face-to-face survey, thus may not be included in a survey population
In a survey for older people’s health, it may be difficult to get permission to interview those living in sheltered.
Thus, if there is a different between target and survey population, it should be made clear to all involve with the research to avoid misrepresentation of the research and its findings.
Census or sample?التعداد أم العينة
– Census: is collecting data from every member or element of that population
– Sample: is collecting data from representative subset
Census is feasible when:
When population is too small and accessible enough. E.g. employees of an organization
When it is necessary to collect data from all elements of a population. E.g. When research is about changes in working practices, its important to survey all employees attitudes and opinions.
Drawback of census
– Risk of feasibility, and time & cost consuming (when population of interest is too large)
– Risk of high level of non-response (which lead to less representative that might achieved with a well-designed sample )
– Risk of high level of non-sampling error
In the end, a census may deliver poor quality data than well-designed sample.
Sampling techniques
Two categories:
1- Random or probability sampling: in which each elements/member of the population has a known and non-zero chance of being selected. The person choosing the sample has no influence on the elements selected.
2- Purposive or non-probability: we do not know what probability each element has of being selected because the person choosing the sample may on purpose or without purpose favour or select particular elements
Choosing a sampling technique
For qualitative studies: involves small sample size, non-probability technique is the most suitable.
For quantitative studies, choosing the technique influenced by the nature and the aim of the study, by practical concerns including:
– Nature and accessibility of population
– Availability of suitable sampling frame
– Constraints of time and budget
Sample size
Sample size is: the number of elements included in the sample, it is important in term of the precision of the sample estimates but on its own does not guarantee that the result will be accurate or unbiased.
Choosing a sample size
The way in which the sample is chosen affected by the sampling technique used, the sampling frame.
Choice of sample size depends on:
1- The nature and purpose of the research
2- The importance of the decision to be made on the basis of the results
3- It must be large enough to:
-Provide evidence with a degree of confidence in the findings.
- Do the analysis needed (that allow comparison of sub-groups within the sample).
4- Time and budget constraints are also factor in the choice.
*-* If the level of precision of the sample estimate or the size of the confidence level or interval required is known, the sample size can be calculated to achieve these.
Checking the sample achieved:
During and after fieldwork the sample is monitored and checked to ensure that unites and elements selected meet the sample requirements.
If any inconsistency found (such as high rates of non-response or under- or over-representation of elements), further sampling and fieldwork work will be required.
Part 2: Ideas behind random sampling
Random sampling:
Random or probability sampling: is where each elements of the population is drawn at random and has a known chance of being selected. The person choosing the sample has no influence on the elements selected.
For random selection to produce a truly representative sample:
1- The sample size must be at least 100
2- Population must be well mixed
3- Sampling frame must be: complete, accurate, up to date
4- Non-response must be zero – or - all those selected must take part in the research
Terminology
Population parameter: Thing we want to talk about in the population.
E.g. average or proportion of 18-24 years olds who drink brand A. (%)
Sample statistic: The corresponding figure derived from the sample is an estimate of population parameter.
E.g. estimate of the proportion who drink brand A in the population.
Sampling distribution of the men:
e.g. you are interested to know the weekly food spend of single person household in UK.
You select a sample at random of population of a single person household , and from the sample data you note the average(mean).
You select another sample ,,,,, and continuo the process until you done, then plot the value of the average weekly spend on food from each sample on a graph.
At the end, you have a graph “bell-shaped curve” of a normal distribution, known as “sampling distribution of the mean”. as the following figure:
Sampling distribution of the mean
Sampling variability
The graph shows that each sample does not produce the same value; a range of samples produces a range of values for the same measure. This variation known as “Sampling Variability”
But “How can we know how accurate the sample estimate is? , we do this by using “ Standard error of the mean”
Standard error of the mean
- It measure of the variability within the sample (sampling distribution)
It is the standard division (S) of the sampling distribution, that we use to measure the accuracy of a particular sample (n) estimate.
- The larger sample size, deliver results that are more accurate with smaller the standard error ( has bell shape “normal distribution”)
- The less variability there is in the sample, the smaller the standard error
Confidence intervals
- It is Properties theory of the normal distribution
- If the shape is “symmetric”: 50% of measures below and 50% above the mean
- If we divided the curve into segments we, then:
– 68% lie within 1 SD either side of the mean
– 95% lie within 2 SD either side of the mean
– 99% lie within 2.6 SD either side of the mean
– 99.9% lie within 3 SD either side of the mean
Example” in a survey for weekly food spend, we found that it is = €250
- How confident can we be that the population value is €250?
- It is from a probability sample so there will be some variation
- Limits on range of a value are called confidence limits (fall between x SD of mean)
- If we want to know margin of error we use “Confidence interval”
Part 3 : Sampling techniques
* There are three main approaches/ techniques/Methods to sampling:
1- Probability or random sampling
2- Semi-random sampling
3- Non-probability sampling
1- Probability or random sampling
Random or probability sampling: in which each elements/member of the population has a known and non-zero chance of being selected. The person choosing the sample has no influence on the elements selected.
* Random sampling approaches include:
1- Simple random sampling
2- Systematic random sampling
3- Stratified random sampling
4- Cluster & Multi-stage sampling. Sampling or probability theory underpins random sampling.
1- Simple random sampling – or – unrestricted random sampling approach
- Two main methods: Lottery & Random number method
- Simple random sampling work like this:
Imagine we have a population of 1,000 (N=1,000)
The population may include people or organizations and whatever relevant to the research investigation
Before making any selections from the population, we know that each item in it has a 1 in 1,000 chance of being selected
Once an item is selected as a part of the sample, we do not return it to the population, this is known as “sampling without replacement”.
“Sampling without replacement” used to make sure that no item (e.g. person or organization) is chosen more than once.
Sampling with replacement makes this approach different than sampling associated with probability theory.
In market & social research surveys we do not usually interview the same person twice.
2- Systematic random sampling
- Items in population numbered from 1 to N and arranged in random order
- We decide on sample size we need (n), and we work out sampling interval (k) by divide the population size N by the sample size n
- We select every N/n item from the randomized list of the population. قائمة عشوائية من السكان
For example: of we have a population of 6,000 and we need to draw a sample of 200.
- We calculate the sampling interval to be 30 (6,000/200) and start at a random point between 1 and 6,000 (N), where every 30th item from the list until we get the require sample size of 200.
It’s called systematic because a system is used for selecting the sample sampling interval & there is a randomly chosen starting point on the list will determine which items in the sample are selected.
So each item is selected is dependent on previous item, while there is no dependency in simple random sampling.
The list should be randomized to produce a result that very similar to those produced by simple random sample. If the list is order, for example, employees in order of their staff grade, then a systemic sample may produce a better sample because it will ensue a spread of sample unite across the list.
Limitations:
* The systemic approach may not deliver a good sample:
- If there is a pattern or sub-divided into categories. E.g. users and non-users of a service.
- If items on the list grouped in a way where some groups may missed out or under-represented
* For practical reasons, both simple and systematic random sampling may not be able to use; in many social and market research situation list of target population may not be available. Or the population size makes it difficult to number all of the items.
Although computerized list and database make it a less problem.
3- Stratified random samplingالعينة العشوائية الطبقية
The most widely used methods of sampling in research.
In sampling population from a market or social research project it is very likely that we know something about that population which improve the quality of the sample and the accuracy of the results derived from it.
Example: in a population of employees, we may know which staff grade each holds. We can use the information to make sure that employees from each staff grade are properly represented in the sample. To do this we must divide the population into the relevant groups or “strata”, e.g. who belong to grade 1, grade 2, ,,,
Choosing “Stratification factor” depend on what we believe is more relevant to the research objectives
From each strata we choose the require sample size- using simple random or systematic random sampling approach.
4- Cluster and multi-stage sampling
- Cluster is where population divided into groups: that national population can easily divided up into administrative areas, sate or region
For example: organizations have departments; we can use these clusters in a sampling strategy
- Multi-stage sampling:
First stage known as primary sampling units (PSUs): select sample of groups such as a department
Second stage: select a sample from within each group
If the units within each of the PSUs are clustered together, the sample known as a cluster sample. -But it is not necessary that multi-stages sampling to be with cluster
Advantages: cost effect approach and the sample may be more widely spread – more than both simple and systematic random sampling
Disadvantages:
- Standard error is greater than simple and stratified random sample.
- And because of standard error, Sample estimates may be less accurate than from a stage probability sample.
Sampling frame
* Sampling frames are used in order to draw random samples. A sampling frame can be a database, a list, a record, a map – something that identifies all the elements of the target population.
Example: Electoral Register
- To be effective as a sampling frame, to allow you to draw a representative sample of the population, it must be accurate, complete and up to date.
- A practical frame: must be easily available, easy to use, and have necessary information to enable us to find the elements listed on it.
- Problems with sampling frames arise as a result of:
1- Missing elements: missing elements that belong to the population but do not appear on the sampling frame. It is difficult to detect if there is a missing element in a sampling frame, but incomplete one mean that the sample derived from it will not be representative of the population. To deal with it we must compare it with another source of information.
2- Clusters of elements: A sampling frame may list elements as groups or cluster of elements, by which make it hard to identify similarities or differences in cluster elements and may not having an equal chance of selection when we randomly choosing from the cluster.
3- Blanks or foreign elements: it is any element that included in the sample frame and does not belong to it. Such as out-of-date elements that shouldn’t be included anymore. E.g. Retired employee. Replacement requires dealing with it.
4- Duplication: appear more than once. It’s easy to deal with electronically.
2- Semi-random sampling
It is the way of reducing the time and cost involve without giving the interviewer greater role in selecting locations, households or individuals (to avoid selection bias).
Semi-random sampling procedure known as “random route sampling” or ”random walk”
The selection started using multi-stage stratified random sample, and then each interviewer is given one selected item to do the first interview.
Along with the starting address, the interviewer given a set of instructions for selecting subsequence addresses to interview.
Less cost and fieldwork time related to the following call backs that used to achieve the interview.
3- Non-probability sampling
It is not always possible or feasible to use probability sampling method, in the case of:
- Cost and time limitation
- Unavailable sample frame
- The type of research do not require it
Thus, we use non-probability sampling method:
- The interviewer or observer has some control over the selection of the sample elements.
- We don’t know the chance of selecting any item
- We can’t use probability theory to make inference about the population based on the sample
- We can’t make calculations about the accuracy of sample estimate
* Quota sampling: is the most commonly used non-probability sampling method and is employed widely in market research.
Information on key characteristics in the target population (derived from primary & secondary sources) is used to design a sampling framework (quota) that reflects the make-up of the population on these key characteristics.
Quotas are allocated to interviewers, and interviewers task is to select individuals/items that fit the characteristics set our in the quota.
The quality of the quota sample depends on:
- The degree of randomness with which the interviewer makes selections
- How accurate and up to date us the information on which the quota controls are based.
* A well-designed probability or random sample should be representative of the target population in all aspects ( because of randomness):
- A well –designed quota sample may only be representative of the population in term of the characteristics specified in the quota – it may be unrepresentative in other ways.
- With probability samples we are able to estimate representativeness; with quota sampling we are not able to estimate representativeness, or even gauge the possible biases that exist.
Qualitative research and sampling
* Non-probability sampling techniques are used in qualitative research:
- Samples are typically small
- Probability theory does not apply
-But representativeness is an important goal
-Selecting a sample for qualitative research should be a rigorous and systemic process.
* In choosing a sample for a qualitative research it is important to:
Defined clearly the target population
Defined clearly the relationship the sample has with the population
The sampling process
In qualitative market research sampling is usually refers to as recruitment and the specially trained interviewers who know as recruiters.
The recruitment or scanning questionnaire can be used and recruiters are asked to fined individuals who match the recruitment or sample criteria.
Recruitment’ techniques
– Screening
– Convenience (‘lurk and grab’)
– List sampling
– Network or snowball sampling
– Piggy-backing or multi-purposing
Sample size
Can take a rolling or dynamic approach,. That is, findings emerging will tell you if you have sampled enough . – To continue until reach ‘saturation’ – no new insights
E.g. theoretical sampling
Although sample sizes in qualititative studies are usually small, they should be large enough to:
– Provide credible evidence re research objectives
– Allow analysis and comparison of sub-groups
Summary
Sampling :Scientific, cost effective way of getting at population of interest