PART C - SAMPLING
Understanding Research Methods By Mildred Patten
SAMPLING(Biased or Unbiased)
Infer or generalize to population from samples
Define population
Draw sample or conduct census
Poor sampling=poor inferences (two main concerns)
Sample size
Sample selection method (biases)
Unbiased (every pop element has equal chance to be included – names in a hat)
Common biased methods (fail to identify all population members, convenience samples, volunteers)
Simple Random Sampling
Simple random requires names in a hat OR use of a table of random numbers
Each population member must be numbered with same number digits; use random starting place in the table
Random samples are subject to error
Called random sampling error (when the sample differs from the population)
Sample size affects random sampling error
Increase sample size to decrease sampling error
Bias is result of NON RANDOM or SYTEMATIC ERRORS
Sample size does not affect biases; biases are the result of something the researcher did
SYSTEMATIC SAMPLING (or Nth sampling)
Use a randomly ordered list if possible (alphabetical is sometimes used)
Select a random starting point
N is equal to the number in the population divided by the number in the desired sample (population size ÷ sample size)
Make sure to go completely though the list so all members have an equal chance of selection
StratifiedRandom Sampling
Random sampling is unbiased, but sampling error can occur
To be more precise and reduce sampling error, sometimes use stratified sampling
Population is divided into relevant strata (multiple strata are often used such as age, sex, location)
Usually proportional – with equal proportions from each strata in the population selected in the sample
Stratification insures small subgroups are correctly represented in the sample
Other Methods Of Sampling
Cluster – randomly select groups, not individuals
Beware, clusters are more homogeneous, thus a large number of clusters and stratification may be needed
Purposive – select those with desired info
Not random and generalizations are dangerous
Snowball – to locate subjects hard to find (addicts)
Bias is presumed in these, but best available
Multistage – stratify by urban, suburban, rural counties (clusters), then randomly select houses and individuals within them
Topics 20-23REVIEW QUESTIONS
What is the first step in sampling?
What is the best way to draw an unbiased sample?
Identify the following sampling methods:
JC, Kingsport, and Bristol all have 30,000 people in them and I draw a sample of 66 customers from each city
I randomly select 15 businesses in each of the three cities and send them surveys to give to their employees
I get a list of all 30,000 students at UT and select every 100th student to get a sample of 300 students
I ask everyone I know in chemotherapy to provide the name of someone else who is also receiving chemotherapy so I can interview them
Sampling and Demographics
Demography – study of people and populations
Characteristics of participants (age, sex, income, marital status, etc.)
Describe participants so reader can judge usefulness of results
Can compare to population to see if representative
Can track response rates by demographics to see if bias in responding
Sample Size
Elimination of sample bias is most crucial element
Adequacy of sample size is second most crucial
Increasing sample size increases precision
Meaning results vary little from sample to sample
Larger is better, but there is a diminishing return
Larger sample does nothing to eliminate bias
SAMPLE SIZE:A closer look
How many needed depends on:
Money and time available (more subjects=more $/time)
Occurrence of effect measuring (rare=more subjects)
Variability in population (much variation=more subjects)
Size of difference in groups (small diff=more subjects)
Pilot study (20-100 subjects) gives insight including expected return rates
It’s relative, but 1500 is the upper limit, with about 30 the lower limit See Table 2, Recommended Sample Sizes
Topics 24-26REVIEW QUESTIONS
How can you increase your sampling accuracy?
I want to determine TC students’ attitudes about the on-line library and ask my students to complete a survey. What type sample is this and are there any reasons not to trust it?
How many subjects is the absolute minimum needed to have faith in results? What about the maximum?
PART D - INSTRUMENTATION
Introduction ToValidity
Valid instrument measures what it is supposed to
performs the function it purports to
Tests/instruments are valid for a particular purpose
Must first state the purpose of the measure
It’s a matter of degree – how valid is it?
Reasons For Invalidity
Testing only a sample of the construct studied (some samples better than others)
Some constructs/traits are elusive
how to measure honesty/cheerfulness – hard to get the full essence
Hard to quantify constructs BUT replication is impossible if you do not
JudgmentalValidity
Content validity – Judgments on appropriateness of content (used especially with achievement tests)
3 Parts to content validity
Broad sample content (cover entire content)
Important material emphasized (more test items on it)
Different skill levels covered (knowledge, comprehension, application, analysis, synthesis, evaluation)
Facial validity – judge whether, on its face, the instrument measures what it’s intended to
Occasionally intentional low facial validity (to hide purpose of research)
EmpiricalValidity
Criterion related validity – comparisons between a measure and some criterion
Predictive validity – criterion occurs in future
Concurrent validity – criterion occurs now
Validity coefficient (0 to 1.0); closer to 1.0 is better (means high/high and low/low relationship)
- Also can go to -1.0 (a high/low and low/high relationship)
- Rarely are validity coefficients 1.0, too much variation in complex constructs
JUDGMENT-EMPIRICAL(Construct Validity)
Construct validity uses both judgment and empirical methods
Measure hypothetical constructs (not concrete, constructed)
Cannot see, touch, feel, but can see indicators of them (honesty, love, fear, depression)
Infer construct existence by observing collection of indicators
To test, correlate with some other measure of the construct or test to see if effects of it are present – correlate depression with a happiness scale you develop; also correlate your scale with count of smiles/hour (this is an indirect measure of validity, so use a series of tests to establish construct validity)
In establishing validity of scale or test, look at all measures
Judgment – content/face
Empirical – predictive/concurrent
Judgment/Empirical – construct
Topics 27-30REVIEW QUESTIONS
Establishing validity requires what?
Name and explain the four types of validity:
RELIABILITY(and its relationship to validity)
Reliable = consistent (take two measures and compare them to see if consistent)
Highly reliable measures may be valid OR not
Validity is most important, but to be useful must have valid AND reliable measurements
Can have reliability without validity, but not validity without reliability
Text example - shooting at targets
Measures of Reliability
Compare two measures and calculate a reliability coefficient
Correlation between two quantitative measures
Closer to 1.0 the better (.80 is high; .50 can be useful)
Test/retest (test same group with same instrument at two times – needs to be enduring trait)
Inter-rater/observer (consistency between 2 raters)
Parallel or equivalent forms (2 forms of instrument)
Internal consistency (split-half; intra-rater; cron.alpha)
Topics 31-33REVIEW QUESTIONS
What does reliability mean?
Name 4 ways to establish reliability:
Can you have reliability without validity?
Can you have validity without reliability?
Norm-Referenced v. Criterion-Referenced Tests
NRTs compare individual performance with group (TCAP)
Items are intentionally of medium difficulty
Shows how a local group differs from the norm
CRTs measure whether an individual has met performance standards
Item difficulty of little concern; instead want examinee to meet a set level of performance
Describes what examinee knows and doesn’t know
Measures of Optimum Performance
Achievement tests – measure knowledge and skills acquired (measure effectiveness of instruction to see if objectives met)
Objectively scored - multiple choice (provides snapshot of achievement)
Subjectively scored - essays, performance, products (for reliability need standard scoring system such as checklist or rating scale Excellent/Good/Fair/Poor)
Aptitude tests – predict achievement (SAT/college)
Intelligence tests – predict general achievement
Measures ofTypical Performance
Examinees’ best performance is measured with achievement/aptitude/intelligence tests
Typical performance desired for personality traits (attitudes/interests/disposition)
Social desirability bias is a problem (and Hawthorne or guinea pig effect)
Anonymous surveying helps
Direct unobtrusive observation useful
Projective techniques can be used (Ink blots; Most employees…); Note that these require expertise
Likert most common attitude scale (SA to SD)
One statement per topic; multiple statements cover all components of the construct; mix neg/positive statements
Topics 34-36REVIEW QUESTIONS
Is the driving part of the test you take to get a driver’s license a norm or criterion referenced test?
If I want to find out how much people cheat on their taxes, what type of bias might occur in doing surveys?
When using a Likert scale, several questions are asked about one topic, then the answers are totaled. Why is this better than asking just one question?