Data Collection

Measurement

Sampling Methods

Survey Design

Designing Experiments

Copyright  1993-97 Thomas P. Sturm

Experimentation / Measurement

- A method of determining a specific value for a variable

- To have VALIDITY, must insure that the variable used to represent the property is relevant or appropriate.

e.g. asking for height in gallons is not appropriate

- To accurately portray the characteristics of the population, use an INSTRUMENT that possesses the following characteristics:

UNBIASED

Bias is a systematic tendency to misrepresent (overstate or understate the true value) the data in some way

e.g. AGE on the survey - biased 1/2 year low

PRECISE

Lack of precision causes observed values obtained through the measurement process to be somewhat distant or scatter from their "true" value

e.g. EARNINGS LAST SUMMER - how many responses would stand up to an IRS audit for accuracy

RELIABLE

Unreliable results are those which would be quite different if the experiment/observation were made again under "identical" circumstances

e.g. PICK A NUMBER FROM 1 to 10 - how many would pick the same number again and again

Sampling

Sampling is the process by which we select the sample that we are going to measure.

The sampling itself must be done to provide an unbiased, reliable, and precise estimate of the values of the population it is intended to represent.

A CENSUS is actually an attempt to "sample" the entire population, and is generally expensive

- could be impossible (flash bulb testing??)

- could be inaccessible (homeless??)

IF you expend enough effort to get everything, is the most accurate, reliable, and unbiased

A CONVENIENCE SAMPLE is a sample of whatever is the easiest to measure

- students in a class

- what you happen to have on hand

generally the most prone to inaccuracy and unreliability, and very likely to be biased

A SELF-SELECTED SAMPLE is a sample of people who "choose themselves" to be in the survey

- phone-ins to 900 numbers

- mail-back surveys without follow-up

generally the most prone to bias, and very likely to be inaccurate and unreliable

Sampling Methods

The following sampling methods, when used with care within their limits of applicability, can produce unbiased, reliable, and precise results

SIMPLE RANDOM SAMPLE (SRS) - every member of the population has exactly the same probability of being selected

- can be hard to make the probabilities exactly equal

- can miss an accurate description of "rare" subsets

- could still be expensive

STRATIFIED SAMPLE - divide the population into "strata" and then perform an SRS on each strata

- e.g. healthy adults vs. those with a rare disease

- need to know relative sizes of the strata

SYSTEMATIC SAMPLE - start at a random point, and then select every kth item

e.g. for a sample of 1/10th of the population at an event that issued numbered tickets, pick at random a digit from 0 to 9, and then include everyone whose ticket number ended with that digit

- could be just a expensive as SRS

CLUSTER SAMPLE - pick, at random, areas or regions or groups of the population, then perform a census within each group

- least expensive of the above methods

- must have enough areas to avoid unreliability

- must carefully check results between groups for bias

Measurement Errors

- Reporting errors (in 1950 survey, average age of women over 40 was under 40 years old)

- Recording errors (random transcription errors)

- Unit of measurement errors (some in dollars, some in cents; some per unit, some per six-pack, some per case of 24)

Suggestion: pick a convenient unit of measurement, perhaps through the use of a consistently applied coding technique

- Processing errors (performing mathematical operations inappropriate for the scale of measurement of the data)

- Non-response errors (no response from selected groups)

- Errors in doing the sampling

- Errors in adjusting data from stratified samples

- Must accurately classify each response into appropriate strata

- Must know the proportion of people actually in each strata

- Must properly "scale back" the responses from the "overrepresented" strata to derive population statistics

Survey Design

When designing a survey, you must look ahead to the administration of the survey, the collection of results, the tabulation of results, the analysis of results, and the interpretation of results.

To do this successfully, you must ask some basic questions:

- Why am I doing the survey? What specific facts do I hope to learn more about? What variables might be use to measure those facts and what variables might influence those facts?

- Who am I going to survey? What is my population? Am I surveying people, or doing experiments with physical objects? Can I realistically obtain the kind of sample I want from that population at reasonable cost?

- What questions will I ask? Some questions need to address the specific facts I hope to learn more about, while other questions need to "consider the source." These latter questions are called "demographic" questions. For example, if I want to learn more about pop consumption on campus, in addition to asking questions about how much pop is consumed, when it is consumed, where it is purchased, where it is consumed, diet or regular, etc., I might also want to ask demographic questions such as age, class year, sex, weight, day student or boarder, etc.

- The remaining questions are form of the survey, how many surveys to administer, and how to ask the questions.

Form of the Survey

Direct measurement in the laboratory:

+ Most accurate and reliable

+ Least subject to unknown influences

- Can be expensive, many times impractical

Direct personal interviews:

+ Consistent measurement if interviewers are well trained

- Time consuming

- Hard to get a random sample

Telephone interviews:

+ Somewhat consistent measurement with skilled interviewers

+ Less expensive than lab setting or face-to-face interviews

- Lower response rate due to hang-ups and no answers even after many callbacks

- Incomplete surveys due to hang-ups

Mail surveys:

+ Quick

+ Least expensive

- Lowest response rate (10% to 20%)

- Great deal of opportunity to misinterpret questions

- No idea if person is informed about the subject you are studying

How Many Surveys to Administer

Determining the number of surveys to administer depends upon the following factors:

What form of survey is being done?

- you need to send out about 10 times as many mail surveys as you would need lab participants

For nominal demographic data, how many categories does the data divide into (maximum over all questions)?

- you need 5 times as many completed surveys if you have a demographic question that has 10 possible nominal responses than if you have a survey all of whose demographic questions on a nominal scale are logically divided into two categories (e.g. male/female, boarder/day student, graduate/undergraduate, yes/no, etc.)

How accurate do the results need to be?

- the more data, the more accurately the measurement can be done

How much statistics do you know and how much professional statistical assistance can you afford?

- the SMALLER the sample, the MORE work it is to accurately draw conclusions from the data

- ideally, you want 30 completed surveys PER DEMOGRAPHIC GROUP for the demographic question measured on the nominal scale with the most possible response values.

Example Calculation of Number of Surveys to Administer

Example 1:

We are doing a laboratory experiment in which all of the demographic questions on a nominal scale are yes/no. We will need to insure 30 yes responses and 30 no responses to each demographic question. However, since we are controlling the demographics, we could get by with as few are 60 well-crafted experiments.

Example 2:

We are doing a long mail survey in which one of the demographic questions is on a nominal scale and has 10 possible responses. We need 300 completed surveys at a minimum to get 30 for each value on the nominal scale. However, we have no control over the response, so we could need up to twice that many (600) completed surveys. Mail survey response is in the 10% to 20% range, but our survey is long, so we should expect on the low end of that range. Since we could also use the additional responses if they come in, we assume a 10% response rate. This means we should send out about 6000 surveys.

In both of these examples, our ultimate results are likely to have comparable (low) accuracy, but the statistical analysis required (because of 30 in each group) will be manageable without advanced statistical methods.

How to Ask (Phrase) Survey Questions

Many times the measurement process does more to determine the scale of measurement that the nature of the property being measured.

Always want to strive for ratio data or as close to it as possible.

The method of asking a question, establishing a "ruler," or selecting units of measurement can have a dramatic effect on the scale of measurement (and the measurement errors) of the resulting values.

Examples:

Temperature:

Fahrenheit - interval

Kelvin - ratio

Age:

Young/Old check boxes - nominal (unreliable??)

Traditional College Age/Older check boxes - ordinal

Birthdate - can be converted to ratio without bias

In general, try to control the responses to things that can easily be converted to numerical values on as high a scale of measurement as possible, and try to provide as much "calibration" as possible so that the values are being measured as consistently as possible between subjects.

Miscellaneous Survey Mechanics

Different respondents may interpret the question differently. This variation needs to be eliminated with totally unambiguous language. This is best tested by piloting a survey and then asking respondents how they interpreted the question.

All respondents must understand the question. The variation in reading ability needs to be factored out by targeting the reading level to below grade 8 level difficulty. This can be measured by computer software.

Respondents may “give up” in the middle of a survey. Make sure the important questions are asked first, and the less important questions (less important demographics, for example) are at the end.

Designing Experiments

We may be able to measure a phenomenon by subjecting different experimental units (usually called subjects in this context, especially when they are human) to different stimuli (usually called treatments in this context).

This frequently takes the form similar to finding relationships in categorical data, namely, we attempt to explain the value of a response variable by noting differences in an explanatory variable.

The difference in designing experiments is that as designers we have control over the values of the explanatory variables, which are called factors in this context.

The experiment is performed by combining specific values (usually called levels in this context) for each of the explanatory variables, and measuring the resulting response.

We need for each factor a “control group” that measures the “normal” outcome when “no treatment” is given.

We need to eliminate confounding variables due to chance. This is generally done by placing subjects into experimental groups at random.

Where humans are involved (in any way) we need to eliminate subjective bias by not allowing subject or researcher to know what treatment is being received. This is known as a double-blind experiment.

Calculating the Number of Participants

The crucial factor in experimental design is to avoid combinatorial explosion. The most common method of designing an experiment is called block design. In a block design, we place a fixed number of people in every category combination. Ideally, we would like about 30 people in each category combination.

- If we have 1 factor with 2 levels, we need 2 x 30 = 60 participants.

- If we have 2 factors with 2 levels, we need 2 x 2 x 30 = 120 participants.

- If we have 2 factors, the first with 2 levels and the second with 5 levels, we need 2 x 5 x 30 = 300 participants.

- If we have 3 factors, each with 4 levels, we need 4 x 4 x 4 x 30 = 1920 participants.

- If we have 3 factors, the first with 5 levels, the second with 7 levels, and the third with 11 levels, we need 5 x 7 x 11 x 30 = 11550 participants.

- If we have 4 factors, each with 3 levels, we would need 3 x 3 x 3 x 3 x 30 = 2430 participants.

- If we have 7 factors, the first of which has 3 levels, and each of the other 6 factors has 20 levels, we would need 3 x 20 x 20 x 20 x 20 x 20 x 20 x 30 = 5,760,000,000 participants, or more people than there are on earth!

It is therefore not difficult to understand why most experimental designs involve only 1, 2, or at most 3 factors.

Statistics PrimerPart IV, Page 1Data Collection