MARKETING RESEARCH

BASIC CONCEPTS OF MEASUREMENT

RELIABILITY

A measure is said to be reliable if it consistently obtains the same result. Hence, a scale which measured a weight and 90.1 pounds, 89.95, 90.06, and 89.98 in four trials would be quite reliable (the spread/range is only 0.15 pounds) even if the true weight were 100 pounds. Conversely, a second scale which produced weights of 95 pounds, 103 pounds, 92 and 109 would be less reliable than the first even if in the sense of being closer to the true value it was better. Reliability is thus synonymous with repetitive consistency.

Approaches to test reliability:

Test-retest. (Applying the same measure to the same subjects at two different points in time, we can compare the two measures and see how closely they match and what the test-retest reliability is). In marketing, one would expect the correlation of 0.5 – 0.7.

(!) While test-retesting, measurement process is to have no effect on subject; subject’s behavior(characteristics) did not change over time.

(Since at least one assumption is to be violated, consequently this approach is imperfect).

Alternative forms. Equivalent measuring devices (forms) are applied to the same subjects. Consistency of the results is checked, and a measure of reliability can be obtained.

(!) Measure depends on the degree of equivalency of the alternative forms as on the true reliability of measure.

VALIDITY.

Valid is the synonym to good (however, the term is loosely used in conversation as a synonym by many people).

Construct validity. The ability of a measure to both represent the underlying construct (concept) and to relate to other constructs in an expected way. Or, basically: a measure has a construct validity if it behaves according to existing theory.

E.g. a ruler is of no use to measure mass. Length measurement, however, has a construct validity.

Also, a construct should possess what is known as discriminant validity, i.e., the construct should be sufficiently distinct from other constructs to justify its existence.

Content validity. Refers to logic. E.g., one might argue that observing how much a person eats of a vegetable on his/her plate is a measure of the person’s liking of the vegetable. Such measurement has logical appeal or face validity, and hence the measure would appear to have content validity (although, actually, the amount of eaten might depend on how hungry the person was or some other variable).

Inclusiveness is an integral part of content validity, consequently, in order to achieve content validity, constructs often must be measured by more than one item.

Convergent validity. A measure has convergent validity if it follows the same pattern as other measures of the same construct. E.g. three different measures of attitude would be said to have convergent validity if they were highly correlated with each other.

A related concept is concurrent validity, which occurs when a measure is highly correlated with known values of the underlying construct.

(One method of examining simultaneous convergent and discriminant validity is through a multitrait-multimethod matrix).

Predictive validity. The most pragmatic form of validity. Shortly, the predictive validity of a measure is the ability of the measure to relate to other measures in a known/predicted way. Technically, the if a measure is useful in prediction, then use it regardless of whether we can explain why it works.

E.g. in extreme, assume I predicted sales of a new product by multiplying the number of letters in the name by the weight of a package. Assume somehow this turned out to be predictive of sales. While the measure has no construct or content validity, it works and it is predictively valid. May look foolish, but image it works out in 10 times in a row (100 times…).

The point is that at some stage the predictive accuracy of a measure will outweigh the prior theories and can, in fact, lead to development of new theories.

Unbiased/biased measurement. Biased measurement is the one where we expect the measured result to be different from the true value of the construct/variable.

Often, respondents overestimate their intentions/behavior/characteristics; but, if bias is consistent, then it can be measured, and obtained results can be adjusted. E.g., mentioned earlier measurement of 100 pounds, where results were close to 90, is biased, but the measurement has high reliability, and bias can be easily calculated and used as a parameter in further estimates.

Efficient measurement. Is the one, if it gets the maximum possible information from a given sample size at minimal cost. E.g., a simple scale may be just as accurate measuring device as a more elaborate setup; consequently, the scale would thus be chosen because of its superior efficiency.

Consistent measuring. Consistency refers to the ability of a statistic to tend toward the true value of the construct/variable as more data is gathered. In measurement terms, a measure would be consistent if averaging repeated measures produced a result which approached the true value as the number of measures averaged increased.

NOTE: suggestions on running proper measurement.

  1. Select only those variables to measure which make logical sense in the context of the problem being studied.
  2. Use measures, which seem logically appropriate to the construct/variable to be measured.
  3. Use measures, which are reliable.
  4. Use variables, which produce similar results over related measurement periods. (If the response obtained depends heavily on the measurement method employed, chances are the information being collected is a response to the measurement method and not the construct.)
  5. Use measures, which are as easily usable by researcher and respondent as possible.
  6. Use measures which prove to be useful in a pragmatic way (i.e., if a variable proves to be a good predictor of a key variable, use it; if a variable doesn’t seem to be related to any other variables, save your effort and don’t measure it).

Grouping.

Be very careful in grouping and averaging collected data. If 50% like hot tea, and another 50% like ice-tea, then averaging will show that people prefer lukewarm tea. Derived product strategy will be a disaster.

Comparative analysis of answers on nonobjective questions as attitudes will bring even more complex issues (ref. Normalizing [removing the mean {average} from a series of responses} and standardizing {the process of making the mean 0 and the standard deviation 1 for a series of responses} procedures).

Indirect measurement procedures.

Obtrusive, direct method works almost in all typical cases. However, when respondents either wish to hide feelings, or cannot express them accurately, then indirect measurement is to be used (e.g. instead of asking “what do you think?” ask “what neighbors think?”, etc.). Plenty of techniques exist. (Consult other sources, like, e.g., Kassarjian [1974]).

Share measurement (a company’s market share).

Very difficult task, and most probably, the reported figures as 37.2 are very inaccurate.

Why?

  1. Share of what? E.g., for “Sugar Pops” am I to compute a share of presweetened cereals, or ready-eat cereals, or cereals at all?
  2. Share on what basis? Hryvnas, unit sales, net weight, or number of serving?
  3. Being successful with 1 and 2, still any research is vulnerable to biased measurement and other errors, which will be discussed below.

ERROR TYPOLOGY

(5 general types of errors are discussed, which go far beyond measurement issues).

1. Researcher (i.e., directly traceable to the researcher).

Myopia. Wrong questions are asked. This is usually a manifestation of poor problem definition and research objective specification.

Inappropriate Analysis.

  1. Omission: failing to perform what would be a meaningful analysis.
  2. Commission: performing an analysis for which the data is not suited.

Misinterpretation.

  1. Poor preparation, inability to understand the result of the analysis performed because of a technical deficiency, or just having a bad day (head ache, poor eyesight, etc.).
  2. Researcher expectation. Interpretation is influenced by prior opinions of the researcher.

Communication. Inability to translate the results into report, understandable by intelligent, decision-oriented managers.

2. Sample (!).

Since most data collection is partial in nature, the selection of whom to analyze can greatly influence the results.

Frame. Who are the relevant subjects/respondents. Studying video-games, one can target male heads of households, omitting children, who also influence decision-making process. On the other hand the one can include too many congregations, so the frame will include irrelevant groups of people, resulting in poor quality of the report.

Process. Once the frame is specified, a proper process of selecting respondents must be chosen. E.g., I may be interested in a group of retired people, an official list of retired in a certain district may either include people, who are still working, or vice versa – not to include the ones who are already retired.

Response. (Even with a good frame and selection process, the sample may still be unrepresentative). Response rates are rarely higher than 70 percent, and sometimes are less then 10. Different subgroups of a sample may behave differently (e.g. young and senile populations tend to have different than average opinions). Managing non-responses is a very hard problem to deal with. For low-rate responses it is very hard to argue about the research results are projectable to the community at large.

3. Measurement process.

Conditioning. Especially with panels. The instrument and the content of research may draw a subject’s special attention to the problem, leading to different than usual behavior and attitude.

Process bias. The research process, being an intruder to the normal pace of subject’s life leads, again, to changes in opinions. E.g., an interviewer may irritate a respondent.

Recording. Type, mistaken interpretation, or generally – interviewer misinterpreting a verbal response (or gremlins infest a recording device :). Another case is when data is recorded without measurement at all. Fake data is placed, coherent to other answers, so a third-party interviewer saves his time.

4. Instrument.

Individual question/scale.

Scales lead to truncation of possible answers; ”other” section “invites” to minimize analysis of alternatives; “good” subjects usually hide their extreme attitudes (e.g. I hate olives, but since olives is to be a food associated with higher society, I would state “neutral”); rounding errors are pre-input in many scales (in yes-no question whether I will go somewhere or not, being 60% will round up my response to “yes”, introducing a 40% error). Also, a problem of ambiguity can also be relevant for a research since answering either yes or no on a certain question may be understood by the respondent both as positive or be associated with absolutely different concerns. (E.g., “Should pollution control standards on automobiles be relaxed?” – an affirmative answer would presumably indicate a relatively low environmental concern; but (!), it could also indicate a great concern for jobs or for increasing the use of coal to conserve petroleum reserves and limit nuclear reactor development). Hence, the match b/w the construct and the question is imperfect.

Test instrument.

The discussed items bring to mind an evoked set of thought and standards of comparison, which the respondent uses to determine his response. E.g., comparing chess, bridge, cribbage and football on degree or risk, football will most probably be marked as the most dangerous; but its comparison to rugby, sword-fighting, or motorcycling will make the least dangerous. Macro-level of, say, questionnaire must be adequate to the problem, researched. Micro-level also is important. First option on multiple-choice question always attracts more attention then the rest.

5. Respondent.

Response style.

  1. Consistency/inconsistency. Sometimes respondents show what they perceive the answer should be, but not their original feelings.
  2. Boasting/humility (overstating and understating scores).
  3. Agreement (some happy souls tend to be “yeah-sayers” [e.g. Do you like to swim? Yes. Do you like to sit on a beach? Yes. Do you like to stay home? Yes…]).
  4. Lying.

Also, people prefer not to answer (or answer incorrectly on sensitive questions). A kind of remedy could be a guarantee of anonymity.

~ Randomization procedure (on one-to-one interview) can help.

E.g. Question A is sensitive. B is not, but with the same answer scale (e.g., yes, no). An interviewer can ask the respondent to secretly flip a coin and then answer A if the coin is heads and B is the coin is tails:

YesNo

A. Do you beat your spouse? [ ][ ]

B. Are you left handed? [ ][ ]

The respondent then reports the answer without indicating which question was answered (the questionnaire remains all the time in interviewer’s hands).

If the non-sensitive question B has a known distribution (percent of left-handed people in population), then the overall fraction of Yes and No responses can be used to derive the fraction of Yes responses to the sensitive question.

If, e.g., the fraction of yes responses is 40 percent and we know that 25 percent are left handed, then

25*(1/2) + X(1/2) = 40

It may turn out that 55 percent of people beat their spouses.

V.Extremism/caution. Stating 1s and 5s, while true preferences are 2 and 4. (Last are compensated by cynics and highly educated people, who do not believe in extremes).

VI.Socially desirable. Many respondents feel uncomfortable admitting to unusual behavior or attitudes. E.g., in jury selection process, prospective jurors were asked if they liked to read. Not surprisingly, 90 percent said they did, although one suspects a smaller percentage were actually avid readers.

Response. The respondent is also the source of the following three other errors.

  1. Inarticulation. In responding a question (especially an open one), a respondent may be unable or unwilling to accurately articulate a response.
  2. Mistakes. Careless errors are also possible even by committed respondents.
  3. Uncertainty. Individuals may be not sure what their true feelings are or actual behavior has been.

Controlling errors:

Measured value = True value +Measurement process errors + Instrument errors + Respondent errors.

Before going into statistical aspects of measuring errors, try initially to predict them and minimize operationally.