ICU. MARKETING RESEARCH

Vladimir V. Bulatov ()

LECTURE 3B. SCALES

Note: the notion of measurement assumes that there is something worth measuring. The “thing” to be measured (e.g., an attitude toward a supplier, favorite color, or sales) is referred to here as a construct.

Many constructs are fairly complex (e.g., one’s attitude toward Japanese restaurants selling liquor on Sundays includes feelings toward Japanese, restaurants, liquor, etc.). Nonetheless, in order to arrive at a bottom-line statement about such constructs, there is a strong tendency to convert/simplify these constructs into a single scale or series of scales, usually quantitative ones.

SCALE TYPES

Nominal (Categorical).

Refers to arbitrarily assigning a number to different response categories. The scale number has no meaning in and of itself. Some obvious examples of nominal scales include tax ID code or football players. There is no obvious relationship b/w the quantity of the construct being measured and the numerical value assigned to it.

à Non-metric scale. Used to compute frequencies and other calculations (e.g., average values, are meaningless).

Ordinal.

The higher is the number – the more (less) the construct exists. The absolute size of the number, however, has no meaning nor do the differences b/w two scale value. Ranking is the most common form of ordinal scale. If the ranking is based on intelligence, we know that the subject ranked first is more intelligent (at least according to our ranking method) than the person ranked second, but we have no idea how much smarter he/she is.

à Non-metric scale. Frequencies, percentile, medians, plus a variety of other order statistics can be utilized.

Interval.

An interval scale is a scale where differences (intervals) b/w scale values have meaning, but the absolute scale values are meaningless. E.g. Celsius or Fahrenheit temperature scales. Difference b/w 40 and 41 is the same as b/w 1 and 2, but 0 has no meaning. All we can say is 0 is one degree warmer than –1 and one degree colder than +1; hence, 100 degrees are not at all 2 times warmer than 50.

à Metric scale. Allows computation of means, standard deviations, use of parametric statistical tests, and the computation of product-moment correlations b/w 2 interval-scaled variables; this in turn allows to utilize such “fancy” techniques as regression, discriminant, and factor analysis (very desirable scale type for data being collected).

Ratio.

The highest order scale. Here the ratio b/w 2 scale values is meaningful. 0 represent absence of construct (e.g., money), while 50 items of construct are 2 times less than 100.

à Metric scale. Allows everything of interval scales plus, geometric mean or the coefficient of variation; metric scales are also meaningful when multiplied together.

Higher-order scales are desirable by analysts but are heavier for respondents to cope with. There must be a trade-off b/w on whom to put the burden: respondent (who may reject filling a questionnaire) or analyst (who will later have to convert scales or deal with less desirable type of data). Interval scale is basically the most preferred by MRers.

EXAMPLES OF IMPLEMENTING DIFFERENT SCALES INTO QUESTIONS.

NOMINAL SCALES.

Multiple choice. (Check of a single answer from a set of alternatives).

E.g.

Which of the following terms best describes inizlots?

__A. Riboflavit.

__B. Ordils and humspiels.

__C. Octiviniginianus.

__D. All of the above.

__E. B on alternate Tuesdays.

__F. None of the above.

(For quantification purposes, we would typically assign a 1 to the first answer (riboflavit), a 2 to the second, etc. The numbers would represent only what category was chosen, but not how much of the construct was present.

E.g.

Marital status: What is your marital status?

______

Single Married Divorced Widowed

(=1) (=2) (=3) (=4)

Occupation: What is your occupation?

______………

Lawyer (1) Teacher (2)

Brand choice/used: Which brand of soft drink did you last buy?

______

Coke (1) 7UP (2) Pepsi (3) Other (4)

The categories used may be either supplied in advance to the respondent (aided) or coded after the respondent gives a verbal/written answer (unaided). In general, the aided/structured approach is easier for both the respondent and the analyst.

Yes/No (Binary) (Measures with only 2 possible values are typically nominal scales).

Ownership: Do you own a color TV?

______

Yes (1) No (2)

Trait association (adjective checklist). Please, indicate which of the following descriptions apply to these products. Check as many descriptions as you feel apply to each product.

Descriptions

Product Necessary Fun Useless Good investment

Color TV ______

Showmobile ______

Life insurance ______


ORDINAL SCALES.

Forced ranking. The most obvious ordinal scale is a forced ranking:

E.g. Please, rank the following five brands in terms of your preference by marking 1 next to your most preferred brand, 2 next to your second most preferred brand, and so forth:

Coke ______

Pepsi ______

7UP ______

Dr.Pepper ______

Fresca ______

Paired comparison. A mean to generate an ordinal scale without asking the respondent to consider all the alternatives simultaneously. Respondents only choose the more preferred (or heavier or prettier, or any other characteristic you wish to measure) of two alternatives at a time. Converting previous task into a paired comparison framework, there are 10 pairs:

Coke, Pepsi

Coke, 7UP

Coke, Dr. Pepper

Coke, Fresca

Pepsi, 7UP

Pepsi, Dr Pepper

Pepsi, Fresca

7UP, Dr. Pepper

7UP, Fresca

Dr. Pepper, Fresca

Formula



a!=(a)(a-1)(a-2)…(2)(1) is called factorial.

Number of distinct ways I can draw a sample of size b out of universe a.

Paired comparison allows intransitivity (A preferred to B, B to C, but C to A again!); this allows to uncover special nature of preferences, but makes data quality questionable. Another tough thing about PC is that if we have a big number of alternatives, than the number of distinct variants may explode, leading to a trouble in getting an ordinal scale. Because of their cumbersome nature, complete paired comparisons are rarely used except in pilot studies or laboratory situations.

Semantic scale. A SS obtains responses to a stimulus in terms of semantic categories.

E.g. Do you like yogurt?

______

Dislike Dislike Neutral Like Like

Extremely (2) (3) (4) extremely

(1)  (5)

Respondents are instructed to check the category which best describes their feelings. Since they choose the category on the basis of the words (semantics) attached to it, this is a semantic differential scale. (Ordinal, but not interval, still).

Picture scale. E.g. for children, smiling faces ranging from sad to happy; or another set of pictures for areas where literacy is low.

Summated (Likert) scale. It is an extension of semantic scale in two ways. Rather than measure a construct by a single item, a series of items are used and sum score is calculated. Second, the scales are calibrated so the neutral score is coded “0”.

E.g.

Do you like the taste of yogurt?

______x______

Dislike Dislike Neutral Like Like

Strongly (-1) (0) (1) strongly

(-2) (2)

Is yogurt a healthful food?

______x______

Extremely Not Neutral Healthful Like

not healthful healthful (0) (1) extremely

(-2 (-1) (2)

Do you feel your friends like yogurt?

___x______

Dislike Dislike Neutral Like Like

Strongly (-1) (0) (1) strongly

(-2) (2)

See, overall score on yogurt characteristics is negative.

Other ordinal scales can be incorporated into a MR, but its usage is usually limited.

INTERVAL SCALES

Equal Appearing Interval (Thurstone’s technique. Not practical and is rarely used).

Bipolar Adjective.

Rather than attaching a description to each of the response categories, only the two extreme categories are labeled; e.g.:

Dislike Like

Extremely Extremely

1 2 3 4 5 6

______

Since the responses are equally apart both physically and numerically, it can be assumed that the responses will be intervally scaled.

Note: many consider this scale only somewhere in b/w ordinal and interval; secondly, test results (if the researches are valid) on Bipolar Adjective and semantic scales rarely differ from each other.

Agree-Disagree scale. (A variant of bipolar adjective).

E.g. Show your agreement to the statement “I like yogurt”

Disagree Agree

Strongly Strongly

1 2 3 4 5 6

______

(Minor logical problem: I strongly disagree either because I am strongly neutral, or I dislike yogurt; however, usually respondents interpret correctly).

Continuous scales. The same as above but:

e.g:

Very bad ______Very good

Optical devices are then used to measure the response.

(However, since results, using these continuous scales are usually identical to bipolar adjective scales, first are almost never used).

Equal width interval. (Assessing to which category the respondents fall)

E.g. (yet, only ordinal scale):

______

None 1-2 3-15 16-99 100 or more

The below scale is interval:

______…

0-4 5-9 10-14 15-19 … …

Second approach has some advantages (and no additional cost associated with); but in case of improper distribution of data (e.g., almost all responses are 2 and 3, then disproportionate interval scale is to be used.

Dollar Metric (Graded Paired Comparison). Resembles the paired comparison test, but the choices in pairs must be accompanied by “how much” evaluation. E.g. Which brand you prefer? (pairs follow). How much extra would you be willing to pay to get your more preferred brand? (Amount of money must be stated).

Pilot tests are run to evaluate how well respondents can differentiate the differences in construct. Afterwards, appropriate scales are made.

Generally, for individual level analysis 6 or more points are usually sufficient to account for respondent’s discriminatory abilities; for aggregate analysis, even fewer are needed. Therefore, most scales should use b/w four scale points (for phone surveys, intercept interviews, low commitment situations) and eight (for committed and knowledgeable respondents).

Use either odd or even number of scale points. In well-done research such difference will have almost no effect on result.

The Law of Comparative Judgement.

Paired comparison judgements can be converted into intervally scaled data by means of Thurstone’s law of comparative judgement.

We will discuss in very detail this approach later.

(other approaches are also applicable, and can be utilized).


RATIO SCALES

Direct quantification (the simplest way). Ask directly for quantification of a construct, which is ratio scaled.

E.g.

How many dress shirts do you own? ____

How old are you? ____

Problem with this approach is that the respondent may not know or refuse to answer.

Consequently, this approach is to be used only during a pilot/small scale surveys.

Constant Sum Scale. Very popular device in marketing research. Researchers are given a number of points (if the process is conducted in person, chips or other physical objects are often use) and told to divide them among alternatives according to some criteria (e.g., preference, importance, aesthetic appeal). Since respondents are told to allocate chips in a ratio manner (if you like brand A twice as much as brand B, assign it twice as many chips, etc.), then the results are presumably ratio scaled.

E.g., I might ask for 10 points to be allocated among three brands:

______

A 2

B 3

C 5

______

10

Two problems with this approach: respondents may mess up proper score distribution, necessitating recalculations; second: determining the appropriate number of points/chips to use requires trading off b/w rounding error if too few are used and fatigue/frustration/refusal problems if too many are used. Still, the approach is quite useful.

(Constant sum paired comparison: by combining a constant sum scale and paired comparison methods, we get a constant sum paired comparison; this allows for ratio scaled paired comparison judgement).

Delphi procedure. (Separate hand-out will be given). The Delphi procedure is a modification of the constant sum scale designed to produce agreement among judges.

Reference Alternative. (or: Fractionation or Magnitude scaling). This approach seeks a ratio scale by having respondents compare alternatives to a reference alternative.

E.g.

Reference alternative X=100

Alternative A ___

Alternative B ___

Alternative C ___

Respondents are instructed to indicate how alternatives compare to the reference alternative on some criterion such as preference by putting down a number half as large if the alternative is half as preferred, and so on.

In the example case, a respondent might assign 50 to A, 250 to B, and 130 to C.

To note: total sum approach is more often used than reference alternative approach.


To conclude:

It may be interesting to consider how crucial the choice of method is. One study (Haley and Case, 1979) compared 13 disparate measures of response to a brand, and found that:

1.  All 13 tested measuring methods are highly correlated.

2.  Awareness and brand choice are somewhat different from the other attitude measures.

3.  Acceptability, 6 point adjective, agreement, quality, 10-point numerical, thermometer, and Stapel (modification of semantic scale) tend to produce predominantly favorable readings.

4.  For purposes of predicting market share, scales, which restrict the number of brands getting top ratings (such as constant sum) tend to discriminate better.

5.  With the exception of the constant sum scale, rating less than midpoint were associated with essentially a zero share and even top-category ratings (e.g. will absolutely definitely buy) tended to be related to only about 50 percent share.

CONSEQUENTLY, IT APPEARS THAT CONSISTENCY IN USE OF A SCALE IS AT LEAST AS IMPORTANT AS THE SPECIFIC SCALE USED.

6

MR. SCALES