MARKETING RESEARCH

CAUSALITY (причинная связь, обусловленность) AND EXPERIMENTS.

Note:

The concept causality implies that if I change a particular variable (e.g., advertising), then another variable (e.g., sales) will change as a result of my actions. If managers can develop an understanding of the causal relations in a market, then they can make “optimal” decisions. Causal inference, therefore, is essential to effective decision making.

Approaches to identify causality:

  1. Deductive (using a strong theory/model of, say, how the world behaves to predict the consequences of various actions). (When inferences are proven false, then a new theory is required). Economists often deduce consequences of actions in the market based on their models.
  2. Inductive approach (means, to examine data in an attempt to see what they indicate about the world). Actually, pure inductive approach is never used (deciding whether data or theory should come first is like deciding what appeared first – a chicken or an egg). Instead, theory (causal notions) leads to data collection, which in turn leads to revised theory in a never ending series.
  3. Intuitively drawn causal inferences. (Have negative consequences).

Causality model: A  [black box]  B

Incomplete understanding of the process (black box) may be acceptable in a short run, but can become dangerous strategically.

Establishing causality:

e.g. increasing promotion 20 percent caused sales to increase 20 percent…

Factors to be considered:

  1. Randomness (assumption that 20% increase in promotion caused 20% increase in sales ignores the essential randomness in the world). Given human knowledge, it is impossible to predict the consequences of an action with certainty.
  2. Reverse causality (given that many promotional budgets are set a percent of sales, an equally plausible explanation of the facts is that such a promotional budgeting rule was followed by the company.
  3. Other explanations (the same period competitor could move out, or management targeted AD exposed area for extra effort this year, including greater AD support, more salespeople, etc; or – on the area was a great economic upsurge).

Consequently, in establishing causality, several steps are required. These are generally grouped into three major categories:

  1. Concurrent variation. (a necessary, but insufficient condition for establishing causality b/w 2 variables (A & B) is that two move together in a consistent pattern (e.g. when A [AD spendings] go up then B [sales] go also up).
  2. Precedence. (e.g. if sales increased first, it is not appropriate to say that promotion caused sales). This idea is straightforward, but unfortunately, sometimes it is impossible to determine what came first (e.g. the effect of promotion on sales occurs within one week, data on promotion is available monthly, and data on sales is available bimonthly).
  3. Elimination of Alternative Explanations. Unless other pseudo-valid explanations are ruled out – no causal relationship can be conclusively established.

Note(!): it is possible to come up with an essentially infinite number of alternative explanations, although not all may be particularly believable. Still, causality can rarely be established with absolute certainty. The best we can hope for, therefore, is to establish causality beyond reasonable doubt.

Experiments.

(Straightforward [real world] experimental approach is problematic [i.e. actual decrease in price to estimate the increase in sales]).

Designing experiments, concerns are about validity (usefulness) of the experiment, that are traditionally grouped into two categories:

-internal validity (refers to producing “clean” results, which rule out competing explanations);

-external validity (refers to the extent to which the results of an experiment are generalizable.

A perpetual conflict b/w these two exists since: internal validity leads to strict controls in a laboratory setting (which may not bear much resemblance to the world), while realistic (natural) situations have numerous competing explanations for their results. The craft of MR is to balance these two concerns.

Internal validity constraints.

-Non-comparability of groups (selection). Using a criterion to select people, who will take part in experiment makes the group different from other non-selected subjects.

-Lost subjects (mortality). Over the course of experiment subjects drop out. The dropped out subjects differ from the retained, what additionally affects the results.

-Exogenous Occurrences (history). During many experiments (especially field experiments), crucial events occur outside the control of the researcher (e.g., an oil embargo, new product introduction). The events influence the measured results and make it hard to estimate the treatment effect.

-Changes over time (maturation of a sample).

-Effect of experiment (that alters subject’s behavior).

-Instrument variability (e.g. changing scales, or LIFO to FIFO inventory methods in measuring profits, etc.). Instrument must be kept as constant as possible.

-Luck. (if a subject scored highly on an aided recall test, chances are that s/he was lucky as well as smart). Test scores usually overstate true abilities.

External validity (is concerned with the extent to which experimental results can be generalized to apply to other [outer world] settings).

E.g. to determine the impact of smoking on individuals one (in boarders of experiment) may propose to prevent smokers from smoking and to force non-smokers to smoke. Aside logistical and ethical problems of such experiment it should be noted that the results will definitely have very low external validity.

Therefore, in case of one experiment, internal and external validity concerns must be balanced.

If a certain finding or area of investigation is particularly crucial, then a series of experiments is typically involved. Such studies should include those with strong internal validity (lab experiments) and those with great external validity (field experiments). Only when a result holds up across such a broad range of situations is the cautious researcher willing to claim that a causal relationship has been demonstrated.

Also, a long-run impact (e.g. AD cannibalize sales of future periods) must also be accounted.

EXPERIMENTAL DESIGN: managerial issues.

The following types of variables should be considered:

  1. General conditions (e.g. economic, regulatory).
  2. Competitive actions.
  3. Own marketing mix.
  4. Characteristics of the sample (e.g., people or markets).

In designing an experiment, carefully consider all factors that might influence results and then classify them as:

  1. Those, which will be ignored. (e.g. age differences).
  2. Those that will be controlled for (that is, each treatment group will be matched in terms of these variables). (e.g., increased awareness due to treatment).
  3. Those that will be monitored to see if they were important after the fact. (e.g. time spend by customers on analysis of product design).
  4. Those that will be manipulated in the design (e.g., package size).

Note: variables that are ignored can be taken care of by a randomization process whereby subjects are randomly assigned to treatments. When a variable is considered sufficiently crucial to the design that the researcher is not willing to risk unequal groups in terms of that variable, then assignments are made in order to “balance” the groups in terms of that variable. When a variable is considered a possible key influence, it should be measured (as well as controlled for in some situations). Finally, some key variables become the basis for the actual experimental manipulation.

  • Selection of dependent variables (that measure the impact of an experiment) is sometimes a hard task (selection of dependent variable is a trade-off between measuring that which is likely to change as a result of the experiment [AD copy awareness] and that which is likely to matter if it does vary [e.g., profits]).
  • Some causes of consequences can be not under control of a moderator; consequently, these variables must be identified as uncontrollable (e.g. general economic trends); these variables sometimes may have a much greater influence on the result than, say, marketing mix variables (e.g., ad copy).
  • Aggregation (assumption that the same event or action should, at least on average, produce the same consequence) can be very harmful for the experiment results.

E.g. assume that feeding bulk food to weight lifters is beneficial, while feeding it to white-collar workers is detrimental. On average, bulk food would have no impact on people, and such medical conclusion may become very harmful.

EXPERIMENTAL DESIGN: Basic notions.

A logic underlying a simple experiment of fairly straightforward.

E.g. Assume that a company is considering changing advertising strategy from copy A to copy B. What should they do?

  1. Expose subjects to copy B.
  2. Measure attitude toward the product after exposure.

(everything seems logical except one point: how the absolute result value is to be evaluated?)

It may turn out that exposure to “B” had 4 points on 5-pt scale, but exposure to “A” had 4,5 pt. Consequently, B grade may indicate an impending disaster.

Consequently, the following two treatments must be set for testing:

and, considering results, we have:

We see, copy B (although effective in absolute terms) is less effective than copy A.

Alternatively, if we are also concerned about the effect of premeasure on results, then the following treatment scheme can be designed.

Treatments 3 and 4 are included to test the effect (the risen awareness) of premeasure action. By measuring the effect of the copy both with and without premeasures, the difference between the results gives an estimate of the effect of the premeasure.

Comparing 1 and 2, A is better, but comparing 3 and 4, B is better. Premeasure improved the effect of A (15-14), but reduced the effect of B by two (13-15). However, in removing the possible effect of the premeasure in treatments 3 and 4, we have also removed the check that indicated that the groups exposed to treatments 3 and 4 had the same initial attitudes… WE ARE NOT SURE WHICH AD IS BETTER.

Note: in experimental design 3 things should be apparent.

  1. The design stage depends heavily on logic and is essentially the process of deciding which factors (variables) could influence the result, so that the effect of each one ca be separately isolated by either manipulation or control.
  2. The number of factors that could possibly influence the results is enormous. Therefore, the choice of the most important is crucial.
  3. Interpretation must be done very carefully, since the differences may not be statistically significant. For this reason, interpretation of the results of an experiment almost always involves a statistical analysis – usually analysis of variance (ANOVA) (will be discussed later).

Definitions:

Factor: a variable that is explicitly manipulated as a part of the experiment (e.g. price, advertising, copy).

Levels: the values a factor is allowed to take on (e.g. prices of $100, $200, $300; advertising copy A or B).

Treatment: The combined levels of the factors to which an individual is exposed (e.g. price of $200 and advertising copy A).

Control group: subjects who are exposed to no treatment.

Measurement: the recording of a response of a respondent by any means (observation, survey, etc.).

Single factor designs:

  1. After-only without control group:

Group1XO

(useless design because no comparison is available)

  1. Before-after without control group:

Group1O1XO2

  1. After-only with a control group.

Group1XO1

Group2O2

(the effect of a treatment is measured by O1-O2; used often post hoc when exposure to a treatment [e.g. an AD] is monitored after the fact)

  1. Before-after with one control group.

Group1O1XO2

Group2O3O4

(the advantage of premeasure is it allows for slightly unequal groups in terms of key variables since the difference [change] in the key variable is used to estimate the effect of a treatment. The effect of a treatment is given by (O2-O1)-(O4-O3). The control group is used to estimate the maturation and testing effects).

  1. Four-group, six study.

Group1 (experimental)O1XO2

Group2 (control)O3O4

Group3 (experimental)XO5

Group4 (control)O6

(used when an interaction effect between the premeasure and the treatment is expected [e.g. when asking opinion about a topic causes a subject to begin thinking about the topic and thus to respond differently).

The prior level of the variable of interest is estimated as the average of the two premeasures: 0.5(O1+O3).

Consequently, the 4 groups yield estimates of the impact of the experimental treatment (E), premeasurement (M), the interaction b/w the premeasure and the treatment (I), and uncontrolled variables (U) as follows:

Group1O2-O1 = M+E+I+U

Group2O4-O3 = M+U

Group3O5-0.5(O1+O3) = E+U

Group4O6-0.5(O1+O3) = U

By solving these four equations with four unknowns, we can estimate each of the separate effects.

Multiple-factor designs (designs involving the monitoring of two or more factors; the basic idea of such experiments is to simultaneously assess the effects of varying levels on several factors).

Very extensive studies (e.g. I have 4 AD strategies, 3 packages, and 3 colors. I will need 36 stores to run all possible test-types; in order to estimate interactions [e.g. the unique effect of putting package design III and red color together, I will need 72 stores).

Finding all 36 similar stores, willing to participate, is an impossible task. Consequently, so-called fractional factorial designs are used (less than full factorial designs).

Simplification schemes:

  1. Independent factor testing (assumption, that factors (influencing variables) affect the dependent variable separately; consequently, we need 4 stores to check AD, 3 stores to check packaging, and 3 stores to check color [10 stores total]).
  2. Orthogonal designs (where it is possible to assume that certain interactions do not occur [e.g., AD and color do not interact], and subsets of the factorial array can often be used to estimate the direct effects of the influencing variables.
  3. “Logical designs” (where useless or unfeasible combinations are eliminated).
  4. Latin square (can be applied to two-factor design; e.g. use only three stores over three time periods by cycling each package through each store.

[This allows estimation of both the time effect and the package design effect]. Assumptions are: no interaction b/w the factors and no carry-over effect [that is, sales in one period do not influence sales in the next – what is actually rarely possible, because a product can be either stock-piled {canned food} or satisfies demand over a long period of time {BMW}].

Note (!): Developing an ability to design successful experiments requires a combination of logic, perseverance, and experience (plus a nontrivial amount of luck). When designing an experiment, cookbooks bring only conceptual help, but still some considerations are important:

-always have a control group or result to serve as a baseline (since absolute results are basically meaningless);

-choose a criterion variable which is both measurable and translatable to market results;

-calibrate before the experiment, so that the translation from the experimental criterion variable (e.g. attitude) to the likely market result (e.g., share) is well established;

-be careful not to assume that the result of a one-shot treatment (e.g., price, ad copy) will be repeated with multiple exposure;

-if you want to measure the effect of a particular factor/variable, make sure that it (a) varies (if I only expose subjects to regular cigarettes, I can’t assess the effect of filters on preference) and (b) varies in such a way that its variation is not perfectly related to the variation of other factors (if all low price products were also late entries in the market, it is impossible to know whether their share is a function of a late entry or price);

-be aware that the experiment itself may influence behavior.

Laboratory experiments.

(Came from natural sciences, where laboratories are used to tightly control conditions). Unfortunately consumers and businesses are both likely to realize they are being observed (consequently, their behavior changes). More are concerned about measurement of subjective factors like attitudes; measurement of such objective parameters as sales for lab experiments is irrelevant.

Limitations of experiments are obvious (they are derivatives of altered behavior).

However, lab experiments, despite limitations, can often be very useful.

E.g.: 264 students agreed to take part in an experiment (the idea of which was, of course, unknown to them). During three weeks they had to fill out questionnaires. During the experiment they were served different drinks on 12 occasions. All soft drinks were in 12-ounce cans and served cold. As part of the experiment, subjects were denied to the opportunity to purchase Coke on the fifth occasion. The purpose of this was to see if those who had bought Coke in the previous period would switch to the most similar brand, Pepsi. (Results are omitted, but showed that indeed, majority of individuals did indeed switched to Pepsi, when Coke was out of stock).

Field Experiments.

Opposite extreme from lab experiments on realism scale.

Many problems related to artificial nature are reduced; however, for this effect is paid a heavy price. E.g. test markets in two cities run budgets of $500000 routinely and require a minimum of six months to complete.

Besides, FEs are extremely difficult to control. (Especially if competitors simultaneously run other promotion activities or product introductions, bringing additional bias to evaluation of experiment results).

Natural experiments.

Standard experiment is a situation in which a researcher assigns subjects to treatments in either a random or systematic manner. In field setting it is sometimes difficult. The concept of natural experiment is to allow subjects to choose which treatment they receive.

E.g. to measure the effect of an ad, we simply observe (after the fact) the difference in behavior b/w those who happened to see the ad and those who did not see it. Put differently, performing a natural experiment is treating data as though it were the output of an experiment.

Advantage – no artificial nature; disadvantage – interpretation is a serious problem.

Example. To see how NE works, consider the issue of which media (TV or magazine) is more effective in increasing perceived knowledge of a new small car.

In this case subjects were asked their perceived knowledge (on a 10-point scale) of a new car both before and after the presentation of a major introductory campaign. The subjects were also asked to report which magazines they read and which TV shows they viewed. By using the actual media plan, a potential advertising exposure measure was established. While this measures potential rather than actual exposure (I may read a magazine and skip the ads), this objective measure seemed preferable to self-reported AD exposure, which can be expected to be both inaccurate and contaminated by attitudes. Based on the AD exposure measures, 622 subjects were divided into nine categories. Next, the average change in perceived knowledge was calculated for each of the nine possible combinations of TV and magazine advertising exposure. The average changes and number of subjects in each cell are shown in table below.