Some Notes for a Psychological Methods Paper That Outlines the Advantages and Disadvantages

Amalgamation of Partitions from Multiple Segmentation Bases:

A Comparison of Non-Model-Based and Model-Based Methods

Rick L. Andrews[1], Michael J. Brusco[2], Imran S. Currim[3],[4]

Lerner College of Business and Economics, Newark, Delaware 19716

College of Business, Florida State University, Tallahassee, Florida 32306-1110

Paul Merage School of Business, University of California, Irvine, California 92697-3125

July 15, 2008

Abstract

The segmentation of customers on multiple bases is a pervasive problem in marketing research. For example, segmentation service providers partition customers using a variety of demographic and psychographic characteristics, as well as well as an array of consumption attributes such as brand loyalty, switching behavior, and product/service satisfaction. Unfortunately, the partitions obtained from multiple bases are often not in good agreement with one another, making effective segmentation a difficult managerial task. Therefore, the construction of segments using multiple independent bases often results in a need to establish a partition that represents an amalgamation or consensus of the individual partitions. In this paper, we compare three methods for finding a consensus partition. The first two methods are deterministic, do not use a statistical model in the development of the consensus partition, and are representative of methods used in commercial settings, whereas the third method is based on finite mixture modeling. In a large scale simulation experiment the finite mixture model yielded better average recovery of holdout (validation) partitions than its non-model-based competitors. This result calls for important changes in the current practice of segmentation service providers that group customers for a variety of managerial goals related to the design and marketing of products and services.

Keywords: Marketing; Clustering; Market segmentation; Consensus partition; Finite mixture models

1. Introduction

Market segmentation has maintained a venerable position in the marketing research literature since the ground-breaking work of Wendell Smith (1956) more than one-half century ago. The pragmatic benefits of segmentation have been described by Frank et al. (1972), Wind (1978), and Punj and Stewart (1983), among others. Advancements in segmentation methodology have proliferated in the marketing and classification literatures, and an excellent treatment of these developments is provided by Wedel and Kamakura (2000). Because segmentation is a challenging task of considerable importance in industry today, a large number of segmentation service providers, such as Claritas, NPD/Crest, Simmons, MediaMark, and Polk Automotive, have developed a variety of segmentation based products and services for various product and service categories such as retail, restaurant, real estate, financial services, health care, automotive, telecommunications, internet, cable and satellite services, energy, media and ad agencies, travel, and not-for-profit organizations. In addition, Claritas provides additional segmentation services on nearly all marketing databases from leading providers such as ACNielsen, Gallup, IRI, JD Power, MediaMark, Nielsen Media Research, NFO, NPD, Polk Automotive, Scarborough and Simmons, and nearly all major direct mail list providers, consumer marketing surveys and audience measurement systems.

One of the most important distinguishing aspects of state-of-the-art segmentation procedures is the presence or absence of a statistical model. Traditional clustering procedures, such as Ward’s (1963) hierarchical method or K-means partitioning algorithms (MacQueen, 1967), which are widely used in commercial settings, take the data “as is” and do not posit any statistical model. Hereafter, we refer to methods of this type as “non-model-based.” Contrastingly, some academic literature advocates finite mixture models as a preferred approach to clustering because of the provision of a formal statistical model (e.g., Banfield and Raftery, 1993; McLachlan and Peel, 2000). Mixture model approaches have received considerable attention in the marketing research literature (Dillon and Kumar, 1994; Wedel and Kamakura, 2000); however, their relative incorporation into marketing practice is limited at best. It is likely that some of the hesitancy for practitioners to adopt model-based approaches to clustering may stem from an insufficient knowledge of their performance relative to their non-model-based competitors. Extensive comparisons of model-based and non-model-based methods are slowly emerging in the literature (Andrews et al., 2008; Steinley and Brusco, 2008) and should ultimately provide marketing research analysts with information as to precise conditions under which model-based approaches are apt to be preferred. For example, the study by Andrews et al. (2008) compares model-based and non-model-based procedures for segmenting consumers jointly and simultaneously on two sets of bases, corresponding to households’ responses to product and marketing mix strategies and their demographic characteristics. They found that if the manager’s primary purpose is to forecast responses to product and marketing mix variables for a new sample of consumers for whom only demographics are available, model and non-model based procedures perform equally well. On the other hand, if developing an understanding of the true segmentation structure in a market is important, as is often the case for design and marketing of products, the model-based procedure is clearly preferred.

Another critical factor for characterizing segmentation problems is the nature of the data measurements. The most common non-model-based procedures (e.g., K-means), as well as methods based on mixtures of normal distributions, are designed for data that are at least interval-scaled. Although clustering based on metric data is certainly important in marketing research, so are segmentation problems based on categorical data (Green et al., 1988). The viability of model-based methods for segmentation based on categorical data is also recognized in the literature on latent class analysis (Dillon and Kumar, 1994). In contrast, despite the flexibility of non-model-based clustering methods such as p-median algorithms (Brusco and Köhn, 2008) and K-centroid clustering heuristics (Chaturvedi et al., 1997) for clustering data measured on nominal or ordinal measurements, their implementation in the marketing literature is limited.

One especially significant segmentation problem pertaining to categorical data arises when the same set of consumers are segmented independently on two or more distinct bases. When different partitions of customers are obtained for different segmentation bases, perhaps with varying number of segments across partitions, a natural problem that arises is the establishment of a single consensus partition that best reflects an amalgamation of the individual partitions (Krieger and Green, 1999). [1] The potential for segmenting consumers on different bases has been identified by a number of authors (Krieger and Green, 1996; Ramaswamy et al., 1996; Brusco et al., 2002; Andrews and Currim, 2003a; Brusco et al., 2003). Unfortunately, segment partitions for different bases often do not correlate well with one another. For example, household responses to product and marketing mix characteristics often do not correlate well with household characteristics (Wind, 1978; Green and Krieger, 1991; Gupta and Chintagunta, 1994; Brusco et al., 2002), resulting in segmentation that is ineffective.

The segmentation of households based on different bases such as household responses to the product or marketing mix and household characteristics is pervasive across commercial applications as well. For example, Claritas employs information on purchase behaviors and consumer descriptor variables (e.g., demographic, lifestyle) in its PRIZM NE and other associated segmentation systems to increase profitability through customer acquisition, development, and retention. NPD also employs consumer behavior (e.g., shopping behavior, purchases) and descriptor variables (e.g., demographics, lifestyle, attitudes) to reveal groupings of consumers, help retailers better understand the behaviors and characteristics of the buyers they have, and determine how best to build market share. Simmons gives client companies insight on the differences and similarities between consumer segments based on their body characteristics, similarity and differences in food preferences, and other descriptor variables (e.g., self-image, and media usage preferences) so that clients are able to better target products and promotions to groups of people of different body sizes. MediaMark Research employs consumer behavior (actual and intended product usage) and descriptor variables (e.g., demographics, psychographics, lifestyle and media) for print media (e.g., newspapers, magazines, etc.) to support concept and product development, conduct prime prospect analysis to help expand readership, attract new readers and meet existing reader needs. And, R. L. Polk uses a combination of automotive preference and purchase data, and demographic and lifestyle data to develop predictive segmentation models, and help client companies to communicate with and retain current customers, identify in-market buyers, generate brand awareness, reach, capture new customers, and target ‘the right people at the right time with the right offer.’ These examples suggest that segmentation based on multiple bases is pervasive in commercial settings as well.

What options are available to marketing research analysts looking to obtain a consensus partition from multiple independent partitions? One classic approach has deep historical roots in the theory of voting and social choice (Borda, 1784; Condorcet, 1785), and is based on the aggregation of preferences. Règnier (1965) is generally credited with the extension of these principles to clustering. His approach views the set of categorical variables as a system of binary equivalence relations and subsequently obtains the equivalence relation that is minimum distance from the system of relations in a least squares sense. The resulting optimization problem can be transformed into a binary integer program and is widely known as the clique partitioning problem (Grötschel and Wakabayashi, 1989). Hereafter, we refer to this approach as the CPP.

Krieger and Green (1999) developed an approach for amalgamating partitions that is conceptually similar to the CPP, although less mathematically tractable for exact solution methods. Their approach, which they call SEGWAY, attempts to find a consensus partition that maximizes a weighted function of agreement indices between the consensus partition and the set of categorical measurements. Krieger and Green opted for Hubert and Arabie’s (1985) adjusted Rand index (ARI) as the agreement measure, which was an excellent choice in light of its reputation in the classification literature. For example, Steinley (2004, p. 394) provides cogent arguments for the superiority of the ARI over classification rates, including ARI’s correction for chance and the greater information it provides by capturing the pattern of classified observations.

The CPP and SEGWAY methods for consensus partitioning do not employ a statistical model. They take the data measurements “as is” and optimize the appropriate objective criterion using discrete partitioning algorithms. As an alternative to these two non-model-based approaches, we consider latent class models (Lazarsfeld and Henry, 1968) as a third approach.

Each of the three approaches for amalgamation of partitions has its own inherent advantages and limitations. For example, the CPP possesses the benefits of historical precedent and the use of the ubiquitous least-squares approach to data analysis. Another desirable property of the CPP is that it is one of the few discrete partitioning methods that automatically determines the number of clusters via the solution process. Relative to the CPP, SEGWAY offers greater flexibility with respect to differentially weighting the importance of partitions in the objective function, as well as the incorporation of constraints to secure minimum ARI thresholds for each partition. The SEGWAY algorithm is, however, more computationally demanding than the CPP and selection of appropriate weights and thresholds can be a tedious and challenging endeavor.

Whereas the CPP and SEGWAY models are plausible discrete partitioning approaches to the amalgamation of partitions, the latent class method (Lazarsfeld and Henry, 1968; Dillon and Kumar, 1994) is the natural probabilistic approach. The latent class method, which falls within the domain of finite mixture models (FMM), is predicated on the assumption of local independence among the categorical attributes (McLachlan and Peel, 2000, Chapter 5; Grim, 2006). Although somewhat unrealistic, this assumption, which is sometimes referred to as the naive Bayes approach, often yields very satisfactory performance when used in practice. Using the popular EM algorithm, segment probabilities for each class of each partition are iteratively estimated with the goal of maximizing log likelihood. Upon convergence of the algorithm, each customer is assigned to the segment for which the membership probability is largest.

Although the CPP, SEGWAY, and FMM approaches for consensus partitioning each have their own merits, little is known about their relative performances when applied to segmentation data (Krieger and Green, 1999). Without such information, it is difficult to offer recommendations to market research practitioners with respect to the best practices for amalgamating partitions from different bases. Accordingly, our contribution is the comparison of the consensus partitioning approaches across a broad range of data conditions. The results of our study revealed that the FMM provided better recovery than its non-model-based competitors.

In the next section, we provide a formal precise description of the CPP, SEGWAY, and FMM approaches. This is followed by the description and results of a simulation experiment comparing the consensus partitioning methods with respect to their fit to holdout (validation) partitions. The paper concludes with a discussion of the findings and suggestions for future research.

2. Amalgamation methods

2.1. Binary equivalence relations and the clique partitioning problem (CPP)

Our development of the problem of aggregating binary equivalence relations and the formulation of the CPP uses the following definitions:

N : the number of customers (or firms) that have been partitioned, indexed 1 £ n £ N;

P : the number of partitions of the customers to amalgamate, indexed 1 £ p £ P;

X : an N ´ P matrix with elements xnp representing the class measurement for customer n in partition p, for 1 £ n £ N and 1 £ p £ P;

E(p) : an N ´ N matrix that defines a binary equivalence relation on partition p, for 1 £ p £ P. The elements of E(p) are defined as = 1 if xnp = xjp and = 0 if xnp ¹ xjp, for 1 £ n, j £ N and 1 £ p £ P;

E : an N ´ N matrix defining a “median” relation for the N customers, where enj = 1 if customers n and j are assigned to the same segment, 0 otherwise, for 1 £ n, j £ N;

The goal of the problem posed by Règnier (1965) is to find the median relation (Mirkin, 1974, 1979; Barthélemy and Monjardet, 1995), E, that provides the best aggregation of the P binary equivalence relations, E(1), E(2), ..., E(P), as measured by the least squares loss function:

Minimize: . (1)

The loss function is minimized subject to constraints guaranteeing reflexivity, symmetry, transitivity, and binary properties of the median relation, which are, respectively, enforced as follows:

enn = 1 " 1 £ n £ N, (2)

enj = ejn " 1 £ n < j £ N, (3)

eni + eij – enj £ 1 " 1 £ n, i, j £ N, (4)

enj Î {0, 1} " 1 £ n, j £ N. (5)

The linearization of (1) is accomplished via some algebra that begins with the expansion of the quadratic expression and separation of terms that do not include the median relation: