Cultural Repertoires: A Market Basket Analysis
Chris Hand1 and Alan Collins2
1. School of Marketing, Kingston University, Kingston Hill, Kingston upon Thames, Surrey, KT2 7LB, UK
Email:
2. Department of Economics, University of Portsmouth, UK
Abstract
This paper investigates the effect of participation in one cultural pursuit on participation in another. With one or two exceptions, studies of participation have tended to focus solely on one pursuit at a time, using a standard demand function which may include the prices of complements and substitutes.
This paper adopts an alternative, empirical approach to determining the inter-relationships between cultural pursuits and employs a technique found in the marketing and data mining literature: Market Basket Analysis. This allows us to identify the inter-relationships between the cinema, theatre, museums / galleries, concerts / gigs, live sport, playing sport / exercise, watching videos / DVDs and playing computer games using a national survey from the UK.
We find three groups of pursuits which are strongly associated, where participation in one is associated with participation in the others, as well as two groups where participation is negatively associated.
Key words: Leisure, Participation, Market Basket Analysis
1. Introduction
Cultural activities have been the subject of increased attention from Economics and Business researchers in recent years. However, attention has tended to focus on the determinants of participation in individual leisure pursuits. For example, Farrell and Shields (2002) investigated participation in sports in the UK whilst Gray (2003) and Borgonovi (2004) examine attendance at performing arts in the US. The film industry in particular has attracted attention from researchers, with recent studies investigating the determinants of commercial success (e.g. De Vany and Walls,1999, Collins, Hand and Snell, 2002, Walls, 2005), modelling survival times (e.g. De Vany and Walls, 1997, Jedidi, Krider and Weinberg, 1998), release timing for movies (Krider and Weinberg, 1998), sequential release of films across channels, such as in movie theatres and on video (Lehmann and Weinberg, 2000) and across markets (Elberse and Eliashberg, 2003). Studies of the theatre have addressed similar themes. Johnson and Garbarino (2001) investigated the differences between subscribers and non-subscribers to an off-Broadway theatre, in terms of satisfaction, trust and commitment. Simonoff and La (2003) investigate the determinants of the duration of a play’s run on Broadway whilst Maddison (2004) examines the distribution of Broadway shows’ survival times. Ngobo (2005) investigates the impact of both demographics and satisfaction on upward and downward migration (i.e. the decision to become a subscriber and the decision to purchase tickets less often.
It is already known that those who attend one live art frequently are likely to attend others (Andreason and Belk, 1980). However, the relationship between different leisure pursuits (not just live arts) and whether they are complements or substitutes has received far less attention and has tended to focus on whether sports and arts are substitutes. A second strand of the literature has investigated whether tastes in both music and in leisure pursuits more generally have broadened over time; whether “snobs” have become “omnivores” (e.g. Peterson and Kern, 1996, Holbrook, Weiss and Habich, 2002).
The conventional approach in economics in defining substitutes and complements is to examine cross-price elasticities. As Gapinski (1986) observes, the idea that the price of substitutes are determinants of demand for the arts is far from new. However, exactly what the substitutes for a particular leisure pursuit are is far from clear. In models of demand for the theatre, the cinema is sometimes included as a substitute (e.g. Touchstone, 1980). Macmillan and Smith (1999) included a proxy for television in their model of cinema attendance, whilst Collins, Hand and Ryder (2005) included video in their study of cinema visit frequency. Typically however, such studies have included few substitutes and have the aim of explaining attendance at a particular arts event, rather than determine the relationship between different leisure pursuits. In the marketing literature, the relationship between different channels of distribution (e.g. cinema release and release on DVD) have been studied, but from the producer’s perspective rather than the consumers’ (for example, Lehmann and Weinberg, 1998, considered the optimal release time for films on video).
An alternative is to examine consumers’ behaviour over time. Over time, consumers may change from one brand to another, to another and then back to the original brand. As Ehrenberg (1972) has shown, in general, households are loyal to several brands rather than being loyal to only one brand. The brands a household is loyal to are known as the brand portfolio or brand repertoire. The effect of participation in one activity on the likelihood of participation in another has received little recent attention. Montgomery and Robinson (2005) provide a notable exception; their study found little evidence that sports were substituted for arts, rather sports attendance increased the likelihood of attending arts performances.
This paper takes a different approach, regarding the leisure pursuits people engage in as forming the repertoire of pursuits they are loyal to. These “cultural repertoires” (analogous to brand repertoires) are investigated using a market basket analysis to determine the extent to which cinema, theatre, video / DVD, video / computer games, concerts / gigs, galleries / museums, watching live sports and playing sport / exercising are substitutes or complements.
2. Market Basket Analysis
Market Basket Analysis (MBA) is an exploratory technique which identifies the strength of association between pairs of products purchased from an individual retailer. Such analysis is usually applied to data on shopping behaviour, such as that collected at the point of sale. If applied to grocery shopping for example, the results of a MBA could inform a supermarket’s pricing strategy. If the supermarket knows that bread and fruit juice tend to be purchased together, it can avoid offering price discounts on both at the same time. In this paper, we apply the same approach to a notional basket of leisure pursuits which we denote consumers’ leisure repertoires.
A Market Basket Analysis determines the degree to which two leisure activities are associated and hence are likely to feature in the same “basket” of leisure pursuits. In its simplest form, an MBA can be seen as a series of pairwise contingency tables. With very large datasets, such contingency tables can be used to filter out pairs of products which are not associated, allowing a more parsimonious model to be estimated. A number of different methods have been employed in the study of market baskets: pairwise comparison (e.g. Julander, 1992), association rules (e.g. Giudici, 2003), Bayesian model search employing Markov Chain Monte Carlo methods (Giudici and Passerone, 2002), neural network models (Decker and Monien, 2003) and the method we employ, log-linear models.
3. Data
In this study we used data from the Cinema and Video Industry Audience Research (CAVIAR) survey. The CAVIAR survey is undertaken annually by BMRB International on behalf of the UK’s Cinema Advertising Association. The data was collected from a sample representative of the UK population (slightly over-sampling younger age groups to reflect the cinema audience). Amongst other things the survey asks which of a list of leisure pursuits each respondent enjoys participating in (however, the survey only captures participation, data on frequency of participation is only collected for cinema-going and watching videos / DVDs). Hence our data is based on stated preference rather than actual consumption records.
The full CAVIAR data set contains 3106 observations; filtering out respondents under the age of 18 reduced the sample size to 1937. Table 1 shows the number of respondents who stated they enjoyed each leisure pursuit in each age group.
Table 1. Age profile of Leisure Pursuits
Age Group / Cinema / Computer / console games / Theatre / Live Sport / Concert / gig / Sport / exercise / Gallery/Museum / DVD /
Video
18 – 24 / 474 / 297 / 113 / 214 / 213 / 304 / 98 / 487
25 – 34 / 296 / 178 / 104 / 121 / 164 / 182 / 112 / 330
35 – 44 / 259 / 117 / 120 / 131 / 140 / 163 / 119 / 272
45 – 54 / 85 / 31 / 71 / 45 / 64 / 56 / 62 / 97
55 – 64 / 34 / 13 / 53 / 21 / 27 / 26 / 37 / 39
65 & over / 34 / 11 / 50 / 25 / 20 / 24 / 35 / 44
Total / 1182 / 647 / 511 / 557 / 628 / 755 / 463 / 1269
Cinema-going and watching DVDs were the most popular with 1182 and 1269 respondents saying they enjoyed these activities, whilst the theatre was the least popular. As might be expected, cinema-going, playing computer / console games and watching DVDs / video were more popular among younger respondents, whilst theatre-going and going to galleries and museums were more popular among older respondents.
4. Log-Linear Model
Log-linear models can be thought of as association tests for n-way contingency tables. Our data set contains eight leisure pursuits: cinema, theatre, watching DVDs / videos, computer / console games, watching live sport, concerts / gigs, sport / exercise and going to galleries or museums. We could investigate the relationship between these activities by considering the 36 2x2 contingency tables obtained by considering each possible pair of leisure activities. However, this would only show marginal associations, and not conditional associations. Hence, we used a log-linear model, rather than separate contingency tables to run the market basket analysis.
A loglinear model predicts the log of the number of observations in each cell of a contingency table using an estimated parameter (λ) for each value of the row and column variables and for each combination of the row and column variables. In general terms, for a two-way contingency table, the predicted number is obtained from a constant (μ), two main effects coefficients which depend on the variables in the row and column of the contingency table and an interaction term which describes the association between the two variables. For example, in a contingency table of whether cinemagoers are also theatregoers the predicted number of cases in each cell would be as follows:
ln(m11) = μ + λ non-cinemagoer + λ non-theatregoer + λ non-cinemagoer and non-theatregoer
ln(m12) = μ + λ non-cinemagoer + λ theatregoer + λ theatregoer but non-cinemagoer
ln(m21) = μ + λ cinemagoer + λ non-theatregoer + λ cinemagoer but non-theatregoer
ln(m22) = μ + λ cinemagoer + λ theatregoer + λ cinemagoer and theatregoer
where m(ij) refers to the cell in the ith row and jth column of the table. In Market Basket Analysis, interest is usually focussed on the interaction between purchases.
The results of log-linear models are often interpreted in terms of odds ratios. The odds ratios are arguably easier to interpret than the log-linear coefficients. An odds ratio of one denotes no association, less than one a negative association and greater than one a positive association. In analyses of data sets with a large number of variables, attention may be focused on odds ratios greater than a threshold level (e.g. Giudici and Passerone, 2002, use an odds ratio of 5) rather than reporting all significant associations. Market Basket Analyses are usually conducted on very large datasets; the example presented by Giudici (2003) contains 46,727 observations. With such large datasets inferential statistical tests can become too sensitive with very small odds ratios being significant. Focusing on the largest odds ratios (the most strongly associated pairs of leisure pursuits) avoids this problem. Alternatively, an odds ratio may be regarded as significant if the lower bound of its 95% confidence interval is greater than 1. Our dataset is sufficiently small so that the significance tests based on Z statistics and on confidence intervals coincide.
5. Results
Table 2 contains the results of the log-linear model. In order to obtain the odds ratios, we transform the estimated coefficients by exponentiating them (i.e. raising the coefficient to the power e)1. The odds column contains the odds that each combination of leisure pursuits is associated. Where a negative association was found (odds less than 1), the odds against were also calculated in order to compare the strength of the positive and negative associations. To make the table easier to read, it is arranged with the interaction term parameters in descending order of size.
Table 2 Full log-linear model results
Parameter / Estimate / Std. Error / Z / Sig. / Odds / Oddsagainst
Constant / 4.889 / 0.068 / 71.451 / 0.000 / - / -
games / -1.806 / 0.123 / -14.720 / 0.000 / - / -
video_DVD / -0.335 / 0.091 / -3.693 / 0.000 / - / -
Cinema / -0.938 / 0.102 / -9.240 / 0.000 / - / -
Concert_gig / -2.287 / 0.133 / -17.204 / 0.000 / - / -
Gallery_museum / -2.221 / 0.137 / -16.227 / 0.000 / - / -
live_sport / -1.834 / 0.123 / -14.905 / 0.000 / - / -
sport_exercise / -1.505 / 0.112 / -13.399 / 0.000 / - / -
Theatre / -1.912 / 0.128 / -14.921 / 0.000 / - / -
live_sport * sport_exercise / 1.460 / 0.110 / 13.305 / 0.000 / 4.305* / -
Theatre * gallery_museum / 1.388 / 0.121 / 11.504 / 0.000 / 4.005* / -
Cinema * video_DVD / 1.050 / 0.107 / 9.832 / 0.000 / 2.859* / -
games * video_DVD / 1.020 / 0.120 / 8.485 / 0.000 / 2.772* / -
Theatre * concert_gig / 0.813 / 0.119 / 6.819 / 0.000 / 2.254 / -
Cinema * concert_gig / 0.757 / 0.119 / 6.370 / 0.000 / 2.133 / -
Concert_gig * gallery_museum / 0.728 / 0.122 / 5.957 / 0.000 / 2.070 / -
Cinema * theatre / 0.671 / 0.130 / 5.159 / 0.000 / 1.957 / -
Concert_gig * video_DVD / 0.546 / 0.122 / 4.485 / 0.000 / 1.726 / -
sport_exercise * gallery_museum / 0.485 / 0.124 / 3.902 / 0.000 / 1.624 / -
games * live_sport / 0.453 / 0.114 / 3.969 / 0.000 / 1.573 / -
games * sport_exercise / 0.389 / 0.109 / 3.563 / 0.000 / 1.475 / -
Cinema * sport_exercise / 0.388 / 0.112 / 3.466 / 0.001 / 1.474 / -
Cinema * gallery_museum / 0.379 / 0.133 / 2.844 / 0.004 / 1.461 / -
live_sport * concert_gig / 0.313 / 0.120 / 2.613 / 0.009 / 1.368 / -
Cinema * games / 0.268 / 0.113 / 2.381 / 0.017 / 1.308 / -
games * concert_gig / 0.202 / 0.113 / 1.783 / 0.075 / 1.223 / -
Theatre * sport_exercise / 0.191 / 0.122 / 1.557 / 0.119 / 1.210 / -
Concert_gig * sport_exercise / 0.189 / 0.113 / 1.667 / 0.095 / 1.208 / -
Cinema * live_sport / 0.106 / 0.121 / 0.877 / 0.381 / 1.112 / -
live_sport * video_DVD / 0.019 / 0.123 / 0.151 / 0.880 / 1.019 / -
sport_exercise * video_DVD / -0.006 / 0.114 / -0.053 / 0.958 / 0.994 / 1.006
Gallery_museum * video_DVD / -0.010 / 0.131 / -0.075 / 0.940 / 0.990 / 1.010
Theatre * live_sport / -0.116 / 0.133 / -0.875 / 0.382 / 0.890 / 1.123
games * gallery_museum / -0.158 / 0.129 / -1.229 / 0.219 / 0.854 / 1.171
live_sport * gallery_museum / -0.286 / 0.137 / -2.096 / 0.036 / 0.751 / 1.332
Theatre * video_DVD / -0.298 / 0.127 / -2.351 / 0.019 / 0.742 / 1.347
games * theatre / -0.393 / 0.127 / -3.092 / 0.002 / 0.675 / 1.481
* denotes odds ratio significantly greater than 2 (i.e. those with a lower bound of the 95% confidence interval > 2). See appendix for full results.