Measuring Outdoor: How Media Research uses Traffic Research to Create a Ratings Currency
Measuring Outdoor: How Media Research Uses Traffic Research to Create a Ratings Currency
James Tobolski, Arbitron, Inc.
William McDonald, Arbitron, Inc.
Joshua Chasin, Warp Speed Marketing, Inc.
September 24, 2003
Measuring Outdoor: How Media Research Uses Traffic Research to Create a Ratings Currency
Section 1. Abstract
The field of traffic research is a high-stakes field populated by a school of serious, dedicated, educated, and passionate researchers. Traffic research is high stakes because, for example, a municipality might choose to undertake a $500 million bond offering to build a new road or bridge based on forecasts of traffic flow throughout a metropolitan or other geographic area.
But traffic research is vastly different from media research, at least as practiced in the US. Perhaps the most dramatic difference is the skew in media research toward primary data collection, whereas in traffic research a large body of empirical data already exists (government traffic counts), so much of the advanced work is done in development and deployment of statistical, predictive models.
The Out-of-Home ratings service prototyped by Arbitron in Atlanta represents a new paradigm in media research. Ultimately, of course, all ratings services derive their utility from the extent to which they provide users with insight about exposure to advertising (or, in practice, potential exposure, or “opportunity to see/hear.”) But because the core consumer behavior to be measured in an Out-of-Home audience measurement service is traffic, as opposed to media consumption, the prototype methodology in Atlanta was developed by combining the knowledge bases of two wholly different disciplines—media research and traffic research.
This hybrid approach was necessary too because of the great challenges in adequately reporting on opportunity to see for Out-of-Home in the US, where the medium is so fragmented and granular that traditional approaches to audience measurement would fall short. Indeed, the paper draws a parallel to the Internet as another medium new to audience measurement, in which traditional, existing paradigms for audience measurement are currently being challenged.
The resulting system that has been developed includes components that will be quite familiar to the US media researcher—and some that will be new and different. The primary difference is the extent to which the system relies on a data expansion algorithm, or statistical model, in order to fill all the cells (by demographic and inventory unit) in the market. As this paper will demonstrate, such an approach is not without precedent in the Out-of-Home media measurement arena; and, that it is the appropriate approach when the behavior to be projected is traffic behavior.
Perhaps the major finding presented in this paper is the fact that primary respondent data collection in and of itself is insufficient as a means for creating Out-of-Home audience estimates. Primary data collection is a vital component in such a system; however for Out-of-Home it must be supplemented by statistical modeling in order to create the level of granularity necessary for the buying and selling of advertising. This is a conclusion that several international media markets have already reached; it is a new finding within the context of US media measurement.
Finally, this paperwill show how, by combining best practice media research with best practice traffic research, the system described herein is able to generate actionable estimates at an extremely granular level, for in excess of 98% of the 7,500+ inventory units in the Atlanta geography covered.
Section 2. The Challenge: When Traditional Audience Measurement Paradigms Fall Short
Out-of-Home is Like the Internet
In a recent article, Media Researcher Erwin Ephron noted that “Outdoor does not compete with TV, radio or print for the bulk of discretionary advertising dollars, because, for one thing, there is no good planning data for the medium.”[1] In the same article, in discussing potential solutions, Ephron goes on to note that: “The travel data can be either site-centric or consumer-centric. (Ironically these are the same measurement issues the Internet is debating.)”[2]
Dr. Joseph Philport, president of the Traffic Audit Bureau (TAB), has observed that the Daily Effective Circulation counts (DECs) provided by the TAB can co-exist in harmony with a US Out-of-Home ratings service. Dr. Philport observed that there was a place for both “site-centric” (the DEC) and “user-centric” (ratings) data.[3]
To those acquainted with the issues confronting Internet measurement, as Ephron notes, this paradigm is quite familiar. And this is appropriate—because like Internet measurement, Out-of-Home exposure measurement poses a daunting challenge to the traditional model of media audience measurement. In short, both media are so granular—the number of inventory units presenting an “opportunity to see” is so vast and fragmented—that traditional media research approaches relying on data collection among a random probability sample—followed by data cleaning and editing, and then weighting and projection to the universe-- fall short.
The problem, quite simply, is that the appropriate sample size to provide measurement with sufficient statistical reliability is a function of the average audience size of the media vehicle being measured. On the Internet, there are thousands of web sites, millions of web pages, and innumerable opportunities to serve an exposure to a web visitor. Traditional media research approaches cannot possibly hope to provide sufficiently large sample sizes to measure the granularity of Internet behavior with the same robustness that a radio ratings, TV ratings, or print ratings service does. Subsequently, Internet audience reportage has bifurcated into two approaches: site-centric, involving a census of visitors taken from traffic logs for individual sites (but which cannot be combined across site to create duplication); and user-centric, or sample-based measures, which face sample size challenges but which can generate multi-site duplication patterns.
The Internet: With respect to Internet measurement, one innovative approach to user-centric measurement has been developed by comScore. ComScore recruits respondents in a non-random fashion, enabling them to offer an Internet panel of roughly 1.5 million respondents, whose online behavior is tracked and used to generate reports. While some in the media research community may have trouble accepting a methodology that is not built around random probability sampling, comScore’s approach seems a clever way of addressing the research needs of the Internet by getting around practical and economic limits on sample size that deployment of a random probability sample would pose. While the comScore approach has not been vetted by the Media Ratings Council accreditation process, it has been subject to an ARF review.[4]
Out-of-Home: Out-of-Home is generally a local medium, requiring measurement on the local market level. Arbitron has identified over 7,500 pieces of Out-of-home inventory in Atlanta, the first US market in which the company has deployed a test ratings service.[5] In Chicago, which isunder consideration as the first Arbitron expansion market, the company has identified over 13,000 pieces of inventory.
Typically, ratings services require a cell in-tab of 30 or more respondents before reporting an estimate. With 13,000 pieces of inventory and 16 age/sex cells (18-24; 25-34; 35-44; 45-49; 50-54; 55-64; and 65+ for both males and females), this implies that an 18+ sample must be large enough to generate at least 4,680,000 inventory exposures—just among the sample! (12 cells X 30 minimum in-tab X 13,000 inventory units.) And this is a simplistic estimate; it fails to take into account the fact that the inventory on the most heavily-trafficked roads will account for a disproportionate number of the exposures (exposures to inventory inAtlanta, or any US market for that matter, are by no means distributed evenly across the inventory units.)
Subsequently there would be three important variables to consider in determining the appropriate sample size to measure a major US market, using the traditional media research paradigm:
- How many exposures or impressions can the researcher collect per respondent?
- How are an individual respondent’s exposures distributed across inventory? In other words, does a given respondent generate repeated exposures to the same set of inventory? The higher the intra-clustering effect among individual respondents, the more respondents would be required to cover a broader array of inventory units.
- How deep into the 13,000 pieces of inventory will it be necessary to report usable data, in order to have a viable service?
Under the traditional ratings service methodological approach, these three variables would yield some guidance with respect to the necessary sample sizes for meeting marketplace need.
Out-of-Home Ratings the Traditional Way: How Much is Enough?
Traditionally, media researchers consider minimum counts per survey respondent group cell as the driver in determining total sample size requirements. However, this approach for Out-of-Home would result in unreasonably (and unnecessarily) large sample sizes. (If the survey instrument was an electronic technology, such as GPS, these sample sizes would make cost unfeasible.)
The Atlanta test provided some empirical data to help estimate how large a survey sample would have to be in order to report on the Out-of-Home medium at the level of granularity to which US media buyers and sellers are accustomed.
The following assumptions are made:
- As noted above: that reportage needs to be sufficiently discrete to create the demographic cell data that serves as building blocks for assembling broader target demographics; that is, 16 age/sex breaks.
- That the vast preponderance of identified inventory must be reportable with non-zero audience estimates at the demographic cell level. (There is a direct relationship between sample size and the percentage of inventory with reportable estimates, in a traditional media measurement construct.)
Key Finding: Respondent Exposures per week: Arbitron estimates that the 50 respondents in the Atlanta test who carried personal GPS units were exposed to approximately 1500 discrete inventory units per week.[6]
Inventory exposure distributions by inventory type: How do the 1500 exposures distribute by inventory type? For this exercise we will assume four quartiles of inventory class based on the probability that an individual respondent will be exposed to any inventory in that class in a week. This step helps us to distribute the 1500 exposures by inventory type.
To illustrate the sample size calculation, we will walk through inventory class A in table 1 below:
Table 1: Calculation of Target Sample Size
7-Day GPS Instrument
Market Total Inventory Units: 13,000
A / B / C / D / E / F / G / H / I / JInventory Class / % of Inventory / Inventory Distribution / Probability
Of Class Exposure / Class Distribution / Min. Cell Size / #
Cells / Col.F X
Col.G / Col. C X
Col. H / Col. I
Col. E
A / 25% / 3250 / 0.90 / 563 / 30 / 16 / 480 / 1560000 / 2773
B / 25% / 3250 / 0.70 / 438 / 30 / 16 / 480 / 1560000 / 3566
C / 25% / 3250 / 0.50 / 313 / 30 / 16 / 480 / 1560000 / 4992
D / 25% / 3250 / 0.30 / 188 / 30 / 16 / 480 / 1560000 / 8320
100% / 13000 / 2.40 / 1500
- Column B: We are assuming an equal distribution of inventory by quartile—that is, Inventory class A is the most heavily-trafficked 25% of inventory; class B the second-most-heavily trafficked, and so on. Presumably class A would comprise the largest boards on the busiest roads.
- Column C: Given a market of 13,000 inventory units, 25%, or 3,250, will fall into each quartile.
- Column D: For each quartile, an estimated probability that a given respondent will be exposed to any inventory in that class in a week. Type A is the most heavily-trafficked; here we assume that there is a 90% probability; or, that 90% of the population will be exposed to at least one class A inventory unit in a week.
- Column E:Given a total weekly exposure of 1500 discrete units, we use the exposure probabilities in column D to distribute these 1500 respondent inventory units by quartile. The calculation here is (class probability) / (sum of class probabilities) X (total respondent inventory units per week). For class A: (.9) / (2.4) * 1500, or 563. Over a third of the inventory units to which the average respondent is exposed would fall into the highest quartile.
- Column F: Minimum in-tab cell size required for reporting. We are assuming a count of 30; conceivably the ratings service might require a larger cell in-tab for reporting estimates, driving the final required sample up.
- Column G: For each class, the 16 age/sex cells as above.
- Column H: The product of columns F and G; the bare minimum sample size required would be 480—30 per cell for each of 16 cells. If every respondent was exposed to every inventory unit, 480 would be an appropriate target sample. Obviously this is not a realistic scenario.
- Column I: Total required discrete respondent/exposures. The product of the 480 required respondents multiplied by the inventory units in the class. For type A, with 3,250 inventory units, we would require a total of 1,563,000 discrete respondent/inventory exposures. In other words, 1.56 million instances of one respondent being exposed to a discrete piece of inventory.
- Column J: Final required sample size. We divide the 1.56 million required discrete respondent/inventory exposures by the number of discrete exposures per respondent per week in the class. 563 of the discrete 1500 inventory units to which the average respondent is exposed fall into class A. Therefore, in order to report on all the class A inventory at the inventory/demographic cell level, a sample size of 2,773 respondent-weeks would be required.
Market Considerations
We have seen that in order to measure the 25% of inventory which receives the most discrete exposures, a sample would need to cover 2,773 person-weeks (i.e., 2,773 respondents providing a week of data each.) However, it is an easier task to report on the largest inventory units than to report on the lower-rated units. Arbitron market diligence in the US has led the company to conclude that a service would need to go beyond the top quartile in order to be commercially viable. Indeed the very largest units are often sold based on supply and demand; the industry’s need for ratings is largely driven by a need to develop schedule-level estimates for packages that include all types of inventory, not just the types with the most traffic.
In other words: from a user perspective, depth and granularity of data reported is an important research quality driver in Out-of-Home audience measurement.
To include the second quartile of inventory units—reporting now on the 50% inventory that is most heavily trafficked—would require a sample size of 3,566 person-weeks.
Expanding the service to cover all four quartiles of inventory would require a sample size of 8,320 person-weeks. This is over ten times as large as existing and proposed services based exclusively on 7-day GPS measurement.
The above illustration assumes equal distribution of the 16 age/sex cells within the sample. If the cells are not equally distributed in the sample (e.g., if males 18-24 represent only 5% of the total sample) than sample requirements would need to be somewhat larger.
Section 3. A Global Perspective
At this point it becomes reasonable to explore ways in which researchers in other countries have grappled with the challenges of Out-of-Home measurement. It is important to note that the media landscape varies greatly by nation, and the reader should bear this in mind. Different countries have different inventory types, sizes, tolerances for clutter, and legislation restricting placement of Out-of-Home advertising.
POSTAR:
Perhaps the best-known Out-of-Home audience measurement system is Great Britain’s POSTAR (a contraction standing for “PosterAudience Research”). POSTAR is a Joint Industry Committee that was charged with developing a way to address “an almost intractable set of problems”[7] in Out-of-Home measurement. POSTAR is comprised, in essence, of seven steps:
- Traffic counts: As many traffic counts as possible are obtained from the appropriate municipalities, and related to individual poster sites via neural networking. This process uses poster site characteristics to attribute counts to sites even where government data is not available.
- Pedestrian counts: Pedestrian counts from a sample of locations are taken manually. A similar neural networking approach as above is applied to this data to project pedestrian counts for all sites.
- Coverage calculation: A sample is taken measuring person-level travel behavior in great detail. Trips are mapped and related to individual poster sites. This enables the development of daily coverage for different OTS levels, and build of OTS levels over time.
- Dispersion factor: Also from the above survey, a factor was developed to determine the impact of geographic dispersion of campaign inventory on the exposure patterns of campaigns.
- Visibility Adjusted Impacts: A two-stage process, using eye tracking and time exposed to an inventory unit, to develop an adjustment to audience levels at the gross and campaign (but not the inventory-specific, apparently) level.
- Refinements: These include adjustments to accommodate specific user geographies (exposure of persons in geography A to a campaign in geography B), to account for hours of daylight based on seasonality; and, to account for illumination.
- Data access: Assemblage of all this data, factoring, and manipulation into a usable desktop system.[8]
Intriguingly, the POSTAR system does not rely on the traditional media research paradigm. Outside of the pedestrian count work and the national survey to develop factors, there are no “respondents”, per se. In what may be an unfair oversimplification of POSTAR, the methodology may well be characterized as the application of factors and modeling to existent municipal traffic counts.
ROAM:
ROAM (Research on Outdoor Audience Measurement) is a system developed to provide audience measurement for Out-of-Home inventory units in the five major markets in Australia.
Australia represents a unique situation, because the Australian government mandates that an annual sample of 40,000 persons complete one-day travel diaries. Participation in this survey is akin to jury duty in the US; if called, eventually you must serve. Subsequently, these 40,000 travel diaries—placed for purposes of civic planning and engineering—enable the Australian media research community to piggyback onto the government effort by taking advantage of data from 40,000 respondents annually, already collected, tabulated—and paid for.
ROAM is an elegant system that takes the raw data from these travel diaries as input, and uses data about nature and demographics of origin and destination points as inputs into a sophisticated computer model that generates inventory-specific data with demographic detail at the inventory unit level. ROAM is provided to users with an advanced mapping system, enabling the user to “see” the performance of different campaigns against different demographics and geographies (for example, how does a campaign of inventory located in the business district work in reaching persons living in a certain suburban area?)[9]