1
Jennifer HubenigMaster’s Paper
April 30, 2007
The Convenience of Tobacco: Measuring and Examining Disparity in Tobacco Supply and Demand
By: Jennifer Hubenig
Masters Project
April 30, 2007
Advisor: Dr. Michael Tiefelsdorf
Table of Contents
Section / Page Numberi. Abstract / 3
ii. Table of Figures, Equations and Tables / 4
1. Introduction / 5
2. Project Objective / 8
3. Literature Review / 9
4. Data Sources / 13
5. Analysis (Methods) / 15
5.1 Creating Supply and Demand Based Estimates / 19
5.1.1 Supply Estimates / 19
5.1.2 Demand Estimates / 24
5.2 Calculating the Supply/Demand Discrepancy / 26
5.3 Regression on Socioeconomic Variables / 28
6. Results and Discussion / 30
6.1 Estimation Results / 30
6.2 Regression Results / 34
7. Conclusion / 39
7.1 Future Work / 40
8. References / 40
i. Abstract:
ii. Table of Figures, Equations and Tables:
Figures / Page Number- Illustration of a geometric intersection courtesy of ArcGIS® Desktop Help
- P(i) indicates the proportions of the tract (outlined in red) population assuming an equal population distribution.
- D(i) indicates the national average demand in each tract proportion assuming an equal population distribution.
- The purple area is a Thiessen polygon includes all the estimated demand in each portion. A sum of this demand gives the total demand for the outlet this polygon is based on.
- S(i) indicates the estimated demand (from supply) in each tract proportion assuming an equal population distribution.
- The blue area is a census tract that includes all the supply-estimated demand from a sum of the supply-estimated demand in each tract portion S(i).
- Scatterplot of Log of Supply-Estimated Demand versus Population-estimated Demand
- R Histogram of the Dependent Variable (left) and Population density (right)
- Distribution of the ratio of population-estimated demand subtracted by national demand
- The ratio map of supply-based/population-based demand estimation
- The ratio map of population-based demand/ population
- The ratio map of supply-based demand/ population
- Scatterplot of the studentized regression residuals versus the total population
- Geoda Significance Map of the regression model residuals.
- Geoda LISA Cluster Map of the regression model residuals.
- Map of the regression model residuals.
Equations
- Calculation of Supply-based Demand
- Calculation of Population-based Demand
- Calculation of Supply-based Demand versus Population-based Demand Ratio
Tables
- CDC Population At-Risk Percentages grouped by Gender, Race, Age, and Income
- Geocoding Percentages with Maptitude and ArcGIS
- Cleanup of Geocoding and Difference of Addition of Outlets “outside” the study area
- Example calculation of population estimated demand in each portion. A sum of this demand gives the total demand for the outlet this polygon is based on.
- R-Output for the Final Regression Model
The Convenience of Tobacco: Measuring and Examining Disparity in Tobacco Supply and Demand
1. Introduction
In 2002 the Center for Disease Control and Prevention (CDC) declared “Cigarette smoking is the leading preventable cause of death in the United States”[1] making it “responsible for about one in five deaths annually, or about 438,000 deaths per year.”[2] However, cigarette smoking costs not only human lives; it also has a financial cost. According to the CDC, $167 billion is spent each year on health care costs and lost productivity;2however, these deaths and expenditures are not equally distributed among the population. They tend to vary remarkably with age, gender, ethnicity, income and geographic region (Schneider et al. 2005). Studies that dissect this distribution of tobacco use are useful for policy implementations since they help to target campaigns and programs designed to decrease the harmful effects of smoking. Some of these policy options include restrictions on advertising, education on the adverse effects of smoking, and restriction of access to tobacco by zoning and/or enforcement of penalties for underage sales. Previous research on the discrepancies between tobacco supply (quantity of tobacco sold and/or the density of tobacco outlets) and demand (estimated number of smokers and smoking intensities) provide a more inclusive portrait of which geographic areas are under served (high number of people served by each outlet) or over served with regards to the number of tobacco outlets. Explanations for a high number of people per outlet served (under served) include the rapid growth of an area (time lapse between location of population and retail outlets), or that the quantity each customer consumes is below average so outlets are choosing to locate in other high consumption areas. A few explanations for a low number of people per outlet served (over served) include the possible targeting of certain groups by tobacco companies, high individual consumption of tobacco products, and recent emigration from an area, making the excess outlets a historic artifact.
Tobacco use is ubiquitous; it is not exclusive to an urban, rural, Hispanic, White, poor, rich, young or old populations. Therefore, I conducted my research in the Dallas-Fort Worth Metropolitan Statistical Area (MSA)[3] at the census tract level. I selected this area since it includes urban, suburban and rural characteristics, a representative proportion of both largest minority groups in the United States (African American 13.2% and Hispanic 15.6%), and the area contains relatively similar proportions of the population in all income categories used.[4] Previous studies selected only predominantly urban (Hyland et al. 2003) or rural settings (Peterson et al. 2005). The incorporation of both of these settings in my research allows for a comparison across the study area and adds an extra dimension to the study by researching the possible disparity between urban and rural effects.
While my research is not intended to provide a causal explanation of the supply and demand discrepancy, it will contribute to the body of literature concerning the well-established relationship between tobacco consumption and tobacco outlet density by establishing a new technique to explore this relationship. My research will differ in the estimation process of outlet density by taking into account an estimation of the number of smokers in a census tract and comparing the number of smokers estimated by outlet density. Other literature focuses solely on the supply side and estimates tobacco outlet density by taking the number of outlets, divide it by the kilometers of road network in the study area, and regress selected socioeconomic variables on the outlet density (Schneider et al. 2005, Hyland et al. 2003, Peterson et al. 2005). The process I am using will calculate an average estimated number of smokers[5] (supply) that tobacco outlets serve per census tract. To estimate the supply from tobacco outlet density, I will use Thiessen polygons rather than kilometers of road networks, to demarcate the market areas of each tobacco outlet. This will ensure that the supply estimate is not truncated at administrative boundaries (i.e. census tract boundaries), and, hence, the size of each Thiessen polygon will be indicative of local outlet density (smaller polygons indicate many outlets in the immediate area and large polygons indicate less).
A notable difference between my study and the previously established literature is that I am calculating a discrepancy between population-estimated demand and outlet-based supply, whereas the other studies concentrate on the density of the tobacco outlets (supply) only. Tobacco use literature is deemed important in public health literature and in the general public, but its spatial dimension is mainly neglected. However, alcohol use (with many parallels to tobacco use in Caetano and Clark 1998, Hyland et al. 2003 and Peterson et al. 2005) has been studied geographically extensively, and the concept that certain groups in the general population have a higher propensity towards tobacco consumption is well-established in statistical literature, and this should prompt more geographic-based research than has been conducted presently(Schneider et al. 2005, p 2). Some implications of this geographic-based research include the earlier mentioned policy implementations, which would have greater success from targeted campaignsas opposed to general ones, since well-designed geographic research can help narrow the scope of these programs and campaigns by identifying regions that have both an unwarranted high outlet density and that are estimated to have high proportions of smokers.
In previous literature, African American, Latino (Hyland et al. 2003, Peterson et al. 2005), and low income (Schneider et al. 2005) groups have been characterized by high densities of tobacco outlets. While these are valid categorizations for the Dallas-Fort Worth MSA (African American and Hispanic are the largest minority groups in this region), I would like to extend this hypothesis by testing age groups and gender as well as ethnic groups and income. I feel that there are other socioeconomic variables that contribute to the explanation of the discrepancy in demand and supply.
2. Project Objective
My research will address the question of which socioeconomic characteristics among age, race, gender and income contribute to the discrepancy between the population-estimated number of smokers and the tobacco outlet (supply) estimated number of smokers served. My research will also describe and test a different way to quantify this discrepancy rather than measuring purely tobacco outlet density.
3. Literature Review
Considering the emphasis put on controlling tobacco use and reducing disparities in public health and the media, the topic of estimating tobacco outlet density has had a lack of geographic research on the topic. A study by Laws et al. in 2002 on 10 neighborhoods in Boston, Massachusetts focused on the percentage of retail outlets that sold tobacco. This study addressed the subject of tobacco outlet density, but it had no geographic component. The earliest piece of literature to research the subject geographically (which is highly referenced in subsequent literature) is Dr. Andrew Hyland’s article “Tobacco Outlet Density and Demographics in Erie County, New York” (2003). Hyland’s article was ground-breaking in the tobacco research area since many studies of geographic nature had been published for alcohol outlets but not tobacco. Consumer habits and retail patterns have been noted to be similar between tobacco and alcohol outlets, which provide a compelling reason to study tobacco and apply similar established geographic hypotheses and methods. Hyland’s study was on 1019 licensed tobacco outlets in the primarily urban areas of the city of Buffalo inErie County, New York in 1996. During his research, Hyland’s use for Geographic Information Systems (GIS) was primarily to locate the tobacco outlets and to divide the number of outlets by each 10 kilometer section of roadway in each census tract. Although many parallels in consumer behavior have been drawn between alcohol and cigarettes (many outlets sell both), a study on tobacco with a geographic component using GIS software to do more than locate outlets and measure kilometers of roadway, had yet to be conducted.
John E Schneider’s article, “Tobacco Outlet Density and Demographics at the Tract Level of Analysis in Iowa: Implication for Environmentally Based Prevention Initiatives” (2005), evaluates at the tract level the “geographic association between tobacco outlet density and three demographic correlates – income, race, and ethnicity” (Schneider, 1) in Polk County in Iowa. His study primarily focuses on explaining the density of tobacco supply by testing the correlation of density with income, race and ethnicity. While referencing Hyland’s work as the basis for his research, Schneider does mention some differences between his study and Hyland’s. In particular, he references the use of 2003 outlet data and 2000 census data, as opposed to 1996 outlet data and 1990 census data. He also notes that, while the Hyland study focused on the percentage of African Americans and median income as socioeconomic variables, he added the percentage of Latino (Hispanic) people to these variables. However, once again Schneider, like Hyland, used GIS primarily to locate the tobacco outlets and also to divide the number of outlets by each 10 kilometer section of roadway in the census tract.
N. Andrew Peterson’s article “Tobacco Outlet Density, Cigarette Smoking Prevalence and Demographics at the CountyLevel of Analysis” (2005) is informative in some of the methods employed, but it is less applicable to my research because of the geographic unit used. Like Schneider, Peterson’s study takes place in Iowa, but his analysis takes place at the county level for all 99 counties in Iowa. Peterson, like Schneider and Hyland, calculates tobacco outlet density as the number of outlets per 50 kilometers of roadway. Also Peterson, like Hyland, alludes to the parallels between geographic analyses involving alcohol outlets and tobacco outlets (Peterson, 1631) and notes that the success of studies involving alcohol outlets support research such as mine.
The main differences between Schneider, Hyland and Peterson’s research and my own include: the geographic area of the data, the currency of the data, the method by which tobacco outlet density is measured, and the socioeconomic variables studied. In terms of the geographic area studied, Schneider’s study area was primarily rural in PolkCounty in Iowa, Peterson’s was all of Iowa (mostly rural), and Hyland’s was primarily urban (Buffalo, New York). My research, on the other hand, studies the MSA of Dallas-Fort Worth which encompasses both urban and rural effects. On the topic of data currency, Schneider uses 2003 tobacco license data and year 2000 census data, Peterson uses 2002 tobacco license data and year 2000 census data, Hyland used 1996 tobacco license data and 1990 census data, whereas my study uses 2006 license data (collected from April 2004 to April 2006), a 2005 road network, and 2000 census data. The currency of this data is important since many counties (CollinCounty is a good example) are rapidly expanding from one year to the next; therefore, the more current the data is, the closer to an actual representation of the county the research can reflect. As well, the method by which Schneider, Peterson, and Hyland estimate tobacco outlet density is the number of outlets in a census tract divided by kilometers of roadway in each census tract. While this is accepted as a reliable measure used in several studies (Schneider 2005, Hyland 2003, and Peterson 2005 among others), the area of influence of a tobacco outlet is not truncated by census tract boundaries, especially if the outlet lies close to the border. A better approach to this problem is to use Thiessen polygons[6] to demarcate the areas of influence. A summation of the proportions of those polygons that lie within each census tract would give a more accurate approximation of how many outlets (or portions thereof) serve the population of that census tract. Another major difference is that my research intends to estimate demand, instead of tobacco outlet density. From this estimateof demand I can test whether the proportion of the population the density of tobacco outlets estimates is similar to estimated overall demand predicted from the total population, or, in fact, vastly different. Finally, the socioeconomic variables used to explain the density of tobacco outlets in the above studies are race and income and I intend to extend the scope of explanation by adding age and gender.
Dr. Scott P. Novak’s article, “Retail Tobacco Outlet Density and Youth Cigarette Smoking: A Propensity-Modeling Approach” (2006), controls for a diverse range of neighborhood characteristics and examines whether tobacco outlet density is related to youth smoking habits (Novak 670). Novak’s study was primarily individual-based, but it differed in the control methods used to improve causal explanations. Novak had “trained raters drive at 5 mph down every street within the selected census tracts. Each side of the block was videotaped, and observer logs were coded to gather information on land use, physical conditions, and patterns of social interaction” (Novak 671). In addition to this interactive neighborhood information collection, the “trained raters” also added codes for any retail location licensed to sell tobacco with specific emphasis on liquor stores, gas stations, bars, supermarkets, and convenience stores. To calculate outlet density Novak took the total number of block faces with at least one retail outlet and divide it by the total number of block faces in the tract. Even though this calculation does not divide the number of outlets by kilometers of road network, it still does not accurately portray areas of influences of stores near tract boundaries. As my project suggests, these areas of influence should be factored into the analysis of density.
Ying-Chih Chuang’s “Effects of Neighborhood Socioeconomic Status and Convenience Store Concentration Individual-Level Smoking” (2006) assess the effects of socioeconomic status[7] at the neighborhood-level, and, echoing Novak’s study, Chuang also evaluates the effects of convenience store density on individual levels of smoking (Chuang, 568). One of the most prominent themesarising from the aforementioned literature, and from Chuang’s paper in particular, is the positive correlation of high smoking rates and low socioeconomic status (SES). Chuang hypothesizes that people with low SES in particular “may be more vulnerable to disadvantaged environments as they may be less knowledgeable about the harmful effects of smoking, have fewer resources to stop smoking, and experience more stressors in their daily lives than high SES people.”(Chuang 568) The similarities in Chuang’s paper to the rest of the literature reviewed lies solely in the hypothesis of SES on smoking status.
All of these articles offer a unanimous conclusion: smoking rates are higher in tracts with a higher density of tobacco outlets (however calculated). My research endeavors to explain this relationship by estimating the proportion of the population that actively smokes[8] and comparing this proportion to the one that would arise based on the distribution of the tobacco outlets. A secondary, and sometimes implicit, conclusion is that this density is associated with low SES. I will attempt to assess this hypothesis myself by incorporating age and gender to race and income which is previously established.
4. Data Sources
The socioeconomic variables used to estimate population-based demand are obtained from the United States Census Bureau Website ( from the decennial 2000 census. The variables used are “Race by Sex by Age” (P145A-H) and “Sex by Earnings in 1999 for the Population 16+ Years with Earnings” (P84) at the census tract level (it is assumed throughout this study that the areal composition of the census tracts in the study area has not changed substantially between 2000 and 2006). The CDC’s Behavioral Risk Factor Surveillance System (brfss) provides percentages of various socioeconomic/demographic groups and their susceptibility to smoke. Specifically the groups of interest for this research are classified by gender, race, age, and income level. These groups are broken up as shown in Table 1 with corresponding population-at-risk percentages.