Instructions for a Student Project on an Empirical Analysis of Cigarette Smoking

(Mainly for Wabash students, though this is adaptable at other institutions. Delete this before distributing. These instructions are based on using the JMP files. The CigProjectwithPriceData.xls workbook has instructions and links to NBER web sites for cig data, including SAS and Stata programs for the data.)

The files in this folder are designed to enable a student to do an empirical analysis of various questions related to smoking behavior. Since the price elasticity of demand for cigarettes is of particular interest, we provide the material needed to estimate this elasticity.

A codebook (cpsmay99.pdf) is available.

Three data sets are available:

  • CigPrice.xls is an Excel workbook that contains average cigarette price data by state.

The other two data sets are the same, but in different format. You will have to use the codebook to whittle down the data set to manageable size.

  • CPSMay99.zip is a compressed archive that contains a single JMP file of the CPS May 1999 Smoking Supplement. Although a slim 18.1MB compressed, this file will balloon to 450MB when unzipped. You must have enough space for the file.
  • CPSMay99.dta is a Stata data set which contains all the data. You may not know how to use Stata, but your instructors can help you. If you are a Wabash student, we stand ready to give you a subset of the data in an Excel file containing only the variables you specify. (This data set is too large to transmit over the web. To obtain the data, go to the NBER web page for “Reading Current Population Survey (CPS) Data with SAS, SPSS, or Stata”, and obtain the .do and .dct files for reading in the data as well as the ascii version of the May 1999 CPS Smoking Supplement, available at the NBER CPS Supplements page.

Incorporating Price Information:

If the student wishes to include state average cigarette prices in the analysis, the two datasets must be merged. The CPS has a Census code for the state in which each individual resides. The student must figure out these codes and perform the merger. We recommend using the VLOOKUP function to assign prices to observations based on the observations’ state codes from an Excel lookup table with three columns: state abbreviation, state numeric code, state average price. For an example of how to use VLOOKUP, see CPSRecode.xls (in the Basic Tools\InternetData\CPS folder).

Before merging, if JMP is being used, we suggest cutting down the file size to speed up analyses and lessen memory issues.

JMP Suggestions:

To make the JMP file more manageable, we recommend deleting unneeded variables and missing observations. An effective (but by no means the only) way to do this is to use the Tables: Subset command. The idea is to highlight the rows or columns that you want to keep, then execute Tables: Subset and a new data table is created.

For example, since smoking is the key concept in this data set, you may wish to drop all observations where it’s unknown whether or not the person is a smoker. You need to highlight all of the values that have data on smoking behavior.

One way to do this is to create a new variable, based on the Smoker recode variable. We created a new column, named it “In Smoker Universe,” and inserted the following formula:

This formula produces a “1” if Smoker recode is greater than 0, and a “0” if Smoker recode is less than or equal to 0. In the code book, Smoker recode is less than 0 if it’s not possible to determine whether the person is a smoker or if the smoking questions weren’t asked for this person (“not in the universe”). We then created a histogram of this variable by using Analyze:Distribution. We clicked on the bar for “1” and had the following display:

Note that rows with a value of 1 for “In Smoking Universe” are selected.

Now that the observations with information on smoking behavior are highlighted, we execute Tables:Subset and chose the following options:

This produces a new data table with a little more than half as many observations as before. Notice that the Subset dialog box allows you to produce a smaller data set by drawing a random sample from the original data set.

The same Tables: Subset procedure can be used to select variables that will be used in your analysis. Click on the column containing the variable name, and hold down the CTRL key as you click on non-contiguous columns that you want to keep. When finished, execute Tables: Subset and you will have a much smaller, more manageable subset of the original, complete data set.

Document your work in case you need to reproduce the smaller dataset and so others can see what you have done.

CigProjectInstructions.docPage 1 of 4