Notes on Data Files at the JSE Data Archive

04cars.dat
04cars.txt
NAME: 2004 New Car and Truck Data
TYPE: Sample
SIZE: 428 observations, 19 variables

The first variable is text and contains embedded blanks, so you need to read it with column format. Don’t forget the $ to indicate that it is alphanumeric. The remaining variables are delimited with blanks, so list input can be used for them. Asterisks are used to code missing values, so you must replace them with dots prior to bringing the data into SAS.

The code to input all of the variables is shown here:

/* See http://www.amstat.org/publications/jse/datasets/04cars.txt.

Data from http://www.amstat.org/publications/jse/datasets/04cars.dat -- this file used * as

the missing value code, but SAS uses . -- accordingly, I used Word to find and replace

every asterisk with a dot. */

options pageno=min nodate formdlim='-';

*Produce YesNo format;

proc format; value YN 0='No' 1='Yes';

data cars; infile 'C:\D\Downloads\04cars.dat';

*Notice use of column format for Vehicle_Name, for which the data have embedded blanks;

*Blanks were provided as delimiters, so I could use list input for the remaining variables;

input Vehicle_Name $ 1-45 Sports_Car Sport_Utility_Vehicle Wagon Minivan

Pickup All_Wheel Rear_Wheel Suggested_Retail_Price Dealer_Cost Engine_Size

Cylinders Horsepower City_MPG Highway_MPG Weight Wheel_Base Length Width;

*Apply YN format;

Format Sports_Car -- Rear_Wheel YN. ;

proc means; var Sports_Car -- Width; run;

*Compare Sporty cars with others on price and engine size;

proc sort; by Sports_Car;

Proc means; var Suggested_Retail_Price Engine_Size; by Sports_Car; run; quit;

93cars.dat (the basic data file)
93cars.txt (the documentation file)
NAME: 1993 New Car Data
TYPE: Sample
SIZE: 93 observations, 26 variables

Asterisks are used to code missing values, so you must replace them with dots prior to bringing the data into SAS. In the code below I used column input, but list should work too. Since there are two lines of data for each car, you must tell SAS when to skip to the second line – see the “#2” in the code below.

options pageno=min nodate formdlim='-';

title 'Column input, two lines of data per case';

title2 'See http://www.amstat.org/publications/jse/datasets/93cars.txt';

title3 'and http://www.amstat.org/publications/jse/datasets/93cars.dat';

*Produce formats for categorical variables that are coded numerically;

proc format; value DT 1='Rear Wheel' 2='Front Wheel' 3='All Wheel';

value YN 0='No' 1='Yes';

data cars; infile 'C:\D\Downloads\93cars.dat';

*Column input. Variable_Name Columns Variable_Name Columns, etc. $ indicates alphanumeric variable;

*Notice use of # to point to second line for each case;

input Manufacturer $ 1-14 Model $ 15-29 Type $ 30-36 Minimum_Price 38-41 Midrange_Price 43-46 Maximum Price 48-51 City_MPG 53-54 Highway_MPG 56-57

Air_Bags 59 Drive_Train 61 Cylinders 63 Engine_Liters 65-67 Horsepower 69-71

RPMinute 73-76

#2

RPMile 1-4 Manual_Transmission 6 Fuel_Capacity 8-11 Passengers 13

Length 15-17 Wheelbase 19-21 Width 23-24 U_turn 26-27 Rear_Seat 29-32

Luggage_Capacity 34-35 Weight 37-40 Domestic 42;

*Create variable Sporty from variable Drive_Train;

If Type = 'Sporty' then Sporty = 1; Else Sporty = 0;

*Apply formats;

Format Drive_Train DT. Sporty YN. ;

proc means; var Minimum_Price -- Domestic; run;

*Compare Sporty cars with others on price and engine size;

proc sort; by Sporty;

Proc means; var Midrange_Price Engine_Liters; by Sporty;

*What kind of car has a three cylinder engine ?? ;

data three; set cars; if Cylinders = 3;

proc print; var Manufacturer Model Type; run; quit;

babyboom.dat
babyboom.txt
NAME: Time of Birth, Sex, and Birth Weight of 44 Babies
TYPE: Observational
SIZE: 44 observations, 4 variables

Can you guess the nature of the relationship between “Time of birth recorded on the 24-hour clock” and “Number of minutes after midnight of each birth?”

options pageno=min nodate formdlim='-';

*Produce Sex format;

proc format; value sx 2='Boy' 1='Girl';

data cars; infile 'C:\D\StatData\JSE\BabyBoom.dat';

input BirthTime24 Sex BirthWeight Minutes_Midnight;

*Apply YN format;

Format Sex sx. ;

Proc Corr; Var BirthTime24 -- Minutes_Midnight; run;

Return to Wuensch’s Data Files Page.