Exercises for Lecture 2
EXERCISE 1 – Scleroderma data again!
1)Import the Scleroderma data. Create the improvement variable which is the difference in the two mobility scores.
(a)For only the clinics 45, 46, 48 and 49 (these have the largest sample sizes), you should draw a comparative boxplot of the improvement score, but do this separately for the drug and placebo groups. (You should have two comparative boxplots – one for the drug and one for the placebo, and each should compare the improvement scores for these four clinics. You should be able to do this running the boxplot procedure only once.) Summarize what you see in the plots.
(b) For a more formal analysis, run PROC GLM using the classification variables clinic and treatment, again only using the data in clinics 45, 46, 48 and 49. Be sure to include an interaction term. What does the GLM procedure tell you about the effects of clinic and treatment on the improvement in mobility scores?
2)For the Scleroderma data, we want to create a variable improve which indicates whether or not the mobility score has improved. These 4 pieces of code will not give the same answers all the time. The differences can occur for missing values in either or both variables and equality. Think carefully about how all four pieces of code handle these situations. Then think about how you would like to handle missing values and equality for your data and write the code that will do it.
if mobility2 gt then mobility1 then improve=1;
else improve=0;
if mobility2 lt mobility1 then improve=0;
else improve=1;
if mobility2=. or mobility1=. then improve=.;
else if mobility2 gt mobility1 then improve=1;
else improve=0;
if mobility2 gt mobility2 then improve=1;
else if mobility1 lt mobility2 then improve=0;
else improve=.
EXERCISE 2 - Merging SAS data sets
Create a library MYDATA for storing your permanent SAS data sets. This data is part of a larger study that contains 12 data sets with between 15 and 100 variables per data set. The data sets share at least one variable in common (in this case SUBJECT). I’ve picked three of the data sets, ENTRY, MEDHIST, and ANTHRO, and a few variables from each – stored SAS data sets on the CSASS website.
ENTRY
variablesdescription
SUBJECTthe subject’s id in the study
FREGtreatment group
MEDHIST
variablesdescription
SUBJECTthe subject’s id in the study
BIRDT subjects birthdate
SEXmale (1) and female (2)
ANTHRO
variablesdescription
SUBJECTthe subject’s id in the study
WT1CDweight code, 1=lbs/oz, 2=grams, 3=kilograms
WT1weight at baseline
WT1OZif the weight is in lbs/oz, the WT1 is pounds and this is ounces
DATEbaseline date
After downloading the three data sets, make sure to put them in your library MYDATA (it is fine to download them directly into this library). You need to merge the three dataset by the common variable SUBJECT. In addition to merging the data sets, in the SAME data step, create a new variable WEIGHT for each subject which is the baseline weight in kilograms, and a new variable AGE which is the subject’s age in weeks at baseline (remember taking a difference in two date variables gives the number of days between the two values). The resulting data set should be a permanent SAS dataset called Alldata and should be stored in your library MYDATA. The dataset Alldata will contain the variables - freg, subject, birdt, sex, weight at baseline (in kilograms), baseline date and age at baseline.
Using Alldata, create another permanent SAS dataset, Checkdata in the library Mydata. Checkdata, should contain observations which have missing values for some of the variables as well as observations containing obvious errors in a variable’s values (try and use the NMISS function for the quantitative variables). For now, the only obvious errors to worry about are negative ages. Print out Checkdata by treatment group, using reasonable labels for the variables subject, birdt, sex, weight at baseline, baseline date and age at baseline. Make sure to print out the labels.
EXERCISE 3 – Transposing SAS data sets
Suppose the data consist of growth measurements for 3 girls and 2 boys at ages 8, 10, 12 and 14. The program below will read in the data, where y1 is the measurement at age 8, y2 at age 10 and so forth. The data step also creates the age variable.
data growth;
input id sex $ y1 y2 y3 y4;
a1=8;
a2=10;
a3=12;
a4=14;
cards;
01 F 21 20 22 23
04 F 20 24 26 27
06 F 19 20 22 25
07 M 24 27 28 29
12 M 22 21 23 25
You want to draw a graph of these 5 growth curves on the same axes (don’t worry about distinguishing between males and females), so that the 4 growth measurements for each subject are connected with straight lines. You want to use SGPLOT to do this, but it requires the data be in the format below.
01 F 21 8
01 F 20 10
01 F 22 12
01 F 23 14
04 F 20 8
......
......
12 M 25 14
One way to do this is to first transpose y1 – y4 and then a1 – a4. If you then do a side-by-side merge with the two transposed data sets, you can get to this rearrangement of the data, but some renaming of variables will be necessary. Once done, you can read a little about SGPLOT to figure out how to plot the 5 growth curves on one set of axes. I haven’t fancied up the graph with labeling and legends, but the idea is to look like the graph below.