Home Work 2
Due by 10:00 pm, 1/17 (Sunday)
Note: Please read through the class document. Let me know for any questions.
1. Use the following data (blue font part) as data for CARDS statement.
Write a SAS program, read the data into SAS dataset: work.NFL_score.
Use proc print to review the output data, make sure everything is correct.
Hint: When using ‘Cards;’ statement to read data, the INFILE statement can
use CARDS as file name . For example: INFILE CARDS;
The INFILE statement option sometimes is needed. (see bottom of page 13 and top of page 14 in Day1.doc)
1 2 3 4 5 6
12345678901234567890123456789012345678901234567890123456789012345678
Team Q1 Q2 Q3 Q4 Total W_L Status
Arizona 0 3 3 17 23 L
Baltimore 3 17 3 3 26 W Final
Indianapolis 7 7 13 3 30 W
Houston 7 3 0 14 24 L Final
Buffalo 7 0 0 0 7 L
New England 3 14 14 7 38 W Final
Detroit 7 14 0 0 21 L
Philadelphia 21 21 7 7 56 W Final
St. Louis 0 0 0 3 3 L
Tampa Bay 0 3 7 14 24 W Final
Cleveland 0 10 7 7 24 L
Oakland 3 13 7 3 26 W Final
Carolina 0 10 14 3 27 W
Atlanta 0 10 7 3 20 L Final
Dallas 0 3 14 17 34 W
Chicago 3 0 7 0 10 L Final
Tennessee 3 7 7 14 31 W
New Orleans 0 7 7 0 14 L Final
data work.NFL_score;
infile cards missover;
input Team $1-16 Q1 Q2 Q3 Q4 Total W_L $52-52 Status $57-61;
cards;
Arizona 0 3 3 17 23 L
Baltimore 3 17 3 3 26 W Final
Indianapolis 7 7 13 3 30 W
Houston 7 3 0 14 24 L Final
Buffalo 7 0 0 0 7 L
New England 3 14 14 7 38 W Final
Detroit 7 14 0 0 21 L
Philadelphia 21 21 7 7 56 W Final
St. Louis 0 0 0 3 3 L
Tampa Bay 0 3 7 14 24 W Final
Cleveland 0 10 7 7 24 L
Oakland 3 13 7 3 26 W Final
Carolina 0 10 14 3 27 W
Atlanta 0 10 7 3 20 L Final
Dallas 0 3 14 17 34 W
Chicago 3 0 7 0 10 L Final
Tennessee 3 7 7 14 31 W
New Orleans 0 7 7 0 14 L Final
;
run;
proc print data=work.NFL_score; run;
2. Write your own program to read the following text file into SAS data set named work.bird_text. (See examples on page 13).
Input data: /courses/ddbf9765ba27fe300/bird_text.txt
Filename bird '/courses/ddbf9765ba27fe300/bird_text.txt';
data bird_text;
infile bird truncover;
input line $1-100;
run;
proc print data=bird_text; run;
3. Write your own programs to read the following data into sas dataset.
Make sure to start with libname first.
· Input: /courses/ddbf9765ba27fe300/stocks.sas7bdat
output: work.stocks
libname dd '/courses/ddbf9765ba27fe300';
data work.stocks;
set dd.stocks;
run;
· Input: /courses/ddbf9765ba27fe300/C5_Cust_Ref.txt
Output: work.cust
filename cust '/courses/ddbf9765ba27fe300/C5_Cust_Ref.txt';
data work.cust;
infile cust truncover;
input ID $1-4 name $6-29 Address1 $31-50 Address2 $52-75;
run;
· Input: /courses/ddbf9765ba27fe300/balance.sas7bdat
output: work.balance
libname dd '/courses/ddbf9765ba27fe300';
data work.balance;
set dd.balance;
run;
· Input: sashelp.cars Note: sashelp is a system libref, you can use it directly.
Output: work.cars
data cars;
set sashelp.cars;
run;
· For this one, write code without run.
Input: c:\classdata\sasclass\stocks.sas7bdat
Output: c:\test\stocks_backup.sas7bdat
Libname outd 'c:\test';
Libname ind 'c:\classdata\sasclass';
Data outd.stocks_backup;
Set ind.stocks;
Run;
4. Run programs and fills in the blanks for the following.
a) run code, check log and fill in the blanks; Also run proc contents to compare.
data geoexm; * data geoexm has _9_ variables and _20699_ rows;
set sashelp.geoexm;
run;
proc contents data= sashelp.geoexm; run;
Observations / 20699Variables / 9
For questions bellow, don’t read log before fill in the blanks;
b)
data sample_1; * data sample_1 has _150_ observations;
set geoexm (obs=150);
run;
c)
data samp_2; * data sample_2 has _20490_ observations;
set geoexm (firstobs=210); * random dataset is created in question a);
run;
d)
data samp_3; * data sample_3 has __237__ observations;
set geoexm (firstobs=324 obs=560);
run;
e) *** don’t need to do this question. ;
data samp_4; * data sample_4 has approximately ______
observations;
set geoexm;
if rand<=0.10;
run;
5. Assume this is a large dataset. Very often, you only need a small part of the data for writing test code, or for specific analysis. That way, it is more efficient.
The following questions use /courses/ddbf9765ba27fe300/simulated_card_data.sas7bdat as input data. For each question writes one data step SAS program. Always try to be efficient.
1) Write a program to find out how many rows and how many variables this data has.
libname dd '/courses/ddbf9765ba27fe300';
proc contents data=dd.simulated_card_data;
run;
The data has 4022 rows and 31 variables.
2) Output: work.first100 Read only the first 50 rows.
libname dd '/courses/ddbf9765ba27fe300';
data work.first100;
set dd.simulated_card_data(obs=50);
run;
3) Output: work.last100 Read the last 50 row.
*** Method 1; * Hard coded manually;
data work.last50;
set dd.simulated_card_data(firstobs=3973); * 4022 - 50 + 1 = 3973;
run;
*** Method 2; * Data driven, but not efficient;
data work.last50;
set dd.simulated_card_data nobs=totn;
if _N_ > totn-50;
run;
*** Method 3; * Data driven and efficient;
data work.last50;
firstn=totn-50 +1;
do i=firstn to totn;
set dd.simulated_card_data point=i nobs=totn;
If _error_ then abort;
Output;
end;
stop;
run;
4) Output: work.Row2051 Read row 2001 to 2150
data work.Row2051;
set dd.simulated_card_data(firstobs=2001 obs=2050);
run;
5) Output: work.cof_am only keep: acct_id offer fico_seg chargeoff_am
data work.cof_am;
set dd.simulated_card_data(keep=acct_id offer fico_seg chargeoff_am);
run;
6) Output: work.balance Only keep: acct_id offer fico_seg balance1 to balance12
data work.balance;
set dd.simulated_card_data(keep=acct_id offer fico_seg balance1-balance12);
run;
7) Output: work.Low_fico Only select rows with offer='Offer 2' and fico<=600
Use where statement.
data work.Low_fico;
set dd.simulated_card_data;
where offer='Offer 2' and fico<=600;
run;
8) Output: work.Low_fico Only select rows with offer='Offer 2' and fico<=600
Use if statement.
data work.Low_fico;
set dd.simulated_card_data;
if offer='Offer 2' and fico<=600;
run;
9) Output: work.COF Only select observations that chargeoff_am not equals missing, and only keep variables: acct_id, offer, fico, fico_seg and chargeoff_am.
data work.COF;
set dd.simulated_card_data(keep=acct_id offer fico fico_seg chargeoff_am);
where chargeoff_am ne .;
run;
10) Output: work.mean_bal Create a new variable mean_bal. It’s value equals mean of
balance1 to balance12
data work.mean_bal;
set dd.simulated_card_data;
mean_bal = mean(of balance1-balance12); * method 1;
mean_bal2 = mean(balance1,balance2,balance3,balance4,balance5,balance6,balance7,balance8,balance9,balance10,balance11,balance12); * method 2;
run;