Home Work 2

Due by 10:00 pm, 1/17 (Sunday)

Note: Please read through the class document. Let me know for any questions.

1.  Use the following data (blue font part) as data for CARDS statement.

Write a SAS program, read the data into SAS dataset: work.NFL_score.

Use proc print to review the output data, make sure everything is correct.

Hint: When using ‘Cards;’ statement to read data, the INFILE statement can

use CARDS as file name . For example: INFILE CARDS;

The INFILE statement option sometimes is needed. (see bottom of page 13 and top of page 14 in Day1.doc)

1 2 3 4 5 6

12345678901234567890123456789012345678901234567890123456789012345678

Team Q1 Q2 Q3 Q4 Total W_L Status

Arizona 0 3 3 17 23 L

Baltimore 3 17 3 3 26 W Final

Indianapolis 7 7 13 3 30 W

Houston 7 3 0 14 24 L Final

Buffalo 7 0 0 0 7 L

New England 3 14 14 7 38 W Final

Detroit 7 14 0 0 21 L

Philadelphia 21 21 7 7 56 W Final

St. Louis 0 0 0 3 3 L

Tampa Bay 0 3 7 14 24 W Final

Cleveland 0 10 7 7 24 L

Oakland 3 13 7 3 26 W Final

Carolina 0 10 14 3 27 W

Atlanta 0 10 7 3 20 L Final

Dallas 0 3 14 17 34 W

Chicago 3 0 7 0 10 L Final

Tennessee 3 7 7 14 31 W

New Orleans 0 7 7 0 14 L Final

data work.NFL_score;

infile cards missover;

input Team $1-16 Q1 Q2 Q3 Q4 Total W_L $52-52 Status $57-61;

cards;

Arizona 0 3 3 17 23 L

Baltimore 3 17 3 3 26 W Final

Indianapolis 7 7 13 3 30 W

Houston 7 3 0 14 24 L Final

Buffalo 7 0 0 0 7 L

New England 3 14 14 7 38 W Final

Detroit 7 14 0 0 21 L

Philadelphia 21 21 7 7 56 W Final

St. Louis 0 0 0 3 3 L

Tampa Bay 0 3 7 14 24 W Final

Cleveland 0 10 7 7 24 L

Oakland 3 13 7 3 26 W Final

Carolina 0 10 14 3 27 W

Atlanta 0 10 7 3 20 L Final

Dallas 0 3 14 17 34 W

Chicago 3 0 7 0 10 L Final

Tennessee 3 7 7 14 31 W

New Orleans 0 7 7 0 14 L Final

;

run;

proc print data=work.NFL_score; run;

2.  Write your own program to read the following text file into SAS data set named work.bird_text. (See examples on page 13).

Input data: /courses/ddbf9765ba27fe300/bird_text.txt

Filename bird '/courses/ddbf9765ba27fe300/bird_text.txt';

data bird_text;

infile bird truncover;

input line $1-100;

run;

proc print data=bird_text; run;

3.  Write your own programs to read the following data into sas dataset.

Make sure to start with libname first.

·  Input: /courses/ddbf9765ba27fe300/stocks.sas7bdat

output: work.stocks

libname dd '/courses/ddbf9765ba27fe300';

data work.stocks;

set dd.stocks;

run;

·  Input: /courses/ddbf9765ba27fe300/C5_Cust_Ref.txt

Output: work.cust

filename cust '/courses/ddbf9765ba27fe300/C5_Cust_Ref.txt';

data work.cust;

infile cust truncover;

input ID $1-4 name $6-29 Address1 $31-50 Address2 $52-75;

run;

·  Input: /courses/ddbf9765ba27fe300/balance.sas7bdat

output: work.balance

libname dd '/courses/ddbf9765ba27fe300';

data work.balance;

set dd.balance;

run;

·  Input: sashelp.cars Note: sashelp is a system libref, you can use it directly.

Output: work.cars

data cars;

set sashelp.cars;

run;

·  For this one, write code without run.

Input: c:\classdata\sasclass\stocks.sas7bdat

Output: c:\test\stocks_backup.sas7bdat

Libname outd 'c:\test';

Libname ind 'c:\classdata\sasclass';

Data outd.stocks_backup;

Set ind.stocks;

Run;

4.  Run programs and fills in the blanks for the following.

a)  run code, check log and fill in the blanks; Also run proc contents to compare.

data geoexm; * data geoexm has _9_ variables and _20699_ rows;

set sashelp.geoexm;

run;

proc contents data= sashelp.geoexm; run;

Observations / 20699
Variables / 9

For questions bellow, don’t read log before fill in the blanks;

b)

data sample_1; * data sample_1 has _150_ observations;

set geoexm (obs=150);

run;

c)

data samp_2; * data sample_2 has _20490_ observations;

set geoexm (firstobs=210); * random dataset is created in question a);

run;

d)

data samp_3; * data sample_3 has __237__ observations;

set geoexm (firstobs=324 obs=560);

run;

e) *** don’t need to do this question. ;

data samp_4; * data sample_4 has approximately ______

observations;

set geoexm;

if rand<=0.10;

run;

5. Assume this is a large dataset. Very often, you only need a small part of the data for writing test code, or for specific analysis. That way, it is more efficient.

The following questions use /courses/ddbf9765ba27fe300/simulated_card_data.sas7bdat as input data. For each question writes one data step SAS program. Always try to be efficient.

1)  Write a program to find out how many rows and how many variables this data has.

libname dd '/courses/ddbf9765ba27fe300';

proc contents data=dd.simulated_card_data;

run;

The data has 4022 rows and 31 variables.

2)  Output: work.first100 Read only the first 50 rows.

libname dd '/courses/ddbf9765ba27fe300';

data work.first100;

set dd.simulated_card_data(obs=50);

run;

3)  Output: work.last100 Read the last 50 row.

*** Method 1; * Hard coded manually;

data work.last50;

set dd.simulated_card_data(firstobs=3973); * 4022 - 50 + 1 = 3973;

run;

*** Method 2; * Data driven, but not efficient;

data work.last50;

set dd.simulated_card_data nobs=totn;

if _N_ > totn-50;

run;

*** Method 3; * Data driven and efficient;

data work.last50;

firstn=totn-50 +1;

do i=firstn to totn;

set dd.simulated_card_data point=i nobs=totn;

If _error_ then abort;

Output;

end;

stop;

run;

4)  Output: work.Row2051 Read row 2001 to 2150

data work.Row2051;

set dd.simulated_card_data(firstobs=2001 obs=2050);

run;

5)  Output: work.cof_am only keep: acct_id offer fico_seg chargeoff_am

data work.cof_am;

set dd.simulated_card_data(keep=acct_id offer fico_seg chargeoff_am);

run;

6)  Output: work.balance Only keep: acct_id offer fico_seg balance1 to balance12

data work.balance;

set dd.simulated_card_data(keep=acct_id offer fico_seg balance1-balance12);

run;

7)  Output: work.Low_fico Only select rows with offer='Offer 2' and fico<=600

Use where statement.

data work.Low_fico;

set dd.simulated_card_data;

where offer='Offer 2' and fico<=600;

run;

8)  Output: work.Low_fico Only select rows with offer='Offer 2' and fico<=600

Use if statement.

data work.Low_fico;

set dd.simulated_card_data;

if offer='Offer 2' and fico<=600;

run;

9)  Output: work.COF Only select observations that chargeoff_am not equals missing, and only keep variables: acct_id, offer, fico, fico_seg and chargeoff_am.

data work.COF;

set dd.simulated_card_data(keep=acct_id offer fico fico_seg chargeoff_am);

where chargeoff_am ne .;

run;

10) Output: work.mean_bal Create a new variable mean_bal. It’s value equals mean of

balance1 to balance12

data work.mean_bal;

set dd.simulated_card_data;

mean_bal = mean(of balance1-balance12); * method 1;

mean_bal2 = mean(balance1,balance2,balance3,balance4,balance5,balance6,balance7,balance8,balance9,balance10,balance11,balance12); * method 2;

run;