Home Work 5

Due by 11:00 pm, 2/20(Saturday)

Note: The array home work is significantly harder. Try to start early.

Let me know for any questions. Make sure you run the code in SAS.

1. Input data: /courses/ddbf9765ba27fe300/simulated_card_data.sas7bdat

Output data: work.tt.

a)In the output data, create a new variable COF_Month, it’s value is the month of the account charged off. For example: if the chargeoff1 to chargeoff 9=0, and chargeoff 10-charge12=1 then COF_Month=10. For account that never charge off up to month 12, make COF_Month=. ;

b)In the same data step, create another variable CUM_COF. If any of the chargeoff1 to chargeoff12 equals 1, then CUM_COF=1. Else, CUM_COF=0.

2.. Run the following SAS program to create a sas dataset. In the data, variable DAY’s value is ranged from integer 1 to 7. Browser the data: week_day.

data week_day;

do i=1to100;

day=int(ranuni(54)*7)+1;

output;

end;

run;

a) Use above week_day as input data, create a new dataset temp1, use IF ... THEN look up method to create variable WeekDay according to following.

* Method 1: Using If ... then lookup;

Day=1 WeekDay=’Sun’, day=2 WeekDay=’Mon’, ... day=7 WeekDay=’Sat’.

b)Complete the following program that use ARRY look up method to create variable WeekDay, and complete the unfinished array statement and array look statement.

* Method 2: Using array lookup;

data temp2;

array wd(7) ______* Initialize array;

set week_day;

length WeekDay $3;

WeekDay= ______; * perform array lookup;

run;

c) Write a PROC FORMAT for using PROC FORMAT look up method to create variable WeekDay.

* Method 3: Using proc format lookup; First, write a proc format step.

Proc format specification: format name: wkd,

Format: 1 format to ‘Sun’, 2 format to ‘Mon’, .... 7 format to ‘Sat’.

Then study the following program, and complete the lookup statement using put() function:

data temp3;

set week_day;

length WeekDay $3;

WeekDay= ______; *** Lookup statement;

run;

d) Are the above three methods generate the same results?

3. a) Write a SAS program, sort the following SAS data by TYPE, ORDER.

Input: /courses/ddbf9765ba27fe300/sort_data.sas7bdat

Output: work.sort_data1

Then browser the output.

b) Write a SAS program, sort the following SAS data by TYPE, ORDER and

descending BALANCE.

Input: /courses/ddbf9765ba27fe300/sort_data.sas7bdat

Output: work.sort_data2

Then browser the output.

c)Write a SAS program, sort the following SAS data by TYPE, ORDER,

And use NODUPKEY option.

Input: /courses/ddbf9765ba27fe300/sort_data.sas7bdat

Output: work.sort_data3

Then browser the output.

c)This data has ______TYPES?

d)This data has ______ORDER?

e)Data work.sort_data3 has ______observations?

f)If we sort this data by TYPE only with nodupkey option, the output data will

have ______observations.

g)Using c:\sas_class\classdata\sort_data.sas7bdat as input data, create a data max_bal. It contains only rows of maximum bal for each by group for TYPE AND ORDER like the following:

id type order bal

0292 A 1 996

0328 A 2 995

0360 A 3 979

0015 A 4 987

0115 B 1 921

0107 B 2 980

0081 B 3 969

0129 B 4 973

0232 C 1 996

0059 C 2 985

0201 C 3 983

241 C 4 965

4.In a risk analysis data, there are two by variables, Risk (risk level) and Revol (revolve rate). The data was sorted by risk revol like the following. When the data is set by risk revol, fill in the internal variable values for the following table ( <Tab> Key may be helpful).

Note: The program is for illustration, it cannot be run.

Proc sort data=inlib.indata out=indata;

by risk revol;

run;

data temp;

set indata;

by risk revol;

run;

======

Risk Revol First.Risk Last.Risk First.Revol Last.Revol

High Heavy

High Heavy

High Heavy

High Low

High Middle

High Middle

Low Heavy

Low Low

Low Low

Low Low

Low Middle

5. Run the following program, generate SAS dataset work.bal_delq.

Create a new sas dataset work.get_roll_rate. In this data, read the

work.bal_delq in and add a new variable: Roll_Rate.

Roll Rate = delq_bal / Balance of previous month;

(hint: use retain statement).

data bal_delq;

input date Balance delq_bal;

cards;

200801 234554 .

200802 453635 7505.73

200803 546782 15423.59

200804 678597 21324.50

200805 123454 29179.67

200806 653245 5555.43

200807 456729 44290.11

;

run;

6.Run the following program, browse the output data work.temp in SAS Studio, and also run a proc contents. (Following the examples in the class document).

data temp(keep=group balance);

array gr(4) $6. ('GRP_A', 'GRP_B', 'GRP_C', 'GRP_D');

do i=1to100;

rand=int(4*ranuni(86))+1;

group=gr(rand);

balance=int(ranuni(678)*200)+100;

output;

end;

run;

a)Sort data work.temp by group, output over-write the input.

b)Write a SAS program to calculate the total sum of balance and total count N of the dataUSING method you chose.

c) Write a SAS program to calculate the sum of balance of each group and count N of each group USING RETAIN statement method.

c)Write a SAS program to calculate the sum of balance of each group and count N of each group without USING RETAIN statement.

7. Optional for extral credit. (This question has to be done independently)

Use the SAS dataset /courses/ddbf9765ba27fe300/simulated_card_data.sas7bdat as input dataset.

a)..Sort the simulated_cdat.card_data by OFFER and FICO_SEG, make output data name as work.Card.

b) Use Card data calculate group sum vertically by offer and fico_seg for variables Balance1 – Balance12. Name the sum variables as : sumbal1 – sumbal12. Name the output data as get_sum. Only keep the last row of each by group (Do Not UsePROC SUMMARY, PROC MEANS, OR PROC SQL).

c) Use one to many merge, merge get_sum data with Card data, names the output data as

final. In the same merge data step, calculate grp_pct1 – grp_pct12,

grp_pct(i) = bal(i) / sumbal(i); format grp_pct1 – grp_pct12 as percent8.4.

d). Run the following SAS code, in the output window, if the Sum column are not

1.0000000 then checks your code.

proc means data=final nway sum;

class offer fico_seg;

var grp_pct1-grp_pct12;

run;