Home Work 5
Due by 11:00 pm, 2/20(Saturday)
Note: The array home work is significantly harder. Try to start early.
Let me know for any questions. Make sure you run the code in SAS.
1. Input data: /courses/ddbf9765ba27fe300/simulated_card_data.sas7bdat
Output data: work.tt.
a)In the output data, create a new variable COF_Month, it’s value is the month of the account charged off. For example: if the chargeoff1 to chargeoff 9=0, and chargeoff 10-charge12=1 then COF_Month=10. For account that never charge off up to month 12, make COF_Month=. ;
b)In the same data step, create another variable CUM_COF. If any of the chargeoff1 to chargeoff12 equals 1, then CUM_COF=1. Else, CUM_COF=0.
2.. Run the following SAS program to create a sas dataset. In the data, variable DAY’s value is ranged from integer 1 to 7. Browser the data: week_day.
data week_day;
do i=1to100;
day=int(ranuni(54)*7)+1;
output;
end;
run;
a) Use above week_day as input data, create a new dataset temp1, use IF ... THEN look up method to create variable WeekDay according to following.
* Method 1: Using If ... then lookup;
Day=1 WeekDay=’Sun’, day=2 WeekDay=’Mon’, ... day=7 WeekDay=’Sat’.
b)Complete the following program that use ARRY look up method to create variable WeekDay, and complete the unfinished array statement and array look statement.
* Method 2: Using array lookup;
data temp2;
array wd(7) ______* Initialize array;
set week_day;
length WeekDay $3;
WeekDay= ______; * perform array lookup;
run;
c) Write a PROC FORMAT for using PROC FORMAT look up method to create variable WeekDay.
* Method 3: Using proc format lookup; First, write a proc format step.
Proc format specification: format name: wkd,
Format: 1 format to ‘Sun’, 2 format to ‘Mon’, .... 7 format to ‘Sat’.
Then study the following program, and complete the lookup statement using put() function:
data temp3;
set week_day;
length WeekDay $3;
WeekDay= ______; *** Lookup statement;
run;
d) Are the above three methods generate the same results?
3. a) Write a SAS program, sort the following SAS data by TYPE, ORDER.
Input: /courses/ddbf9765ba27fe300/sort_data.sas7bdat
Output: work.sort_data1
Then browser the output.
b) Write a SAS program, sort the following SAS data by TYPE, ORDER and
descending BALANCE.
Input: /courses/ddbf9765ba27fe300/sort_data.sas7bdat
Output: work.sort_data2
Then browser the output.
c)Write a SAS program, sort the following SAS data by TYPE, ORDER,
And use NODUPKEY option.
Input: /courses/ddbf9765ba27fe300/sort_data.sas7bdat
Output: work.sort_data3
Then browser the output.
c)This data has ______TYPES?
d)This data has ______ORDER?
e)Data work.sort_data3 has ______observations?
f)If we sort this data by TYPE only with nodupkey option, the output data will
have ______observations.
g)Using c:\sas_class\classdata\sort_data.sas7bdat as input data, create a data max_bal. It contains only rows of maximum bal for each by group for TYPE AND ORDER like the following:
id type order bal
0292 A 1 996
0328 A 2 995
0360 A 3 979
0015 A 4 987
0115 B 1 921
0107 B 2 980
0081 B 3 969
0129 B 4 973
0232 C 1 996
0059 C 2 985
0201 C 3 983
241 C 4 965
4.In a risk analysis data, there are two by variables, Risk (risk level) and Revol (revolve rate). The data was sorted by risk revol like the following. When the data is set by risk revol, fill in the internal variable values for the following table ( <Tab> Key may be helpful).
Note: The program is for illustration, it cannot be run.
Proc sort data=inlib.indata out=indata;
by risk revol;
run;
data temp;
set indata;
by risk revol;
run;
======
Risk Revol First.Risk Last.Risk First.Revol Last.Revol
High Heavy
High Heavy
High Heavy
High Low
High Middle
High Middle
Low Heavy
Low Low
Low Low
Low Low
Low Middle
5. Run the following program, generate SAS dataset work.bal_delq.
Create a new sas dataset work.get_roll_rate. In this data, read the
work.bal_delq in and add a new variable: Roll_Rate.
Roll Rate = delq_bal / Balance of previous month;
(hint: use retain statement).
data bal_delq;
input date Balance delq_bal;
cards;
200801 234554 .
200802 453635 7505.73
200803 546782 15423.59
200804 678597 21324.50
200805 123454 29179.67
200806 653245 5555.43
200807 456729 44290.11
;
run;
6.Run the following program, browse the output data work.temp in SAS Studio, and also run a proc contents. (Following the examples in the class document).
data temp(keep=group balance);
array gr(4) $6. ('GRP_A', 'GRP_B', 'GRP_C', 'GRP_D');
do i=1to100;
rand=int(4*ranuni(86))+1;
group=gr(rand);
balance=int(ranuni(678)*200)+100;
output;
end;
run;
a)Sort data work.temp by group, output over-write the input.
b)Write a SAS program to calculate the total sum of balance and total count N of the dataUSING method you chose.
c) Write a SAS program to calculate the sum of balance of each group and count N of each group USING RETAIN statement method.
c)Write a SAS program to calculate the sum of balance of each group and count N of each group without USING RETAIN statement.
7. Optional for extral credit. (This question has to be done independently)
Use the SAS dataset /courses/ddbf9765ba27fe300/simulated_card_data.sas7bdat as input dataset.
a)..Sort the simulated_cdat.card_data by OFFER and FICO_SEG, make output data name as work.Card.
b) Use Card data calculate group sum vertically by offer and fico_seg for variables Balance1 – Balance12. Name the sum variables as : sumbal1 – sumbal12. Name the output data as get_sum. Only keep the last row of each by group (Do Not UsePROC SUMMARY, PROC MEANS, OR PROC SQL).
c) Use one to many merge, merge get_sum data with Card data, names the output data as
final. In the same merge data step, calculate grp_pct1 – grp_pct12,
grp_pct(i) = bal(i) / sumbal(i); format grp_pct1 – grp_pct12 as percent8.4.
d). Run the following SAS code, in the output window, if the Sum column are not
1.0000000 then checks your code.
proc means data=final nway sum;
class offer fico_seg;
var grp_pct1-grp_pct12;
run;