Economics 538 Project Spring 2012 - Due day of final

Hastie-Tibshirani-Friedman (2009, 122) describe the South African Heart Disease Data which can be loaded with

b34sexec options ginclude(H_T_F_data.mac) member(SA_HEART); b34srun;

Information in the file is

b34sexec data heading('South Africa Heart Disease Data');

/;A retrospective sample of males in a heart-disease high-risk region

/;of the Western Cape, South Africa. There are roughly two controls per

/;case of CHD. Many of the CHD positive men have undergone blood

/;pressure reduction treatment and other programs to reduce their risk

/;factors after their CHD event. In some cases the measurements were

/;made after these treatments. These data are taken from a larger

/;dataset, described in Rousseauw et al, 1983, South African Medical

/;Journal.

/;

* Detail in H_F_T (2009) page 122;

label sbp ='systolic blood pressure ';

label tobacco ='cumulative tobacco (kg) ';

label ldl ='low densiity lipoprotein cholesterol ';

label adiposit =' ';

label famhist ='F history of heart disease (yes=1, ';

label typea ='type-A behavior ';

label obesity =' ';

label alcohol ='current alcohol consumption ';

label age ='age at onset ';

label chd ='response, coronary heart disease ';

input row sbp tobacco ldl adiposit famhist typea

obesity alcohol age chd;

Means obtained will be

Variable Label # Cases Mean Std. Dev. Variance Maximum Minimum

ROW 1 462 231.935 133.939 17939.5 463.000 1.00000

SBP 2 systolic blood pressure 462 138.327 20.4963 420.099 218.000 101.000

TOBACCO 3 cumulative tobacco (kg) 462 3.63565 4.59302 21.0959 31.2000 0.00000

LDL 4 low densiity lipoprotein cholesterol 462 4.74032 2.07091 4.28866 15.3300 0.980000

ADIPOSIT 5 462 25.4067 7.78070 60.5393 42.4900 6.74000

FAMHIST 6 F history of heart disease (yes=1, 462 0.415584 0.493357 0.243401 1.00000 0.00000

TYPEA 7 type-A behavior 462 53.1039 9.81753 96.3840 78.0000 13.0000

OBESITY 8 462 26.0441 4.21368 17.7551 46.5800 14.7000

ALCOHOL 9 current alcohol consumption 462 17.0444 24.4811 599.322 147.190 0.00000

AGE 10 age at onset 462 42.8160 14.6090 213.422 64.0000 15.0000

CHD 11 response, coronary heart disease 462 0.346320 0.476313 0.226874 1.00000 0.00000

CONSTANT 12 462 1.00000 0.00000 0.00000 1.00000 1.00000

This dataset has been used to analyze the effect of cumulative tobacco exposure on coronary heart disease.

1. Use logit, probit and OLS to analyze chd=f(tobacco, ). H_T_F used a stepwise approach to obtain a model chg=(tobacco,ldl,famhist,age). Experiment with other variables such as obesity, sbp and alcohol. You may find that sbp and obesity are significant is some specifications but not others, why?

2. Using OLS, PROBIT, RCOVER, RDA, PPREG and RANFOREST repeat the analysis and report and discuss the confusion matrix that is reported. RDA might requite :mxt 10000000. Space for B34S will have to be increased. Note that some models require left hand sides are 0-1 some require they are 1-2

3. Using ppreg, mars, gam, olsq and ranforest develop models for sbp and obesity. Include tobacco in your model. The focus of your analysis is on the effect of cumulative tobacco use but discuss other variable findings.

Software Help. For problem 2 use the setup in problem set # 5 but be sure and recode _argsg if you have any 0-1 variables on the right.

Software from table 17.9 in Chapter 17 listed below is useful for problem 3. Problem 1 setups are in chapter 3.

Notes:

/;

/; Murder Data estimated with:

/; OLS - PROBIT - RCOVER - RDA - PPREG - RANFOREST

/;

b34sexec options ginclude('b34sdata.mac') macro(murder)$

b34seend$

b34sexec matrix;

call loaddata;

call load(mvconfus :wbsuppl);

call echooff;

cases=dfloat(integers(0,1));

call olsq(d1 t y lf nw :print );

call mvconf2(%y,%yhat,cases,

'Tests on Murder Data using OLS Model',cfm,1);

call probit(d1 t y lf nw :print );

call tabulate(%names,%lag,%coef,%se,%t);

call mvconf2(%y,%yhat,cases,

'Tests on Murder Data using probit Model',cfm,1);

call rcover(d1 t y lf nw :print );

call mvconf2(%y,%yhat,cases,

'Tests on Murder Data using rcover Model kd=10',

cfm,1);

/; recode d1 1-2

d11=d1+1.;

call rda(d11 t y lf nw :nk 2 :print );

%yhat_1=%yhat-1.;

call mvconf2(d1,%yhat_1,cases,'Tests on Murder Data using rda Model',

cfm,1);

call ppreg(d11 t y lf nw :class 2 :print);

%yhat_1=%yhat-1.;

call mvconf2(d1,%yhat_1,cases,'Tests on Murder Data using ppreg Model',

cfm,1);

call print('Tests on Murder Data using Random Forest Model':);

call ranforest(d11 t y lf nw :class 2

:print :maxtree 20 :vote_yhat);

b34srun;

1