Week 3/4 [06+ Sept.] Class Activities
File: week-03-04-10sep07.doc
Directory: \\Muserver2\USERS\B\\baileraj\Classes\sta402\handouts
Week 3 Topic -- REPORT WRITING
* Introduce the Output Delivery System (ODS) for customizing procedure output
* PROC TABULATE for producing nicely-formatted tables
Week 4 Topic – INTRODUCTION TO MODELING PROCS
* REG and GLM primarily
Bonus Material – conversational UNIX
ODS References
Gupta, S. (2003) Quick Results with the Output Delivery System. SAS Institute Inc., Cary, NC USA.
Delwiche LD and Slaughter SJ. (2003) The Little SAS Book: A Primer, 3rd edition. SAS Institute. Cary, NC, USA. [pages 144-157]
Haworth LE (2001) Output Delivery System: The Basics. SAS Institute Inc. Cary, NC USA.
ODS Basics
What is ODS?
* method of delivering output in a variety of formats (other than the default “listing” format”)
* options available include HTML, Rich Text Format (RTF), PS, PDF, SAS data sets
Basic ODS Terminology
“destinations” – locations to which ODS routes output (e.g. LISTING, HTML, RTF, PRINTER, PDF, OUTPUT – new data set)
“objects” – output entities created by ODS to store the formatted results
“styles” – font/color/other attributes of a report
Basic syntax of ODS statements
* identify output objects;
ODS TRACE ON </options>;
* open output destination;
ODS destination <FILE=filename>;
* create SAS data set with output object;
ODS OUTPUT output-object-name=SAS-data-set-name;
* [optional] select particular objects for inclusion;
ODS <destination> SELECT output-object-name;
PROC …
PROC …
PROC …
ODS <destination> CLOSE;
ODS TRACE OFF;
ODS to different file types
A familiar example
proc format;
value totfmt 0='none'
1-HIGH='some'
;
data d1;
infile "\\Muserver2\USERS\B\BAILERAJ\public.www\classes\sta402\SAS-programs\ch2-dat.txt" firstobs=16 expandtabs missover pad ;
* infile 'M:\public.www\classes\sta402\SAS-programs\ch2-dat.txt' firstobs=16 expandtabs missover pad ;
input @9 animal 2.
@17 conc 3.
@25 brood1 2.
@33 brood2 2.
@41 brood3 2.
@49 total 2.;
cbrood3 = brood3;
format cbrood3 totfmt.;
label animal = animal ID number;
label conc = Nitrofen concentration;
label brood1 = number of young in first brood;
label brood2 = number of young in 2nd brood;
label brood3 = number of young in 3rd brood;
label total = total young produced in three broods;
proc print;
where conc=0;
run;
/* aside: ODS LISTING open as a default.
You can have multiple destinations open simultaneously.
If you want to close the LISTING destination before
generating output then type ODS LISTING CLOSE; before
issuing the PROC for which output is desired.
*/
/* generate HTML files with objects from 3 PROCs */
ODS TRACE ON;
* ODS HTML file='M:\public.www\classes\sta402\SAS-programs\day6-example.html’;
ODS HTML file="\\Muserver2\USERS\B\BAILERAJ\public.www\classes\sta402\SAS-programs\ODS-HTML-example.html”;
proc plot;
plot total*conc=cbrood3 / vaxis=0 to 40 by 2;
run;
proc freq;
table conc*cbrood3 / nopct nocol chisq trend exact;
run;
proc univariate plot; by conc;
var total;
run;
ODS HTML CLOSE;
ODS TRACE OFF;
/* now generate HTML files with additional linkage info */
ODS TRACE ON;
ODS HTML path='\\Muserver2\USERS\B\BAILERAJ\public.www\classes\sta402\SAS-programs’
body = ’day6-example2.html’ /* Output objects */
contents = ‘day6-example2-TOC.html’ /* Table of contents */
frame = ‘day6-example2-frame.html’ /* organizes display */
newfile = NONE; /* all results to one file*/
/* old code where M drive referenced vs. specification of the full path
ODS HTML path='M:\public.www\classes\sta402\SAS-programs’
body = ’day6-example2.html’ /* Output objects */
contents = ‘day6-example2-TOC.html’ /* Table of contents */
frame = ‘day6-example2-frame.html’ /* organizes display */
newfile = NONE; /* all results to one file*/
*/
/* comment: by default, opens a new body file for each part of output so the
“newfile=NONE” directs all output to the same body file
newfile=PAGE – creates new body file for each page of output
*/
proc plot;
plot total*conc=cbrood3 / vaxis=0 to 40 by 2;
run;
proc freq;
table conc*cbrood3 / nopct nocol chisq trend exact;
run;
proc univariate plot; by conc;
var total;
run;
ODS HTML CLOSE;
ODS TRACE OFF;
/* select on one of the output objects for inclusion */
*ODS HTML file='M:\public.www\classes\sta402\SAS-programs\day6-example3.html’;
ODS HTML file=”\\Muserver2\USERS\B\BAILERAJ\public.www\classes\sta402\SAS-programs\day6-example3.html”;
ODS HTML SELECT SSPLOTS;
ODS HTML SHOW; /* write details to SASLOG confirming object sel. */
proc univariate plot; by conc;
var total;
run;
ODS HTML CLOSE;
/* select different destinations */
options orientation=landscape nocenter nodate;
ODS ESCAPECHAR= “^”; /* for fancy formatting later */
/* old program with M drive reference
ODS RTF file='M:\public.www\classes\sta402\SAS-programs\day6-example.rtf’;
ODS PDF file='M:\public.www\classes\sta402\SAS-programs\day6-example.pdf’;
ODS PS file='M:\public.www\classes\sta402\SAS-programs\day6-example.ps’;
*/
ODS RTF file='\\Muserver2\USERS\B\BAILERAJ\public.www\classes\sta402\SAS-programs\day6-example.rtf’;
ODS PDF file='\\Muserver2\USERS\B\BAILERAJ\public.www\classes\sta402\SAS-programs\day6-example.pdf’;
ODS PS file='\\Muserver2\USERS\B\BAILERAJ\public.www\classes\sta402\SAS-programs\day6-example.ps’;
Title ‘Plot of number of young vs. Nitrofen concentration^{super a}’;
Footnote1 ‘^{super a}s=some young produced in Brood 3, n= no young produced in Brood 3’;
proc plot;
plot total*conc=cbrood3 / vaxis=0 to 40 by 2;
run;
ODS RTF CLOSE;
ODS PDF CLOSE;
ODS PS CLOSE;
ODS to create output data sets
proc sort data=d1; by conc;
ODS TRACE ON; /* see what ODS objects are created by univariate */
proc univariate data=d1; by conc;
var total;
run;
ODS TRACE OFF;
ODS OUTPUT Quantiles=data_quant; /* extract quantiles */
proc univariate data=d1; by conc;
var total;
run;
ODS OUTPUT CLOSE;
proc print data=data_quant;
run;
Var
Obs conc Name Quantile Estimate
1 0 total 100% Max 36.0
2 0 total 99% 36.0
3 0 total 95% 36.0
4 0 total 90% 35.0
5 0 total 75% Q3 34.0
6 0 total 50% Median 32.5
7 0 total 25% Q1 30.0
8 0 total 10% 25.5
9 0 total 5% 24.0
10 0 total 1% 24.0
11 0 total 0% Min 24.0
...... edited output ......
45 310 total 100% Max 15.0
46 310 total 99% 15.0
47 310 total 95% 15.0
48 310 total 90% 11.0
49 310 total 75% Q3 6.0
50 310 total 50% Median 6.0
51 310 total 25% Q1 5.0
52 310 total 10% 2.0
53 310 total 5% 0.0
54 310 total 1% 0.0
55 310 total 0% Min 0.0
/* can also create multiple data sets */
ODS OUTPUT Quantiles(MATCH_ALL=conc_name_macro)=data_quant;
proc univariate data=d1; by conc;
var total;
run;
ODS OUTPUT CLOSE;
proc print data=data_quant;
run;
from the SAS LOG file
NOTE: The data set WORK.DATA_QUANT has 11 observations and 4 variables.
NOTE: The above message was for the following by-group:
Nitrofen concentration=0
NOTE: The data set WORK.DATA_QUANT1 has 11 observations and 4 variables.
NOTE: The above message was for the following by-group:
Nitrofen concentration=80
NOTE: The data set WORK.DATA_QUANT2 has 11 observations and 4 variables.
NOTE: The above message was for the following by-group:
Nitrofen concentration=160
NOTE: The data set WORK.DATA_QUANT3 has 11 observations and 4 variables.
NOTE: The above message was for the following by-group:
Nitrofen concentration=235
NOTE: The data set WORK.DATA_QUANT4 has 11 observations and 4 variables.
NOTE: The above message was for the following by-group:
Nitrofen concentration=310
/* write the data set names to the SAS LOG */
%put The conc_name_macro variables contains the following data sets &conc_name_macro;
76 %put The conc_name_macro variables contains the following data sets &conc_name_macro;
The conc_name_macro variables contains the following data sets DATA_QUANT DATA_QUANT1 DATA_QUANT2
DATA_QUANT3 DATA_QUANT4&conc_name_macro;
/* merge the concentration summary files to create single table */
data c0; set DATA_QUANT;
rename Estimate=C0_Est; key=_n_; drop VarName conc;
data c80; set DATA_QUANT1;
rename Estimate=C80_Est; key=_n_; drop VarName conc;
data c160; set DATA_QUANT2;
rename Estimate=C160_Est; key=_n_; drop VarName conc;
data c235; set DATA_QUANT3;
rename Estimate=C235_Est; key=_n_; drop VarName conc;
data c310; set DATA_QUANT4;
rename Estimate=C310_Est; key=_n_; drop VarName conc;
data all;
merge c0 c80 c160 c235 c310; by key;
drop key;
proc print data=all;
run;
Obs Quantile C0_Est C80_Est C160_Est C235_Est C310_Est
1 100% Max 36.0 36.0 31.0 27.0 15
2 99% 36.0 36.0 31.0 27.0 15
3 95% 36.0 36.0 31.0 27.0 15
4 90% 35.0 35.5 30.5 25.0 11
5 75% Q3 34.0 33.0 30.0 21.0 6
6 50% Median 32.5 32.5 29.0 16.5 6
7 25% Q1 30.0 29.0 27.0 13.0 5
8 10% 25.5 26.5 24.5 9.5 2
9 5% 24.0 26.0 23.0 7.0 0
10 1% 24.0 26.0 23.0 7.0 0
11 0% Min 24.0 26.0 23.0 7.0 0
/* extract the rows-observations corresponding to the 5 number summary */
data fivenum; set all;
if _n_=1 or _n_=5 or _n_=6 or _n_=7 or _n_=11;
proc print;
run;
Obs Quantile C0_Est C80_Est C160_Est C235_Est C310_Est
1 100% Max 36.0 36.0 31.0 27.0 15
2 75% Q3 34.0 33.0 30.0 21.0 6
3 50% Median 32.5 32.5 29.0 16.5 6
4 25% Q1 30.0 29.0 27.0 13.0 5
5 0% Min 24.0 26.0 23.0 7.0 0
Using ODS OUTPUT to create dataset in a simulation
/*
Extracting coefficients from simple linear
regression simulation
*/
options formdlim="-" nodate;
/* generate simulation data sets Y ~ N(mu(x)= 3+2x, sigma=2) */
data sims;
do dataset=1 to 1000;
do x=1 to 10;
y = 3 + 2*x + 2*rannor(0);
output;
end;
end;
/* DEBUG: print to check generated data */
proc print data=sims;
run;
/* SORT for data set */
proc sort data=sims; by dataset;
run;
/* USE OUTEST to extract the estimated coefficients */
proc reg data=sims outest=myparms; by dataset;
model y=x;
run;
proc print data=myparms;
run;
/* HISTOGRAM for estimated slope */
proc gchart data=work.myparms;
vbar x;
run;
/* Re-do this with ODS */
*ods trace on; * determine what output objects are constructed;
ods output ParameterEstimates=reg_coefs;
proc reg data=sims; by dataset;
model y=x;
run;
proc print data=reg_coefs;
run;
ods output close;
*ods trace off;
proc print data=reg_coefs;
run;
proc contents data=reg_coefs;
run;
data slopes; set reg_coefs;
if Variable="x";
slope=Estimate;
keep dataset slope;
data intercepts; set reg_coefs;
if Variable="Intercept";
Intercept = Estimate;
keep dataset intercept;
data both; merge slopes intercepts; by dataset;
proc gplot data=both;
title "Plot of estimated slope vs. estimated intercept";
plot slope*intercept;
run;
proc gchart data=both;
title "Sampling distribution of the estimated slope";
vbar slope;
run;
proc gchart data=both
title "Sampling distribution of the estimated intercept";
vbar intercept;
run;
proc print data=slopes;
run;
PROC TABULATE (producing fancier results tables in SAS)
PROC TABULATE <option(s)>;
CLASS variable(s) </ options>; * identify non-numeric vars;
FREQ variable; * identify variable containing frequency of observation;
TABLE <page-expression,>
row-expression,>
column-expression</ table-option(s)>;
VAR analysis-variable(s)</ options>; * identify analysis vars;
WEIGHT variable; * identify variable name – e.g. sampling wts;
* FORMATTING related subcommands …
CLASSLEV variable(s) / STYLE=<style-element-name | PARENT> <[style-attribute-specification(s)] >;
KEYLABEL keyword-1='description-1'
<...keyword-n='description-n'>;
KEYWORD keyword(s) / STYLE=<style-element-name | PARENT> <[style-attribute-specification(s)] >;
[* check out results of search for “Tabulate syntax” on www.muohio.edu/quantapps SAS doc]
Comments:
* concatenation (blank) operator
* crossing (*) operator
* format modifiers
* grouping elements (parentheses) operator
* ALL class variable
data d1;
infile 'M:\public.www\classes\sta402\SAS-programs\ch2-dat.txt' firstobs=16 expandtabs missover pad ;
input @9 animal 2.
@17 conc 3.
@25 brood1 2.
@33 brood2 2.
@41 brood3 2.
@49 total 2.;
proc tabulate data=d1;
class conc;
var brood1 brood2 brood3 total;
table (brood1 brood2 brood3 total)*conc, min q1 median q3 max;
run;
proc tabulate data=d1;
class conc;
var total;
table conc=”Nitrofen Concentration” all, total (mean var);
run;
Week 04+/- [12+ Sept.] Class Activities
AN INTRODUCTION TO STATISTICAL MODELING
* PROC REG for linear modeling (a very basic introduction)
* PROC GLM for anova models
Other normal response modeling
ANOVA – balanced anova models
Non-normal response modeling
GENMOD – generalized linear models
LOGISTIC – [grouped] binary regression
PROBIT – [grouped] binary regression (INVERSECL)
CATMOD – categorical data modeling
Failure time modeling
LIFEREG – accelerated failure time models
PHREG – Cox’s PH model
And more …
REGRESSION using PROC REG
Basic Model:
Yi = b0 + b1Xi + ei [“simple linear regression”]
= b0 + b1 Xi1 + b2 Xi2 + b3Xi3 + b4 Xi4 + b5Xi5 + eij [“multiple linear regression”]
Error Assumption:
eij ~ indep. N(0, s2)
i=1,2,…,n [observations]
/*
example sas program that does simple linear regression
*/
options ls=75;
data example1;
input year nboats manatees;
cards;
77 447 13
78 460 21
79 481 24
80 498 16
81 513 24
82 512 20
83 526 15
84 559 34
85 585 33
86 614 33
87 645 39
88 675 43
89 711 50
90 719 47
;
/*
WARNING: ODS RTF will place TITLE information along
With SAS date/time/page number as part of a header in
the RTP document. Check out Print Preview or view the
header.
*/
ODS RTF file='D:\baileraj\Classes\Fall 2003\sta402\SAS-programs\linreg-output.rtf’;
proc reg;
title ‘Number of Manatees killed regressed on the number of boats registered in Florida’;
model manatees = nboats / p r cli clm;
plot manatees*nboats=”o” p.*nboats=”+” / overlay;
plot r.*nboats r.*p.;
run;
ODS RTF CLOSE;
Analysis of Variance /Source / DF / Sumof
Squares / Mean
Square / F Value / PrF /
Model / 1 / 1711.97866 / 1711.97866 / 93.61 / <.0001
Error / 12 / 219.44991 / 18.28749
Corrected Total / 13 / 1931.42857
Root MSE / 4.27639 / R-Square / 0.8864
Dependent Mean / 29.42857 / Adj R-Sq / 0.8769
Coeff Var / 14.53141
Parameter Estimates /
Variable / DF / Parameter
Estimate / Standard
Error / tValue / Pr|t| /
Intercept / 1 / -41.43044 / 7.41222 / -5.59 / 0.0001
nboats / 1 / 0.12486 / 0.01290 / 9.68 / <.0001
Output Statistics /
Obs / DepVar
manatees / Predicted
Value / StdError
MeanPredict / 95% CL Mean / 95% CL Predict / Residual / StdError
Residual / Student
Residual /
1 / 13.0000 / 14.3827 / 1.9299 / 10.1779 / 18.5876 / 4.1604 / 24.6050 / -1.3827 / 3.816 / -0.362
2 / 21.0000 / 16.0059 / 1.7974 / 12.0896 / 19.9222 / 5.8989 / 26.1130 / 4.9941 / 3.880 / 1.287
3 / 24.0000 / 18.6280 / 1.5976 / 15.1472 / 22.1089 / 8.6816 / 28.5745 / 5.3720 / 3.967 / 1.354
4 / 16.0000 / 20.7507 / 1.4528 / 17.5853 / 23.9161 / 10.9102 / 30.5911 / -4.7507 / 4.022 / -1.181
5 / 24.0000 / 22.6236 / 1.3420 / 19.6997 / 25.5475 / 12.8582 / 32.3891 / 1.3764 / 4.060 / 0.339
6 / 20.0000 / 22.4987 / 1.3488 / 19.5600 / 25.4375 / 12.7288 / 32.2687 / -2.4987 / 4.058 / -0.616
7 / 15.0000 / 24.2468 / 1.2622 / 21.4968 / 26.9968 / 14.5320 / 33.9616 / -9.2468 / 4.086 / -2.263
8 / 34.0000 / 28.3672 / 1.1482 / 25.8656 / 30.8689 / 18.7198 / 38.0147 / 5.6328 / 4.119 / 1.367
9 / 33.0000 / 31.6137 / 1.1650 / 29.0753 / 34.1520 / 21.9566 / 41.2707 / 1.3863 / 4.115 / 0.337
10 / 33.0000 / 35.2346 / 1.2909 / 32.4221 / 38.0472 / 25.5019 / 44.9673 / -2.2346 / 4.077 / -0.548
11 / 39.0000 / 39.1054 / 1.5187 / 35.7963 / 42.4144 / 29.2178 / 48.9929 / -0.1054 / 3.998 / -0.0264
12 / 43.0000 / 42.8512 / 1.7974 / 38.9349 / 46.7675 / 32.7442 / 52.9582 / 0.1488 / 3.880 / 0.0383
13 / 50.0000 / 47.3462 / 2.1762 / 42.6048 / 52.0877 / 36.8917 / 57.8007 / 2.6538 / 3.681 / 0.721
14 / 47.0000 / 48.3451 / 2.2647 / 43.4109 / 53.2794 / 37.8018 / 58.8884 / -1.3451 / 3.628 / -0.371
Output Statistics /
Obs / -2-1012 / Cook's
D /
1 / | | | / 0.017
2 / | |** | / 0.178
3 / | |** | / 0.149
4 / | **| | / 0.091
5 / | | | / 0.006
6 / | *| | / 0.021
7 / | ****| | / 0.244
8 / | |** | / 0.073
9 / | | | / 0.005
10 / | *| | / 0.015
11 / | | | / 0.000
12 / | | | / 0.000
13 / | |* | / 0.091
14 / | | | / 0.027
Sum of Residuals / 0
Sum of Squared Residuals / 219.44991
Predicted Residual SS (PRESS) / 281.76275
Multiple Regression with indicator variables