SASDescriptive StatisticsDr. Fan

Supplementary Materials

Based on the data in HTWT.txt

SAS Code:

DATA HTWT;

INPUTGENDER $

HEIGHT

WEIGHT

COLLEGE $;

DATALINES;

M 68.5 155 SCI

F 61.2 99 BSNS

F 63.0 115 BSNS

M 70.0 205 SCI

M 68.6 170 ARTS

F 65.1 125 BSNS

M 72.4 220 ARTS

M 69.5 188 SCI

;

RUN;

For Quantitative Variables: PROC UNIVARIATE

PROC UNIVARIATE gives you an extensive statistics for quantitative variables including normality tests (with the option NORMAL), stemplots and boxplots (with the option PLOT). You can also get histograms, normal Q-Q plots, and normal probability plots by HISTOGRAM, QQPLOT, and PROBPLOT, respectively.

SAS Syntax:

PROC UNIVARIATE DATA=dataname NORMAL PLOT;

TITLE “the title you want for the output”;

VAR var1 var2 … vari;

HISTOGRAM var1 var2 … varj;

QQPLOT var1 var2 … vark;

PROBPLOT var1 var2 … varl;

RUN;

Example:

We wish to get descriptive statistics including stemplots and boxplots for the variable height. Also verify whether the variable height follows a normal distribution.

SAS Code:

PROCUNIVARIATEDATA=HTWT NORMALPLOT;

TITLE"DESCRIPTIVE STATISTICS + PLOTS";

VAR HEIGHT WEIGHT;

HISTOGRAM HEIGHT / MIDPOINTS=60 TO 75 BY 5NORMAL;

INSETMEAN='Mean' (5.2)

STD='Standard deviation' (5.2);

QQPLOT HEIGHT;

PROBPLOT HEIGHT;

RUN;

Exercise: Obtain descriptive statistics for weight. Is it reasonable to assume weight follows a normal distribution? Does the mean weight of the population significantly differ from 160 lb?

Example:

We wish to get the descriptive statistics and histogram for height and for each gender separately.

SAS Code:

PROCSORTDATA=HTWT;

BY GENDER;

RUN;

/** NUMERICAL AND VISUAL SUMMARIES FOR QUANTITATIVE VAR'S **/

PROCUNIVARIATEDATA=HTWT NORMALPLOT;

TITLE"MORE DESCRIPTIVE STATISTICS + HISTOGRAMS";

BY GENDER;

VAR HEIGHT;

RUN;

For Categorical Variables: PROC FREQ and PROC GCHART

PROC FREQ outputs frequency tables for categorical variables.

PROC GCHART outputs the bar charts/pie charts for categorical variables. Use the discrete option (for numerical variables) to treat each value as one individual category. In pie charts, other options include: value = inside causes the frequency count to be placed inside the pie slice; percent = inside causes the percent to be placed inside the pie slice; slice = outside causes the label to be placed outside the pie slice.

SAS Syntax:

PROC FREQ DATA=dataname;

TABLES var1 var2 … vari;

RUN;

PROC GCHART DATA=dataname;

VBAR var1 var2 … vari / DISCRETE;

PIE var1 var2 … vari / DISCRETE;

RUN;

Example:

Summarize the variable college and gender.

SAS Code:

PROCFREQDATA=HTWT;

TITLE“FREQUENCY TABLE”;

TABLESGENDERCOLLEGE;

RUN;

/** VISUAL SUMMARY FOR CATEGORICAL VAR'S **/

PROCGCHARTDATA=HTWT;

TITLE"BAR CHART";

VBARGENDERCOLLEGE;

RUN;

PROCGCHARTDATA=HTWT;

TITLE"PIE CHART";

PIE gender/VALUE=INSIDE

PERCENT=INSIDE

SLICE=OUTSIDE; RUN;

Relationship among Variables:

  • 1 categorical + 1 quantitative variables

Example: Draw a plot to illustrate how the height of male students distributes differently to the height of female students. Describe what you see.

/* Sorting the data to be used in the side by side boxplots */

procsortdata=htwt OUT=HTWT_SORTED;

by gender;

run;

PROCBOXPLOTDATA=HTWT_SORTED;

TITLE"SIDE-BY-SIDE BOXPLOT FOR GENDER AND HEIGHT";

PLOT HEIGHT*GENDER; /* QUANTITATIVE*CATEGORICAL */

RUN;

  • 2 categorical variables

Example: Draw a plot to illustrate how the distribution of the colleges of male students differs from that of female students. Describe what you see.

PROCFREQDATA=HTWT;

TITLE"TWO-WAY TABLE";

TABLES GENDER*COLLEGE;

RUN;

PROCGCHARTDATA=HTWT;

TITLE"SIDE-BY-SIDE BAR CHART FOR GENDER AND COLLEGE";

VBARCOLLEGE / GROUP= GENDER;

RUN;

  • 2 quantitative variables:

Example: Draw a plot to illustrate the relationship between height and weight. Describe what you see.

PROCGPLOTDATA=HTWT;

TITLE"SCATTERPLOT OF WEIGHT VS. HEIGHT";

PLOT WEIGHT*HEIGHT; /* Y*X */

RUN;

  • 1 categorical + 2 quantitative variables

Example: Draw a plot to illustrate the relationship between height and weight for different genders. Describe what you see.

PROCGPLOTDATA=HTWT;

TITLE"SCATTERPLOT FOR WEIGHT BY HEIGHT WITH GENDER GROUP";

PLOT WEIGHT*HEIGHT=GENDER;

RUN;

  • 2 categorical + 1 quantitative variables

Example: Draw a plot to illustrate how the height of male students distributes differently to the height of female students within each college group. Describe what you see.

/* Sorting the data to be used in the side by side boxplots */

ProcSortData=HTWT;

By COLLEGE GENDER;

run;

/* Printing the side by side boxplots of height by college and gender*/

ProcBoxplotdata = HTWT;

Title"Boxplot of height by College and Gender";

Plot Height*GENDER (COLLEGE);

run;

1