90-776 Manipulation of Large Data Sets
Homework 2 Solutions
1) Program
/*U:\CLASS\90776\PROGRAMS\HW2P1.SAS performs the tasks in hw 2, problem 1*/
/* by Rob Greenbaum*/
/* date 3/14/1999*/
/* last 3/23/1999*/
/* The program creates a permanent SAS data set containing the cloud seeding data : u:\class\90776\data\cloudhw2.sd2 */
options pageno=1;
libname classdat 'u:\class\90776\data\';
/* a. Bring the cloud seeding SAS data set into a permanent SAS data set saved as u:\class\90776\data\cloudhw2.sd2 */
DATA classdat.cloudhw2;
SET classdat.cloud;
/*b. create target*/
target = te+tw;
/*c. create control*/
control = NC+SC+NWC;
/*d. drop variables */;
DROP TE TW NC SC NWC;
/* e. label variables*/
LABEL period ='Period number'
seeded ='S=seeded, U=unseeded'
season ='Time of the year'
target ='Target areas (TE+TW)'
control ='Control areas (NC+SC+NWC';
RUN;
/* f. display contents*/
PROC CONTENTS data= classdat.cloudhw2;
RUN;
/* g. means broken down by seeded and unseeded */
PROC MEANS n mean std data = classdat.cloudhw2 nway;
class seeded;
var control target;
/* create an output data set */
OUTPUT OUT=meanseed
n = n_cont n_target
mean=m_cont m_target
std=s_cont s_target;
RUN;
/* h. print out the output data set */
PROC print data=meanseed;
RUN;
/* i. histograms broken down by seeded and unseeded*/
PROC CHART data = classdat.cloudhw2;
VBAR control target / group=seeded;
run;
/* j. plot the control data against the target data */
proc plot data=classdat.cloudhw2;
plot control*target=seeded; /*=seeded causes SAS to display 'S' and 'U'*/
run;
2) Program
/*U:\CLASS\90776\PROGRAMS\HW2P2.SAS performs the tasks in hw 2, problem 2*/
/* by Rob Greenbaum*/
/* date 3/23/1999*/
/* The program uses the permanent SAS data set containing the cloud seeding data that was created in problem 1: u:\class\90776\data\cloudhw2.sd2 */
options pageno=1;
libname classdat 'u:\class\90776\data\';
/* let's check what's in the data set*/
PROC contents data=classdat.cloudhw2;
run;
/* let's find the statistics for numeric variables */
PROC means data=classdat.cloudhw2 n mean min max maxdec=2;
VAR period control target;
/* let's find the frequency distributions for the categorical variables*/
/* First, I need to format the SEEDED variable*/
PROC format;
value $seedfmt 'S'='Seeded' 'U' = 'Unseeded';
run;
PROC freq data = classdat.cloudhw2;
TABLES season seeded;
FORMAT seeded $seedfmt.; /* this attaches the format to the seeded variable*/
run;
1) Output
The SAS System 09:52 Tuesday, March 23, 1999 1
CONTENTS PROCEDURE
Data Set Name: CLASSDAT.CLOUDHW2 Observations: 108
Member Type: DATA Variables: 5
Engine: V612 Indexes: 0
Created: 10:44 Tuesday, March 23, 1999 Observation Length: 40
Last Modified: 10:44 Tuesday, March 23, 1999 Deleted Observations: 0
Protection: Compressed: NO
Data Set Type: Sorted: NO
Label:
-----Engine/Host Dependent Information-----
Data Set Page Size: 8192
Number of Data Set Pages: 1
File Format: 607
First Data Page: 1
Max Obs per Page: 203
Obs in First Data Page: 108
-----Alphabetic List of Variables and Attributes-----
# Variable Type Len Pos Label
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
5 CONTROL Num 8 32 Control areas (NC+SC+NWC
1 PERIOD Num 8 0 Period number
3 SEASON Char 8 16 Time of the year
2 SEEDED Char 8 8 S=seeded, U=unseeded
4 TARGET Num 8 24 Target areas (TE+TW)
The SAS System 09:52 Tuesday, March 23, 1999 2
SEEDED N Obs Variable Label N Mean Std Dev
------
S 54 CONTROL Control areas (NC+SC+NWC 54 4.5051852 3.3630207
TARGET Target areas (TE+TW) 54 3.1518519 2.4138508
U 54 CONTROL Control areas (NC+SC+NWC 54 5.2975926 3.8626293
TARGET Target areas (TE+TW) 54 3.4061111 2.5053004
------
Without formally testing, it appears that there is greater rainfall in the unseeded areas. For the control areas, the mean rainfall was 4.5 inches per season in the seeded areas and 5.3 inches per season in the unseeded areas. Likewise, for the target areas, the mean rainfall was 3.2 inches per season in the seeded areas and 3.4 inches per season in the unseeded areas. Note that the mean is greater for the control areas because the control areas include three areas (NC, SC, NWC) and the target areas only contained two (TE and TW).
The SAS System 09:52 Tuesday, March 23, 1999 3
OBS SEEDED _TYPE_ _FREQ_ N_CONT N_TARGET M_CONT M_TARGET S_CONT S_TARGET
1 S 1 54 54 54 4.50519 3.15185 3.36302 2.41385
2 U 1 54 54 54 5.29759 3.40611 3.86263 2.50530
The SAS System 09:52 Tuesday, March 23, 1999 4
Frequency
22 ˆ ***
‚ ***
21 ˆ ***
‚ ***
20 ˆ ***
‚ ***
19 ˆ *** ***
‚ *** ***
18 ˆ *** *** ***
‚ *** *** ***
17 ˆ *** *** ***
‚ *** *** ***
16 ˆ *** *** *** ***
‚ *** *** *** ***
15 ˆ *** *** *** ***
‚ *** *** *** ***
14 ˆ *** *** *** ***
‚ *** *** *** ***
13 ˆ *** *** *** *** ***
‚ *** *** *** *** ***
12 ˆ *** *** *** *** ***
‚ *** *** *** *** ***
11 ˆ *** *** *** *** ***
‚ *** *** *** *** ***
10 ˆ *** *** *** *** ***
‚ *** *** *** *** ***
9 ˆ *** *** *** *** ***
‚ *** *** *** *** ***
8 ˆ *** *** *** *** ***
‚ *** *** *** *** ***
7 ˆ *** *** *** *** ***
‚ *** *** *** *** ***
6 ˆ *** *** *** *** *** *** ***
‚ *** *** *** *** *** *** ***
5 ˆ *** *** *** *** *** *** ***
‚ *** *** *** *** *** *** ***
4 ˆ *** *** *** *** *** *** ***
‚ *** *** *** *** *** *** ***
3 ˆ *** *** *** *** *** *** *** ***
‚ *** *** *** *** *** *** *** ***
2 ˆ *** *** *** *** *** *** *** *** ***
‚ *** *** *** *** *** *** *** *** ***
1 ˆ *** *** *** *** *** *** *** *** *** *** *** ***
‚ *** *** *** *** *** *** *** *** *** *** *** ***
Šƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
1.5 4.5 7.5 10.5 13.5 16.5 19.5 1.5 4.5 7.5 10.5 13.5 16.5 19.5 CONTROL Midpoint
‚ƒƒƒƒƒƒƒƒƒƒƒƒƒƒ S ƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‚ ‚ƒƒƒƒƒƒƒƒƒƒƒƒƒƒ U ƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‚ SEEDED
The SAS System 09:52 Tuesday, March 23, 1999 5
Frequency
23 ˆ ***
‚ ***
22 ˆ *** ***
‚ *** ***
21 ˆ *** ***
‚ *** ***
20 ˆ *** ***
‚ *** ***
19 ˆ *** ***
‚ *** ***
18 ˆ *** *** ***
‚ *** *** ***
17 ˆ *** *** ***
‚ *** *** ***
16 ˆ *** *** ***
‚ *** *** ***
15 ˆ *** *** ***
‚ *** *** ***
14 ˆ *** *** ***
‚ *** *** ***
13 ˆ *** *** ***
‚ *** *** ***
12 ˆ *** *** ***
‚ *** *** ***
11 ˆ *** *** ***
‚ *** *** ***
10 ˆ *** *** *** ***
‚ *** *** *** ***
9 ˆ *** *** *** *** ***
‚ *** *** *** *** ***
8 ˆ *** *** *** *** ***
‚ *** *** *** *** ***
7 ˆ *** *** *** *** *** ***
‚ *** *** *** *** *** ***
6 ˆ *** *** *** *** *** ***
‚ *** *** *** *** *** ***
5 ˆ *** *** *** *** *** *** *** ***
‚ *** *** *** *** *** *** *** ***
4 ˆ *** *** *** *** *** *** *** *** ***
‚ *** *** *** *** *** *** *** *** ***
3 ˆ *** *** *** *** *** *** *** *** ***
‚ *** *** *** *** *** *** *** *** ***
2 ˆ *** *** *** *** *** *** *** *** *** ***
‚ *** *** *** *** *** *** *** *** *** ***
1 ˆ *** *** *** *** *** *** *** *** *** *** *** *** ***
‚ *** *** *** *** *** *** *** *** *** *** *** *** ***
Šƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
0 2 4 6 8 10 12 0 2 4 6 8 10 12 TARGET Midpoint
‚ƒƒƒƒƒƒƒƒƒƒƒƒƒƒ S ƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‚ ‚ƒƒƒƒƒƒƒƒƒƒƒƒƒƒ U ƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‚ SEEDED
The SAS System 09:52 Tuesday, March 23, 1999 6
Plot of CONTROL*TARGET. Symbol is value of SEEDED.
‚
‚
25 ˆ
‚
‚
‚
‚
C ‚
o ‚ U
n 20 ˆ
t ‚
r ‚
o ‚
l ‚
‚
a ‚
r 15 ˆ U
e ‚ U
a ‚
s ‚
‚ S S
( ‚ S
N ‚ S U S
C 10 ˆ S
+ ‚ U U S
S ‚ U U
C ‚ U UU SU U
+ ‚ U U U S U
N ‚ SS S
W ‚ U US S U S S S
C 5 ˆ SS S S S U U
‚ U U S U U S S
‚ U U S S U U U U
‚ U S U U
‚ U SU US S
‚ U SSSSSSS U
‚ SSUU
0 ˆ SSU
‚
Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒˆƒƒ
0 2 4 6 8 10 12 14
Target areas (TE+TW)
NOTE: 23 obs hidden.
2) Output
Codebook for Cloud Seeding Data
The SAS cloud seeding data set is saved as u:\class\90776\data\cloudhw2.sd2
These data are those collected in a cloud-seeding experiment in Tasmania between mid-1964 and January 1971. Their analysis, using regression techniques and permutation tests, is discussed in:
Miller, A.J., Shaw, D.E., Veitch, L.G. & Smith, E.J. (1979).
`Analyzing the results of a cloud-seeding experiment in Tasmania',
Communications in Statistics - Theory & Methods, vol.A8(10), 1017-1047.
The data can be found at the web site:
PERIOD Period number
This variable measures the observation number.
Observations / Mean / Minimum / Maximum108 / 100.57 / 1.00 / 194.00
CONTROL Control areas (NC+SC+NWC
This variable measures the mean rainfall for the control areas (north, south, and northwest)
Observations / Mean / Minimum / Maximum108 / 4.90 / 0.08 / 20.86
TARGET Target areas (TE+TW)
This variable measures the mean rainfall for the target areas (east and west).
Observations / Mean / Minimum / Maximum108 / 3.28 / 0.10 / 12.15
SEASON Time of the year
This variable indicates the season of the year
Count
AUTUMN / 24SPRING / 32
SUMMER / 24
WINTER / 28
SEEDED S=seeded, U=unseeded
This variable indicates whether the clouds were seeded or unseeded
Response Category / CountSeeded / 54
Unseeded / 54