90-776 Manipulation of Large Data Sets

Homework 2 Solutions

1) Program

/*U:\CLASS\90776\PROGRAMS\HW2P1.SAS performs the tasks in hw 2, problem 1*/
/* by Rob Greenbaum*/
/* date 3/14/1999*/
/* last 3/23/1999*/
/* The program creates a permanent SAS data set containing the cloud seeding data : u:\class\90776\data\cloudhw2.sd2 */

options pageno=1;

libname classdat 'u:\class\90776\data\';

/* a. Bring the cloud seeding SAS data set into a permanent SAS data set saved as u:\class\90776\data\cloudhw2.sd2 */

DATA classdat.cloudhw2;

SET classdat.cloud;

/*b. create target*/

target = te+tw;

/*c. create control*/

control = NC+SC+NWC;

/*d. drop variables */;

DROP TE TW NC SC NWC;

/* e. label variables*/

LABEL period ='Period number'

seeded ='S=seeded, U=unseeded'

season ='Time of the year'

target ='Target areas (TE+TW)'

control ='Control areas (NC+SC+NWC';

RUN;

/* f. display contents*/

PROC CONTENTS data= classdat.cloudhw2;

RUN;

/* g. means broken down by seeded and unseeded */

PROC MEANS n mean std data = classdat.cloudhw2 nway;

class seeded;

var control target;

/* create an output data set */

OUTPUT OUT=meanseed

n = n_cont n_target

mean=m_cont m_target

std=s_cont s_target;

RUN;

/* h. print out the output data set */

PROC print data=meanseed;

RUN;

/* i. histograms broken down by seeded and unseeded*/

PROC CHART data = classdat.cloudhw2;

VBAR control target / group=seeded;

run;

/* j. plot the control data against the target data */

proc plot data=classdat.cloudhw2;

plot control*target=seeded; /*=seeded causes SAS to display 'S' and 'U'*/

run;

2) Program

/*U:\CLASS\90776\PROGRAMS\HW2P2.SAS performs the tasks in hw 2, problem 2*/
/* by Rob Greenbaum*/

/* date 3/23/1999*/

/* The program uses the permanent SAS data set containing the cloud seeding data that was created in problem 1: u:\class\90776\data\cloudhw2.sd2 */

options pageno=1;

libname classdat 'u:\class\90776\data\';

/* let's check what's in the data set*/

PROC contents data=classdat.cloudhw2;

run;

/* let's find the statistics for numeric variables */

PROC means data=classdat.cloudhw2 n mean min max maxdec=2;

VAR period control target;
/* let's find the frequency distributions for the categorical variables*/

/* First, I need to format the SEEDED variable*/

PROC format;

value $seedfmt 'S'='Seeded' 'U' = 'Unseeded';

run;

PROC freq data = classdat.cloudhw2;

TABLES season seeded;

FORMAT seeded $seedfmt.; /* this attaches the format to the seeded variable*/

run;

1) Output

The SAS System 09:52 Tuesday, March 23, 1999 1
CONTENTS PROCEDURE
Data Set Name: CLASSDAT.CLOUDHW2 Observations: 108

Member Type: DATA Variables: 5

Engine: V612 Indexes: 0

Created: 10:44 Tuesday, March 23, 1999 Observation Length: 40

Last Modified: 10:44 Tuesday, March 23, 1999 Deleted Observations: 0

Protection: Compressed: NO

Data Set Type: Sorted: NO

Label:

-----Engine/Host Dependent Information-----

Data Set Page Size: 8192

Number of Data Set Pages: 1

File Format: 607

First Data Page: 1

Max Obs per Page: 203

Obs in First Data Page: 108

-----Alphabetic List of Variables and Attributes-----

# Variable Type Len Pos Label

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

5 CONTROL Num 8 32 Control areas (NC+SC+NWC

1 PERIOD Num 8 0 Period number

3 SEASON Char 8 16 Time of the year

2 SEEDED Char 8 8 S=seeded, U=unseeded

4 TARGET Num 8 24 Target areas (TE+TW)

The SAS System 09:52 Tuesday, March 23, 1999 2

SEEDED N Obs Variable Label N Mean Std Dev

------

S 54 CONTROL Control areas (NC+SC+NWC 54 4.5051852 3.3630207

TARGET Target areas (TE+TW) 54 3.1518519 2.4138508

U 54 CONTROL Control areas (NC+SC+NWC 54 5.2975926 3.8626293

TARGET Target areas (TE+TW) 54 3.4061111 2.5053004

------

Without formally testing, it appears that there is greater rainfall in the unseeded areas. For the control areas, the mean rainfall was 4.5 inches per season in the seeded areas and 5.3 inches per season in the unseeded areas. Likewise, for the target areas, the mean rainfall was 3.2 inches per season in the seeded areas and 3.4 inches per season in the unseeded areas. Note that the mean is greater for the control areas because the control areas include three areas (NC, SC, NWC) and the target areas only contained two (TE and TW).

The SAS System 09:52 Tuesday, March 23, 1999 3
OBS SEEDED _TYPE_ _FREQ_ N_CONT N_TARGET M_CONT M_TARGET S_CONT S_TARGET

1 S 1 54 54 54 4.50519 3.15185 3.36302 2.41385

2 U 1 54 54 54 5.29759 3.40611 3.86263 2.50530

The SAS System 09:52 Tuesday, March 23, 1999 4

Frequency

22 ˆ ***

‚ ***

21 ˆ ***

‚ ***

20 ˆ ***

‚ ***

19 ˆ *** ***

‚ *** ***

18 ˆ *** *** ***

‚ *** *** ***

17 ˆ *** *** ***

‚ *** *** ***

16 ˆ *** *** *** ***

‚ *** *** *** ***

15 ˆ *** *** *** ***

‚ *** *** *** ***

14 ˆ *** *** *** ***

‚ *** *** *** ***

13 ˆ *** *** *** *** ***

‚ *** *** *** *** ***

12 ˆ *** *** *** *** ***

‚ *** *** *** *** ***

11 ˆ *** *** *** *** ***

‚ *** *** *** *** ***

10 ˆ *** *** *** *** ***

‚ *** *** *** *** ***

9 ˆ *** *** *** *** ***

‚ *** *** *** *** ***

8 ˆ *** *** *** *** ***

‚ *** *** *** *** ***

7 ˆ *** *** *** *** ***

‚ *** *** *** *** ***

6 ˆ *** *** *** *** *** *** ***

‚ *** *** *** *** *** *** ***

5 ˆ *** *** *** *** *** *** ***

‚ *** *** *** *** *** *** ***

4 ˆ *** *** *** *** *** *** ***

‚ *** *** *** *** *** *** ***

3 ˆ *** *** *** *** *** *** *** ***

‚ *** *** *** *** *** *** *** ***

2 ˆ *** *** *** *** *** *** *** *** ***

‚ *** *** *** *** *** *** *** *** ***

1 ˆ *** *** *** *** *** *** *** *** *** *** *** ***

‚ *** *** *** *** *** *** *** *** *** *** *** ***

Šƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

1.5 4.5 7.5 10.5 13.5 16.5 19.5 1.5 4.5 7.5 10.5 13.5 16.5 19.5 CONTROL Midpoint

‚ƒƒƒƒƒƒƒƒƒƒƒƒƒƒ S ƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‚ ‚ƒƒƒƒƒƒƒƒƒƒƒƒƒƒ U ƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‚ SEEDED

The SAS System 09:52 Tuesday, March 23, 1999 5

Frequency

23 ˆ ***

‚ ***

22 ˆ *** ***

‚ *** ***

21 ˆ *** ***

‚ *** ***

20 ˆ *** ***

‚ *** ***

19 ˆ *** ***

‚ *** ***

18 ˆ *** *** ***

‚ *** *** ***

17 ˆ *** *** ***

‚ *** *** ***

16 ˆ *** *** ***

‚ *** *** ***

15 ˆ *** *** ***

‚ *** *** ***

14 ˆ *** *** ***

‚ *** *** ***

13 ˆ *** *** ***

‚ *** *** ***

12 ˆ *** *** ***

‚ *** *** ***

11 ˆ *** *** ***

‚ *** *** ***

10 ˆ *** *** *** ***

‚ *** *** *** ***

9 ˆ *** *** *** *** ***

‚ *** *** *** *** ***

8 ˆ *** *** *** *** ***

‚ *** *** *** *** ***

7 ˆ *** *** *** *** *** ***

‚ *** *** *** *** *** ***

6 ˆ *** *** *** *** *** ***

‚ *** *** *** *** *** ***

5 ˆ *** *** *** *** *** *** *** ***

‚ *** *** *** *** *** *** *** ***

4 ˆ *** *** *** *** *** *** *** *** ***

‚ *** *** *** *** *** *** *** *** ***

3 ˆ *** *** *** *** *** *** *** *** ***

‚ *** *** *** *** *** *** *** *** ***

2 ˆ *** *** *** *** *** *** *** *** *** ***

‚ *** *** *** *** *** *** *** *** *** ***

1 ˆ *** *** *** *** *** *** *** *** *** *** *** *** ***

‚ *** *** *** *** *** *** *** *** *** *** *** *** ***

Šƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

0 2 4 6 8 10 12 0 2 4 6 8 10 12 TARGET Midpoint

‚ƒƒƒƒƒƒƒƒƒƒƒƒƒƒ S ƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‚ ‚ƒƒƒƒƒƒƒƒƒƒƒƒƒƒ U ƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‚ SEEDED

The SAS System 09:52 Tuesday, March 23, 1999 6

Plot of CONTROL*TARGET. Symbol is value of SEEDED.

25 ˆ

C ‚

o ‚ U

n 20 ˆ

t ‚

r ‚

o ‚

l ‚

a ‚

r 15 ˆ U

e ‚ U

a ‚

s ‚

‚ S S

( ‚ S

N ‚ S U S

C 10 ˆ S

+ ‚ U U S

S ‚ U U

C ‚ U UU SU U

+ ‚ U U U S U

N ‚ SS S

W ‚ U US S U S S S

C 5 ˆ SS S S S U U

‚ U U S U U S S

‚ U U S S U U U U

‚ U S U U

‚ U SU US S

‚ U SSSSSSS U

‚ SSUU

0 ˆ SSU

Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒˆƒƒ

0 2 4 6 8 10 12 14

Target areas (TE+TW)

NOTE: 23 obs hidden.

2) Output

Codebook for Cloud Seeding Data

The SAS cloud seeding data set is saved as u:\class\90776\data\cloudhw2.sd2

These data are those collected in a cloud-seeding experiment in Tasmania between mid-1964 and January 1971. Their analysis, using regression techniques and permutation tests, is discussed in:

Miller, A.J., Shaw, D.E., Veitch, L.G. & Smith, E.J. (1979).

`Analyzing the results of a cloud-seeding experiment in Tasmania',

Communications in Statistics - Theory & Methods, vol.A8(10), 1017-1047.

The data can be found at the web site:

PERIOD Period number

This variable measures the observation number.

Observations / Mean / Minimum / Maximum
108 / 100.57 / 1.00 / 194.00

CONTROL Control areas (NC+SC+NWC

This variable measures the mean rainfall for the control areas (north, south, and northwest)

Observations / Mean / Minimum / Maximum
108 / 4.90 / 0.08 / 20.86

TARGET Target areas (TE+TW)

This variable measures the mean rainfall for the target areas (east and west).

Observations / Mean / Minimum / Maximum
108 / 3.28 / 0.10 / 12.15

SEASON Time of the year
This variable indicates the season of the year

Response Category /

Count

AUTUMN / 24
SPRING / 32
SUMMER / 24
WINTER / 28

SEEDED S=seeded, U=unseeded

This variable indicates whether the clouds were seeded or unseeded

Response Category / Count
Seeded / 54
Unseeded / 54