Stat 519 Multivariate Analysis

Homework #2

R Graphics

The data set LOST DAYS may be found in your text CD data sets in chapter 3. This data set looks at lost work days per crew member due to injury (LostDPC), crew size (SIZE), foreman age (ForeAge), foreman experience in years (ForeExp), average crew member experience in years (AvgExp), and whether or not the crew customarily uses power tools (Power).

Import this file into R and use it to answer the following questions. Turn in your R input and output, including the requested plots, along with any discussion on your results.

1.  Examine histograms and kernel density estimates of LostDPC. In addition to the defaults, examine at least one additional choice of bin width (histogram) and bandwidth (kde). What do these plots tell you about the distribution of LostDPC?

2.  Examine a strip chart and a boxplot of LostDPC, broken down by Power. What do these charts tell you about the effect of the use of power tools on the distribution of lost days per capita?

3.  Examine scatterplots, looking at the effect on LostDPC (y) from each of crew size, foreman age, foreman experience, and average crew experience (x). Discuss your results. Which independent variable appears to have the greatest influence on lost days per capita?

4.  Use the plot3d function from the rgl library to examine 3d scatterplots of this data. Your third variable (z) should be LostDPC, and your second (y) should be the influential variable you identified in problem 3. Try each of the remaining numeric variables as your first (x) variable. (Remember, you’ll need to use the rgl.snapshot command to save your rgl images to import into Word.) Discuss your results. Can you find any interaction effects (that is, behavior of x and y on z that is not obvious from just looking at x vs z and y vs z separately)?

5.  Construct a pairwise scatterplot matrix, a stars plot, and a parallel coordinates plot on the five numeric variables, using color to separate by Power. Examine these plots for additional interesting interactions, outliers, or other features of the data. Discuss your observations.

6.  Construct conditioning plots on some variables which you find in interesting and illuminative, and discuss your observations.