CSSS 508: Intro R

2/17/06

Homework 6 Solutions

1) Download hw6prob1.dat from the class website and read it into R.

prob6.1<-as.matrix(as.data.frame(read.table("hw6prob1.dat")))

The first two columns are x and y coordinates.

The third column is the group label: 1, 2, 3, 4, or 5.

On the same graphics window, plot the 5 groups:

a) group 1 as red circles

b) group 2 as green crosses

c) group 3 as solid black circles

d) group 4 as blue triangles

e) group 5 as solid yellow squares

Label the x and y axes as “Coordinate 1” and “Coordinate 2” respectively.

Title the graph “Five Bivariate Normal Distributions”.

##Finding the Groups

group1<-prob6.1[prob6.1[,3]==1,]

group2<-prob6.1[prob6.1[,3]==2,]

group3<-prob6.1[prob6.1[,3]==3,]

group4<-prob6.1[prob6.1[,3]==4,]

group5<-prob6.1[prob6.1[,3]==5,]

##Plotting/labeling the plot space

plot(prob6.1[,1],prob6.1[,2],type="n",xlab="Coordinate 1",ylab="Coordinate 2",main="Five Bivariate Normal Distributions")

##Plotting each group

points(group1[,1:2],col=2,pch=1)

points(group2[,1:2],col=3,pch=3)

points(group3[,1:2],col=1,pch=16)

points(group4[,1:2],col=4,pch=2)

points(group5[,1:2],col=7,pch=15)

2) Download hw6prob2.dat from the class website and read it into R.

prob6.2<-as.matrix(as.data.frame(read.table("hw6prob2.dat")))

In each column are incomes of a different age group (18-35, 36-55, 56-89), three total.

#Finding the Subgroups

age18.35<-prob6.2[,1]

age36.55<-prob6.2[,2]

age56.89<-prob6.2[,3]

a) One the same graphics window, plot three income histograms (not on the same plot; three separate plots in a column), one for each age group and each titled with its age bracket. The x-axis label on all three should be “Income”. All three histograms should cover the exact same range (so you can compare the distributions at a glance); you need to figure out a breaks range that includes all the income values.

par(mfrow=c(3,1))

hist(age18.35,breaks=seq(min(prob6.2),max(prob6.2),length=20),main="Age 18-35",xlab="Income")

hist(age36.55,breaks=seq(min(prob6.2),max(prob6.2),length=20),main="Age 36-55",xlab="Income")

hist(age56.89,breaks=seq(min(prob6.2),max(prob6.2),length=20),main="Age 56-89",xlab="Income")

b) On the same graphics window, plot three boxplots, one for each group. They should be labeled by age bracket and be in three different colors.

par(mfrow=c(1,1))

boxplot(age18.35,age36.55,age56.89,col=c(2,3,4),names=c("Age 18-35","Age 36-55", "Age 56-89"))

title("Income Boxplots by Age Group")

3) Generate three random samples, 100 observations each: one from a N(3, 1), one from a N(2, 2), and one from a N(4, 0.5). For each random sample, find the corresponding density values (use dnorm()). You should now have three sets of 100 pairs of data. Plot all three sets on the same graph as lines. i.e., you’re drawing 3 different density curves. Each curve should be in a different color and a different line type.

##Generating the Data

x1<-rnorm(100,3,1)

y1<-dnorm(x1,3,1)

x2<-rnorm(100,2,2)

y2<-dnorm(x2,2,2)

x3<-rnorm(100,4,0.5)

y3<-dnorm(x3,4,0.5)

allx<-c(x1,x2,x3)

ally<-c(y1,y2,y3)

##Plotting the Curves; limits set for whole dataset

plot(sort(x1),y1[order(x1)],type="l",xlim=c(min(allx),max(allx)),ylim=c(min(ally),max(ally)),xlab="x",ylab="Density",col=1,lty=1,lwd=2)

lines(sort(x2),y2[order(x2)],col=2,lty=2,lwd=2)

lines(sort(x3),y3[order(x3)],col=3,lty=3,lwd=2)

title("3 Different Normal Curves")

Extra: If you want, do help(legend) and explore putting a legend identifying the curves on your graph.

legend(-2,0.75,legend=c("N(3,1)","N(2,2)",

"N(4,0.5)"),col=c(1,2,3),lty=c(1,2,3),lwd=2)