CSSS 508: Intro R
2/17/06
Homework 6 Solutions
1) Download hw6prob1.dat from the class website and read it into R.
prob6.1<-as.matrix(as.data.frame(read.table("hw6prob1.dat")))
The first two columns are x and y coordinates.
The third column is the group label: 1, 2, 3, 4, or 5.
On the same graphics window, plot the 5 groups:
a) group 1 as red circles
b) group 2 as green crosses
c) group 3 as solid black circles
d) group 4 as blue triangles
e) group 5 as solid yellow squares
Label the x and y axes as “Coordinate 1” and “Coordinate 2” respectively.
Title the graph “Five Bivariate Normal Distributions”.
##Finding the Groups
group1<-prob6.1[prob6.1[,3]==1,]
group2<-prob6.1[prob6.1[,3]==2,]
group3<-prob6.1[prob6.1[,3]==3,]
group4<-prob6.1[prob6.1[,3]==4,]
group5<-prob6.1[prob6.1[,3]==5,]
##Plotting/labeling the plot space
plot(prob6.1[,1],prob6.1[,2],type="n",xlab="Coordinate 1",ylab="Coordinate 2",main="Five Bivariate Normal Distributions")
##Plotting each group
points(group1[,1:2],col=2,pch=1)
points(group2[,1:2],col=3,pch=3)
points(group3[,1:2],col=1,pch=16)
points(group4[,1:2],col=4,pch=2)
points(group5[,1:2],col=7,pch=15)
2) Download hw6prob2.dat from the class website and read it into R.
prob6.2<-as.matrix(as.data.frame(read.table("hw6prob2.dat")))
In each column are incomes of a different age group (18-35, 36-55, 56-89), three total.
#Finding the Subgroups
age18.35<-prob6.2[,1]
age36.55<-prob6.2[,2]
age56.89<-prob6.2[,3]
a) One the same graphics window, plot three income histograms (not on the same plot; three separate plots in a column), one for each age group and each titled with its age bracket. The x-axis label on all three should be “Income”. All three histograms should cover the exact same range (so you can compare the distributions at a glance); you need to figure out a breaks range that includes all the income values.
par(mfrow=c(3,1))
hist(age18.35,breaks=seq(min(prob6.2),max(prob6.2),length=20),main="Age 18-35",xlab="Income")
hist(age36.55,breaks=seq(min(prob6.2),max(prob6.2),length=20),main="Age 36-55",xlab="Income")
hist(age56.89,breaks=seq(min(prob6.2),max(prob6.2),length=20),main="Age 56-89",xlab="Income")
b) On the same graphics window, plot three boxplots, one for each group. They should be labeled by age bracket and be in three different colors.
par(mfrow=c(1,1))
boxplot(age18.35,age36.55,age56.89,col=c(2,3,4),names=c("Age 18-35","Age 36-55", "Age 56-89"))
title("Income Boxplots by Age Group")
3) Generate three random samples, 100 observations each: one from a N(3, 1), one from a N(2, 2), and one from a N(4, 0.5). For each random sample, find the corresponding density values (use dnorm()). You should now have three sets of 100 pairs of data. Plot all three sets on the same graph as lines. i.e., you’re drawing 3 different density curves. Each curve should be in a different color and a different line type.
##Generating the Data
x1<-rnorm(100,3,1)
y1<-dnorm(x1,3,1)
x2<-rnorm(100,2,2)
y2<-dnorm(x2,2,2)
x3<-rnorm(100,4,0.5)
y3<-dnorm(x3,4,0.5)
allx<-c(x1,x2,x3)
ally<-c(y1,y2,y3)
##Plotting the Curves; limits set for whole dataset
plot(sort(x1),y1[order(x1)],type="l",xlim=c(min(allx),max(allx)),ylim=c(min(ally),max(ally)),xlab="x",ylab="Density",col=1,lty=1,lwd=2)
lines(sort(x2),y2[order(x2)],col=2,lty=2,lwd=2)
lines(sort(x3),y3[order(x3)],col=3,lty=3,lwd=2)
title("3 Different Normal Curves")
Extra: If you want, do help(legend) and explore putting a legend identifying the curves on your graph.
legend(-2,0.75,legend=c("N(3,1)","N(2,2)",
"N(4,0.5)"),col=c(1,2,3),lty=c(1,2,3),lwd=2)