STT 501R#2Fall, 2008
#there are several ways to compare the distribution of
#one variable with the the distribution of another.
#Side-by-side boxplots is a simple way. Try this with the
#"standard" and "new" therapies in problem 3.6 on page 64:
standard=c(4 15 24 10 1 27 31 14 2 16 32 7 13 36 29 6 12 18 14 15 18 6 13 21 20 8 3 24)
new=c(5 20 29 15 7 32 36 17 15 19 35 10 16 39 27 14 10 16 12 13 16 9 18 33 30 29 31 27)
standard; new
boxplot(standard,new)
#how would you describe the differences you see in these
#distributions?
#
#now let's compare the quantiles in more detail...
quantile(standard); quantile(new)
#or try
quantile(standard,probs=seq(0,1.00,.01))
quantile(new,probs=seq(0,1.00,.01))
#use the IQR function to get the interquartile range,(spread
#of the middle 50% of the data)
#we could plot each quantile distribution
#as in R#1 but we're lucky here that both of these vectors
#are the same length (try length(standard); length(new) to
#verify this). this means that the quantiles of each
#correspondto the same percentages...so if we sort both
#vectors and thenpair them up we can compare the two
#distributions...
sstand=sort(standard); snew=sort(new)
plot(sstand,snew); abline(0,1)
#adding the "0,1" line puts a line on the plot that
#corresponds to "standard" = "new" - abline(0,1) sketches
#the line with slope=1 through the origin. How would you
#describe thedifferences you see between the two
#distributions of survivaltimes?
#
#Now let's look at an interesting time series, the global
#temperature averages from 1881 through 2007. first read in
#the data from the text file you may download from the
#website. then use read.table to read it in...
GT=read.table(file=file.choose(),header=T)
GT[1:5,] #to see the data - read the info file to understand
#the variables. get a time plot using the plot function
plot(year,JD,type="b") #to get both points and lines...
#how would you describe the trend? is there "seasonal"
#variation?
#
#How to plot more than one series on the same axes?
#Let's try to put both the January and June plots on the
#sameplot...
#first determine which one has the most variability in its
#values
min(Jan); max(Jan); min(Jun); max(Jun)
#then try
plot(Year,Jan,type="b",ylim=c(-160,160))
#add points for June in red as follows:
points(Year,Jun,col="red")
#then add the lines
lines(Year,Jun,col="red")
#if you had tried to plot(Year,Jun) on top of the January
#plotit would have just plotted over the first plot. The #points andlines functions have the ability to add points
#and lines to an existing plot.