STT 501R#2Fall, 2008

#there are several ways to compare the distribution of

#one variable with the the distribution of another.

#Side-by-side boxplots is a simple way. Try this with the

#"standard" and "new" therapies in problem 3.6 on page 64:

standard=c(4 15 24 10 1 27 31 14 2 16 32 7 13 36 29 6 12 18 14 15 18 6 13 21 20 8 3 24)

new=c(5 20 29 15 7 32 36 17 15 19 35 10 16 39 27 14 10 16 12 13 16 9 18 33 30 29 31 27)

standard; new

boxplot(standard,new)

#how would you describe the differences you see in these

#distributions?

#

#now let's compare the quantiles in more detail...

quantile(standard); quantile(new)

#or try

quantile(standard,probs=seq(0,1.00,.01))

quantile(new,probs=seq(0,1.00,.01))

#use the IQR function to get the interquartile range,(spread

#of the middle 50% of the data)

#we could plot each quantile distribution

#as in R#1 but we're lucky here that both of these vectors

#are the same length (try length(standard); length(new) to

#verify this). this means that the quantiles of each

#correspondto the same percentages...so if we sort both

#vectors and thenpair them up we can compare the two

#distributions...

sstand=sort(standard); snew=sort(new)

plot(sstand,snew); abline(0,1)

#adding the "0,1" line puts a line on the plot that

#corresponds to "standard" = "new" - abline(0,1) sketches

#the line with slope=1 through the origin. How would you

#describe thedifferences you see between the two

#distributions of survivaltimes?

#

#Now let's look at an interesting time series, the global

#temperature averages from 1881 through 2007. first read in

#the data from the text file you may download from the

#website. then use read.table to read it in...

GT[1:5,] #to see the data - read the info file to understand

#the variables. get a time plot using the plot function

plot(year,JD,type="b") #to get both points and lines...

#how would you describe the trend? is there "seasonal"

#variation?

#

#How to plot more than one series on the same axes?

#Let's try to put both the January and June plots on the

#sameplot...

#first determine which one has the most variability in its

#values

min(Jan); max(Jan); min(Jun); max(Jun)

#then try

plot(Year,Jan,type="b",ylim=c(-160,160))

#add points for June in red as follows:

points(Year,Jun,col="red")

#then add the lines

lines(Year,Jun,col="red")

#if you had tried to plot(Year,Jun) on top of the January

#plotit would have just plotted over the first plot. The #points andlines functions have the ability to add points

#and lines to an existing plot.