STT 501R#2Fall, 2008

#there are several ways to compare the distribution of

#one variable with the the distribution of another.

#Side-by-side boxplots is a simple way. Try this with the

#"standard" and "new" therapies in problem 3.6 on page 64:

**standard=c(4 15 24 10 1 27 31 14 2 16 32 7 13 36 29 6 12 18 14 15 18 6 13 21 20 8 3 24)**

**new=c(5 20 29 15 7 32 36 17 15 19 35 10 16 39 27 14 10 16 12 13 16 9 18 33 30 29 31 27)**

standard; new

**boxplot(standard,new)**

#how would you describe the differences you see in these

#distributions?

#

#now let's compare the quantiles in more detail...

**quantile(standard); quantile(new)**

#or try

**quantile(standard,probs=seq(0,1.00,.01))**

**quantile(new,probs=seq(0,1.00,.01))**

#use the IQR function to get the interquartile range,(spread

#of the middle 50% of the data)

#we could plot each quantile distribution

#as in R#1 but we're lucky here that both of these vectors

#are the same length (try **length(standard); length(new)** to

#verify this). this means that the quantiles of each

#correspondto the same percentages...so if we sort both

#vectors and thenpair them up we can compare the two

#distributions...

**sstand=sort(standard); snew=sort(new)**

**plot(sstand,snew); abline(0,1) **

#adding the "0,1" line puts a line on the plot that

#corresponds to "standard" = "new" - abline(0,1) sketches

#the line with slope=1 through the origin. How would you

#describe thedifferences you see between the two

#distributions of survivaltimes?

#

#Now let's look at an interesting time series, the global

#temperature averages from 1881 through 2007. first read in

#the data from the text file you may download from the

#website. then use read.table to read it in...

**GT=read.table(file=file.choose(),header=T)**

GT[1:5,] #to see the data - read the info file to understand

#the variables. get a time plot using the plot function

**plot(year,JD,type="b") **#to get both points and lines...

#how would you describe the trend? is there "seasonal"

#variation?

#

#How to plot more than one series on the same axes?

#Let's try to put both the January and June plots on the

#sameplot...

#first determine which one has the most variability in its

#values

**min(Jan); max(Jan); min(Jun); max(Jun)**

#then try

**plot(Year,Jan,type="b",ylim=c(-160,160))**

#add points for June in red as follows:

**points(Year,Jun,col="red")**

#then add the lines

**lines(Year,Jun,col="red")**

#if you had tried to plot(Year,Jun) on top of the January

#plotit would have just plotted over the first plot. The #points andlines functions have the ability to add points

#and lines to an existing plot.