STT2640-Lab 4

Identifying Outliers and Boxplot

Part I. Making a boxplot by hand

Problem #1. The diastolic blood pressure readings of 12 randomly selected males aged 45-49 from the Framingham Heart Study were

94, 84, 74, 90, 98, 92, 74, 90, 80, 98, 78, 130

Note that the first 11 observations are the same as Problem #1 in Lab 3. The last data value here is 130 instead of 80.

a)  Compute the five-summary for the data set by hand.

b)  Make a boxplot by hand for the data set.

c)  Based on the boxplot, do you see any outliers? Explain.

Part II: Tutorial Portion

Step 1. Using R to Compute the z-score for a data value

In R, function mean(data) , sd(data) are used to calculate the mean and standard deviation respectively of a data set. And function var(data) is used to calculate the sample variance.

> Bloodpressure=c(94,84,74,90,98,92,74,90,80,98,78,130) #Create data set Bloodpressure

> xbar=mean(Bloodpressure) # Compute the mean of the data set

> xbar

[1] 90.16667 # The sample mean blood pressure is 90.16667

> var(Bloodpressure) #Calculate the sample variance

[1] 230.8788

> s=sd(Bloodpressure) # Compute the standard deviation of the data set

> s

[1] 15.1947 # The standard deviation of the data set is 15.1947

> zscore=(130-xbar)/s #compute the z-score for data value x=130

> zscore

[1] 2.621529 #z-score of data value 130 is 2.62, this value is a potential outlier.

Step 2. Make a boxplot for the data set

> windows() #Creates a new graphics window so that other graphs will not be overwritten.

> boxplot(Bloodpressure) #Make a boxplot for data set Bloodpressure, do you see any outliers?

Part III: Lab Portion

Problem #2. Given a data set, 10, 80, 80, 85, 85, 85, 85, 90, 90, 90, 90, 90, 95, 95, 95, 95, 100, 100. Write an R script to complete the following:

a)  Compute the mean and standard deviation of the data set.

b)  Compute the z-score for data point x=10. Is it a potential outlier? Explain.

c)  Make a boxplot for the data set. Are there any outliers? Explain.

Your lab report includes the following:

1)  Solutions to Problem #1 from Part I

2)  R script file for Problem #2 in Part III

3)  R output obtained using your R script for Problem #2 (you may put your R script and output onto one page)

4)  Your answers to Problem #2 including the boxplot