STT2640-Lab 4
Identifying Outliers and Boxplot
Part I. Making a boxplot by hand
Problem #1. The diastolic blood pressure readings of 12 randomly selected males aged 45-49 from the Framingham Heart Study were
94, 84, 74, 90, 98, 92, 74, 90, 80, 98, 78, 130
Note that the first 11 observations are the same as Problem #1 in Lab 3. The last data value here is 130 instead of 80.
a) Compute the five-summary for the data set by hand.
b) Make a boxplot by hand for the data set.
c) Based on the boxplot, do you see any outliers? Explain.
Part II: Tutorial Portion
Step 1. Using R to Compute the z-score for a data value
In R, function mean(data) , sd(data) are used to calculate the mean and standard deviation respectively of a data set. And function var(data) is used to calculate the sample variance.
> Bloodpressure=c(94,84,74,90,98,92,74,90,80,98,78,130) #Create data set Bloodpressure
> xbar=mean(Bloodpressure) # Compute the mean of the data set
> xbar
[1] 90.16667 # The sample mean blood pressure is 90.16667
> var(Bloodpressure) #Calculate the sample variance
[1] 230.8788
> s=sd(Bloodpressure) # Compute the standard deviation of the data set
> s
[1] 15.1947 # The standard deviation of the data set is 15.1947
> zscore=(130-xbar)/s #compute the z-score for data value x=130
> zscore
[1] 2.621529 #z-score of data value 130 is 2.62, this value is a potential outlier.
Step 2. Make a boxplot for the data set
> windows() #Creates a new graphics window so that other graphs will not be overwritten.
> boxplot(Bloodpressure) #Make a boxplot for data set Bloodpressure, do you see any outliers?
Part III: Lab Portion
Problem #2. Given a data set, 10, 80, 80, 85, 85, 85, 85, 90, 90, 90, 90, 90, 95, 95, 95, 95, 100, 100. Write an R script to complete the following:
a) Compute the mean and standard deviation of the data set.
b) Compute the z-score for data point x=10. Is it a potential outlier? Explain.
c) Make a boxplot for the data set. Are there any outliers? Explain.
Your lab report includes the following:
1) Solutions to Problem #1 from Part I
2) R script file for Problem #2 in Part III
3) R output obtained using your R script for Problem #2 (you may put your R script and output onto one page)
4) Your answers to Problem #2 including the boxplot