1) (30 Total Points) This Problem Examines the T Probability Density Function

Final Exam

STAT 850

Fall 2016

Complete the problems below. Make sure to fully explain all answers and show your work to receive full credit! This includes all R code and output used to complete a problem!

1) (30 total points) This problem examines the t probability density function.

a) (10 points) Construct a plot of a t probability density function with 5 degrees of freedom. Include appropriate x-axis and y-axis labels on the plot rather than those given by default.

b) (10 points) Starting from the code in part a), create a function named tplot() that will produce the same plot for any user specified degrees of freedom. This degrees of freedom needs to be an argument specified in the function’s call. Make sure the plot indicates what degrees of freedom were used and run the code with a degrees of freedom of 10.

c) (10 points) Starting from the code in part a), create four t-distribution plots for the degrees of freedom of 5, 10, 20, and 30. These plots should be put in a single graphics window arranged in a 22 grid. Also, these plots should be created directly from using a loop code structure.

2) (24 total points) This problem examines a data set investigating the relationship between cheese taste and the amount of acetic acid (natural log of the amount) in a particular cheese. The data can be downloaded from the graded materials web page of my course website, and it is located in the file cheese.csv. Below are the first few observations:

Taste score / Natural log of acetic acid
12.3 / 4.543
20.9 / 5.159
 / 

Using this data, complete the following.

a) (6 points) Read in the data into R and print the first 6 observations. If you are unable to complete this part, I will help you but this will result in a four-point deduction.

b) (6 points) Sort the data by the taste score. Print the first six observations of this sorted data.

c) (6 points) Create a new data frame that has only those observations with a natural log of acetic acid value greater than 6.0 and a taste score greater than 40. Print the data frame. Do not use a sorting of the data to create this new data frame.

d) (6 points) Estimate and state the regression model that uses taste score as the response and the natural log of acetic acid as the explanatory variable (the log transformation is already applied to the data in cheese.csv)

3) (12 points) Why is R referred to as an “object-oriented language”? What is the main benefit from being this type of language?

4) (34 total points) Answer the following questions.

a) (6 points) Below is some R code and output.

> A <- 2

> a

Error: object 'a' not found

Why was the error message printed? Explain.

b) (8 points) What is a data frame in R? In your answer, describe a data frame’s relationship with an R list.

c) (8 points) What is the difference between the intended use of ifelse() and if() {} else {}?

d) (6 points) Why is the “search path” in R important to know?

e) (6 points) What is the distinction between a “matrix” and a “vector” for R? Note that this question is not asking for the mathematical distinction that would be discussed in a course on matrix algebra.

5) (3 points extra credit) The New York Times article “Data Analysts Captivated by R’s Power” from January 6, 2009 had the following quote from an individual working for a major statistical software company:

I think it addresses a niche market for high-end data analysts that want free, readily available code. We have customers who build engines for aircraft. I am happy they are not using freeware when I get on a jet.

Who said this quote and what company were they from?