Assignment #1

STAT 850

Fall 2016

Complete the following problems below. Within each part, include your SAS program code, all corresponding output, and any additional information needed to explain your answer. Your SAS code and output should be formatted in a manner similar to the lecture notes.

1)(35 total points) The National Football League (NFL) holds a scouting combine every year for college football players who would like to play football in the NFL. These players go through a number of evaluations during the combine so that NFL teams can assess their ability. For more information, please see and

The nfl_combine_2014_noNA.csv data file is available from my course website, and it contains information on some of the players who participated in the 2014 combine. The columns in the data file represent the following information:

  • Player: Name of player being evaluated
  • College: College that the player attended
  • Position: The position of the player where DB = defensive back, LB = linebacker, OL = offensive linemen, RB = running back, S = safety, TE = tight end, WO = wide receiver; players who played other positions were excluded from the data file
  • OverallGrade: The overall grade of the player based on the evaluations
  • Height: Height in inches
  • ArmLength: Arm length in inches
  • Weight: Weight in pounds
  • Dash40: 40-yard dash time in seconds
  • BenchPress: Number of bench press repetitions of 225 pounds
  • VerticalJump: Vertical jump in inches
  • BroadJump: Broad jump in inches
  • Cone3Drill: 3-cone drill time in seconds
  • Shuttle20: 20-yard shuttle run in seconds

Using these data, complete the following problems below. While you are welcome to use any football knowledge to help answer questions, this is not needed to perform well on this assignment.

a)(4 points) Read the data into SAS using proc import and print the first five observations using proc print.

b)(4 points) Sort the players by their 40-yard dash times. Print only the names of each offensive linemen with their 40-yard dash times.

c)(4 points) Find the mean 40-yard dash times for the players by each position. Which position has the fastest players on average? Which position has the slowest players on average?

d)Using proc means, find the mean, standard deviation, and sample size for the 40-yard dash times of both offensive linemen and wide receivers (each separately). Export these values into a data set the following ways:

i)(3 points) output statement

ii)(3 points) ods statement

e)(5 points) Assuming these players are a simple random samplefrom a population of all players, we can perform statistical inference procedures to make inferences about this population. With this assumption, perform a two-sample t-test with unequal variances to test the equality of means for the 40-yard dash times of offensive linemen and wide receivers. More formally, we can create the hypotheses as

H0: OL – WR = 0

Ha: OL – WR 0

where position denotes the mean for particular position. This hypothesis test should be performed by showing the correct test statistic and p-value equationswith their values AND without using a SAS procedure to automatically find these values. Your write-up here should be formal by including statements of hypotheses, test statistic, p-value, critical value, and decision with reasoning.

f)This problem involves a new procedure, proc ttest, to perform the same types of calculations as in part e).

i)(3 points) Show the main syntax help page available in SAS for this procedure. A screen capture will suffice to obtain full credit.

ii)(3 points) Perform the calculations for the test usingproc ttest. Indicate where the key components of the output are that allows one to perform the test.

iii)(3 points) Use ods trace to determine what is the appropriate table name that contains the p-value for the test.

iv)(3 points) Use the ods statement to create a data set with the p-value for the test. Print this data set.

2)(20 total points) “Stability testing” is performed by pharmaceutical companies to determine the shelf life for drug products. Typically, part of a drug batch (like a number of pills) is put into storage in a controlled temperature and humidity environment. At regular time points, an item is taken out of storage and testing is performed on it. A common response measured on each item is potency. Over time, the potency of a drug will usually degrade, so the Food and Drug Administration (FDA) has set a 95% lower limit of the desired potency level which the drug needs to remain above. The exact time point where the drug goes below this limit is the shelf life. This shelf life (say, 4 months) then is added to the manufacturing date of a drug to find the expiration date, which is what consumers often see printed on drug packaging.

The shelf life is found with the help of regression models. To show how this done, below is a simulated data set where the potency of a drug has been measured over time in months. Suppose a single pill has been measured at each time point.

Time / Potency
3 / 1.0155450
6 / 0.9835495
9 / 0.9957994
12 / 0.9836627
15 / 0.9863230
18 / 0.9945146
21 / 0.9995710
24 / 0.9679062
30 / 0.9690051
36 / 0.9891509
48 / 0.9674187
60 / 0.9557498

For example, the pill taken out of storage at time 3 months had a potency of 100.92% of the desired potency level. Using this data, complete the following problems.

a)(4 points) Use a data step with the datalines statement to create a SAS data set containing the data in the previous table. Print the data set using proc print.

b)(5 points) Estimate and state the sample regression model with time as the explanatory variable and potency as the response variable. Use proc reg to perform the estimation and make sure that no plots are produced by the procedure. Interpret the relationship between time and potency as given by the model.

c)(4 points) Is there sufficient evidence to indicate a linear relationship between time and potency? Use the appropriate statistical inference methods to make this judgment.

d)(4 points) Use proc reg again as in part b), but include the plot with 95% confidence interval bands for the expected potency. No other plots should be included in the output! I recommend using the SAS help to determine the correct coding specification.

e)(3 points) The FDA has guidelines to determine the shelflife of a drug. Specifically, a 95% confidence interval band plot (like in part d)) is used to find the time where the lower band intersects a horizontal line drawn at a 95% potency level. The corresponding time point where this occurs is the shelf life. Using the plot in part d), approximate what the shelf life would be for this data. Note that you do not need to use SAS to draw the line at a 95% potency level.

1