U of Washington Stat302 Winter 2017 Final

Name______Student ID:______

U of Washington Stat302 Winter 2017 Final

Part I. Python and Python vs R (40 Points) Evaluate these in Python, first 3 in Python(left) and R (right):

a = [4,7,1,2,3]a = c(4,7,1,2,3)

a[1]a[1]

a[-1]a[-1]

a[1:3]a[1:3]

[2] + a[1:]

a + a

a[1] * 3

[a[1]] * 3

[i % 2 for i in a]

{i % 2 for i in a}

[i for i in a if i%2 == 1]

[a[a[0]]]

[("A" if x=="a" else x) for x in list("cat")]

["A" for x in list("cat") if x is not "a"]

b = [1,2,3]

c = b + b

d = [b,b]

b[1] = 4

L = {k*k : k for k in range(1000)}

L[81]

Part II (16 points, this shouldn’t be hard):

7 questions, 2 points each:

In the Python scientific computing ecosystem, which library does what? Match each description with the package:

Function optimization and root finding / A / Base Python (standard library)
Math functions like sin() and exp() / B / MatPlotLib
Matrices and Arrays / C / NumPy
Reading and writing data tables, time series / D / Pandas
Regression models and statistical tests / E / SciPy
Scatterplots and histograms / F / SimPy
Symbolic algebra, like differentiation / G / StatsModels

2 points: Suppose you were guessing at the last question at random (matching 7 answers to 7 questions, no repetitions). What's the probability you'll get every question right? Write an expression (in any of the 3 languages) to calculate this probability. You can use algebra or simulation or call a library function, it's up to you. (If this seems hard, you’re overthinking it)

For 10 extra credit points (this is hard, only if you have extra time): Same situation. What's the probability that you'll get exactly 0 questions right?

Part III. SAS (6 questions * 4 points = 24 pts)

What does each statement block (there are 6 here) output, or if it doesn’t produce output, what does it do?

data Kittens;

input Name $ WeightLBSTailLengthIn Furriness;

cards;

Sherst 15 12 99

Sloan 18 10 80

end;

run;

procprintdata=Kittens;

whereWeightLBSge16;

run;

data Cats;

set Kittens;

LBSPerIn = WeightLBS / TailLengthIn;

run;

data X1;

doi=1,2,5;

output;

end;

run;

data X2;

doi=1,2,5;

output;

end;

do j=1to3;

output;

end;

run;

data X3;

doi=1to3;

do j=1to3;

k = i * j;

ifi > j thenoutput;

end;

run;

Part IV. Regression (5 questions * 4 points = 20 pts)

OLS Regression Results

======

Dep. Variable: TOTEMP R-squared: 0.995

Model: OLS Adj. R-squared: 0.992

Method: Least Squares F-statistic: 330.3

Date: Sun, 01 Feb 2015 Prob (F-statistic): 4.98e-10

Time: 09:32:37 Log-Likelihood: -109.62

No. Observations: 16 AIC: 233.2

Df Residuals: 9 BIC: 238.6

Df Model: 6

Covariance Type: nonrobust

======

coef std err t P>|t| [95.0% Conf. Int.]

------

const -3.482e+06 8.9e+05 -3.911 0.004 -5.5e+06 -1.47e+06

GNPDEFL 15.0619 84.915 0.177 0.863 -177.029 207.153

GNP -0.0358 0.033 -1.070 0.313 -0.112 0.040

UNEMP -2.0202 0.488 -4.136 0.003 -3.125 -0.915

ARMED -1.0332 0.214 -4.822 0.001 -1.518 -0.549

POP -0.0511 0.226 -0.226 0.826 -0.563 0.460

YEAR 1829.1515 455.478 4.016 0.003 798.788 2859.515

======

Omnibus: 0.749 Durbin-Watson: 2.559

Prob(Omnibus): 0.688 Jarque-Bera (JB): 0.684

Skew: 0.420 Prob(JB): 0.710

Kurtosis: 2.434 Cond. No. 4.86e+09

======

(this example is from the StatsModels reference - link)

1)What fraction of the variance is explained by the regression model?

2)Which variables are significant at the 5% level?

3)If YEAR=2010 and GNPDEFL and other variables are all 0, what’s the predicted value of TOTEMP (write out what numbers to add/multiply, don’t bother calculating)?

4)If you ran this experiment again (i.e. applied the same model to data generated by the same process), by how much would the coefficient of ARMED typically change? (pick the one number reported on this table that best answers this)

5)Write out a command/expression to run this model (in R or Python or SAS – your choice)