Public Policy 605, Gerontology 604

Statistics II

Analysis of Variance and Chi-Square Tests

1. Review

Probability distributions, the normal distribution, testing differences in two sample means, correlation.

Read:

  • Crown, Statistical Models..., Chapter 2 Inferential Statistics and Measures of Association, Alternative Concepts of Probability, Useful Probability Distributions: The Normal Distribution, Testing Differences in Two Sample Means, Correlation, pp. 5-9, 20-22, 23-25.

2. Analysis of Variance

One-way analysis of variance, two-way analysis of variance.

Read:

  • Introductory Statistics, Chapter 10 Analysis of Variance (ANOVA) (with appendices).
  • Statistical Models..., Chapter 2 Inferential Statistics and Measures of Association, One-Way Analysis of Variance, pp. 22-23.

Labs:

  • 1 Analysis of Variance .
  • Stata Data File Sysage.dta (self-extracting file).

Problem Sets:

  • 1 Analysis of Variance .

3. Chi-Square Tests

Tests for multinomials: goodness of fit, tests for independence: continngency tables.

Read:

  • Introductory Statistics, Chapter 17 Chi-Square Tests.
  • Statistical Models..., Chapter 2 Inferential Statistics and Measures of Assocation, The Chi-square Test, pp. 19-20.

Labs:

  • 2 Chi-Square Tests .

Problem Sets:

  • 2 Chi-Square Tests .

Ordinary Least Squares Regression, Classical Linear Regression

4. Fitting a Line by Least Squares

Ordinary Least Squares (OLS), introduction to statistical models and estimators.

Read:

  • Introductory Statistics, Chapter 11 Fitting a Line (with appendices).
  • Statistical Models..., Chapter 1 Introduction to Multivariate Modeling.
  • Guide to Econometrics, Chapter 1 Introduction.

Problem Sets:

  • 3 Least Squares Computation .

5. Correlation and Regression

Simple correlation, correlation and regression.

Read:

  • Introductory Statistics, Chapter 15 Correlation, Sections 15-1 and 15-2 (with appendices).

6. The Statistics of Bivariate Regression

The regression model and assumptions, the disturbance term, sampling properties, confidence intervals and hypothesis testing.

Read:

  • Introductory Statistics, Chapter 12 Simple Regression (with appendices).
  • Statistical Models..., Chapter 3 Regressional Analysis, up to Multiple Regression, pp. 27-38.
  • Guide to Econometrics, Chapter 1, Section 1.2 The Disturbance Term, pp. 2-4.

Handouts:

  • Hypothesis Testing .
  • Sample Regression Output .

Labs:

  • 3 Bivariate Regression Model .
  • 4 Regression Statistics .
  • Stata Program Illustrating the Sampling Properties (self-extracting archive file).
  • Help File for Above Program .

Problem Sets:

  • 4 Generating a Regression Model in Stata .
  • 5 Regression Statistics .

7. Elementary Matrix Algebra

Matrices and vectors, matrix operations, scalar (inner) products, matrix multiplication, identity matrix, inverse matrix, application to regression.

Read:

  • Matrix Algebra: An Introduction:
  • Chapter 1 Introduction
  • Chapter 2 Elementary Operations and the Inverse of a Matrix
  • The Inverse of a Square Matrix, pp. 33-35.
  • Application of the Inverse of a Matrix to the Solution of a System of Equations, pp. 38-41.
  • Application in Regression Analysis, pp. 41-46.

Datasets:

  • U.S. Air Pollution Dataset (self-extracting achive).
  • U.S. Air Pollution Dataset (same as above, zip archive).

Labs:

  • 5 Matrix Algebra .

Problem Sets:

  • 6 Matrix Algebra.

8. Multiple Regression

The regression model and its OLS fit, confidence intervals and statistical tests, when to drop or keep regressors, interpretation of regression coefficients, direct and indirect effects, bias from omitting confounding regressors, path analysis, correlation in multiple regression, beta coefficients, adjusted r-square, mechanical properties of OLS.

Read:

  • Introductory Statistics
  • Chapter 13 Multiple Regression
  • Chapter 15 Section 15-4 Correlation in Multiple Regression
  • Statistical Models..., Chapter 3 Regression Analysis, pp. 38-44
  • Multiple Regression
  • Beta Coefficients
  • Ajusted R2
  • A Multiple Regression Example
  • Guide to Econometrics, Technical Notes, "The OLS estimator has several well-known mechanical properties...", pp. 52-53.

Handouts:

  • Common Multiple Regression Notation .

Labs:

  • 6 Confounding Variables Bias .

Problem Sets:

  • 7 Multiple Regression Basics .
  • 8 Air Pollution .

9. Dummy Variables

Categorical variables, parallel lines for two categories, several categories, operationalizing a categorical dummy as a set of dummy variables, the reference category method versus including all categories without a constant, main effects, interactions, fully-interactive models, ANOVA as regression with dummy variables.

Read:

  • Introductory Statistics, Chapter 14 Regression Extensions, pp. 434-448
  • Section 14-1 Dummy (0-1) Variables
  • Section 14-2 Analysis of Variance (ANOVA) by Regression
  • Statistical Models..., Chapter 3 Regression Analysis, pp. 63-70
  • Categorical Expanatory Variables in Regression
  • Dummy Variables with Multiple Categories
  • Regression and Two-Way ANOVA
  • Interaction Terms
  • Guide to Econometrics, Chapter 14 Dummy Variables

Handouts:

  • Interpreting Dummy Variable Coefficients .
  • The Systolic Blood Pressure Model .
  • The First Commandment for Specifying Non-Ordered Categorical Variables .

Labs:

  • 7 Dummy Variables Versus ANOVA .
  • A Stata Command for Creating Dummy Variables and Interactions .
  • dummy.ado (self-extracting archive file).
  • dummy.ado (same as above, zip archive file).

10. The F-Test and the Chow Test

Testing a joint hypothesis with the F-test, interval estimation for a parameter vector, likelihood ratio, Wald, and Lagrange multiplier tests, bootstrapping, testing hypotheses involving linear constraints on parameters, nested models, the Chow test.

Read:

  • Statistical Models..., Chapter 3 Regression Analysis, Specialized F-Tests, pp. 44-47
  • Guide to Econometrics
  • Chapter 4 Interval Estimation and Hypothesis Testing
  • General Notes to Chapter 14
  • "Dummy variables play an important role...", p. 229.
  • "The advantage of the dummy variable variant of the Chow test...", p 229.
  • "The Chow test as described earlier cannot be performed when...", pp. 230-231.

Handouts:

  • F-Tests for Hypotheses Involving Linear Constraints on Coefficients .
  • Chow Test Example .
  • Fully-Interactive Version of the Chow Test Example .
  • An F-Test for Ordered Categorical Variables .

Problem Sets:

  • 9 Chow Test .

11. Nonlinear Specifications

The meaning of linearity, quadratic and polynomial specifications, the exponential function and logarithmic specifications, diagnosis by inspection of residuals, RESET specification tests, whether or not R-squares of different specifications can be compared.

Read:

  • Introductory Statistics, Chapter 14, pp. 449-466
  • 14-3 Simplest Nonlinear Regression
  • 14-4 Nonlinearity Removed by Logs
  • 14-5 Diagnosis by Residual Plots
  • Statistical Models..., Chapter 4 Nonlinearities and Categorical Explanatory Variables, pp. 53-63
  • The Meaning of Linearity
  • Polynomial Regression
  • Exponential Functions
  • Logarithmic Functions
  • Comparing Alternative Model Specifications
  • Guide to Econometrics
  • paragraph preceding Chapter 6, section 6.3 on RESET test, p. 96
  • Chapter 6, Section 6.3 Nonlinearity
  • Chapter 6, General Notes, "Thursby and Schmidt suggest that the best variant of RESET...", p. 104.
  • Chapter 6, General Notes, Section 6.3 Nonlinearity, pp. 104-107, especially:
  • "A distinct danger in using the highest R-squared criterion...", p. 105.
  • "Recursive residuals ...", p. 105.
  • Chapter 6, Technical Notes, "Below is a summary of the more popular nonlinear functional forms...", pp. 108-109.

Handouts:

  • Exponential and Logarithm Rules .
  • Interpretation of Coefficients in Logarithmic Models .

Labs:

  • 8 Nonlinear Specification and Diagnosis .
  • reset.ado, reset.hlp, and cps78wnh.dta Stata files (self-extracting archive file).
  • same as above (zip archive file).

Problem Sets:

  • 10 Polynomial and Logarithmic Specifications .
  • 11 Alternative Specifications of the Air Pollution Model .

15. Mulicollinearity

Definition of multicollinearity. Perfect multicollinearity and the indeterminancy of coefficient estimates. Near multicollinearity leads to high standard errors of regression coefficients. How to diagnose a multicollinearity problem. The variance inflation factor (VIF). Some strategies for dealing with multicollinearity. A useful formula for the variance of a regression coefficient.

Read:

  • Introductory Statistics, Chapter 15, section 15-5, Multicollinearity.
  • Statistical Models..., Chapter 5, Violations of Regression Assumptions, pp. 72-77.
  • Guide to Econometrics
  • Chapter 11, Violating Assumption Five: Multicollinearity.
  • General Notes to Chapter 11: Everthing except notes on the ridge estimator, Stein estimator, or Bayesian estimator.
  • Technical Notes to Chapter 11.
  • Chapter 12, Incorporating Extraneous Information. (skim)

Handouts:

  • Multicollinearity Definition.

Labs:

  • 9 Multicollinearity.
  • Tables from this lab.
  • ado and help files for this lab.

16. Heteroskedasticity

Assumptions of the linear regression model. Consequences of heteroskedastic disturbances. The GLS solution. Two-stage EGLS estimators. Diagnosing and testing for heteroskedasticity. Breush-Pagan test. Stata's hettest command. Heteroskedasticity-consistent estimates of standard errors of regression coefficients. The robust option in Stata.

Read:

  • Statistical Models..., Chapter 5, Violations of Regression Assumptions, pp. 77-87.
  • Guide to Econometrics
  • Chapter 8, Violating Assumption Three: Nonspherical Disturbances.
  • Section 8.1 Introduction
  • Section 8.2 Consequences of Violation
  • Section 8.3 Heteroskedasticity
  • General Notes to Chapter 8, Section 8.3 Heteroskedasticity.
  • Technical Notes to Chapter 8:
  • "The heteroskedasticity-consistent estimator...", p. 133
  • Section 8.3 Heteroskedasticity, pp. 134-135.

Handouts:

  • Heteroskedasticy.

Labs:

  • 10 Heteroskedasticity.
  • data, ado, and help files for this lab. (zip archive file)

20. Probit and Logit

Dichotomous dependent variables as censored latent continuous variables with critical threshold values. The probability of being above or below the critical threshold value. The likelihood function for probit and logit models. The linear probability model and the origin of logit models as data admissable models. Interpreting logit output. Odds and odds ratios. Hypothesis testing. The likelihood ratio test.

Read:

  • Statistical Models..., Chapter 6, Linear Probability, Logit, and Probit Models, pp. 104-123.
  • Guide to Econometrics, Chapter 15, Qualitative and Limited Dependent Variables,
  • Section 15.1, Dichotomous Dependent Variables
  • General Notes to Chapter 15, section 15.1
  • Technical Notes to Chapter 15, section 15.1

Handouts:

  • Probit and Logit Models.
  • Figure 2 for prior handout (.bmp file).
  • Likelihood Ratio Test.

Labs:

  • 11 Evaluating Probit Models.
  • PSID76.dta for Lab 11 (self-extracting archive file).
  • 12 Evaluating Logit Models.
  • cancer.exe for Lab 12 (self-extracting archive file).

Problem Sets:

  • 12 Mortgage Choice.
  • Data set for Problem 12 (self-extracting archive file).