CLASS SESSIONS
Monday, June 13, 2011 – Friday, June 17, 2011

1:00PM to 5:00PM

INSTRUCTOR
Katherine M. Keyes, PhD MPH
Columbia University Epidemiology Merit Fellow
(212) 543-5002

722 West 168th Street, 2nd Floor, Suite 229C

COURSE DESCRIPTION

This course will provide participants with practical skills to analyze data arising from complex epidemiologic sampling designs. Complex survey data violate typical assumptions about simple random samples of independent observations, thus requiring specialized statistical techniques. National Health and Nutrition Examination Survey (NHANES) data will be used for applied demonstrations, illustrating concepts applicable to all data sets arising from complex survey designs. We will discuss the theory behind complex sampling strategies and the necessity of applying appropriate statistical techniques to analyze these data and make valid inferences. We will demonstrate the appropriate use of sampling weights in the NHANES data and how the appropriate weight is specific to the research question being asked. We will demonstrate how to obtain basic descriptive statistics, appropriate variance estimates, regression parameters, and survival analysis output in SAS and SAS-callable SUDAAN software.

COURSE LEARNING OBJECTIVES

1. Understand the theoretical justifications for complex survey designs, and why special statistical procedures are needed.

2. Learn how to find publicly available data online, download and organize datasets, and manipulate data for analysis

3. Use SUDAAN software to analyze complex survey data, and be able to do basic frequency and crosstab analysis, linear and logistic regression, and Cox proportional hazards models.

RECOMMENDED COURSE READING LIST

The recommended textbook is:

Research Triangle Intitute (2008). SUDAAN Language Manual, Release 10.0. Research Triangle Park, NC: Research Triangle Institute.


COURSE STRUCTURE

Class time is 20 hours total. The structure of each class will be:

1:00-2:00 Lecture

2:00-3:30 Guided exercise

3:30-3:45 Break

3:45-5:00 Independent research project

Each class is divided into three parts:

Lecture. Dr. Keyes will give an overview of the topic of the day, review basic statistical theory and practice, and provide an overview into the use of SUDAAN software for the topics of the day.

Guided exercise. Dr. Keyes will provide example code for the topics of the day and an in-class assignment sheet. Students will run the code provided an answer the in-class assignment questions. We will discuss the answers as a group.

Independent research project. On the first day of class, students will identify a research question of interest in the NHANES data. By the end of the week, students will have completed an analysis of this research question on their own using the tools discussed in lecture.

COURSE SCHEDULE

Session 1 – Introduction to complex survey data
6-13-11 / Lecture:
Discuss the structure of common complex sampling designs
Discuss finding publicly available data, downloading and organizing data
Merging data into SAS, manage and manipulate study variables
In-Class Assignment: Locate variables, download data, append and merge data, identify, recode, and evaluate missing data in SAS
Independent practice: Select a research question in NHANES that will guide your independent practice throughout the week. Locate variables, download data, append and merge data, identify, recode, and evaluate missing data in SAS.
Required reading:
NHANES general data release document: http://www.cdc.gov/nchs/data/nhanes/nhanes_05_06/general_data_release_doc_05_06.pdf
Session 2 – Univariate statistics
6-14-11 / Lecture:
Introduction to SUDAAN software basic language
Create macros in SAS to output large quantities of univariate statistics
Review application of basic statistical tests (what tests, when, why, assumptions)
In-Class Assignment: Practice the CROSSTAB, RATIO, and DESCRIPT procedures in SUDAAN to do univariate analysis and bivariate statistics including chi-square and t-test
Independent practice: Students will create a table of univariate and bivariate statistics associated with their research question on interest, categorize variables appropriately, use chi-square and t-tests. Also, extract and code potential confounders such as sex, race, age, and income.
Session 3 – Regression
6-15-11 / Lecture:
Introduction to linear, logistic, Poisson, and polytomous regression procedures in SUDAAN
Review application of regression procedures (what procedure, when, why, assumptions)
Discuss variable categorization (when are categories necessary, how should you categorize your variables)
In-Class Assignment: REGRESS, RLOGIST, MULTILOG, and LOGLINK procedures in SUDAAN
Independent practice: Students will conduct and interpret regression procedures related to their research question of interest.
Session 4 – Survival analysis
6-16-11 / Lecture:
Introduction to survival analysis and Cox proportional hazards models in SUDAAN
Discuss differences between Kaplan-Meier and life table
Discuss assumptions of Cox proportional hazards models
In-Class Assignment: KAPMEIER, SURVIVAL procedures in SUDAAN
Independent practice: Students will conduct and interpret survival analyses related to their research question of interest.
Session 5 – Special procedures and options
6-17-11 / Lecture:
Discuss data imputation: theories and assumptions
Discuss weight development
In-Class Assignment: HOTDECK for imputation and WTADJUST for creating weights
Independent practice: Students will finish their independent research project, practice imputing data for at least one of their variables of interest

3 of 3