In-Class Group Project Part 1

(due at the end of class)

The in-class data analysis project will be examining the relationship between race and prevalent diabetes. The data for this project come from the 2007 Behavior Risk Factor Surveillance System (BRFSS) located on the CDC website for the states of South Carolina and Pennsylvania. A subset of the BRFSS dataset has been place on the class website.

Project part 1 relates to becoming familiar with the dataset and understanding the study population and study design of the BRFSS. Prior to class you should have read Chapter 3, “Choosing the Study Subjects: Specification, Sampling, and Recruitment.”

  1. Getting the data and data dictionary
  2. Download the class dataset from the class webpage called ‘scpadata’. Remember you need a libname statement in SAS to use the data.
  1. Create a temporary dataset of scpadata called ‘scpa’.
  1. Download the data dictionary from the CDC website for 2007. It is called ‘codebook_07’.
  1. Download the BRFSS User Guide.
  1. The data
  2. How many observations are there in the dataset?
  1. How many variables are there in the dataset?
  1. How many observations are there for South Carolina and Pennsylvania (you will need to look up state codes in the data dictionary)?
  1. Which state has more people in the database with diabetes? Which state has a higher prevalence of diabetes?
  1. What is the mean and median age of people in SC and PA in the database?
  1. The study population
  2. Define the target population for the BRFSS. Include inclusion and exclusion criteria.
  1. Define the accessible or source population for the BRFSS. Include inclusion and exclusion criteria that pertain to differentiating between the source and target population. Why do you think this source population was chosen? What are its strengths and weaknesses?
  1. List groups of people who are systematically excluded from the BRFSS source population?
  1. What is a sampling frame? What sampling frame is used for the BRFSS?
  1. Define the intended sample for the BRFSS.
  1. Define the actual sample for the BRFSS.
  1. What was the overall response rate for the BRFSS in 2007? What were the response rates in SC and PA?
  1. In general, why is data weighting performed? In the BRFSS what variables are weighted and why?
  1. Is the BRFSS population-based? What is the difference between population-based and clinical study populations?
  1. In general, are results from a population-based study or a clinic based study more generalizable? Does generalizability affect internal or external validity?
  1. In what types of studies would it be more important to use a population-based study population? Why?
  1. If you invite members from the community to participate in a cohort study, is this population-based research? Please explain your answer.
  1. The study design
  2. What is the study design for the BRFSS?