Bios 500 PROJECT

FALL 2004

Overview

Why: One of the objectives of this course is to give you the opportunity to acquire the skills necessary for the appropriate statistical analysis of real data – the kind you will use for your thesis and later in the workplace. You will need to formulate research questions, translate them into statistical hypotheses, choose an appropriate statistical method to investigate the hypotheses, prepare the relevant data and, finally, draw conclusions. At each step, decisions will have to be made based on scientific, statistical thinking. Lecture and lab presentations, homework and exams are structured with this objective in mind. However, the skills learned can only be synthesized when you get your hands on a real dataset and actually perform a “start-to-finish” analysis.

Who: Students will work in groups of 4 to complete this assignment. Groups will be assigned by random selection from all sections of the Bios500 course. Collaborative research is a workplace reality and many players have a hand in the final project reports and journal articles. We therefore anticipate that you will benefit in multiple ways from working on this project with your fellow students.

What: We have prepared a variety of datasets for you to choose from. You and your group will look over the available data and come up with research questions that you are interested in answering. Each student in the group will have primary responsibility for developing and testing one research question. However, all members will review the methods and accuracy of the analyses performed by the others in the group and participate in the final product. In the course of the project you will cover the following processes:

  1. formulating the general research questions
  2. formally stating the statistical hypotheses you wish to test
  3. cleaning the data
  4. manipulating the data (creating new variables, recoding old variables, subsetting)
  5. producing basic descriptive statistics
  6. examining the data to decide on the most appropriate data analysis procedures
  7. analyzing the data using SAS (no other statistical software may be used)
  8. drawing conclusions (on the basis of your sample, making inferences about the population from which you have sampled)
  9. writing a clear and succinct report describing your methods, results, and conclusions.

When: This project will be completed over the course of the semester. As you learn new statistical tools you will be using them to complete the steps necessary to draw conclusions about your hypotheses. Here is the tentative timeline:

Sep 19-26Form your groups. Review datasets posted on web.

Sep 27-Oct 3Group should meet to choose a dataset, formulate research questions and complete p.5 of this document. Group should sign up for an appointment with a BIOS 500 faculty member (signup sheet will be posted outside GCR 336)

Oct 4 – 15Meet with BIOS500 Faculty member to discuss research questions

Oct 16 – 28Perform data cleaning and descriptive statistical analyses for relevant variables.

Oct 29Turn in tables showing basic descriptive statistics to your BIOS 500 instructor.

Oct 30 – Dec 6Perform each of the four statistical hypothesis tests. Write report.

Dec 7Turn in report to your BIOS 500 instructor.

Work on this project will primarily be done outside of lab hours.

Grading: Lab counts for 25% of your BIOS 500 grade, with the lab hw worth 9%, the lab quiz 1% and this project 15%.

Instructions

There are a number of datasets posted on the project section of the lab website. They represent both different health topics and different types and sizes of studies. More information about each dataset is posted on the web. Please read the descriptions of the datasets and variables, get together with your group and discuss possible research questions of interest and translate your research questions into statistical hypotheses. Research questions will be stated fairly generally, while the statistical hypotheses will be more specific, detailed and technical statements involving statistical parameters whose values are to be investigated. For example, based on one of the datasets, you might formulate the following research question and corresponding statistical hypothesis: “I think there will be a relationship between education level and eating more fruits and vegetables. Specifically, I hypothesize that the average daily consumption of fruits and vegetables will be higher for high school graduates than for people who have not completed high school.”

Your group must choose only one dataset to explore, so this may take some diplomacy and persuasion. Eventually you must agree on 4 hypotheses of most interest to all of you, using only one of the datasets. We encourage you to look at the literature (try out Medline and other library services) and talk with students outside your group.

Two types of hypotheses must be included. The first might be described as comparing an average value of a trait with a benchmark value. Examples of this would be: “I think that the average cholesterol level in Asian immigrant males in our study is lower than 200 mg/dl” and “I think that individuals with type A personalities exercise more than the national norm.”

The second kind of hypothesis we want you to include is one that compares the average value of a trait for two groups in the study. Examples of this would be: “ I think that the average blood pressure of Asian immigrant males will be lower than the average European immigrant males’ blood pressure (where data for both groups are included in your study)” or “If we compare offspring of mothers who drank alcohol in the first trimester to offspring of mothers who did not drink alcohol in the first trimester, the average birth weight will be higher in the non-drinking group (where date for both groups are included in your study).

Please consider substantial (non-trivial) issues when developing your hypotheses, in other words, it should have potential for impact on some aspect of public health.

Once the group has made some tentative decisions, fill out the attached form (page 5 of this document) and set a time to meet with a BIOS 500 faculty member. Signup sheets will be posted outside GCR 336. The faculty member will help you refine your research questions and statistical hypotheses, if needed.

Please complete and bring this sheet with you when the group meets with the BIOS 500 Faculty Member

Group Members :

Nameemailphone#

1.

2.

3.

4.

Dataset Selected:

Research Question 1: ______

Corresponding Statistical Hypotheses: ______

______

Research Question 2: ______

Corresponding Statistical Hypotheses: ______

______

Research Question 3: ______

Corresponding Statistical Hypotheses: ______

______

Research Question 4: ______

Corresponding Statistical Hypotheses: ______

______
Format of Final Report (to be turned in by 4:30pm Tuesday, Dec 7, 2004)

We would like the report to resemble a journal article. Please use the journal Epidemiology as your guide. It should contain the following sections:

  • Title page with alphabetical list of authors
  • Abstract
  • Introduction and brief literature review that provide some background and motivation for your hypotheses.
  • Methods
  • Results, including table(s), figure(s)
  • Discussion and conclusions, include limitations and implications
  • References

This report should be a maximum of 5 typed pages, not including the SAS log. For consistency, please use New Times Roman font, size 12, double-spaced.

A man who uses a great many words to express his meaning is like a bad
marksman who, instead of aiming a single stone at an object, takes up
a handful and throws at it in hopes he may hit.

-Samuel Johnson,
lexicographer (1709-1784)

Additionally, we require the SAS log recording your run of the program in which you prepare your data and perform all analyses (more on that below).

The SAS log. We anticipate that each group member will have prepared clean variables, created variables needed for their analyses, and run a statistical test. Once each person’s program has been debugged, please collate into a single program (one data step, multiple procedures). This program must follow the guidelines we give in the lab notebook (page 18), with header, comments, spacing, etc. Run the program, print the log and attach it to your report.

1