Data Set 5

Low birth weight is an outcome that has been of concern to physicians for years. This is due to the fact that infant mortality rates and birth defect rates are very high for low birth weight babies. A woman's behaviour during pregnancy (including diet, smoking habits, and receiving prenatal care) can greatly alter the chances of carrying the baby to term and, consequently, of delivering a baby of normal birth weight.

The goal of this study was to identify risk factors associated with giving birth to a low birth weight baby (weighing less than 2500 grams). Data were collected on 189 women, 59 of whom had low birth weight babies and 130 of whom had normal birth weight babies. The observed predictor variables have been shown to be associated with low birth weight in the obstetrical literature. The goal of the current study[1] was to ascertain if these variables were important in the population being served by the medical centre where the data were collected.

Description of data:

ID Identification Code

LOW Low Birth Weight (0=Birth Weight >= 2500g, 1=Birth Weight < 2500g)

AGE Age of the Mother in Years

LWT Weight of Mother in Pounds at the Last Menstrual Period

RACE Race (1 = White, 2 = Black, 3 = Other)

SMOKE Smoking Status During Pregnancy (1 = Yes, 0 = No)

PTL History of Premature Labor (0 = None 1 = One, etc.)

HT History of Hypertension (1 = Yes, 0 = No)

UI Presence of Uterine Irritability (1 = Yes, 0 = No)

FTV Number of Physician Visits During the First Trimester

BWT Birth Weight in Grams


Preliminary analysis:

The initial suggestion for this study was to do a logistic regression using LOW as the binary response. However, a more standard approach would be to do a linear regression using the continuous variable BWT as the response. In the analysis section of your report, describe why the latter approach is preferable.

The challenging part of this project is to formulate the question of interest such that it is answerable using statistical methods. In particular, since there is correlation among the predictor variables, we can’t separately assess the importance of each variable. Note that there is no one “right” solution here; you are simply required to propose a reasonable approach.


[1] Hosmer and Lemeshow, Applied Logistic Regression, Wiley, (1989).