STAT 601 ~ Assignment #1 (Due Monday, June 4th by 11:59 PM)
75 points
Note this assignment has been shortened from the one shown in the walk-through, problems 2, 5, and 8 have been removed.
Review the following:
Narrated Powerpoint Lectures under headings 1 – 4
Non-narrated Powerpoint Lectures under headings 1 – 4(optional)
Lecture Handouts : 1 – 4, there are blank and annotated ones.
1. An except taken from “Lactation suppression: a pilot study” published in the Australian Journal of Advanced Nursing (1986) is as follows:
“The first method (experimental group) was based on a new approach which claimed that post-partum breast discomfort in non-breastfeeding women would be reduced if milk was extracted from the breasts. The second approach (hospital policy) involved adhering to the normal hospital methods. … Non-breastfeeding women admitted to Ward A were assigned to the experimental group (n = 95) and non-breastfeeding women admitted to Ward B acted as the control group or hospital policy group (n = 57). … Three times daily, participants entered details on the questionnaire relating to self-assessment of pain, use of analgesia and methods employed in suppressing lactation.”
Describe any design problems that are evident. Also discuss biases or confounding that might have arisen from any design deficiencies. (4 pts.)
3. An experiment was carried out to measure the effect of growth hormone on girls affected by a growth disorder called Turner’s syndrome. All 34 girls in the study were given the growth hormone. Their heights were measured at the time the hormone was first administered and again one year later. What are some problems with a study conducted in this way? (2 pts.)
4. A survey of nurse managers at major hospitals was taken. The survey revealed that one nurse manager in five was under medication for stress and almost half had visited doctors because of the pressure they were under. These figures came from the 250 questionnaires returned from the 2500 that were sent out. How reliable do you think the results are and why? (3 pts.)
6. As part of a clinical study several variables were measured for each patient.Classify each the variables measured according to its data type (C = Continuous, O = Ordinal or N = Nominal).(6 pts.)
a) Age
b) Gender
c) Marital Status
d) Race
e) Previous hospitalization?
f) Anxiety Score
g) Cholesterol level
h) Smoking status
i) Alcohol consumption
j) Family history of cancer?
h) Blood pressure
i) White blood cell count
7. Using the article labeled “VA Cost Study” on the D2L site, answer the following questions using Table 1 on pg. 1378 of the paper. (8 pts.)
a) Classify each of the variables according to data type (C = continuous, O = ordinal, N = nominal).
b)How many survey responders were there? Nonresponders?
c)How many subjects and what percentage of the responders had a prior stroke?
d)How poststroke Modified Rankin Scale Score was most prevelant amongst the responders?
e)How many subjects and what percentage of the nonresponders had cognitive impairment at discharge?
f)On which characteristics/variables do the responders and nonresponders significantly differ on? Later in the course we will examine different statistical methods that are appropriate for making these types of decisions based upon our data.
Problems 9 – 12 - Right Heart Catheterization
EFFECTIVENESS OF RIGHT HEART CATHETERIZATION IN
CRITICALLY ILL PATIENTS (JAMA, 1996), Connors et al.
An excerpt from the abstract….
OBJECTIVE: To examine the association between the use of right heart catheterization (RHC) during the first 24 hours of care in the intensive care unit (ICU) and subsequent survival, length of stay, intensity of care, and cost of care.
DESIGN: Prospective cohort study conducted at U.S. teaching hospitals between 1989 and 1994.
SUBJECTS: A total of 5735 critically ill adult patients receiving care in an ICU for 1 of 9 pre-specified
disease categories (MOSF w/ sepsis, MOSF w/ malignancy, lung cancer, COPD, coma, colon cancer, cirrhosis, CHF, ARF).
Variables that make up this database our described below:
Demographics and Disease Category
Variable nameVariable Definition
AgeAge
SexSex
RaceRace (white, black, other)
EduYears of education
IncomeIncome (Under $11k, $11 – $25k, $25 - $50k, > $50k)
NinsclasMedical insurance (Private, Private & Medicare, No Insurance, Medicare,
Medicaid, Medicaid & Medicare)
Cat1Primary disease category (MOSF w/ sepsis, MOSF w/ malignancy,
lung cancer, COPD, coma, colon cancer, cirrhosis, CHF, ARF)
Categories of Admission Diagnosis
Diagnosis variables are all coded as (Y or N)
RespRespiratory Diagnosis
CardCardiovascular Diagnosis
NeuroNeurological Diagnosis
GastrGastrointestinal Diagnosis
RenalRenal Diagnosis
MetaMetabolic Diagnosis
HemaHematologic Diagnosis
SepsSepsis Diagnosis
TraumaTrauma Diagnosis
OrthoOrthopedic Diagnosis
Das2d3pcDASI (Duke Activity Status Index)
Dnr1DNR status on day1 (Yes or No)
CaCancer (Yes, No, Metastatic)
Surv2md1Logistic model estimate of the probability of surviving 2 months
dth30Patient died within 30 days? (Yes or No)
Aps1APACHE score
Scoma1Glasgow Coma Score
Wtkilo1Weight
Temp1Temperature
Meanbp1Mean blood pressure
Resp1Respiratory rate
Hrt1Heart rate
Pafi1PaO2/FIO2 ratio
Paco21PaCo2
Ph1PH
Wblc1WBC
Hema1Hematocrit
Sod1Sodium
Pot1Potassium
Crea1Creatinine
Bili1Bilirubin
Alb1Albumin
Categories of Comorbidities Illness (0 = no, 1 = yes)
CardiohxAcute MI, Peripheral Vascular Disease, Severe Cardiovascular Symptoms
(NYHA-Class III), Very Severe Cardiovascular Symptoms (NYHA-Class IV)
ChfhxCongestive Heart Failure
DementhxDementia, Stroke or Cerebral Infarct, Parkinson’s Disease
PsychhxPsychiatric History, Active Psychosis or Severe Depression
ChrpulhxChronic Pulmonary Disease, Severe Pulmonary Disease, Very Severe
Pulmonary Disease
RenalhxChronic Renal Disease, Chronic Hemodialysis or Peritoneal Dialysis
LiverhxCirrhosis, Hepatic Failure
GibledhxUpper GI Bleeding
MalighxSolid Tumor, Metastatic Disease, Chronic Leukemia/Myeloma, Acute
Leukemia, Lymphoma
ImmunhxImmunosupperssion, Organ Transplant, HIV Positivity, Diabetes Mellitus
Without End Organ Damage, Diabetes Mellitus With End Organ Damage,
Connective Tissue Disease
Transhx Transfer (> 24 Hours) from Another Hospital
AmihxDefinite Myocardial Infarction
Swang1*Right Heart Catheterization (RHC vs. No RHC)
SadmdteStudy Admission Date
DthdteDate of Death
LstctdteDate of Last Contact
DschdteHospital Discharge Date
Death *Death at any time up to 180 Days (Yes or No)
PtidPatient ID (for labeling purposes only)
The dataset in JMP format is available here: Right heart catheterization. It is available under the Datasets section of the website and is call Right Heart Catheterization.
QUESTIONS AND TASKS
9) Complete the following table for comparing and contrasting the two treatment groups
in this study on the following demographics. (10 pts.)
CharacteristicsNo RHC (n = 3551)RHC (n = 2184)
of the Subjectsn%n%
Sex
Female
Male
Race
Black
White
Other
Income
< $11,000
$11,000 - $25,000
$25,000 - $50,000
> $50,000
10) Use appropriate descriptive methods to examine the distribution of the APACHE score (variable name = aps1). Discuss and/or report the following summary statistics:
a)Mean, median, range, SD, 25th percentile (Q1), 75th percentile (Q3),
interquartile range (IQR), and the coefficient of variation (CV) (3 pts.)
b)How would you characterize the distributional shape for the APACHE scores? (1pt.)
c)Which is a better measure of typical value, the mean or the median, or doesn’t it matter? Explain. (2 pts.)
The next questions ask you to examine the relationship between two variables from this study. In question 11 you will examine the relationship between the 30-day mortality (dth30) and whether or not they had a right heart catheter /Swan-Ganz line (swang1) put in or not. In question 12 you will compare the APACHE scores of those who had a heart catheter put in to the scores of those who did not. You will probably want to review the examining relationships between variables Powerpoint lecture first.
11) Compare the 30-day mortality rates of patients who had a right heart catheter put in
to those who did not. Construct an appropriate graphical display and summary
table to do this.
(Use variables: dth30 = 30-day mortality indicator & swang1 = RHC or No RHC)
Summarize the important findings from this analysis in one or two sentences. Be
sure to incorporate all relevant computer output in your assignment that you turn in.
(6 pts.)
12) Compare the APACHE scores of patients who had a right heart catheter put in to
the scores of those who did not. Use an appropriate graphical display and
supporting summary statistics to compare these two groups of patients.
(Use variables: aps1swang1)
a) Summarize your findings from this analysis in one or two sentences,
citing appropriate numerical results in your discussion. Be sure to incorporate all
relevant computer output in your assignment when you turn it in. (6 pts.)
b) Given your results from problems 10 and 11 what can you say about the possible
confounding effect of the APACHE score? (2 pts.)
13) Birth Weights of Infants Born in North Carolina
All of the parts of this question also require the use of JMP and the use of the low birth weight dataset described on pg. 62 of the Daniels text (if you don’t have the book don’t worry, you don’t need it). The dataset is called NCbirth.JMP and is available on the course website under Datasets.
Data description:
The North Carolina State Center for Health Statistics and Howard W. Odum Institute for Research in Social Science at the University of North Carolina at Chapel Hill make publicly available birth and infant death data for all children born in the state of North Carolina. These data can be accessed at:
The data contained in NCbirth.JMP represent a random sample of n = 800 births in North Carolina in 2001. The variables and their coding are described in the table on the following page.
In addition to those described above, the following variables were created and added to the file NCbirth.JMP:
White? Coded as White or Non-White (dichotomous version of RACEMOM)
Hispanic?Coded as Non-Hisp or Hisp (dichotomous version of HISPMOM)
a) Construct a display and obtain summary statistics for comparing the birth weight
in grams (tgrams) of infants born to mothers who smoked during pregnancy vs.
those that did not. Discuss in a written short-paragraph the important results from
your descriptive analysis. (3 pts.)
b)A researcher believes that smoking during pregnancy is associated with increased risk
of having a baby prematurely (gestational age < 36 weeks). What do these data suggest? Construct an appropriate display and give supporting statistics to answer this question. Summarize your findings with one or two sentences. (3 pts.)
c)The researcher also believes that smoking during pregnancy is associated with an
increased risk of having an infant with a low birth weight. What do these data suggest? Construct an appropriate display and give supporting statistics to answer this question. Summarize your findings with one or two sentences. (3 pts.)
d) Which variable has more variation, weight gained during pregnancy by mother or
birth weight of infant in grams? Justify your answer using the appropriate statistics.
(3 pts.)
g) Complete the table below for comparing white mothers to non-white mothers. For
nominal variables report frequency (with % in parentheses) and for numeric variables
report the mean (with SD in parentheses) (10 pts.)
White Non-white
Variable mothers (n = 604) mothers (n = 195)
Mother’s Age (yrs.)Gestation (weeks)
Birth weight (g)
Marital Married
Status Not married
Smoking Smoked
Status Did not smoke
Low Birth Yes
Weight? No
Premature? Yes
No
1