STAT 310 – Homework #8 ~ Correlation, Simple Linear Regression, and Multiple Linear Regression (40pts.) (DUE 4/4/08)

1 – Hyperactivity in Children

Maria Mathias conducted a study of hyperactive children. She measured the children’s attitude, hyperactivity, and social behavior before and after treatment. These data contain the ages of the 31 subjects in the study and the improvement scores from pre-treatment to post-treatment for attitude (ATT), social behavior (SOC), and hyperactivity (HYP). A negative score for HYP indicates an improvement in hyperactivity; a positive score in ATT or SOC score indicates improvement. (Datafile: Hyperactive Children.JMP)

a) Use appropriate statistical tests to determine if the treatment resulted in statistically significant improvement in the three measured outcomes. Summarize your findings for the researcher. (6 pts. ~ 2 pts. for each outcome)

b) Perform an analysis to determine if there is evidence to indicate the age (years) is correlated with any of the three outcome variables. Use Analyze > Multivariate Methods > Multivariate to find a correlation matrix, plot a scatterplot matrix, and conduct significance tests for the correlations between age and the outcome variables.

Summarize the results of the correlation tests. (3 pts.)

c) Looking at the scatterplot matrix do there appear to be any outliers in these data? If so which subject is the outlier? (1 pt.)

d) Repeat your correlation analysis from part (b) with the outlier identified in part (c) deleted. To temporarily delete this case, highlight the row in the spreadsheet and select both Exclude/Unexclude and Hide/Unhide options from the Rows pull-down menu. Do the results change substantially? Discuss any obvious changes. (2 pts.)

Now perform the simple linear regression of change in attitude after treatment (ATT) on age of the child (AGE). Answer the following questions based on the results.

e) Perform the overall regression usefulness test (i.e. HO: Regression is not useful vs Ha: Regression is useful) to formalize your initial investigation of these variables. What is your decision for this test? Write a conclusion in everyday language for this test. (2 pts.)

f) Perform the test to ensure that the slope of our regression line is not zero (i.e. HO: 1 = 0 vs Ha: 1 ≠ 0). What is your decision for this test? Write a conclusion using everyday language for this test. (1 pt.)

g) What is the R-Square (R2) value for this analysis? In the context of this problem, carefully explain what this number is measuring. (2 pts.)

h) Using JMP, create a scatterplot of the data with the estimated regression line.

In the context of this problem, carefully interpret the y-intercept and slope of your estimated regression line. Again, carefully explain what these numbers are measuring. (You need to do more than say they are the y-intercept and slope of the line.) (2 pts.)

i) Discuss whether or not the assumptions for this procedure are being meet. Also, identify any outliers in the data set. (4 pts.)
Checking the assumptions:
Model Appropriate: Make sure no existing trends remain in the residual plot.
> Constant Variance: Make sure there is no megaphone patterns in the residual plot
Independence: Don’t really need to check this as these data are not collected over
time.
> Normality: Make a histogram of the residuals and make sure they follow a normal
distribution
> Outliers: Any observations that fall outside ±2*RMSE are considered possible outliers.

i) Given these results, what do we estimate to be the mean improvement in attitude for a 10 year-old child? (1 pt.)

j) Sally is a hyperactive 12-year old child, what do we estimate her change in attitude will be following treatment? (1 pt.)

2 – Caregiver Burden of Senile Dementia Patients

Son et al. in their paper “Korean Adult Child Caregivers or Older Adults with Dementia” in the Journal of Gerontological Nursing (2003) examined the relationship between burden on the family caregivers and general characteristics of the family member with senile dementia. The dependent variable or response is caregiver burden as measured by the Korean Burden Inventory (KBI). Scores on this response ranges from 28 to 140, with higher scores indicating higher burden. Explanatory variables or predictors were the following:

  • ADL = total activities of daily living (low scores indicate that the elderly perform activities independently)
  • MEM = memory and behavioral problems (higher scores indicate more problems).
  • COG = cognitive impairment (lower scores indicate a greater degree of cognitive impairment).

(Datafile: Caregiver Burden.JMP)

a) Perform an analysis to determine if there is evidence to indicate that caregiver burden is correlated with any of the three demented family member variables. Use Analyze > Multivariate Methods > Multivariate to find a correlation matrix, plot a scatterplot matrix, and conduct significance tests for the correlations between age and the outcome variables. Summarize the results of the correlation tests. (3 pts.)

Now perform the multiple linear regression of caregiver burden (KBI) on the three characteristics of the older family member with dementia and answer the following.

b) Perform the overall regression usefulness test (i.e. HO: Regression is not useful vs Ha: Regression is useful) to formalize your initial investigation of these variables. What is your decision for this test? Write a conclusion in everyday language for this test. (2 pts.)

c) Examine the results of the significance test for each of the individual predictors (i.e. HO: i = 0 vs Ha: i ≠ 0). What are your conclusions from these tests? (3 pt.)

d) Use backward elimination to simplify your original model. Use to determine which terms are statistically significant in your model. Write out the estimated regression model, i.e. … (1 pt.)

e) What is the R-Square (R2) value for your final model? In the context of this problem, carefully explain what this number is measuring. (2 pts.)

f) Discuss whether or not the assumptions for this procedure are being meet. Also, identify any outliers in the data set. (2 pts.)
Checking the assumptions:
Model Appropriate: Make sure no existing trends remain in the residual plot.
> Constant Variance: Make sure there is no megaphone patterns in the residual plot
Independence: Don’t really need to check this as these data are not collected over
time.
> Normality: Make a histogram of the residuals and make sure they follow a normal
distribution
> Outliers: Any observations that fall outside ±2*RMSE are considered possible outliers.

g) Given these results, what do we estimate to be the mean caregiver burden as measured by the KBI for families taking care of dementia patients with a ADL score of 75, a MEM score of 45, and a COG score of 10? (1 pt.)

h) Weng is an 80-year old suffering from dementia with an ADL score of 25, a MEM score of 15, and a COG score of 20. What do we estimate is the caregiver burden (KBI) for his family that takes care of him? (1 pt.)