Assignment:Multiple Regression
Student Name:
Grade:
[Please download a copy of this document and put all your answers in this document. Submit your assignment electronically through the assignment dropbox. Name this assignment file, Amr_FirstInitialLastName.doc]
[You must include the statistical software output that you used forobtaining the statistics to conclude your analysis.Please place them right after your answer for each part of the question.]
Problem I. Linear Regression
In a research study for the effect of smoking on the infant birth weight, four variables were recorded, and they are the followings.
- bweight: Birth Weight in grams
- geststn: Gestation period in weeks
- smoke: Mother Smoke or Not
(1=smoked; 0=did not smoke)
- age: Mother’s Age
The data is in the file stored in the following address:
(SPSS Data File)
(EXCEL Data File)
Use the linear regression modeling technique to answer the following questions:
- Make a scatter plot for birth weight versus length of gestation period and a scatter plot for birth weight versus mother’s age, and describe the relation between each pair of variables.
[Paste the graphs here!]
[Describe here!]
- Make a scatter plot for birth weight versus length of gestation period with mother smoking status as the other categorical factor variable (markers variable). Does the smoking variable appear to be a significant factor on birth weight?
[Paste the graph here!]
[Describe here!]
- Run the regression analysis and check the multicollinearity condition using VIF, with a cutoff at 4. Are gestation period in weeks, mother’s age, and mother’s smoking status significant factors in predicting infant’s birth weight? Is there a multicollinearity problem? (Include a copy of the coefficient table from software output window and interpret.)
[Paste output table from statistical software here!]
[Describe here!]
- Find the linear regression equation for predicting the average infant birth weight using only the significant predictor variables.
[Write the linear regression equation here!]
- Estimate the average weight of infants from mothers aged 25 who smoke and have a gestation period of 38 weeks with a 95% confidence interval. [Use the model with only the significant predictor variables mentioned in 3.]
- Estimate the weight of an infant from mother aged 25 who smoke and has a gestation period of 38 weeks with a 95% confidence interval. [Use the model with only the significant predictor variables mentioned in 3.]
- Perform a two independent samples t-test to see if there is significant difference between the average infant birth weights for mothers who smoked versus not smoked. Does the result contradict with the result in 3? If yes, why?
[Paste the statistical software output on two independent samples t-test here!]
Problem II. Logistic Regression on Risk Behavior Study
In a Risk Behavior Survey study on college students the following variables were observed:
Risk Behavior Variable: Wear seatbelt while last time driving a car
Risk Factor Variables:
- Sex (0=Female, 1=Male)
- Race (1=White, 2=Black, 3=Other)
- Live on campus (0=Off Campus, 1=On Campus)
The data is in the following address:
(SPSS Data File)
(EXCEL Data File)
Use the logistic regression technique to answer the following questions: (Do not use stepwise regression.)
- Report the frequency and percentage distribution for each of the four variables observed in this study.
- Find the significant factor(s) that affects whether students wearing seatbelt or not.
[Paste the output from statistical software containing parameter (coefficient) estimation information here!]
- Use the odds ratios to explain how each of the significant factors affecting the risk behavior.
- Use the logistic regression model to estimate the probability of a randomly selected white female student and living off campus that will not wear a seatbelt. (Use only significant variables. Do not use insignificant variables even they are mentioned.)
1