Assignment 3: Inferential Statistics II Name:
Grade:
[Please download a copy of this document and put all your answers in this document.]
[You can study with other students on this assignment, but you must write the answers yourself. If your assignment paper is the same as other student, both of your grades would be low.]
I.Contingency Table Analysis
Investigation of Food Poisoning: A local health department realized that there were many reports from residents in the city that had signs and symptoms of acute gastrointestinal illness include nausea, vomiting, diarrhea, fever, and abdominal pain. All these reports were occurred during a weekend from a graduation party held in the backyard of a family. Investigators from the local health department were sent out to interview all the residents that went to this party in order to find out what may be the reasons that caused the outbreak of the disease.The objective of this study is to analyze data from the ingestion of various types of food in the party to determine the most likely cause of the outbreak.
Data Link:
Data was collected by the epidemiological team who interviewed those persons known to have been present at the party. Not all attendees ate at the same time, but all ate within four and one half hours of one another. The data obtained from the interviews consisted of the following: age, sickness, types of food eaten, and time to onset of illness. The food at the party consisted of: baked ham, spinach, mashed potatoes, cabbage, jello, rolls, brown bread, milk, coffee, water, cake, vanilla ice cream, chocolate ice cream, and fruit salad. All ill persons had attended the party, and family members who had not attended the picnic did not become ill. There were no laboratory data on any food sources or patient specimens.
Please do the best you can to identify the one food that is most like the problem, and perform a chi-square test of independence, at 5% level of significance, to see if it has significant correlation with the disease. (Please attach the chi-square test table that shows p-value of the test and also report the odds ratio.)
II. Linear Regression
In a research study for the effect of smoking on the infant birth weight, four variables were recorded, and they are the followings.
- Birth Weight in grams
- Gestation period in weeks
- Mother Smoke or Not
- Mother’s Age
The data is in the file stored in the following address:
Use the linear regression modeling technique to answer the following questions:
- Make a scatter plot for birth weight versus length of gestation period and a scatter plot for birth weight versus mother’s age, and describe the relation between each pair of variables.
[Paste the graphs here!]
[Describe here!]
- Make a scatter plot for birth weight versus length of gestation period with mother smoking status as the other categorical factor variable (markers variable). Does the smoking variable appear to be a significant factor on birth weight?
[Paste the graph here!]
[Describe here!]
- Run the regression analysis and check the multicollinearity condition using VIF, with a cutoff at 4. Are gestation period in weeks, mother’s age, and mother’s smoking status significant factors in predicting infant’s birth weight? Is there a multicollinearity problem? (Show the SPSS coefficient table and interpret.)
[Paste SPSS output table here!]
[Describe here!]
- Estimate the average weight of infants from mothers aged 30 who smoke and have a gestation period of 36 weeks using a 95% confidence interval. [Use the model with all predictor variables mentioned in 3.]
[Put your confidence interval here!]
- Estimate the weight of an infant from mother aged 30 who smoke and has a gestation period of 36 weeks with a 95% confidence interval. [Use the model with all predictor variables mentioned in 3.]
[Put your confidence interval here!]
- Find the linear regression equation for predicting the average infant birth weight using only the significant predictor variables.
[Write the linear regression equation here!]
- Perform a two independent samples t-test to see if there is significant difference between the average infant birth weights for mothers who smoked versus not smoked. Does the result contradict with the result in 3? If yes, why?
[Paste the SPSS output on two independent samples t-test here!]