The dataset shrine.txt contains the following variables:
Shrine: Believed to be the number of visitors to the shrine
Rain: The amount of rainfall (units are unknown)
Crop: The crop yield (what was grown is unknown)
Children: The number of children born that year (area included in the study is unknown)
Deaths: The number of deaths that year (assumed to be same area as children)
City Size: The size of Erehwon, the most nearby city. Due to a number of catastrophes and wars
this cannot be used to put the tablets in chronological order.
Wonders: This variable is completely unknown, except for the translation of the word “wonder”
Chief: Each tablet is marked with the word “Chief” or “Chieftess”. It is known the Erehwon was
ruled by both genders and the ruler changed quite frequently.
1)Examine the relationship between shrine and the other variables. Do you see any plots that suggest a non-linear relationship, log transformation, or non-constant variance? You do not need to include the plot, simply explain which variables you believe need further exploration.
2)Look at the relationships between the covariates. Do you see any which suggest collinearity?
3)Include all the variables in the model (without any transformations, curving, or interactions).
Show the output for the model.
4)Include log transformations or non-linear curves as you believe appropriate.
Show the output for the best model.
5)Include an interaction between chief and the other covariates. This allows the slope of each variable to change depending on the variable chief. Determine where it is appropriate.
Show the output for the best model.
6)There is a significant interaction between two of the numerical variables. Find the interaction and find a way to explain the effect of that interaction. For both variables explain how that variable affects the number of visits to the shrine.
7)The best model that I could find had an R2 of 92.9% with S=42.4278.
Find the best model you can, and show the output from the regression for it.
Add the residual plot from your model.
For each variable in your model explain the effect it has on the variable shrine.
Rubric Shrine data
Points are points that may be deducted from 100 (to a minimum of zero)
Score / Points / Description5 / Title, looks nice, formatted well
5 / Proper English and grammar
5 / Intro explains value of report
5 / What program was used
10 / What alpha level was used
5 / Describe the distribution of each response variable
5 / Examined the data with a plot
10 / Discussed the distribution of shrine visits
15 / Found errors
5 / Justified how the errors were dealt with
10 / Searched for non-linear forms
10 / Justified non-linear terms
10 / Searched for categorical interactions
10 / Justified categorical interactions
10 / Searched for numerical interactions
10 / Justified numerical interactions
5 / Explained effect of rain
10 / Explained effect of crop
15 / Explained effect of children
15 / Explained effect of deaths
10 / Explained effect of city size
15 / Explained effect of wonders
10 / Explained effect of chief
15 / Discussed prediction of shrine visits
15 / Discussed the accuracy of the prediction
5 / Mentioned R2
20 / Justified the model with residuals
10 / Explained all output in the report
5 / Conclusion summarized results