The data set “foxbooks.txt” shows:

  • The number of customers recorded for 400 daysnot in any order. This is what we want to predict.
  • The temperature recorded that dayin Fahrenheit.
  • The humidity recorded that day (humidity can be over 100%).
  • The number of advertisements per hour paid for that day.
  • The budget for in-store marketing (determined by a computer in dollars each day).
  • Whether there was “Childrens” hour held at the bookstore.
  • Whether the cashier was “patient” or “mean”, as well as whether they were “Fast” or “slow”.
  • The number of treats given to customers for free
  • The number of books that are marked at a discount price that day
  1. There is a data point that is clearly an error.
  2. Fix that error first.
  3. Then create the modelCustomers~temperature+humidity+adnum+marketing+childrens+cashier+treats+discounts
  4. Look at the plot of fit$residuals based off each off the other variables
  5. In the residual plots you should see “bowtie/fan” shapes indicating numerical interactions.
  6. Fix those bowties by including the appropriate interaction(s)
  1. Which variables were interacting?
  1. Run the model that still has all the variables including the interactions that you need from part A
  2. Now in the residuals there is a variable which shows curvature
  3. Fix the curvature by including the appropriate polynomial (with the linear term)
  1. Which variable was it, and how high did the polynomial need to go?
  1. Run the model with all the variables, the interactions from part A, and the curvature from part B
  2. In the residual plot you should see indications of a categorical interaction
  3. There are two categorical variables: childrens has two categories, cashier has four
  4. You should be able to tell which variables interact just by these plots
  5. Plot the residuals with the command col=data$childrens or col=data$cashier to spot it
  6. Fix the pattern in the residuals by including the correct interaction
  1. Which two variables needed to interact?
  1. Run the model with all the variables and the needed pieces from parts A, B, and C
  2. The obvious patterns are gone. Look to see if there are any subtle patterns to include
  3. Include whatever is needed until the residuals are as good as they can be
  4. Go to the summary. There will be several terms that are not significant
  5. Find a final model by removing terms that you believe are not needed in the model
  6. Note there is more than one correct answer here
  1. Copy and paste the output from summary(fit)
  1. Let’s make sure you know how to create the appropriate graphs you will need for the report
  2. For the interaction you found in Part A
  1. Graph one of the variables with three different lines based off the second variable
  2. Graph the second variable with three different lines based off the first variable
  3. For the curved variable from part B
  4. Graph the curve (without the data and zoomed in to show the curve)
  5. For the interaction in part C
  6. Graph the numerical variable with different lines for each category
  1. Let us prove you have a good model by making a prediction
  2. For the day June 4th
  3. Temperature will be 47°
  4. Humidity will be 75%,
  5. There will be one ad per hour
  6. The marketing budget will be 20
  7. There will be children’s hour at the bookstore
  8. Apatient fast cashier will be at the front.
  9. There will be 100 treats available
  10. There will be 40 books offered at a discount
  1. How many customers do you predict will come?
  2. How much can you reasonably expect your answer to be off by?

Bonus for interested students: The variable cashier has some added tricks that you are not required to catch but do help explain the variability in cashier levels.