DSCI 425 Supervised Learning (65 Pts.)

DSCI 425 Supervised Learning (65 Pts.)

DSCI 425 – Supervised Learning (65 pts.)

Assignment 3 – Neural Networks for Regression

PROBLEM 1 – Predicting the age of an abalone

The age of abalone is determined by cutting the shell through the cone, staining it, and counting the number of rings through a microscope -- a boring and time-consuming task. Other measurements, which are easier to obtain, are often times used to predict the age. Further information, such as weather patterns and location (hence food availability) may be required to solve the problem.

Attribute Information:
Given is the attribute name, attribute type, the measurement unit and a brief description. The number of rings is the value to predict. These data are contained in the data frame Abalone.
Name / Data Type / Measurement Unit / Description
Length / continuous / mm / Longest shell measurementlength
Diameter / continuous / mm / perpendicular to length diam
Height / continuous / mm / with meat in shell height
Whole weight / continuous / grams / whole abalone whole.weight
Shucked weight / continuous / grams / weight of meat shucked.weight
Viscera weight / continuous / grams / gut weight (after bleeding) visc.weight
Shell weight / continuous / grams / after being dried shell.weight
Rings / integer / -- / +1.5 gives the age in years Rings

a) In R – Develop an “optimal” neural network regression model to predict Rings. Provide CV results (split-sample would be easiest) for three models you considered in finding your “optimal” model. Obviously your “optimal” model should be the best of the three considered. Include all your R code and summaries of your CV results. (15 pts.)

b) In JMP – Develop an “optimal” neural network regression model to predict Rings. Provide CV (train/validation or train/validation/test set) results for three models you considered in the process of finding your “optimal” model. Also show a Diagram of your “optimal” model. (15 pts.)

c) Use profilers in JMP to explore the fitted “optimal” model. Include two cool plots (or sequence of plots) you found and discuss them. (6 pts. – 3 pts. each)

Problem 2 – listing Price of homes in the twin cities metro area

These data are contained in the TwinCities.csv file on the website. The variable descriptions are below.

Variable / Info / Description
ID / Label / MLS ID Number
Address / Label / Street Address
CITY / Label / Minneapolis, St. Paul, Shoreview,
Woodbury, Maplewood, West St. Paul
STATE / Label / MN (for all)
ZIP / Label / Zip Code
ListPrice / Response (Y) / Current List Price ($)
BEDS / # of Bedrooms
BATHS / # of Bathrooms (can be fractional)
Location / Name of neighborhood or region in the
Twin Cities metro area.
Don’t use for this assignment!
SQFT / Square footage of home (ft.2)
LotSize / Square footage of lot (ft.2) – missing for several
of the homes in these data.
YearBuilt / Year the home was built, could be used to create
a new variable called Age = 2014 - YearBuilt
ParkingSpots / # of Parking Spots (I assume off-street parking)
HasGarage / Nominal / Garage or No Garage
DOM / Days on the market, number of days the home
has been listed for sale.
BeenReduced / Nominal / Has the price been reduced from the original
listing price. (Y or N)
OriginalList / ------/ Original listing price. Don’t use as a predictor!!!
BeenReduced2 / Has the price been reduced from the original
listing price (Y or N) – this is calculated differently than
the one above. Use one or the other BUT NOT both!
ReductAmt / ------/ Amount of the reduction from the original listing price if it has been reduced. Don’t use as a predictor!!!
PerReduct / ------/ Percent reduction from the original listing price. I wouldn’t use this predictor either, but in might be Ok to use.
LastSaleDate / Date / MM/DD/YY of most recent previous sale of the home. Do not use!
LastSaleDiff / ------/ Current List Price – Last Sale Price. Don’t use!
SoldPrev / Nominal / Has the home been sold previously (Y or N), this one should be Ok to use!
LastSalePrice / Price the home sold for the last time it sold. Don’t use!
Realty / Realty company the home is listed with. Don’t use!
Latitude / Latitude (degrees)
Longitude / Longitude (degrees)
ShortSale / Is more money owed on the home than what the asking price is? (Y or N)

a) In JMP – find an “optimal” neural network regression model for list price or some function of list price using the data in the file TC Homes (Train).JMP. Do this by again showing the validation results of three models, one being your optimal model and discussing your choice. Use the Modeling Utilities > Make Validation Column to create Training and Validations sets or Training, Validation, and Test sets, your choice. Include the cross-validation results for your “optimal” model. Also include a Diagram of your final model. You are only allowed to use the predictors shaded in the table above. The response has also been shaded, but is a different color. (15 pts.)

b) Again use profilers in JMP to find three interesting plots (or sequences of plots) based on the fitted model and explain what they show. (9 pts. – 3 pts. each)

c) Make predictions for the test cases in the file TC Homes (Test).JMP. To do this you will need to Save Prediction Formula from your optimal neural network model from part (a) and then Copy and Paste it into the TC Homes (Test).JMP file. The predicted home prices for these homes should then be computed using the model formula. Submit the TC Homes (Test).JMP file with your predictions added with this assignment. (5 pts.)

1