STAT 415 – Multivariate Statistics (a.k.a. Unsupervised Learning)

Final Project and Oral Presentation

Goal: The goal of your analysis is to find a way to increase sales of orthopedic equipment to U.S. hospitals. In order to help facilitate this process you are asked to identify hospitals in the provided database where sales gain could be maximized. This can be done identifying the most promising hospitals where the company is currently not making sales and hospitals where sales are currently being made, but given their characteristics could potentially be increased.

Analyses: I expect you demonstrate your abilities with some methods we covered in this course in the process of achieving the goal described above. For example, I definitely expect you to explore these data graphically, identify outliers, perform a principal component/factor analyses, and perform a cluster analyses. You may also need to employ some methods you have seen in other courses, such as regression analysis.

Final Products: Working in groups of up to three you should prepare the following:

  • Report containing all relevant output (best put in an appendix) and a clear write-up of your recommendations to company given the goal of increasing orthopedic equipment sales.
  • Oral presentation with PPT summarizing the important components of your analysis and a discussion of your recommendations. All groups will be making their oral presentations during the scheduled final exam time Monday, May 4th from 1:00 – 3:00 pm.

Data Description:

The variables in this data set are:

  • ZIP – U.S. Postal Code the hospital
  • HospID – hospital identification number
  • City – city the hospital is located in.
  • State – state or U.S. territory the hospital is located in.
  • Beds – number of hospital beds in the facility.
  • RBeds – number of rehabilitation beds.
  • Outpatients – number of outpatient visits
  • Admin – administrative costs (in $1000’s per year)
  • Inpatient – revenue from inpatient care (in $1000’s per year)
  • Hip95 – number of hip operations performed in 1995.
  • Knee95 – number of knee operations performed in 1995.
  • SalesYr – sales of rehabilitation/orthopedic equipment since Jan. 1st ($1000’s).
  • Sales12 – sales of rehabilitation/orthopedic equipment for the last 12 months. ($1000’s).
  • Teach – indicator of whether or not the hospital is a teaching hospital (1 = yes, 0 = no).
  • Trauma – indicator of whether or not the hospital has a trauma unit (1 = yes, 0 = no).
  • Rehab – indicator of whether or not the hospital has a rehabilitation unit (1 = yes, 0 = no).
  • TeTrRe – all combinations of the indicators above (for example: 000 = no on all three, 110 = teaching hospitals with a trauma unit but NOT a rehab unit.)
  • Hip96 – number of hip operations in 1996.
  • Knee96 – number of knee operations in 1996.
  • Femur96 – number of femur operations in 1996.
  • logBeds – log(Beds + 1)
  • logRBeds – log(RBeds + 1)
  • LogOut – log(Outpatients+1)
  • logAdmin – log(Admin+1)
  • logInpat – log(Inpatient+1)
  • logHip95 – log(Hip95+1)
  • logKnee95 – log(Knee95+1)
  • logSales – log(Sales+1)
  • logSales12 – log(Sales12 + 1)
  • NoSales = No sales since January 1st? (Y = yes, N = no)
  • NoSales12 = No sales in last 12 months? (Y = yes, N = no)
  • logHip96 = log(Hip96+1)
  • logKnee96 = log(Knee96+1)
  • logFem96 = log(Fem96+1)