MÀSTER DE SUPPLY CHAIN, TRANSPORT I MOBILITAT (UPC).

CURS 17-18 Q1 – LABORATORY TEST

Anàlisi de Dades de Transport i Logística (ADTL) .

(Date: 2/11/2017 17:00-19:00 h Place: Room H5.3)

Lecturer:Lídia Montero Mercadé

Office: Edifici C5 D217

Norms:Calculator, statistical tables and official R, R Studio reference documents without long comments are allowed. Internet access will be available, emailing and chatting is strictly forbidden. Mobile phones should be switched off.

Quiz duration:1h 30 min

Date for posting marks:Before 16/11/17, to be posted at Subject’s WEB.

Open-office:16/11/17 at13:00 (C5-217).

Problem 1: All qüestions account for 1 point

Specifications are given for 428 new vehicles for the 2004 year. The variables recorded include price, measurements relating to the size of the vehicle, and fuel efficiency (from There contains 19 variables:

  • Vehicle Name
  • Sports Car? (1=yes, 0=no)
  • Sport Utility Vehicle? (1=yes, 0=no)
  • Wagon? (1=yes, 0=no)
  • Minivan? (1=yes, 0=no)
  • Pickup? (1=yes, 0=no)
  • All-Wheel Drive? Factor (no,yes)
  • Rear-Wheel Drive? Factor (no,yes)
  • Suggested Retail Price, what the manufacturer thinks thevehicle is worth, including adequate profit for theautomaker and the dealer (U.S. Dollars)
  • Dealer Cost (or "invoice price"), what the dealership pays the manufacturer (U.S. Dollars)
  • Engine Size (liters)
  • Number of Cylinders (=-1 if rotary engine)
  • Horsepower
  • City Miles Per Gallon
  • Highway Miles Per Gallon
  • Weight (Pounds)
  • Wheel Base (inches)
  • Length (inches)
  • Width (inches)

Missing values are denoted with *.

SOURCE: Kiplinger's Personal Finance December 2003, vol. 57, no. 12, pp. 104-123, http:/

Load 04Cars.RData file in your current R or RStudio session. City Miles per Gallon (cmpg) consumption is going to be our numeric target and car type our target factor(f.cartype).

  1. Create a new factor variable consisting on an indicator for Car type, classified either as Sport Car, SUV, Wagon, Minivan and Pickup. Use binary indicators originally included in dataset (named it f.cartype). Summarize the resulting factor.
  2. Determine the presence of missing values in rows and columns. Select the 2 most critical rows and columns according to missing criteria. You can use countNA function provided in O4Cars workspace, after loading the function: mis_col list in the output object contains the total number of missing values per variable and mis_ind counts the total number of missing per observation.
  3. Numeric target is defined as cmpg. Summarize numerically and graphically the response variable. Makean interpretationof the results. Do you think that cmpgmay be considered normally distributed?
  4. Calculate the upper threshold to identify severe outliers for cmpg.Are there any cars satisfying this criteria?How many? Are global outliers retained once car type factor is considered?
  5. Which are the numerical variables statistically associated with the response (cmpg)? Indicate the suitable measure of association and/or tests that support your answer. Assess linearity association to cmpgfor available variables.
  6. Describe the profile for the cmpgnumeric target using available tools in FactoMineR package.Hint: Do not include raw variables Sports, suv, wagon, minivan and pickup but consider f.cartype.
  7. The average cmpgcan be argued to be the same for all car type levels (f.cartype)? Which are group pairs that show non-significantdifferences inaverage city miles per gallon (cmpg)?
  8. The variance of urban consumption,cmpg,can be argued to be the same for all car type levels (f.cartype)? Which are the groups that are likely to have a greater dispersion ofcmpgthan the others?
  9. Describe the profile for the f.cartype target factor using available tools in FactoMineR package. Indicate the most relevant numeric variables available in dataset, globally and by f.cartype level
  10. Is the car type independent of four wheel drive availability? Identify those car types that show four wheel drive lack.

1