ME/ECE 539
Introduction to Artificial Neural Networks and Fuzzy Systems
Final Project
Estimation of car gas consumption in city cycle with ANN
Ernst Schermann
Introduction
Almost every car driver is interested in how much gas his car consumes, as the amount of miles per gallon is directly connected with his purse. In the US the low gasoline prices made this issue not very important, but the increasing oil prices and environmentalists movements slowly change the attitude of car manufacturers to pay more attention to gas consumption of cars. People also start to pay attention when they buy a car what miles per gallon rate this car has. This rate depends on many things. But certain features are more important than others, for example the weight of the car is more important than the manufacturer due to physics. It is interesting to see, whether it is possible to predict the miles per gallon ratio of a certain amount of cars from roughly the same technology and age. That would give the consumer the possibility to decide what miles per gallon ratio the car he wants to buy would have without the necessity to rely on perhaps false information and without extensive trials. Even if this information will be an estimate which has some error to the real value, the information of an independent source which is based on a big enough data set would be valuable.
In this project I want to try to develop a neural network to deliver this estimate. I decided to use a multi-layer perceptron network with the back-propagation as the learning method. The main task in this would be to process the given data and to find optimal parameters for the multi-layer perceptron.
Data Description:
The available data was quite old, the cars represented were from year 1977 to year 1981. But that was not important from which time span they originated for the general ability of the network estimation. The technology that matters for fuel consumption was almost the same, so that the gas consumption estimation depended very largely on the selected features for almost every car. There are much more special features for each manufacturer and even for each type of car, but the ones presented are the most important ones. Beside that I did not have the possibility to include more features. The features and their impact are described below. Most of them would be named on the spot by anyone asked about car characteristics:
1. Cylinders: This is an important property of a car and the amount of gas burnt certainly depends on the number of cylinders a car has. A discrete feature.
2. Displacement: This measure is important for the moment the motor can deliver and has a direct influence on gas consumption. A continuous feature.
3.Horsepower: As a measure for engine power, horse power has an impact on fuel consumption, which can be non-linear, though. A continuous feature.
4. Weight: The heavier a car is the more power a motor needs to accelerate it and the more fuel is burnt to produce this energy. A continuous feature.
5. Acceleration: This feature is closely related to horsepower, displacement and weight, but has an additional information content. Better (more efficient) cars can produce more acceleration with fewer energy, so that feature is also important. A continuous feature.
6. Model year: When a car is new, it needs less gas than an older one. The attrition of different engine parts is the reason for this feature to be included. A continuous feature.
7. Region of origin: Europe, United States or Japan. There are different philosophies and different backgrounds or even laws involved in car manufacturing, for example the strict exhaust norms in Europe or the love for big cars in the United States. These have a profound impact on fuel consumption. A number for each region is assigned.
8. MPG: Target classification value is the miles per gallon ratio and the network performance will be measured on the ability to predict those.
Each of the 398 instances also has a string with the name of the car assigned. This name is unique for each instance(there can be the same car model, but then it is a modification of the base model). The name of the car was removed as there was no reason and no possibility to include it into the estimation process. Also there were some instances among the data that had unknown values for horsepower, so I removed them. 392 data samples remained to work with.
Data Preprocessing
The features are of completely different kind and completely different ranges, so there was some preprocessing involved, before they could be applied to the network. I wanted all features to be weighted equally, because it was not clear which features were more important than the others. So each feature dimension was normalized with a zero mean and unitarian variance. After that I randomized the data, because it was ordered from the older to the newer cars in the original data set. I reserved 92 samples as the final testing set, on which the multi layer perceptron with the best training results would be tested to determine the estimation accuracy. This left 300 samples for training. I did not want my net to simply memorize the samples during training, so cross validation to prevent ‘overfitting’ and get a good estimate of the generalization error was the choice of the art. I chose the simplest one, the 3-way cross validation, so the remaining data could be divided in equal sets of 100 samples. Then I formed three training sets, combining two of the 100 data chunks in every combination and used the remaining chunk as a validation set. My training data ready, I could proceed to the development of the neural net.
MLP Development
There is no model of the car fuel consumption available, I do not know what order it may be, whether it is linear or nonlinear, although nonlinear is more likely. Multi layer perceptron provides the nonlinear input-output mapping of a general nature(Haykin 1999). Also from Haykin is the following statement considering function approximation, which seems important to me when developing a MLP:
“The problem with multilayer perceptrons using a single hidden layer is that the neurons therein tend to interact with each other globally. In complex situations this interaction makes it difficult to improve the approximation at one point without worsening it at some other point…”
As the complexity of car fuel consumption model is not known and the underlying physical principles can be described, but are highly nonlinear (for example friction is certainly involved), both single layer and multi layer MLP should be evaluated to determine the best performance. As the first step only multi layer perceptrons with one hidden layer will be considered.
There are five parameters that I considered for variation to find the best fitting configuration. All of them have a profound impact on the ability of the net to generalize and to inhibit the most appropriate model order: the number of neurons in the hidden layer h, the learning rate , the momentum , the number of epochs to run NumEp and the number of data samples in every epoch K. From my previous experience I assigned the values in Tab. 1 to be tested to this parameters.
h / 3 / 5 / 6 / 8 / 10 / 15 / 20 / 0.01 / 0.05 / 0.1 / 0.2 / - / - / -
/ 0.4 / 0.5 / 0.6 / 0.7 / 0.8 / - / -
NumEp / 200 / 500 / 1000 / 1500 / 2000 / - / -
K / 20 / 40 / 60 / - / - / - / -
Tab. 1: Parameters to be tested
Parameters h, , and NumEp are more important than the others, so I pay a closer attention to them. If is small, more epochs are needed to converge on a solution. For a fixed value of and K = 40 and each value of I wrote a program that for each value of delivers a table of mean testing error for each combination of h and NumEp. These table is obtained by running each cross validation 5 times and averaging the resulting error when the weights are applied on the validation sets. An example is given in Tab. 2 for an 0.1. The rows represent different number of neurons in hidden layer and the columns the number of epochs to run.
h = 3 / h = 5 / h = 6 / h = 8 / h = 10 / h = 15 / h = 20NumEp=200 / 0.1929 / 0.3094 / 0.2326 / 0.1372 / 0.1815 / 0.2748 / 0.2719
NumEp=500 / 0.233 / 0.2544 / 0.2198 / 0.2216 / 0.2597 / 0.2549 / 0.2404
NumEp=1000 / 0.2541 / 0.2691 / 0.2823 / 0.2502 / 0.2436 / 0.2729 / 0.2582
NumEp=1500 / 0.2722 / 0.2545 / 0.2552 / 0.2565 / 0.2731 / 0.252 / 0.2698
NumEp=2000 / 0.2491 / 0.2525 / 0.2624 / 0.2515 / 0.2358 / 0.2376 / 0.2593
Tab. 2: Averaged error for 0.1 and varying h and NumEp
The error is computed as an average of the absolute differences between the real and the estimated values for the miles per gallon ratio. This value has more significance for this problem as the normally used square error.
Although the results with 0.01 were the best with each epoch number, I decided not to leave the other as failures, because I do not know how they will perform, when the epoch size and momentum will be changed. From each of this four tables I have chosen the combinations of hand NumEp with the least error and trained the network with these combinations, fixed K and varying to determine the best value for momentum and epoch size. An example table for K = 40 is inTab. 3:
h = 10=0.01
nepoch=1000 / h = 6
=0.05
nepoch=200 / h = 8
=0.1
nepoch=200
=0.4 / 0.0802 / 0.1127 / 0.2053
=0.5 / 0.0826 / 0.1168 / 0.2309
=0.6 / 0.0818 / 0.1203 / 0.2135
=0.7 / 0.0763 / 0.1015 / 0.192
=0.8 / 0.0743 / 0.1048 / 0.2635
Tab. 3: Momentum variation with K=40
After all the trials it seemed that there are several configurations, with a small enough tuning error. Now it was time to determine the best weights with these configurations on the training set of all the 300 samples and test the network on the unused testing set of 92 samples. The testing errors for the different configurations, averaged over 10 trials, are shown in Tab. 4. The errors there are scaled back already and represent the real mean deviation from the target values in miles per gallon.
Configuration / Averaged errorh=6, =0.1, =0.7, NumEp=1000, K=40 / 3.376
h=8, =0.1, =0.8, NumEp=1000, K=40 / 3.057
h=6, =0.2, =0.8, NumEp=1500, K=60 / 3.124
h=8, =0.05, =0.7, NumEp=500, K=40 / 2.988
h=6, =0.2, =0.7, NumEp=1000, K=60 / 2.853
h=20, =0.1, =0.8, NumEp=1000, K=40 / 3.443
Tab. 4: Error on testing set
The results are not encouraging. The range in the testing set was between 15 and 46 miles per gallon, so the results are not so bad, but I still did not talk about the base case. Compared to the values in the study of Ross Quinlan, where he compares different approaches to learning, these results are lousy. For fair comparison they are presented in Tab. 5. They can also be seen in his article “Combining Instance-Based and Model-Based learning” from 1993, where this data set was used among others to evaluate different approaches to continuous estimation.
I hoped to get better results by using two hidden layers. But that has proven to be an illusion. I tried many configurations with 3 to 8 neurons in the hidden layers with varying parameters of learning rate and number of epochs, but average testing error on them never was less than 3.6 miles per gallon, which is worse than the best results with one hidden layer.
default procedure / 6.53instances alone / 2.72
regression / 2.61
regression + instances / 2.37
model trees / 2.11
model trees + instances / 2.18
neural nets (one hidden layer, backprop) / 2.02
neural nets + instances / 2.06
Tab. 5: Averaged error for different methods in base case
Software
Several MATLAB files were written and modified in this project. This includes Mpg_preproc.m to preprocess the data and create the files for training and testing, Mpg_alpha_calc.m and Mpg_mhu_calc.m to create tables as a basis for a decision for best parameters, Mpg_fintest.m to perform final testing and others. Some files of Prof. Hu like bp.m and bpconfig.m were modified and some renamed, I apologize for that. Two files are included in the Appendix as an example.
Conclusion:
The neural net used in the base case study had a logistic function in the neurons of the hidden layer and a linear function for the output neuron. My intention was to improve the performance of the net by using nonlinear functions for all neurons. But it seems that the underlying real model of car fuel consumption is not “very” nonlinear. Best results obtained by my neural net were 30-40% worse than the best results obtained in the study. I still have hope, that with a different approach, perhaps not using multilayer perceptrons and back propagation, but rather to try radial basis functions as an estimator. Support vector machines would be possible, too, especially because I thing that the model should not be very nonlinear.
Reference:
Haykin, Simon: Neural Networks: A Comprehensive Foundation
Prentice Hall, 2nd edition,1999
Quinlan, Ross: Combining Instance-Based and Model-Based Learning,
In Proceedings on the Tenth International Conference of Machine
Learning, 1993
ftp://ftp.ics.uci.edu/pub/machine-learning-databases/
Appendix:
% File: Mpg_preproc.m
%
% Author: Ernst Schermann
% Final Project ME 539
%
% Mpg_preproc.m: In this file data preprocessing and data partitioning
% is performed
load Auto-mpg_txt.txt
Mpg = Auto_mpg_txt;
clear Auto_mpg_txt
% Calculating mean and std deviation for each of the feature dimensions
for i=2:size(Mpg,2)
Mean(i) = mean(Mpg(:,i));
Std(i) = std(Mpg(:,i));
end
% Normalizing the feature dimensions
for j=2:size(Mpg,2)
Mpg(:,j) = (Mpg(:,j)-Mean(j))/Std(j);
end
% The target value is first column, move it, so that it becomes last column
intermed = Mpg(:,1);
Mpg(:,1) = [];
Mpg = [Mpg intermed];
% Partitioning the data in a training and testing set
% Testing set will only be used to analyse net performance in the very end
testMpg = Mpg(1:92,:);
trainMpg = Mpg(93:392,:);
% Saving the training and testing file for final testing
save('Mpg_ftrain', 'trainMpg','-ASCII');
save('Mpg_ftest', 'testMpg','-ASCII');
% Partitioning the training set in 3 sets for cross-validation
t1 = trainMpg(1:100,:);
t2 = trainMpg(101:200,:);
t3 = trainMpg(201:300,:);
% Creating training sets for cross-validation
Mpg_train1 = [t1;t2];
Mpg_train2 = [t2;t3];
Mpg_train3 = [t3;t1];
% Creating training and testing files for cross-validation
train1 = 'Mpg_tr1';
train2 = 'Mpg_tr2';
train3 = 'Mpg_tr3';
test1 = 'Mpg_tt1';
test2 = 'Mpg_tt2';
test3 = 'Mpg_tt3';
save(train1, 'Mpg_train1','-ASCII');
save(train2, 'Mpg_train2','-ASCII');
save(train3, 'Mpg_train3','-ASCII');
% These are the testing files for each validation run
save(test1, 't3','-ASCII');
save(test2, 't1','-ASCII');
save(test3, 't2','-ASCII');
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% File: Mpg_alpha_calc
%
% Author: Ernst Schermann
% Final Project ME 539
%
% Mpg_alpha_calc: This file begins the process of developing of the best network topology
% and determines the tables of errors for alpha and number of epochs
clear all
% Load and pre-process data
Mpg_preproc
% The best model should be determined with the following
% possible parameters
% Number of neurons in hidden layer to test
hvec = [3 5 6 8 10 15 20];
% Vector of learning rates to test
alphavec = [0.01 0.05 0.1 0.2];
% Vector of momentum rates to test
mhuvec = [0.4 0.5 0.6 0.7 0.8];
% Vector of overall number of epochs to run
epochvec = [200 500 1000 1500 2000];
% Vector of epoch sizes to test
epsizevec = [20 40 60];
% Default values
hdef = 6;
alphadef = 0.1;
mhudef = 0.8;
nepochdef = 1000;
epsizedef = 40;
% Creating the h-nepoch table
Errsqr = zeros(length(epochvec),length(hvec));
Errabs = zeros(length(epochvec),length(hvec));
for epochcount=1:length(epochvec)
for hcount=1:length(hvec)
% Ten runs for each combination to get a meaningful average
for trialcount=1:5
% 3-way cross validation
for cv=1:3
% Learning rate, momentum and epoch size stay at default values
alpha = alphavec(1);
mom = mhudef;
K = epsizedef;
nepoch = epochvec(epochcount);
h = hvec(hcount);
bp_FinProj
% Error from each validation run
Ecvsqr(cv) = SSsqr;
Ecvabs(cv) = SSabs;
end
% Averaged error for cross-validation
Etrialsqr(trialcount) = sum(Ecvsqr)/3;
Etrialabs(trialcount) = sum(Ecvabs)/3;
end
% Averaged error over 10 cross-validation runs
Errsqr(epochcount,hcount) = sum(Etrialsqr)/5;
Errabs(epochcount,hcount) = sum(Etrialabs)/5;
end
end