Dan Metzger

CS539

Project

Traffic Analysis

My goal is to develop a program that will be able to manage traffic using neural networks. For this project I searched the internet for actual data from roadway detection devices. I was finally able to find data on a stretch of road in Phoenix Arizona. This data was gathered using an Inductive Loop Detector (ILD). This ILD is able to detect the presence of a vehicle by a magnetic field. Using these ILD’s we can find the count of cars, the speed they are going, the size of the vehicle, whether it is a car or a truck, and how congested the particular lane is.

The data is available here: ftp://www.azfms.com/pub/traffic/15min/2003_12/

I downloaded the first 5 days of December 2003 to run my project on.

The data that I found contains the ILD logs of a 4 lane highway from December of this year. The data samples go all the way up to the 17th. Data samples were taken every 15 minutes for each lane. Each of the log files was over 30 megabytes in size, so I knew early on that preparing the data for the MLP would be the most difficult part of my project. The log files were split into separate files for each day over a 17 day period. Each of these days contains 49,824 data samples, with each sample holding 180 columns of information.

I found that this data was from several ILD traps on this road in Arizona. There were 519 different stations that were presented in the logs. By looking through the Arizona Freeway Management System (azfms.com) website I found the headings to each of the 180 columns of data in their logs. Once I knew what the data was, I was ready to begin partitioning it into a usable form.

First I decided that I would only look at one of the 519 stations available. So I took the 49,824 data samples, and started with number 1 and counted up 519 steps all the way through the data. I put this into a matrix named station1. Then I loaded the data from the 2nd and 3rd of December, and added the data from station 1 to the list. Now I had 288 data points, with 180 columns each. To find that data that mattered I went into the Excel spreadsheet provided by the AZFMS. By examining it I found that there were 4 lanes that had cars traveling through them, b,c,d, and e. And that for each of these lanes the data provided both a volume of traffic, and the speed of traffic. I decided to train my MLP on the volume of each of these lanes.

But the traffic conditions are as much dependant on the time of day as the volume of cars. I needed to put the time of day into the input features, but I found that the time of day was represented in the following format: HH:MM:SS. However when opening a file Matlab cannot recognize the ‘:’ character. At first I tried converting all the ‘:’ into spaces, but I realized that there is a 15 minute gap between each of the times. My MLP will be predicting the traffic level, and time of the day will be a very important factor. For the MLP to take the time into consideration, it must be in a plain number format. I decided that the time should be expressed in a number between 1 and 96, as there are 96 samples each 15 minutes apart in the 24 hours of a day. I created the looping 1 to 96 count with a for loop going through the entire data set, and appending a column modulo 96.

After performing that step I had the 5 input features, and needed to choose an output. By looking at the excel file I found that there was an average speed on this road. I decided to use my MLP to test for a correlation between the time of day, the number of cars on the road, and the speed the cars are traveling at. I added the column to the training and testing sets and was ready to start configuring the MLP.

To decide what configuration MLP to use I tested out various ones using a program that runs the MLP 50 times and records each Crate (TestMLP.m). By running this program and checking the mean, max and min of all the Crates I saw that the simplest MLP was the most effective. The configuration with 2 hidden layers each with 3 neurons was able to predict the speed of traffic on the 4th of December, by training on the previous days.

Here are the results of the different MLP configurations:

9-3-3-1 / 9-3-3-3-1 / 9-9-9-1 / 9-12-12-1
Mean / 86.5000% / 85.8958% / 86.3333% / 86.0833%
Minimum / 83.3333% / 82.2917% / 82.2917% / 83.3333%
Maximum / 90.6250% / 88.5417% / 90.6250% / 90.6250

The data files that I used can be found here:

ftp://www.azfms.com/pub/traffic/15min/2003_12/phx_20031201_15min.txt.gz

ftp://www.azfms.com/pub/traffic/15min/2003_12/phx_20031202_15min.txt.gz

ftp://www.azfms.com/pub/traffic/15min/2003_12/phx_20031203_15min.txt.gz

ftp://www.azfms.com/pub/traffic/15min/2003_12/phx_20031204_15min.txt.gz

Below is the code that I used to prepare the Data for the MLP:

%Loads the 4 days in december that the MLP will analyze

load phx_20031201_15min;

load phx_20031202_15min;

load phx_20031203_15min;

load phx_20031204_15min;

%This for loop adds every sample from station 1.

%Because there are 519 stations in the data, it skips 519 rows to get to

%the next result from station 1

station1 = [];

for ct =1 :519:49824,

station1 = cat(1,station1,phx_20031201_15min(ct,:));

end;

%This adds data from a second day to train with.

for ct =1 :519:49824,

station1 = cat(1,station1,phx_20031202_15min(ct,:));

end;

%This adds data from a third day to train with.

for ct =1 :519:49824,

station1 = cat(1,station1,phx_20031203_15min(ct,:));

end;

%This loop creates a vector time that contains the numbers 1 through 96

%looped 49824 times this becomes the time index of the day. Each increase

%is 15 minutes that has gone by.

[row col] = size(station1);

time = 1;

for ct = 2:row,

time = cat(1,time,mod(ct,97));

end;

%These lines adds temp which is the sum of 36 and 37, the truck and car volumes

%of lane b

temp = station1(:,36) + station1(:,37);

newstat1 = cat(2, time,temp);

temp = station1(:,47) + station1(:,48);

newstat1 = cat(2, newstat1,temp);

%Adds the volume of lane c

temp = station1(:,58) + station1(:,59);

newstat1 = cat(2, newstat1,temp);

%Adds the volume of lane d

temp = station1(:,69) + station1(:,70);

newstat1 = cat(2, newstat1,temp);

%Adds the Speed and volume of lane e

%This Line adds the output vector, the average speed of all cars in all

%lanes. The MLP will try to predict how fast the cars are moving based on

%the time of day and how many cars are present.

newstat1 = cat(2, newstat1,station1(:,114));

%Clear out the temp variables

clear time;

clear temp;

clear test1;

clear newtest1;

%This for loop adds the testing sample from station 1.

%Because there are 519 stations in the data, it skips 519 rows to get to

%the next result from station 1

test1 = [];

for ct =1 :519:49824,

test1 = cat(1,test1,phx_20031204_15min(ct,:));

end;

%This loop creates the time index from 1 to 96.

[row col] = size(test1);

time = 1;

for ct = 2:row,

time = cat(1,time,mod(ct,97));

end;

%These lines adds temp which is the sum of 36 and 37, the truck and car volumes

%of lane b

temp = test1(:,36)+test1(:,37);

newtest1 = cat(2, time,temp);

temp = test1(:,47)+test1(:,48);

newtest1 = cat(2, newtest1,temp);

%Adds the volume of lane c

temp = test1(:,58)+test1(:,59);

newtest1 = cat(2, newtest1,temp);

%Adds the volume of lane d

temp = test1(:,69)+test1(:,70);

newtest1 = cat(2, newtest1,temp);

%Adds the volume of lane e

%This is the final addition to the testing vector. The output of the

%average speed that all cars are moving at. With this as the output we can

%see how well the MLP learned the road.

newtest1 = cat(2, newtest1,test1(:,114));

With the data that I found, and using the MLP that I did, there were many different possible configurations. I could have used the input of the speed of the cars in the individual lanes, and matched that with the volume of the entire highway. I could have found the correlation between the time of day and the speed that people drive. The data on this webpage goes back an entire year. It is possible to plot an entire year of traffic history and find the trends and connections between the month, time of day, lanes in use, speed, or any other piece of information gathered by an Inductive Loop Detector.

My MLP was able to predict the speed of traffic with over 86% accuracy, knowing only the time of day and the number of cars on the road. This project showed me that the methods of back propagation using a multi layer perceptron can work to classify any pattern. I found that data for almost anything is available, and if we find the right way to use that data, we can create a machine that learns.