Travel Time Distributions Using Copilot GPS Tracks

Travel Time Distributions Using CoPilot GPS Tracks

Santiago Arroyo

Laura Friese

Alain Kornhauser

January 14, 2003

NJ TIDE Paper 2003-1

Transportation Information and Decision Engineering (TIDE)Center

Princeton University, Princeton New Jersey

We began our project by choosing the cities we were going to analyze. For the urban region we chose Denver, CO, since it gave us 101,363 points for Copilot data and 15,858 for Qualcom data, and two interstates go through it, I-70 and I-25. For the rural area, our selection was Staunton, VA, having 74,773 points for CoPilot data and 12,889 for Qualcom data, and interstates I-81 and I-64.

All the data was created in CoPilot and analyzed in Splus. To recreate the data for Denver in CoPilot, map Denver, CO then zoom out twice. To recreate the Staunton data, map Staunton, VA then zoom out five times.

DENVER, COLORADO

CoPilot data

Links: We first started by analyzing links separately. Our way to go was to choose the links that had the most data points, since this would make the analysis more significant. For example, for CoPilot data, we began by selecting the link with the most data points (Link=781, Grid=27057316) which had a volume of 1848 for direction 2(this link is a two-way local road):

We can see how many of the points do not seem to be of this link; however, because there is no horizontal link here (which seems to be the right one), CoPilot associates the red and green data to link 781 (vertical). We can also see how for this data with low snap values, the speed covers all ranges:

We can see that the quality of fit is good for high speeds, while for low ones (below 15 mph) we can have all sorts of values, from 0.4 to 1.1. Clearly, values above 0.93 are wrong, so for further analysis, these points are removed, along with points whose snap value is below 0.5, since the amount of data allows us to do so. When doing this we get the following histogram and distributions for the speed:

The two-headed distribution is the empirical one, and comparing it to theoretical ones, we see that none of the distributions seem to fit the data.

The same behavior appears for the opposite direction, where we have 1100 points:

We also tried to see if this two-headed behavior was due to different speed at different times of day (above, right), but we found that the whole range of speeds appear for all the times measured.

For the quality of fit, we get the following distribution:

We see how a significant accumulation of points below a snap value of 0.75 prevent the distribution for the quality of fit from being normal. As we saw before, these low snap values correspond to points whose speed is low;however, we can not discard them, since clearly they are not outliers. It is difficult to know if their snap value is low because of the speed or, because of the low speed, they correspond to another link and that is why the snap value is low! Clearly, none of the distributions we have fits this behavior.

Another link was also tested, this time on I-70 and we found that the results were somehow different. First of all, there were no low speeds and no low snap values; however a clear direct relationship between the snap values and the speed was evident:

In general, we found this pattern several times, that is, there is a better quality of fit for higher speeds. The distribution for the snap values is similar to the one obtained before, in a lesser degree nevertheless (the scale is different).

The speed distribution was “smoother” than for the local road, but the “two-headed” behavior still appeared (obviously due to the different quality of fit):

It is clear that if it werenot for the “head” at the low speed values, the speed would have a normal distribution. We also note, comparing this speed graph to the one obtained for the previous link, that the average for interstates is higher.This is rather obvious, but what is important is that because of this higher speed we get a smoother distribution.

However, for the opposite direction, we get a similar snap density but different density for speed (this behaves more like a normal density):

Direction / Mean Speed / Stdev Speed / Mean Snap / Stdev Snap / Volume
1 / 61.1083 / 8.455042 / 0.9049595 / 0.02624387 / 494
2 / 64.37827 / 4.876851 / 0.908125 / 0.02446556 / 336
Both / 62.43205 / 7.396192 / 0.906241 / 0.0255712 / 830

The low speeds seem to appear more on Direction 1, maybe due to more frequent traffic jams or the inclination of the road.

Analyzing other different links, we found that as the roads became less local, the speed distribution resemblec more the one shown above. For local roads (type=8), the speed density seemed more like the first link analyzed and sometimes its distribution approached a flat plateau, suggesting, but not quite exactly, a uniform distribution:

Chain of links: Next, we looked at a chain of threelinks that include the first link analyzed. The snap vs. speed plot is very similar to the one obtained for the link:

The snap and speed distributions also showed the same behavior as the link itself, but to a lesser extent:

Notice how the snap concentration for low values is less (in relative terms) than the one for the link itself and the speed distributionstrongly resembles the density obtained for the link; so again, we cannot find a regular distribution to fit to the speed values. This also happens for the chain of links containing the second link analyzed (interstate); the distributions the one obtained for the link.

Road Type: Next, we analyzed the speed distribution and quality of fit by road type. For this particular urban area, we have all the road types except 2 and 5 (ferry). What we do is group all the points of a certain road type in the data and obtain the different distributions. For example, if we pick road type 3 (6163 points), the speed distribution is:

We see how this speed is again not exactly normal but skewed to the right. However, the quality of fit does present a normal behavior.

If we pick another road type, like local ones (8), then both the speed and snap distributions change:

We see how the quality of fit has a left head (which we have seen in the individual links already). As for the speed, for this road type, the distribution is skewed to the left, resembling a gamma or lognormal function.

We then compared the descriptive statistics for each road type:

Road Type / Mean Speed / Stdev Speed / Mean Snap / Stdev Snap / Volume
1 / 55.64934 / 11.62169 / 0.8865312 / 0.04670799 / 24516
3 / 28.73289 / 10.33909 / 0.8237709 / 0.02861672 / 6163
4 / 29.95902 / 10.6849 / 0.8112911 / 0.0412376 / 5166
6 / 26.0136 / 9.695896 / 0.8165954 / 0.03800906 / 37305
7 / 32.53183 / 13.1067 / 0.8106576 / 0.04610697 / 4106
8 / 19.03992 / 9.501062 / 0.7879132 / 0.07606606 / 22877

From this table we can see something we have mentioned before; that is, as the speed increases, so does the quality of fit. We also noticed the differences in mean speed, which of course are higher for interstates, slightly lower for ramps, and lower for the other types, the lowest being the local roads.

Truck data

Links: Next, we analysized the truck data. The procedure followed was the same as for the CoPilot data, except that we also differentiate by time of day. We picked different links, since the links that had significantly large data for Qualcom did not have associated CoPilot data. Additionally, the links that had the highest volume associated to them were interstates, and in most of the cases, the data points were unreliable, like for this link, which had the highest volume:

We picked another link which we thought had better data; of course, the number of points was not as big (around 70). For the link with the most reliable data (road type=1, US 36 I-70), we choose time of day 3 (it had the most points), direction 2, and we found the following relationship between quality of fit and speed:

It is strongly different for the relationships we had obtained for the CoPilot data. First of all, we have a large amount of points with low snap values and we also have some with very high speeds. Clearly the are many erroneous points, and to make the analysis we only kept data with snap values between 0.4 and 0.93. The quality of fit and speed distributions obtained were:

Both density functions have the same shape, a little skewed to the right. They are not exactly normal, and the average snap values were lower than for the links analyzed with the CoPilot data. If we do the same analysis for the other direction, then the following distributions are obtained:

The difference of the behavior between both directions is obvious, and can be due to the fact that the amount of data is low and different for each one, or maybe due to traffic conditions. Nevertheless, some statistics, like speed, are similar for both directions:

Direction / Mean Speed / Stdev Speed / Mean Snap / Stdev Snap / Volume
1 / 36.42609 / 12.16866 / 0.7128986 / 0.1229192 / 69
2 / 37.65595 / 13.51827 / 0.6572619 / 0.09376144 / 84
Both / 37.10131 / 12.89997 / 0.6823529 / 0.1110462 / 153

We extend this analysis for the different times of day (knowing that for many of them the data will be insufficient, and we aggregate both directions to alleviate this), and obtain the following table:

Time of day / Mean Speed / Stdev Speed / Mean Snap / Stdev Snap / Volume
1 / 23.8625 / 18.86084 / 0.63875 / 0.1323888 / 8
2 / 30.34151 / 13.68928 / 0.719434 / 0.08753933 / 53
3 / 37.10131 / 12.89997 / 0.6823529 / 0.1110462 / 153
4 / 23.45161 / 11.72581 / 0.6862903 / 0.1066515 / 62
5 / 35.53333 / 13.88778 / 0.7538095 / 0.06568685 / 21
6 / 34.22 / 9.417393 / 0.67525 / 0.1110553 / 40
7 / 40.89091 / 10.58152 / 0.6963636 / 0.1059413 / 55
8 / 37.46818 / 12.98737 / 0.6968182 / 0.1324993 / 22
Total / 34.10097 / 13.53373 / 0.6924155 / 0.1078424 / 414

This table shows how the behavior is similar for all times of day for this particular link, except for times of day 1 and 4. For 1, clearly there are not enough points to draw any conclusion and the high standard deviation for the quality of fit make the results even less reliable. For time of day 4 (3pm-6pm) we suspect we will be dealing with rush hour in this particular link since the speed is lower then for all other times of day.For time of day 2 the same appears to happen, to a lesser extent. The distribution for speed, however, is very different for the different times of day, while the quality of fit exhibits similar two-headed behaviors. For example, for time of day 4 we get the following speed and quality of fit distributions;

We see how the speed now resembles somewhat a gamma or lognormal distribution.

Chain of Links: We then choose a chain of links that contained the link previously analyzed. This is a set of 9 links in US36 with direction 2, the same as before:

We get 173 (2 times as the link alone) data points for time of day 3, whose quality of fit and speed distributions are the following:

We see how the quality of fit resembles\d the one obtained for the link, but with a smaller bump to the left. This would mean that the new data points have a higher quality of fit value, and as we see on the right, mostly low speeds, decreasing the second head on the speed density distribution as compared to the one obtained for the link. This can also be seen with the mean speed; we get an average value of 34.48 mph for all the chain of links, about 3 mph lower than the value for the link. The fact that we include portions closer to ramps may have something to do with this. If we compare the quality of fit against the speed, we see how all the different speeds have all the range of snap values:

We also analyzed the main statistics for the different times of day, summarized in the following table:

Time of day / Mean Speed / Stdev Speed / Mean Snap / Stdev Snap / Volume
1 / 25.75333 / 17.46244 / 0.6373333 / 0.1273054 / 15
2 / 28.14505 / 13.6003 / 0.6658559 / 0.184265 / 111
3 / 34.44311 / 14.00404 / 0.6552994 / 0.1658749 / 334
4 / 23.79225 / 11.02949 / 0.6531783 / 0.1664378 / 129
5 / 37.16667 / 14.95006 / 0.665641 / 0.1830702 / 39
6 / 32.79495 / 12.59895 / 0.6305051 / 0.1744459 / 99
7 / 40.47236 / 10.97637 / 0.6830081 / 0.1493315 / 123
8 / 36.63902 / 13.15254 / 0.647561 / 0.1917783 / 41
Total / 32.83962 / 13.99299 / 0.6571717 / 0.1685832 / 891

We see that the results are similar to the ones obtained for the link.

An analysis was also made for the link (and chain of links) with the greatest number of data (in I-270), but as we have said, the data seems too bad and clearly belongs to another road. The results obtained (like mean speeds of 18 mph for example) thus are clearly useless.

Road Type: Finally, we made an analysis for the different road types as we did before with the CoPilot data. It is clear however that most of the data is for road type 1, since Level 1 was used when loading the data into Copilot. We also chose time of day 3, since, as we have seen, it is a representative of the overall data. We obtain the following values for the speed and quality of fit:

Road Type / Mean Speed / Stdev Speed / Mean Snap / Stdev Snap / Volume
1 / 24.84693 / 16.8508 / 0.5759912 / 0.1717506 / 10195
3 / 19.55099 / 13.51089 / 0.6746672 / 0.1236534 / 1322
4 / 22.96908 / 16.82245 / 0.5992647 / 0.1714613 / 816
6 / 21.04959 / 13.27991 / 0.6218513 / 0.135854 / 632
7 / 23.21929 / 13.17945 / 0.6242577 / 0.1353156 / 1839
8 / 15.21034 / 7.430648 / 0.5810345 / 0.09943931 / 29

In general, there are very low speed values for all road types, and again, their standard deviations are high. This is clearly due to the fact that this data is so sparse and the snap so low, that most of them do not seem to correspond to the roads snapped to.

The quality of fit showed, however, different densities depending on the road type. Some snap values had distributions similar to the one obtained for a link alone, while some others were like this:

It is clear how for road type 3 the normal quality of fit is obvious, while for road type 4 a two-headed distribution appears. We should take into account that the number of data for Qualcom is not as high as for CoPilot, which makes these results less reliable.

We obtained the following speed distributions, knowing all we have said about how suspicious the data is when fitting to the links:

Although we would tend to think that the speed has a lognormal or gamma distribution for the different road types, the dispersion of the data prevents us from doing that, since these low values of speed are mostly due to the fact that probably the data points are not being associated with the correct links.

STAUNTON, VIRGINIA

Copilot data

First, we will analyze the copilot data for one link, namely Link 18 in Grid 10217740. This link is part of Interstate 81, northeast of Staunton. The following graph is a scatterplot of the data that includes all snap values:

This data is clearly in four distinct groups; the largest being the data with high speed and snap values. Also, the data shows that there is a strong relationship between speed and snap value, that higher speeds have higher snap values. Since snap values below 0.5 and above 0.93 are likely errors, we removed them from the data for the following analysis. The next two graphs plot the density of the snap and speed distributions along with the densities for common distributions; neither seem to match any known distributions:

The following graph is a histogram of the speed distribution:

This data shows that there are two collections of speeds, one centering around 70 mph, the other around 20 mph. The rest of the analysis for this link will try to discover what is causing this split.

Perhaps the difference in the data is due to direction of travel. This next chart shows the means and standard deviations for snap and speed values by direction:

Sample / Snap Mean / Snap SD / Speed Mean / SpeedSD / # Data Points
Both Directions / 0.871 / 0.035 / 68.5 / 12.6 / 1594
Direction 1 / 0.877 / 0.046 / 65.5 / 16.6 / 820
Direction 2 / 0.864 / 0.015 / 71.8 / 4.2 / 774

Direction 2 has a higher mean speed; perhaps Direction 2 is downhill on the highway. However, the standard deviation for Direction 1 is much higher than Direction 2’s; consider the scatter plots of speed vs. snap values for each direction (all snap values included):

Direction 1 Direction 2

Interestingly, the low speeds are almost entirely in Direction 1. These next two histograms are of the speed by direction without data with snap values above 0.93 and below 0.5:

Direction 1 Direction 2

Clearly, the low speeds on this link are due to Direction 1. Two possibilities are that there was a traffic jam only in Direction 1 when this data was collected or that these points should have been snapped to a different road that has slower traffic. Here is the CoPilot map of Link 18 (highlighted link):

Next to and almost parallel to Interstate 81 is Link 322, part of State Route 1916, a Class 8 local road. Direction 1 is from A to B, which would run on the side of Interstate 81 closest to State Route 1916. Here is a zoomed in section of the above map with the CoPilot GPS points, clearly showing that the northbound traffic (Direction 1) is closer to State Route 1916 than the southbound traffic (Direction 2). Note that Interstate 81 is in blue and Link 322 of State Route 1916 is highlighted: