CS 730 – 000

Hoang Vo

SIGKDD 2012 Paper

Mining large-scale, sparse GPS traces for map inference: comparison of approaches, SIGKDD 2012

Summary

The authors of the paper present solutions to the problem of road map derivation from GPS traces. The issue was that GPS data has low resolution and also low sampling frequency.

The side point was the comparison between the three state-of-the-art approaches: KDE algorithm, TC1 inference method and k-means algorithm. The differences are whether they work with individual points or inferred segments between consecutive data points.

How does it build upon, extend, or differ from other work

This workextends the existing Kernel Density Estimation to the KDE point and line segment methods and Trace Merging algorithms to TC1.

Key ideas in the paper

- There is a need to develop map from cheap and coarse-grained GPS trace data.

-Sparse data adaptation of the KDE algorithm was developed.

- There is a trade-off of oversensitive detection of roads that do not exist against missing sparsely sampled roads.

- The k-means method performed the worst, as it simply infer roads from noise that are incorrect. The authors clearly showed that the method is not suitable for map inference.

- As the sampling interval increases, the KDE algorithm using line segments perform much worse; the precision plunges quickly. On the other hand, the KDE algorithm using individual points has a better precision till a certain point, but it still suffers from the low “recall” effect, where roads are considered noises and discarded.

- The TC1 algorithm outperforms the other methods, despite producing some wring road segments when the sampling rate becomes higher.

What is novel or interesting

We can update maps easily using existing or new low-cost data obtained using low-resolution.

The paper proposed a good evaluation framework to compare methods instead of the traditional “eyeball-like” approaches. The new measure is called the “F-metric.”

It is interesting that higher sampling rates do not increase the performance of TC1. Actually, after a certain threshold, clusters start to merge incorrectly.

2. Positive/Strong Points

- S1: Comparing to previous papers studying GPS data with high resolution and high sampling frequency, this paper addresses a more realistic

- S2: The paper provided both quantitative and qualitative evaluation showing the contrasts between the three aforementioned methods.

- S3 The authors attempt to modify the existing algorithms to achieve better accuracy.

- S4 The authors clearly distinguish precision and recall as measures of “correctness” of the method.

- S5 The authors provide intuitive reasoning to explain the phenomena of one method outperforming others in term of recall or precision, even though

Negative/Weak Points

- W2 If the real roads are zigzagged and the real road density is high or the reverse might be true, one method is always better than the other. How do we know when which method is better? Some inherent prior knowledge about the map/region must be obtained to determine what the most suitable method is.

- W3 There is no assumption about the accuracy – GPS sensors might have inherent imprecision

- W4 Only two datasets were evaluated – Chicago and Shanghai, which are datasets with different characteristics. In each dataset, the structures of roads are similar to each other, so parameter tuning on different regions of the same datasets might not be very effective.

- W5The Chicago shuttle dataset is very small; the authors used the entire data set for both parameter tuning and cross-validating.

- W6 Some steps in the algorithms, especially in the TC1 algorithm, use heuristics, which is intuitive, but it does not guarantee the correctness. In particular, the trace segment selection assumes a vehicle would likely follow the straight line from a to b if a and b has “roughly” the same bearing. Perhaps the author should clarify this selection step, since the data sampled is quite sparse.

Research Questions and Points for Discussion

- D1 Can we apply the KDE methods into other fields?

- D2 The author suggested there might be a better way to extract the road centerline where there is a large number of data points between perpendicular roads and using single-linkage clustering would join everything into one cluster. What are possible alternatives?

- D3Can we apply any of the modified methods in extracting paths in a building – 3D sampling?

- D4The authors could develop a new approach to solve the trade-off between oversensitivity and missing roads by tuning parameters in the Segment selection step, even though this might be dataset-dependent.

- D5 The authors stated that currently there is no method to detect one-way and two-way roads, even though this is a very important feature.

- D6 The authors mentioned these methods work well for low resolution and low sampling rate data. What is your expectation of each method’s performance if the data set has a high resolution and high sampling rates?

- D7 How much effect does noise have in the precision/recall/F-measure of each method, since we know different GPS sensors might report inaccurately?