2009 OnPathAnomalyDetection

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Outlier Detection Task, Graph Path Distance Function, Perimeter-based Graph Path Distance Function, Spatial-Region-based Graph Path Distance Function.

Notes

Cited By

Quotes

Author Keywords

Path; Anomaly detection; Distance metrics

Abstract

The purpose of anomaly (outlier) detection is to find a small group of objects that are numerically distant from the rest of the data set. It is generally applied to identifying anomalies from normal patterns and events in urban traffic flow, trends in air quality change, human activities in urban environments, route quality assurance and control, etc. Traditional research in this field has focused on cases where objects can be represented as points or sequences, and popular methods include clustering, distribution, and distance-based methods. In this paper, we present a new type of outlier detection, path outlier detection, in a large spatial graph typically representing a transportation network. Two types of abnormal paths are presented and their corresponding applications are discussed. To perform path anomaly detection, we propose two fundamental distance metrics, spatial-region-based and perimeter-based, along with path segmentation based metrics to capture local feature differences and to jointly combine similarity and dissimilarity. Search algorithms for three distance metrics and outlier detection algorithms are provided to detect abnormal paths and potentially assist with identifying abnormal events or phenomena which occur during a trip. Experiments were performed on synthetic data sets that correspond to two real-world scenarios, and the results show that the efficient perimeter-based distance metric is very effective when used with path segmentation to capture local features and global features, and to combine similarity and dissimilarity.

1. Introduction and background

2. Related work

2.1. Similarity metrics for sequences 2.2. Similarity metrics between clusters 2.3. Similarity metrics between trajectories

3. Distance metrics for paths

Concepts of graph theory, including graph, path, path distance, and shortest path are presented, followed by two fundamental distance or dissimilarity measures for paths: Perimeter Based (PB), and Spatial-Region-Based (SRB). Path segmentation based metrics to capture local feature differences and to combine similarity and dissimilarity together are then presented.

3.1. Basic concepts

3.2. Distance metrics

Possible factors considered in a distance measure for path dissimilarity are a path’s starting vertex and ending vertex, length, the intermediate vertices and their orders, shared edges with other paths, etc. Different distance metrics may use different factors to capture characteristics of a path, which depend on the actual application. A constraint imposed on the path dissimilarity computation is that paths are located in the same graph. Based on these factors, several distance metrics are proposed and their corresponding search algorithms are provided for path outlier detection.

3.2.1. Perimeter-based distance (PB distance)

This approach uses the perimeter of the region R, formed by two paths, P1 and P2, the shortest path from P1’s starting vertex, O1, to P2’s staring vertex, O2, and the shortest path from P1’s ending vertex, D1, to P2’s ending vertex, D2, to measure the distance between these two paths. …

3.2.2. Spatial-region-based distance (SRB distance)

Given graph G (hV, Ei) and two paths P1 = hvs. . .vei and P2 = hvs0,. . .,ve0i, the region-based distance dSR(P1, P2) between paths P1 and P2 is defined as.

  • dRðP1; P2Þ ¼ dGðms; m0sÞ þ dGðme; m0eÞ þXe 2E jej; where
  • E ¼ feje 2 E ^ e 2 Rg; ð4Þ

where R is the spatial region determined (enclosed) by P1, P2, Ps(vs, vs), and Ps(ve, ve), E* is the collection of the edges identified through the spatial intersection of the graph and the region R, the same as the one discussed in Section 3.2.1.

The motivation of SRB distance is that the space area of the region R would intuitively provide useful information about the dissimilarity between these two paths.

SRB distance has two major advantages. First, it captures the spatial closeness between two paths. If two paths are more spatially distant, intuitively the value of the SR distance between these two paths will be larger. This is straightforward when the paths to be compared have close start vertices and close end vertices. In the special case where two paths have reversed positions about the start and end vertices, which means the start vertex of path P1 is close to the end vertex of path P2 and the start vertex of P1 is close to the end vertex of P2, the region may not be well identified. However, the SRB distance considers the distance between the start (or end) vertices.

3.2.3. Path segmentation to capture local features

3.2.4. Combine similarity and dissimilarity with path segmentation

4. Algorithms

4.1. Outlier detection algorithm

4.2. Perimeter-based distance with path segmentation (PBPS distance)

4.3. PSB distance

4.4. Spatial-region-based distance (SRB distance)

5. Experiments and result analysis

5.1. Experiment settings

5.2. Accuracy comparison

5.3. False positive analysis

5.4. False negative analysis

6. Conclusion

7. Future research


,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2009 OnPathAnomalyDetectionQifeng Lu
Feng Chen
Kathleen Hancock
On Path Anomaly Detection in a Large Transportation NetworkJournal of Computers, Environment, and Urban Systems10.1016/j.compenvurbsys.2009.07.0092009