2004 ASurveyofOutlierDetectionMethod

(Hodge & Austin, 2004) ⇒ Victoria Hodge, and Jim Austin. (2004). “A Survey of Outlier Detection Methodologies.” In: Artificial Intelligence Review Journal, 22(2). doi:10.1023/B:AIRE.0000045502.10941.a9

Subject Headings: Outlier Detection, Outlier Detection Algorithm.

Notes

2009

(Chandola et al., 2009) ⇒ Varun Chandola, Arindam Banerjee, and Vipin Kumar. (2009). “Anomaly Detection: A survey.” In: ACM Computing Surveys, 41(3) doi:10.1145/1541880.1541882

Cited By

Quotes

Author Keywords

anomaly - detection - deviation - noise - novelty - outlier - recognition

Abstract

Outlier detection has been used for centuries to detect and, where appropriate, remove anomalous observations from data. Outliers arise due to mechanical faults, changes in system behaviour, fraudulent behaviour, human error, instrument error or simply through natural deviations in populations. Their detection can identify system faults and fraud before they escalate with potentially catastrophic consequences. It can identify errors and remove their contaminating effect on the data set and as such to purify the data for processing. The original outlier detection methods were arbitrary but now, principled and systematic techniques are used, drawn from the full gamut of Computer Science and Statistics. In this paper, we introduce a survey of contemporary techniques for outlier detection. We identify their respective motivations and distinguish their advantages and disadvantages in a comparative review.

1.0 Introduction

Outlier detection encompasses aspects of a broad spectrum of techniques. Many techniques mployed for detecting outliers are fundamentally identical but with different names chosen by the authors. For example, authors describe their various approaches as outlier detection, novelty detection, anomaly detection, noise detection, deviation detection or exception mining. In this paper, we have chosen to call the technique outlier detection although we also use novelty detection where we feel appropriate but we incorporate approaches from all five categories named above. Additionally, authors have proposed many definitions for an outlier with seemingly no universally accepted definition. We will take the definition of Grubbs (Grubbs, 1969) and quoted in Barnett & Lewis (Barnett and Lewis, 1994):: An outlying observation, or outlier, is one that appears to deviate markedly from other members of the sample in which it occurs.

A further outlier definition from Barnett & Lewis (Barnett and Lewis, 1994) is:: An observation (or subset of observations) which appears to be inconsistent with the remainder of that set of data. In figure 2, there are five outlier points labelled V, W, X, Y and Z which are clearly isolated and inconsistent with the main cluster of points. The data in the figures in this survey paper is adapted from the Wine data set (Blake + Merz, 1998).

John (John, 1995) states that an outlier may also be “surprising veridical data”, a point belonging to class A but actually situated inside class B so the true (veridical) classification of the point is surprising to the observer. Aggarwal (Aggarwal and Yu, 2001) notes that outliers may be considered as noise points lying outside a set of defined clusters or alternatively outliers may be defined as the points that lie outside of the set of clusters but are also separated from the noise. These outliers behave differently from the norm. In this paper, we focus on the two definitions quoted from (Barnett and Lewis, 1994) above and do not consider the dual class-membership problem or separating noise and outliers.

…

A more exhaustive list of applications that utilise outlier detection is:

Fraud detection - detecting fraudulent applications for credit cards, state benefits or detecting fraudulent usage of credit cards or mobile phones.
Loan application processing - to detect fraudulent applications or potentially problematical customers.
Intrusion detection - detecting unauthorised access in computer networks.
Activity monitoring - detecting mobile phone fraud by monitoring phone activity or suspicious trades in the equity markets.
Network performance - monitoring the performance of computer networks, for example to detect network bottlenecks.
Fault diagnosis - monitoring processes to detect faults in motors, generators, pipelines or space instruments on space shuttles for example.
Structural defect detection - monitoring manufacturing lines to detect faulty production runs for example cracked beams.
Satellite image analysis - identifying novel features or misclassified features.
Detecting novelties in images - for robot neotaxis or surveillance systems.
Motion segmentation - detecting image features moving independently of the background.
Time-series monitoring - monitoring safety critical applications such as drilling or high-speed milling.
Medical condition monitoring - such as heart-rate monitors.
Pharmaceutical research - identifying novel molecular structures.
Detecting novelty in text - to detect the onset of news stories, for topic detection and tracking or for traders to pinpoint equity, commodities, FX trading stories, outperforming or under performing commodities.
Detecting unexpected entries in databases - for data mining to detect errors, frauds or valid but unexpected entries.
[[Detecting mislabelled data in a training data set.

Outliers arise because of human error, instrument error, natural deviations in populations, fraudulent behaviour, changes in behaviour of systems or faults in systems. How the outlier detection system deals with the outlier depends on the application area.

…

References

;

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2004 ASurveyofOutlierDetectionMethod	Victoria Hodge Jim Austin			A Survey of Outlier Detection Methodologies				10.1023/B:AIRE.0000045502.10941.a9		2004