Outlier Detection (OD) Task

An Outlier Detection (OD) Task is a detection task that detects outlier observations in a data set.

AKA: Anomaly Detection.
Context:
- It can range from being a Univariate Outlier Detection Task to being a Multivariate Outlier Detection Task (such as a bivariate outlier detection task).
- It can range from being a Numerical Outlier Detection Task to being a Categorical Outlier Detection Task.
- It can range from being a Unordered-Data Outlier Detection Task to being a Sequential-Data Outlier Detection Task (such as a temporal outlier detection task).
- It can range from being a I.I.D. Outlier Detection Task to being a Non-I.I.D. Outlier Detection Task.
- It can be solved by an Outlier Detection System (that applies an outlier detection algorithm).
- It can find interesting outliers.
- …
Example(s):
- Temporal Anomaly Detection, such as: fraudulent event detection.
- Spatial Outlier Detection, such as ..
- a Financial Event Outlier Detection Task.
- …
Counter-Example(s):
- a Direct Marketing Task.
- a Disease Diagnosis Task.
See: Out-of-Distribution Detection, Pattern Detection Task, Cost-Benefit Matrix, Distance Measure, Statistical Process Control.

References

2021

(Yang, Zhou et al., 2021) ⇒ Jingkang Yang, Kaiyang Zhou, Yixuan Li, and Ziwei Liu. (2021). “Generalized Out-of-Distribution Detection: A Survey.” arXiv preprint arXiv:2110.11334
- QUOTE:
  Figure 2: Exemplar problem settings for tasks under generalized OOD detection framework. Tags on test images refer to model’s expected predictions. (a) In sensory anomaly detection, test images with covariate shift will be considered as OOD. No semantic shift occurs in this setting. (b) In semantic anomaly detection and one-class novelty detection, normality/ID images belong to one class. Test images with semantic shift will be considered as OOD. No covariate shift occurs in this setting. (c) In multi-class novelty detection, ID images belong to multiple classes. Test images with semantic shift will be considered as OOD. No covariate shift occurs in this setting. (d) Open set recognition is identical to multi-class novelty detection in the task of detection, with the only difference that open set recognition further requires in-distribution classification. (e) Out-of-distribution detection is a super-category that covers semantic AD, one-class ND, multi-class ND, and open-set recognition, which canonically aims to detect test samples with semantic shift without losing the ID classification accuracy. (f) Outlier detection does not follow a train-test scheme. All observations are provided. It fits in the generalized OOD detection framework by defining the majority distribution as ID. Outliers can have any distribution shift from the majority samples.

2013

(Aggarwal, 2013) ⇒ Charu C. Aggarwal. (2013). “Outlier Analysis." Springer Publishing Company, Incorporated. ISBN:1461463955, 9781461463955 doi:10.1007/978-1-4614-6396-2
- QUOTE: … key applications of these methods as applied to diverse domains such as credit card fraud detection, intrusion detection, medical diagnosis, earth science, web log analytics, and social network analysis are covered.

(Hauskrecht et al., 2013) ⇒ Milos Hauskrecht, Iyad Batal, Michal Valko, Shyam Visweswaran, Gregory F Cooper, and Gilles Clermont. (2013). “Outlier Detection for Patient Monitoring and Alerting.” In: Journal of Biomedical Informatics, 46(1). doi:10.1016/j.jbi.2012.08.004

2012

(Chandola et al., 2012) ⇒ Varun Chandola, Arindam Banerjee, and Vipin Kumar. (2012). “Anomaly Detection for Discrete Sequences: A Survey.” In: IEEE Transactions on Knowledge and Data Engineering Journal, 24(5). doi:10.1109/TKDE.2010.235

2009

(Chandola et al., 2009) ⇒ Varun Chandola, Arindam Banerjee, and Vipin Kumar. (2009). “Anomaly Detection: A survey.” In: ACM Computing Surveys, 41(3) doi:10.1145/1541880.1541882
- QUOTE: Anomaly detection refers to the problem of finding patterns in data that do not conform to expected behavior. These non-conforming patterns are often referred to as anomalies, outliers, discordant observations, exceptions, aberrations, surprises, peculiarities or contaminants in different application domains. Of these, anomalies and outliers are two terms used most commonly in the context of anomaly detection; sometimes interchangeably. Anomaly detection finds extensive use in a wide variety of applications such as fraud detection for credit cards, insurance or health care, intrusion detection for cyber-security, fault detection in safety critical systems, and military surveillance for enemy activities.
  The importance of anomaly detection is due to the fact that anomalies in data translate to significant (and often critical) actionable information in a wide variety of application domains.

2005

(Ben-Gal, 2005) ⇒ Irad E. Ben-Gal. (2005). “Outlier Detection.” In: Maimon O. and Rockach L. (Eds.) Data Mining and Knowledge Discovery Handbook: A Complete Guide for Practitioners and Researchers," Kluwer Academic Publishers. ISBN:0387244352.
- ABSTRACT: Outlier detection is a primary step in many data-mining applications. We present several methods for outlier detection, while distinguishing between univariate vs. multivariate techniques and parametric vs. nonparametric procedures. In presence of outliers, special attention should be taken to assure the robustness of the used estimators. Outlier detection for data mining is often based on distance measures, clustering and spatial methods.

2004

(Hodge & Austin, 2004) ⇒ Victoria Hodge, and Jim Austin. (2004). “A Survey of Outlier Detection Methodologies.” In: Artificial Intelligence Review, 22(2). doi:10.1023/B:AIRE.0000045502.10941.a9

2003

(Markou & Singh, 2003) ⇒ Markos Markou, and Sameer Singh. (2003). “Novelty Detection: A Review — part 1: Statistical Approaches.” In: Signal processing, 83(12).
- QUOTE: Novelty detection is the identification of new or unknown data or signal that a machine learning system is not aware of during training. Novelty detection is one of the fundamental requirements of a good classification or identification system since sometimes the test data contains information about objects that were not known at the time of training the model. In this paper we provide state-of-the-art review in the area of novelty detection based on statistical approaches.

(Rousseeuw & Leroy, 2003) ⇒ Peter J. Rousseeuw, and Annick M. Leroy. (2003). “Robust Regression and Outlier Detection." Wiley-IEEE. ISBN:0471488550

2000

(Breunig et al., 2000) ⇒ Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng, and Jörg Sander. (2000). “LOF: identifying density-based local outliers.” In: Proceedings of ACM SIGMOD International Conference on Management of Data (SIGMOD 2000). doi:10.1145/335191.335388

1998

(Knorr & Ng, 1998) ⇒ E. Knorr, and Raymond Ng. (1998). “Algorithms for Mining Distance-based Outliers in Large Data Sets.” In: Proceedings of the 24th International Conference on Very Large Databases (VLDB 1998).
- NOTES: It defines outliers as those data points (vectors) with values different from those of the remaining set of data.