Missing Data Value
- AKA: Undefined Data Value, Missing Value, Unknown Data Value, Unlabeled, Missing Values.
- It can represent:
- It can violate a Data Attribute Constraint that a Data Attribute not contain a Missing Data Value.
- It can break a Data Analysis Algorithm's Input Assumptions.
- See: Testing Record, Data Value, Recorded Data Value.
- In Statistics, missing values are a common occurrence. Several statistical methods have been developed to deal with this problem. Missing values mean that no Data value is stored for the variable in the current Observation. Modern Statistical Packages have made dealing with missing values much easier. Often these use a Maximum Likelihood Estimation for Summary Statistics, Confidence Intervals, etc.
- Techniques of dealing with missing values
- Imputation (Statistics)
- EM imputation, i.e.expectation-maximization imputation: see expectation-maximization algorithm)
- Full Information Maximum Likelihood Estimation
- Indicator Variable
- Listwise Deletion/casewise deletion
- Pairwise Deletion
- Mean Substitution
- MCAR (missing completely at random)
- Censoring (Statistics)
- (Witten & Frank, 2000) ⇒ Ian H. Witten, and Eibe Frank. (2000). “Data Mining: Practical Machine Learning Tools and Techniques with Java implementations." Morgan Kaufmann.
- QUOTE: Most dataset encountered in practice … contain missing values. … You have to think carefully about the significance of missing values. They may occur for a number of reasons, such as malfunctioning measurement equipment, chances in experimental design during data collection, and collation of several similar but not identical datasets.
- (Kohavi & Provost, 1998) ⇒ Ron Kohavi, and Foster Provost. (1998). “Glossary of Terms.” In: Machine Leanring 30(2-3).
- QUOTE: Missing value: The value for an attribute is not known or does not exist. There are several possible reasons for a value to be missing, such as: it was not measured; there was an instrument malfunction; the attribute does not apply, or the attribute's value cannot be known. Some algorithms have problems dealing with missing values.