Concept Name 
Concept Definition 
Synonyms

Accuracy Metric 
An Accuracy Metric is a Classification Model Performance Metric based on the Proportion of Classifier's Correct Predictions to Incorrect Predictions on Unseen Labeled Testing Records.

Accuracy Estimation 
An Accuracy Estimation Process is a Validation Process that approximates the true value of a Classification Model's Accuracy based on a Data Sample.

Association Learning Task 
An Association Learning Task is a Learning Task that requires the discovery of Associations.

Categorical Set 
A Categorical Set is an Unordered Set that is Finite Set.

Classification Function 
A Classification Function is a Function whose Function Range is a Categorical Set (with Categorical Data Values). 
Classifier

Confounding Variable 
A Confounding Variable is a Random Variable in a Statistical Model that Correlates with both a Dependent Variable and an Independent Variable. 
Confounder.

Confusion Matrix 
A Confusion Matrix is a Matrix that represents the count of Probabilistic Classification Function's Predictions with respect to the Actuals on some Labeled Learning Set.

CostBenefit Function 
A CostBenefit Function is an OrdinalValued Function that assigns a Value to each Choice.

Data Cleaning Task 
A Data Cleaning Task requires the Detection and Removal of Erroneous Data Values and Data Records. 

Data Mining Activity 
A Data Mining Activity is an Activity performed by a Data Mining Practitioner to solve a Data Mining Task. 

Data Mining Discipline 
A Data Mining Discipline is an Academic Discipline that focuses on Data Analysis of large datasets from realworld problems. 

Data Mining Task 
A Data Mining Task requires automated Discovery of Patterns typically to support human Decision making. 

Data Mining Practice 
A Data Mining Practice is the Applied Practice of solving RealWorld Data Mining Tasks. 

Data Record Attribute 
A Data Record Attribute is a 2Tuple composed of a Value and a Metadata Record that represents a single property of a Data Record.

Data Record Set 
A Data Record Set is a set of Data Records that share the same Data Record Schema. 
Dataset

Eager Learning Algorithm 
An Eager Learning Algorithm is a Learning Algorithm that involves a Training Phase (to induce a Total Predictive Function). 

Error Rate Metric 
An Error Rate Metric is the Inverse Function of an Accuracy Metric.

Exploratory Data Analysis Task 
An Exploratory Data Analysis Task is a Data Analysis Task that aims to formulate Hypotheses. 


False Negative Rate 
A False Negative Rate is a Predictive Relation Performance Metric that is based on the Probability that a Predictive Relation will make the Incorrect Prediction of mapping a False Test Instance to a Negative Prediction.

False Positive Rate 
A False Positive Rate is a Predictive Relation Performance Metric that is based on the Probability that a Predictive Relation will Incorrectly Predict that a False Test Instance is a True Test Instance (i.e. make a Positive Prediction). 
FPR, Type 1 Error Rate

Feature Vector 
See Vectorized Learning Record.

Finite Ordered Set 
A Finite Ordered Set is an Ordered Set that is a Finite Set. 
Ordinal Set

IID Sample 
An IID Random Variable Set is a Random Variable Set where all random variables are in a Statistical Independence Relation and in an Identical Distribution Relation. 

Information Extraction Task 
An Information Extraction Task requires the populating a Data Structure from the Data contained in a set of Artifacts. 

Information Retrieval Task 
An Information Retrieval Task requires the identification of Artifacts from a Corpus that are relevant to a specified Query.

Instancebased Learning Algorithm 
An Instancebased Learning Algorithm is a Learning Algorithm that does not generalize in terms of a higher language than the instances themselves. 

Lazy Learning Algorithm 
A Lazy Learning Algorithm is a Supervised Learning Algorithm that does not involve a Training Phase.

Learning Record Attribute 
A Learning Record Attribute is a Data Attribute of a Learning Record. 
Feature

Learning Record 
A Learning Record is a Data Record that can be used as Input to a Learning Task.

Example, Instance

Machine Learning Research 
A Machine Learning Research is a Research Domain that investigates Machines improving Performance over time (such as via Reasoning with Inductive Logic). 

Missing Data Value 
A Missing Data Value is Data Record Attribute with no Data Value. 

Modelbased Learning Algorithm 
A Modelbased Learning Algorithm is a Learning Algorithm that represent their Predictive Model in a Formal Language that is more general than the Formal Language used to describe the Data. 

Numeric Interval 
A Numeric Interval is a Contiguous Numeric Subsequence of a Formal Number Sequence.

OLAP Task 
Online Analytical Processing Task is an Interactive Data Analysis Task that is restricted the summarizing past behavior.

Optimization Task 
An Cost Function Optimization Task is a General Task Type where an Optimal Solution must be provided (that optimizes a Cost Function).

Posthoc Analysis Task 
A Posthoc Analysis Task analyzes collected Data Records that were not intentionally collected to test a Hypothesis. 

Precision Metric 
A Precision Metric is a Performance Metric of the Probability that a given Classification Model's Positive Prediction is a Correct Prediction. 

Predictive Function 
A Predictive Function is a Function that can Map a Learning Record to a Target Value. 
Model Target Function

Randomized Controlled Experiment 
A Randomized Controlled Experiment is a Scientific Experiment that tests a Treatment on a Randomly created Treatment Group and a Placebo on a Distinct and Randomly created Control Group. 

Recall Metric 
A Recall Metric (is a Performance Metric for a Predictive Relation that) Estimates the Probability of a True Positive Prediction (a Correct Prediction for True Test Instances). 
Sensitivity True Positive Rate.

Regression Algorithm 
A Regression Algorithm is a Supervised Learning Algorithm that can solve a Regression Task. 
Regressor

Sequence 
A Sequence is a Multiset of Sequence Members in a Partial Order Relation.

SemiSupervised Learning Task 
A SemiSupervised Learning Task is a Supervised Learning Task with access to an Unlabeled Training Records. 

set 
A set is an Abstract Entity that can Represent Zero or more Distinct Set Members.

Statistical Hypothesis Test 
A Statistical Hypothesis Test is a Data Analysis Task that seeks to Validate a Hypothesis. 
Confirmatory Data Analysis

Supervised Learning Task 
A Supervised Learning Task is a Learning Task where some Labeled Training Records are provided.

Target Attribute 
A Target Attribute is a Learning Record Attribute whose behavior is to be modeled by a Supervised Learning Task.

Testing Record 
A Testing Record is a Data Record with a Target Class that is a available during a Learning Task's Training Phase.

Text Mining Task 
A Text Mining Task is a Data Mining Task whose input largely involves Text Data. 
Text Analysis

Training Record 
A Training Record is a Data Record that is a available during a Learning Task's Training Phase. 
Case, Examplar, Example

True Negative Rate 
A True Negative Rate is the Probability that a Predictive Logic Relation will correctly map a False Test Instance to a Negative Prediction. 
Specificity

Tuple 
A Tuple is a Finite Sequence of Fixed Sequence Length 

Unsupervised Learning Task 
An Unsupervised Learning Task is a Learning Task where no Labeled Training Cases are provided. 

Vector 
A Vector is a Number Tuple that Represents a point in some Vector Space. 
