Gabor Melli Data Mining Glossary

A Gabor Melli Data Mining Glossary is a data mining glossary maintained by Gabor Melli.

Context:
- It is derived from Gabor Melli's Data Analysis Ontology.
- …
Counter-Example(s):
- (IBM, 2002)
- (Kohavi & Provost, 1998).
- (Valpola, 2000).
See: Statistics Glossary, Machine Learning Glossary.

Summary

Concept Name	Concept Definition	Synonyms
Accuracy Metric	An Accuracy Metric is a Classification Model Performance Metric based on the Proportion of Classifier's Correct Predictions to Incorrect Predictions on Unseen Labeled Testing Records.
Accuracy Estimation	An Accuracy Estimation Process is a Validation Process that approximates the true value of a Classification Model's Accuracy based on a Data Sample.
Association Learning Task	An Association Learning Task is a Learning Task that requires the discovery of Associations.
Categorical Set	A Categorical Set is an Unordered Set that is Finite Set.
Classification Function	A Classification Function is a Function whose Function Range is a Categorical Set (with Categorical Data Values).	Classifier
Confounding Variable	A Confounding Variable is a Random Variable in a Statistical Model that Correlates with both a Dependent Variable and an Independent Variable.	Confounder.
Confusion Matrix	A Confusion Matrix is a Matrix that represents the count of Probabilistic Classification Function's Predictions with respect to the Actuals on some Labeled Learning Set.
Cost-Benefit Function	A Cost-Benefit Function is an Ordinal-Valued Function that assigns a Value to each Choice.
Data Cleaning Task	A Data Cleaning Task requires the Detection and Removal of Erroneous Data Values and Data Records.
Data Mining Activity	A Data Mining Activity is an Activity performed by a Data Mining Practitioner to solve a Data Mining Task.
Data Mining Discipline	A Data Mining Discipline is an Academic Discipline that focuses on Data Analysis of large datasets from real-world problems.
Data Mining Task	A Data Mining Task requires automated Discovery of Patterns typically to support human Decision making.
Data Mining Practice	A Data Mining Practice is the Applied Practice of solving Real-World Data Mining Tasks.
Data Record Attribute	A Data Record Attribute is a 2-Tuple composed of a Value and a Metadata Record that represents a single property of a Data Record.
Data Record Set	A Data Record Set is a set of Data Records that share the same Data Record Schema.	Dataset
Eager Learning Algorithm	An Eager Learning Algorithm is a Learning Algorithm that involves a Training Phase (to induce a Total Predictive Function).
Error Rate Metric	An Error Rate Metric is the Inverse Function of an Accuracy Metric.
Exploratory Data Analysis Task	An Exploratory Data Analysis Task is a Data Analysis Task that aims to formulate Hypotheses.
False Negative Rate	A False Negative Rate is a Predictive Relation Performance Metric that is based on the Probability that a Predictive Relation will make the Incorrect Prediction of mapping a False Test Instance to a Negative Prediction.
False Positive Rate	A False Positive Rate is a Predictive Relation Performance Metric that is based on the Probability that a Predictive Relation will Incorrectly Predict that a False Test Instance is a True Test Instance (i.e. make a Positive Prediction).	FPR, Type 1 Error Rate
Feature Vector	See Vectorized Learning Record.
Finite Ordered Set	A Finite Ordered Set is an Ordered Set that is a Finite Set.	Ordinal Set
IID Sample	An IID Random Variable Set is a Random Variable Set where all random variables are in a Statistical Independence Relation and in an Identical Distribution Relation.
Information Extraction Task	An Information Extraction Task requires the populating a Data Structure from the Data contained in a set of Artifacts.
Information Retrieval Task	An Information Retrieval Task requires the identification of Artifacts from a Corpus that are relevant to a specified Query.
Instance-based Learning Algorithm	An Instance-based Learning Algorithm is a Learning Algorithm that does not generalize in terms of a higher language than the instances themselves.
Lazy Learning Algorithm	A Lazy Learning Algorithm is a Supervised Learning Algorithm that does not involve a Training Phase.
Learning Record Attribute	A Learning Record Attribute is a Data Attribute of a Learning Record.	Feature
Learning Record	A Learning Record is a Data Record that can be used as Input to a Learning Task.	Example, Instance
Machine Learning Research	A Machine Learning Research is a Research Domain that investigates Machines improving Performance over time (such as via Reasoning with Inductive Logic).
Missing Data Value	A Missing Data Value is Data Record Attribute with no Data Value.
Model-based Learning Algorithm	A Model-based Learning Algorithm is a Learning Algorithm that represent their Predictive Model in a Formal Language that is more general than the Formal Language used to describe the Data.
Numeric Interval	A Numeric Interval is a Contiguous Numeric Subsequence of a Formal Number Sequence.
OLAP Task	Online Analytical Processing Task is an Interactive Data Analysis Task that is restricted the summarizing past behavior.
Optimization Task	An Cost Function Optimization Task is a General Task Type where an Optimal Solution must be provided (that optimizes a Cost Function).
Posthoc Analysis Task	A Posthoc Analysis Task analyzes collected Data Records that were not intentionally collected to test a Hypothesis.
Precision Metric	A Precision Metric is a Performance Metric of the Probability that a given Classification Model's Positive Prediction is a Correct Prediction.
Predictive Function	A Predictive Function is a Function that can Map a Learning Record to a Target Value.	Model Target Function
Randomized Controlled Experiment	A Randomized Controlled Experiment is a Scientific Experiment that tests a Treatment on a Randomly created Treatment Group and a Placebo on a Distinct and Randomly created Control Group.
Recall Metric	A Recall Metric (is a Performance Metric for a Predictive Relation that) Estimates the Probability of a True Positive Prediction (a Correct Prediction for True Test Instances).	Sensitivity True Positive Rate.
Regression Algorithm	A Regression Algorithm is a Supervised Learning Algorithm that can solve a Regression Task.	Regressor
Sequence	A Sequence is a Multiset of Sequence Members in a Partial Order Relation.
Semi-Supervised Learning Task	A Semi-Supervised Learning Task is a Supervised Learning Task with access to an Unlabeled Training Records.
set	A set is an Abstract Entity that can Represent Zero or more Distinct Set Members.
Statistical Hypothesis Test	A Statistical Hypothesis Test is a Data Analysis Task that seeks to Validate a Hypothesis.	Confirmatory Data Analysis
Supervised Learning Task	A Supervised Learning Task is a Learning Task where some Labeled Training Records are provided.
Target Attribute	A Target Attribute is a Learning Record Attribute whose behavior is to be modeled by a Supervised Learning Task.
Testing Record	A Testing Record is a Data Record with a Target Class that is a available during a Learning Task's Training Phase.
Text Mining Task	A Text Mining Task is a Data Mining Task whose input largely involves Text Data.	Text Analysis
Training Record	A Training Record is a Data Record that is a available during a Learning Task's Training Phase.	Case, Examplar, Example
True Negative Rate	A True Negative Rate is the Probability that a Predictive Logic Relation will correctly map a False Test Instance to a Negative Prediction.	Specificity
Tuple	A Tuple is a Finite Sequence of Fixed Sequence Length
Unsupervised Learning Task	An Unsupervised Learning Task is a Learning Task where no Labeled Training Cases are provided.
Vector	A Vector is a Number Tuple that Represents a point in some Vector Space.

Future Entry Log

Gabor Melli Data Mining Glossary

Summary

Future Entry Log

Navigation menu

Search