Gabor Melli Data Mining Glossary
A
Gabor Melli Data Mining Glossary
is a
Data Mining Glossary
maintained by
Gabor Melli
AKA:
Gabor Melli's Data Mining Glossary
.
Context:
It is derived from
Gabor Melli's Data Analysis Ontology
.
See:
Statistics Glossary
,
Machine Learning Glossary
.
Summary
Concept Name
Concept Definition
Synonym
s
Accuracy Metric
An
Accuracy Metric
is a
Classification Model
Performance Metric
based on the
Proportion
of
Classifier
's
Correct Prediction
s to
Incorrect Prediction
s on
Unseen
Labeled Testing Record
s.
Accuracy Estimation
An
Accuracy Estimation Process
is a
Validation Process
that approximates the true value of a
Classification Model
's
Accuracy
based on a
Data Sample
.
Association Learning Task
An
Association Learning Task
is a
Learning Task
that requires the discovery of
Association
s.
Categorical Set
A
Categorical Set
is an
Unordered Set
that is
Finite Set
.
Classification Function
A
Classification Function
is a
Function
whose
Function Range
is a
Categorical Set
(with
Categorical Data Value
s).
Classifier
Confounding Variable
A
Confounding Variable
is a
Random Variable
in a
Statistical Model
that
Correlate
s with both a
Dependent Variable
and an
Independent Variable
.
Confounder
.
Confusion Matrix
A
Confusion Matrix
is a
Matrix
that represents the count of
Probabilistic Classification Function
's
Prediction
s with respect to the
Actual
s on some
Labeled Learning Set
.
Cost-Benefit Function
A
Cost-Benefit Function
is an
Ordinal-Valued Function
that assigns a
Value
to each
Choice
.
Data Cleaning Task
A
Data Cleaning Task
requires the
Detection
and
Removal
of
Erroneous
Data Value
s and
Data Record
s.
Data Mining Activity
A
Data Mining Activity
is an
Activity
performed by a
Data Mining Practitioner
to solve a
Data Mining Task
.
Data Mining Discipline
A
Data Mining Discipline
is an
Academic Discipline
that focuses on
Data Analysis
of large datasets from real-world problems.
Data Mining Task
A
Data Mining Task
requires automated
Discovery
of
Pattern
s typically to support human
Decision
making.
Data Mining Practice
A
Data Mining Practice
is the
Applied Practice
of solving
Real-World
Data Mining Task
s.
Data Record Attribute
A
Data Record Attribute
is a
2-Tuple
composed of a
Value
and a
Metadata Record
that represents a single
Property
of a
Data Record
.
Data Record Set
A
Data Record Set
is a
Set
of
Data Record
s that share the same
Data Record Schema
.
Dataset
Eager Learning Algorithm
An
Eager Learning Algorithm
is a
Learning Algorithm
that involves a
Training Phase
(to induce a
Total Predictive Function
).
Error Rate Metric
An
Error Rate Metric
is the
Inverse Function
of an
Accuracy Metric
.
Exploratory Data Analysis Task
An
Exploratory Data Analysis Task
is a
Data Analysis Task
that aims to formulate
Hypotheses
.
False Negative Rate
A
False Negative Rate
is a
Predictive Relation Performance Metric
that is based on the
Probability
that a
Predictive Relation
will make the
Incorrect Prediction
of mapping a
False Test Instance
to a
Negative Prediction
.
False Positive Rate
A
False Positive Rate
is a
Predictive Relation Performance Metric
that is based on the
Probability
that a
Predictive Relation
will
Incorrectly Predict
that a
False Test Instance
is a
True Test Instance
(i.e. make a
Positive Prediction
).
FPR
,
Type 1 Error Rate
Feature Vector
See
Vectorized Learning Record
.
Finite Ordered Set
A
Finite Ordered Set
is an
Ordered Set
that is a
Finite Set
.
Ordinal Set
IID Sample
An
IID Random Variable Set
is a
Random Variable Set
where all
Random Variable
s are in a
Statistical Independence Relation
and in an
Identical Distribution Relation
.
Information Extraction Task
An
Information Extraction Task
requires the populating a
Data Structure
from the
Data
contained in a set of
Artifact
s.
Information Retrieval Task
An
Information Retrieval Task
requires the identification of
Artifact
s from a
Corpus
that are relevant to a specified
Query
.
Instance-based Learning Algorithm
An
Instance-based Learning Algorithm
is a
Learning Algorithm
that does not generalize in terms of a higher language than the instances themselves.
Lazy Learning Algorithm
A
Lazy Learning Algorithm
is a
Supervised Learning Algorithm
that does not involve a
Training Phase
.
Learning Record Attribute
A
Learning Record Attribute
is a
Data Attribute
of a
Learning Record
.
Feature
Learning Record
A
Learning Record
is a
Data Record
that can be used as
Input
to a
Learning Task
.
Example
,
Instance
Machine Learning Research
A
Machine Learning Research
is a
Research Domain
that investigates
Machine
s improving
Performance
over time (such as via
Reasoning
with
Inductive Logic
).
Missing Data Value
A
Missing Data Value
is
Data Record Attribute
with no
Data Value
.
Model-based Learning Algorithm
A
Model-based Learning Algorithm
is a
Learning Algorithm
that represent their
Predictive Model
in a
Formal Language
that is more general than the
Formal Language
used to describe the
Data
.
Numeric Interval
A
Numeric Interval
is a
Contiguous Numeric Subsequence
of a
Formal Number Sequence
.
OLAP Task
Online Analytical Processing Task
is an
Interactive
Data Analysis Task
that is restricted the summarizing past behavior.
Optimization Task
An
Cost Function Optimization Task
is a
General Task Type
where an
Optimal Solution
must be provided (that optimizes a
Cost Function
).
Posthoc Analysis Task
A
Posthoc Analysis Task
analyzes collected
Data Record
s that were not intentionally collected to test a
Hypothesis
.
Precision Metric
A
Precision Metric
is a
Performance Metric
of the
Probability
that a given
Classification Model
's
Positive Prediction
is a
Correct Prediction
.
Predictive Function
A
Predictive Function
is a
Function
that can
Map
a
Learning Record
to a
Target Value
.
Model
Target Function
Randomized Controlled Experiment
A
Randomized Controlled Experiment
is a
Scientific Experiment
that tests a
Treatment
on a
Randomly
created
Treatment Group
and a
Placebo
on a
Distinct
and
Randomly
created
Control Group
.
Recall Metric
A
Recall Metric
(is a
Performance Metric
for a
Predictive Relation
that)
Estimate
s the
Probability
of a
True Positive Prediction
(a
Correct Prediction
for
True Test Instance
s).
Sensitivity
True Positive Rate
.
Regression Algorithm
A
Regression Algorithm
is a
Supervised Learning Algorithm
that can solve a
Regression Task
.
Regressor
Sequence
A
Sequence
is a
Multiset
of
Sequence Member
s in a
Partial Order Relation
.
Semi-Supervised Learning Task
A
Semi-Supervised Learning Task
is a
Supervised Learning Task
with access to a
Unlabeled Training Record
s.
Set
A
Set
is an
Abstract Entity
that can
Represent
Zero
or more
Distinct
Set Member
s.
Statistical Hypothesis Test
A
Statistical Hypothesis Test
is a
Data Analysis Task
that seeks to
Validate
a
Hypothesis
.
Confirmatory Data Analysis
Supervised Learning Task
A
Supervised Learning Task
is a
Learning Task
where some
Labeled Training Record
s are provided.
Target Attribute
A
Target Attribute
is a
Learning Record Attribute
whose behavior is to be modeled by a
Supervised Learning Task
.
Testing Record
A
Testing Record
is a
Data Record
with a
Target Class
that is a available during a
Learning Task
's
Training Phase
.
Text Mining Task
A
Text Mining Task
is a
Data Mining Task
whose input largely involves
Text Data
.
Text Analysis
Training Record
A
Training Record
is a
Data Record
that is a available during a
Learning Task
's
Training Phase
.
Case
,
Examplar
,
Example
True Negative Rate
A
True Negative Rate
is the
Probability
that a
Predictive Logic Relation
will correctly map a
False Test Instance
to a
Negative Prediction
.
Specificity
Tuple
A
Tuple
is a
Finite Sequence
of
Fixed
Sequence Length
Unsupervised Learning Task
An
Unsupervised Learning Task
is a
Learning Task
where no
Labeled Training Case
s are provided.
Vector
A
Vector
is a
Number Tuple
that
Represent
s a
Point
in some
Vector Space
.
Future Entry Log
Confirmatory Data Analysis
/
Statistical Hypothesis Test
.
Coverage
Cross-Validation
Data Value
Induction Algorithm
/
Inducer
Inductive Logic Programming
Kernel Machine
Knowledge Discovery
Model Deployment
Payoff
Relational Learning
Resubstitution Accuracy
Web Mining