Gabor Melli Data Mining Glossary

A Gabor Melli Data Mining Glossary is a Data Mining Glossary maintained by Gabor Melli



Summary

Concept NameConcept DefinitionSynonyms
Accuracy MetricAn Accuracy Metric is a Classification Model Performance Metric based on the Proportion of Classifier's Correct Predictions to Incorrect Predictions on Unseen Labeled Testing Records.
Accuracy EstimationAn Accuracy Estimation Process is a Validation Process that approximates the true value of a Classification Model's Accuracy based on a Data Sample.
Association Learning TaskAn Association Learning Task is a Learning Task that requires the discovery of Associations.
Categorical SetA Categorical Set is an Unordered Set that is Finite Set.
Classification FunctionA Classification Function is a Function whose Function Range is a Categorical Set (with Categorical Data Values).Classifier
Confounding VariableA Confounding Variable is a Random Variable in a Statistical Model that Correlates with both a Dependent Variable and an Independent Variable.Confounder.
Confusion MatrixA Confusion Matrix is a Matrix that represents the count of Probabilistic Classification Function's Predictions with respect to the Actuals on some Labeled Learning Set.
Cost-Benefit FunctionA Cost-Benefit Function is an Ordinal-Valued Function that assigns a Value to each Choice.
Data Cleaning TaskA Data Cleaning Task requires the Detection and Removal of Erroneous Data Values and Data Records.
Data Mining ActivityA Data Mining Activity is an Activity performed by a Data Mining Practitioner to solve a Data Mining Task.
Data Mining DisciplineA Data Mining Discipline is an Academic Discipline that focuses on Data Analysis of large datasets from real-world problems.
Data Mining TaskA Data Mining Task requires automated Discovery of Patterns typically to support human Decision making.
Data Mining PracticeA Data Mining Practice is the Applied Practice of solving Real-World Data Mining Tasks.
Data Record AttributeA Data Record Attribute is a 2-Tuple composed of a Value and a Metadata Record that represents a single Property of a Data Record.
Data Record SetA Data Record Set is a Set of Data Records that share the same Data Record Schema.Dataset
Eager Learning AlgorithmAn Eager Learning Algorithm is a Learning Algorithm that involves a Training Phase (to induce a Total Predictive Function).
Error Rate MetricAn Error Rate Metric is the Inverse Function of an Accuracy Metric.
Exploratory Data Analysis TaskAn Exploratory Data Analysis Task is a Data Analysis Task that aims to formulate Hypotheses.
False Negative RateA False Negative Rate is a Predictive Relation Performance Metric that is based on the Probability that a Predictive Relation will make the Incorrect Prediction of mapping a False Test Instance to a Negative Prediction.
False Positive RateA False Positive Rate is a Predictive Relation Performance Metric that is based on the Probability that a Predictive Relation will Incorrectly Predict that a False Test Instance is a True Test Instance (i.e. make a Positive Prediction).FPR, Type 1 Error Rate
Feature VectorSee Vectorized Learning Record.
Finite Ordered SetA Finite Ordered Set is an Ordered Set that is a Finite Set.Ordinal Set
IID SampleAn IID Random Variable Set is a Random Variable Set where all Random Variables are in a Statistical Independence Relation and in an Identical Distribution Relation.
Information Extraction TaskAn Information Extraction Task requires the populating a Data Structure from the Data contained in a set of Artifacts.
Information Retrieval TaskAn Information Retrieval Task requires the identification of Artifacts from a Corpus that are relevant to a specified Query.
Instance-based Learning AlgorithmAn Instance-based Learning Algorithm is a Learning Algorithm that does not generalize in terms of a higher language than the instances themselves.
Lazy Learning AlgorithmA Lazy Learning Algorithm is a Supervised Learning Algorithm that does not involve a Training Phase.
Learning Record AttributeA Learning Record Attribute is a Data Attribute of a Learning Record.Feature
Learning RecordA Learning Record is a Data Record that can be used as Input to a Learning Task.Example, Instance
Machine Learning ResearchA Machine Learning Research is a Research Domain that investigates Machines improving Performance over time (such as via Reasoning with Inductive Logic).
Missing Data ValueA Missing Data Value is Data Record Attribute with no Data Value.
Model-based Learning AlgorithmA Model-based Learning Algorithm is a Learning Algorithm that represent their Predictive Model in a Formal Language that is more general than the Formal Language used to describe the Data.
Numeric IntervalA Numeric Interval is a Contiguous Numeric Subsequence of a Formal Number Sequence.
OLAP TaskOnline Analytical Processing Task is an Interactive Data Analysis Task that is restricted the summarizing past behavior.
Optimization TaskAn Cost Function Optimization Task is a General Task Type where an Optimal Solution must be provided (that optimizes a Cost Function).
Posthoc Analysis TaskA Posthoc Analysis Task analyzes collected Data Records that were not intentionally collected to test a Hypothesis.
Precision MetricA Precision Metric is a Performance Metric of the Probability that a given Classification Model's Positive Prediction is a Correct Prediction.
Predictive FunctionA Predictive Function is a Function that can Map a Learning Record to a Target Value.Model
Target Function
Randomized Controlled ExperimentA Randomized Controlled Experiment is a Scientific Experiment that tests a Treatment on a Randomly created Treatment Group and a Placebo on a Distinct and Randomly created Control Group.
Recall MetricA Recall Metric (is a Performance Metric for a Predictive Relation that) Estimates the Probability of a True Positive Prediction (a Correct Prediction for True Test Instances).Sensitivity
True Positive Rate.
Regression AlgorithmA Regression Algorithm is a Supervised Learning Algorithm that can solve a Regression Task.Regressor
SequenceA Sequence is a Multiset of Sequence Members in a Partial Order Relation.
Semi-Supervised Learning TaskA Semi-Supervised Learning Task is a Supervised Learning Task with access to a Unlabeled Training Records.
SetA Set is an Abstract Entity that can Represent Zero or more Distinct Set Members.
Statistical Hypothesis TestA Statistical Hypothesis Test is a Data Analysis Task that seeks to Validate a Hypothesis.Confirmatory Data Analysis
Supervised Learning TaskA Supervised Learning Task is a Learning Task where some Labeled Training Records are provided.
Target AttributeA Target Attribute is a Learning Record Attribute whose behavior is to be modeled by a Supervised Learning Task.
Testing RecordA Testing Record is a Data Record with a Target Class that is a available during a Learning Task's Training Phase.
Text Mining TaskA Text Mining Task is a Data Mining Task whose input largely involves Text Data.Text Analysis
Training RecordA Training Record is a Data Record that is a available during a Learning Task's Training Phase.Case, Examplar,
Example
True Negative RateA True Negative Rate is the Probability that a Predictive Logic Relation will correctly map a False Test Instance to a Negative Prediction.Specificity
TupleA Tuple is a Finite Sequence of Fixed Sequence Length
Unsupervised Learning TaskAn Unsupervised Learning Task is a Learning Task where no Labeled Training Cases are provided.
VectorA Vector is a Number Tuple that Represents a Point in some Vector Space.

Future Entry Log

Confirmatory Data Analysis / Statistical Hypothesis Test.
Coverage
Cross-Validation
Data Value
Induction Algorithm / Inducer
Inductive Logic Programming
Kernel Machine
Knowledge Discovery
Model Deployment
Payoff
Relational Learning
Resubstitution Accuracy
Web Mining