Gabor Melli Data Mining Glossary

From GM-RKB
Revision as of 02:29, 30 September 2018 by Gmelli (talk | contribs) (Text replacement - "|| " to "|| ")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

A Gabor Melli Data Mining Glossary is a data mining glossary maintained by Gabor Melli



Summary

Concept Name Concept Definition Synonyms
Accuracy Metric An Accuracy Metric is a Classification Model Performance Metric based on the Proportion of Classifier's Correct Predictions to Incorrect Predictions on Unseen Labeled Testing Records.
Accuracy Estimation An Accuracy Estimation Process is a Validation Process that approximates the true value of a Classification Model's Accuracy based on a Data Sample.
Association Learning Task An Association Learning Task is a Learning Task that requires the discovery of Associations.
Categorical Set A Categorical Set is an Unordered Set that is Finite Set.
Classification Function A Classification Function is a Function whose Function Range is a Categorical Set (with Categorical Data Values). Classifier
Confounding Variable A Confounding Variable is a Random Variable in a Statistical Model that Correlates with both a Dependent Variable and an Independent Variable. Confounder.
Confusion Matrix A Confusion Matrix is a Matrix that represents the count of Probabilistic Classification Function's Predictions with respect to the Actuals on some Labeled Learning Set.
Cost-Benefit Function A Cost-Benefit Function is an Ordinal-Valued Function that assigns a Value to each Choice.
Data Cleaning Task A Data Cleaning Task requires the Detection and Removal of Erroneous Data Values and Data Records.
Data Mining Activity A Data Mining Activity is an Activity performed by a Data Mining Practitioner to solve a Data Mining Task.
Data Mining Discipline A Data Mining Discipline is an Academic Discipline that focuses on Data Analysis of large datasets from real-world problems.
Data Mining Task A Data Mining Task requires automated Discovery of Patterns typically to support human Decision making.
Data Mining Practice A Data Mining Practice is the Applied Practice of solving Real-World Data Mining Tasks.
Data Record Attribute A Data Record Attribute is a 2-Tuple composed of a Value and a Metadata Record that represents a single property of a Data Record.
Data Record Set A Data Record Set is a set of Data Records that share the same Data Record Schema. Dataset
Eager Learning Algorithm An Eager Learning Algorithm is a Learning Algorithm that involves a Training Phase (to induce a Total Predictive Function).
Error Rate Metric An Error Rate Metric is the Inverse Function of an Accuracy Metric.
Exploratory Data Analysis Task An Exploratory Data Analysis Task is a Data Analysis Task that aims to formulate Hypotheses.
False Negative Rate A False Negative Rate is a Predictive Relation Performance Metric that is based on the Probability that a Predictive Relation will make the Incorrect Prediction of mapping a False Test Instance to a Negative Prediction.
False Positive Rate A False Positive Rate is a Predictive Relation Performance Metric that is based on the Probability that a Predictive Relation will Incorrectly Predict that a False Test Instance is a True Test Instance (i.e. make a Positive Prediction). FPR, Type 1 Error Rate
Feature Vector See Vectorized Learning Record.
Finite Ordered Set A Finite Ordered Set is an Ordered Set that is a Finite Set. Ordinal Set
IID Sample An IID Random Variable Set is a Random Variable Set where all random variables are in a Statistical Independence Relation and in an Identical Distribution Relation.
Information Extraction Task An Information Extraction Task requires the populating a Data Structure from the Data contained in a set of Artifacts.
Information Retrieval Task An Information Retrieval Task requires the identification of Artifacts from a Corpus that are relevant to a specified Query.
Instance-based Learning Algorithm An Instance-based Learning Algorithm is a Learning Algorithm that does not generalize in terms of a higher language than the instances themselves.
Lazy Learning Algorithm A Lazy Learning Algorithm is a Supervised Learning Algorithm that does not involve a Training Phase.
Learning Record Attribute A Learning Record Attribute is a Data Attribute of a Learning Record. Feature
Learning Record A Learning Record is a Data Record that can be used as Input to a Learning Task. Example, Instance
Machine Learning Research A Machine Learning Research is a Research Domain that investigates Machines improving Performance over time (such as via Reasoning with Inductive Logic).
Missing Data Value A Missing Data Value is Data Record Attribute with no Data Value.
Model-based Learning Algorithm A Model-based Learning Algorithm is a Learning Algorithm that represent their Predictive Model in a Formal Language that is more general than the Formal Language used to describe the Data.
Numeric Interval A Numeric Interval is a Contiguous Numeric Subsequence of a Formal Number Sequence.
OLAP Task Online Analytical Processing Task is an Interactive Data Analysis Task that is restricted the summarizing past behavior.
Optimization Task An Cost Function Optimization Task is a General Task Type where an Optimal Solution must be provided (that optimizes a Cost Function).
Posthoc Analysis Task A Posthoc Analysis Task analyzes collected Data Records that were not intentionally collected to test a Hypothesis.
Precision Metric A Precision Metric is a Performance Metric of the Probability that a given Classification Model's Positive Prediction is a Correct Prediction.
Predictive Function A Predictive Function is a Function that can Map a Learning Record to a Target Value. Model
Target Function
Randomized Controlled Experiment A Randomized Controlled Experiment is a Scientific Experiment that tests a Treatment on a Randomly created Treatment Group and a Placebo on a Distinct and Randomly created Control Group.
Recall Metric A Recall Metric (is a Performance Metric for a Predictive Relation that) Estimates the Probability of a True Positive Prediction (a Correct Prediction for True Test Instances). Sensitivity
True Positive Rate.
Regression Algorithm A Regression Algorithm is a Supervised Learning Algorithm that can solve a Regression Task. Regressor
Sequence A Sequence is a Multiset of Sequence Members in a Partial Order Relation.
Semi-Supervised Learning Task A Semi-Supervised Learning Task is a Supervised Learning Task with access to an Unlabeled Training Records.
set A set is an Abstract Entity that can Represent Zero or more Distinct Set Members.
Statistical Hypothesis Test A Statistical Hypothesis Test is a Data Analysis Task that seeks to Validate a Hypothesis. Confirmatory Data Analysis
Supervised Learning Task A Supervised Learning Task is a Learning Task where some Labeled Training Records are provided.
Target Attribute A Target Attribute is a Learning Record Attribute whose behavior is to be modeled by a Supervised Learning Task.
Testing Record A Testing Record is a Data Record with a Target Class that is a available during a Learning Task's Training Phase.
Text Mining Task A Text Mining Task is a Data Mining Task whose input largely involves Text Data. Text Analysis
Training Record A Training Record is a Data Record that is a available during a Learning Task's Training Phase. Case, Examplar,
Example
True Negative Rate A True Negative Rate is the Probability that a Predictive Logic Relation will correctly map a False Test Instance to a Negative Prediction. Specificity
Tuple A Tuple is a Finite Sequence of Fixed Sequence Length
Unsupervised Learning Task An Unsupervised Learning Task is a Learning Task where no Labeled Training Cases are provided.
Vector A Vector is a Number Tuple that Represents a point in some Vector Space.

Future Entry Log