# Gabor Melli Data Mining Glossary

Jump to: navigation, search

A Gabor Melli Data Mining Glossary is a data mining glossary maintained by Gabor Melli

## Summary

 Concept Name Concept Definition Synonyms Accuracy Metric An Accuracy Metric is a Classification Model Performance Metric based on the Proportion of Classifier's Correct Predictions to Incorrect Predictions on Unseen Labeled Testing Records. Accuracy Estimation An Accuracy Estimation Process is a Validation Process that approximates the true value of a Classification Model's Accuracy based on a Data Sample. Association Learning Task An Association Learning Task is a Learning Task that requires the discovery of Associations. Categorical Set A Categorical Set is an Unordered Set that is Finite Set. Classification Function A Classification Function is a Function whose Function Range is a Categorical Set (with Categorical Data Values). Classifier Confounding Variable A Confounding Variable is a Random Variable in a Statistical Model that Correlates with both a Dependent Variable and an Independent Variable. Confounder. Confusion Matrix A Confusion Matrix is a Matrix that represents the count of Probabilistic Classification Function's Predictions with respect to the Actuals on some Labeled Learning Set. Cost-Benefit Function A Cost-Benefit Function is an Ordinal-Valued Function that assigns a Value to each Choice. Data Cleaning Task A Data Cleaning Task requires the Detection and Removal of Erroneous Data Values and Data Records. Data Mining Activity A Data Mining Activity is an Activity performed by a Data Mining Practitioner to solve a Data Mining Task. Data Mining Discipline A Data Mining Discipline is an Academic Discipline that focuses on Data Analysis of large datasets from real-world problems. Data Mining Task A Data Mining Task requires automated Discovery of Patterns typically to support human Decision making. Data Mining Practice A Data Mining Practice is the Applied Practice of solving Real-World Data Mining Tasks. Data Record Attribute A Data Record Attribute is a 2-Tuple composed of a Value and a Metadata Record that represents a single property of a Data Record. Data Record Set A Data Record Set is a set of Data Records that share the same Data Record Schema. Dataset Eager Learning Algorithm An Eager Learning Algorithm is a Learning Algorithm that involves a Training Phase (to induce a Total Predictive Function). Error Rate Metric An Error Rate Metric is the Inverse Function of an Accuracy Metric. Exploratory Data Analysis Task An Exploratory Data Analysis Task is a Data Analysis Task that aims to formulate Hypotheses. False Negative Rate A False Negative Rate is a Predictive Relation Performance Metric that is based on the Probability that a Predictive Relation will make the Incorrect Prediction of mapping a False Test Instance to a Negative Prediction. False Positive Rate A False Positive Rate is a Predictive Relation Performance Metric that is based on the Probability that a Predictive Relation will Incorrectly Predict that a False Test Instance is a True Test Instance (i.e. make a Positive Prediction). FPR, Type 1 Error Rate Feature Vector See Vectorized Learning Record. Finite Ordered Set A Finite Ordered Set is an Ordered Set that is a Finite Set. Ordinal Set IID Sample An IID Random Variable Set is a Random Variable Set where all random variables are in a Statistical Independence Relation and in an Identical Distribution Relation. Information Extraction Task An Information Extraction Task requires the populating a Data Structure from the Data contained in a set of Artifacts. Information Retrieval Task An Information Retrieval Task requires the identification of Artifacts from a Corpus that are relevant to a specified Query. Instance-based Learning Algorithm An Instance-based Learning Algorithm is a Learning Algorithm that does not generalize in terms of a higher language than the instances themselves. Lazy Learning Algorithm A Lazy Learning Algorithm is a Supervised Learning Algorithm that does not involve a Training Phase. Learning Record Attribute A Learning Record Attribute is a Data Attribute of a Learning Record. Feature Learning Record A Learning Record is a Data Record that can be used as Input to a Learning Task. Example, Instance Machine Learning Research A Machine Learning Research is a Research Domain that investigates Machines improving Performance over time (such as via Reasoning with Inductive Logic). Missing Data Value A Missing Data Value is Data Record Attribute with no Data Value. Model-based Learning Algorithm A Model-based Learning Algorithm is a Learning Algorithm that represent their Predictive Model in a Formal Language that is more general than the Formal Language used to describe the Data. Numeric Interval A Numeric Interval is a Contiguous Numeric Subsequence of a Formal Number Sequence. OLAP Task Online Analytical Processing Task is an Interactive Data Analysis Task that is restricted the summarizing past behavior. Optimization Task An Cost Function Optimization Task is a General Task Type where an Optimal Solution must be provided (that optimizes a Cost Function). Posthoc Analysis Task A Posthoc Analysis Task analyzes collected Data Records that were not intentionally collected to test a Hypothesis. Precision Metric A Precision Metric is a Performance Metric of the Probability that a given Classification Model's Positive Prediction is a Correct Prediction. Predictive Function A Predictive Function is a Function that can Map a Learning Record to a Target Value. ModelTarget Function Randomized Controlled Experiment A Randomized Controlled Experiment is a Scientific Experiment that tests a Treatment on a Randomly created Treatment Group and a Placebo on a Distinct and Randomly created Control Group. Recall Metric A Recall Metric (is a Performance Metric for a Predictive Relation that) Estimates the Probability of a True Positive Prediction (a Correct Prediction for True Test Instances). SensitivityTrue Positive Rate. Regression Algorithm A Regression Algorithm is a Supervised Learning Algorithm that can solve a Regression Task. Regressor Sequence A Sequence is a Multiset of Sequence Members in a Partial Order Relation. Semi-Supervised Learning Task A Semi-Supervised Learning Task is a Supervised Learning Task with access to an Unlabeled Training Records. set A set is an Abstract Entity that can Represent Zero or more Distinct Set Members. Statistical Hypothesis Test A Statistical Hypothesis Test is a Data Analysis Task that seeks to Validate a Hypothesis. Confirmatory Data Analysis Supervised Learning Task A Supervised Learning Task is a Learning Task where some Labeled Training Records are provided. Target Attribute A Target Attribute is a Learning Record Attribute whose behavior is to be modeled by a Supervised Learning Task. Testing Record A Testing Record is a Data Record with a Target Class that is a available during a Learning Task's Training Phase. Text Mining Task A Text Mining Task is a Data Mining Task whose input largely involves Text Data. Text Analysis Training Record A Training Record is a Data Record that is a available during a Learning Task's Training Phase. Case, Examplar,Example True Negative Rate A True Negative Rate is the Probability that a Predictive Logic Relation will correctly map a False Test Instance to a Negative Prediction. Specificity Tuple A Tuple is a Finite Sequence of Fixed Sequence Length Unsupervised Learning Task An Unsupervised Learning Task is a Learning Task where no Labeled Training Cases are provided. Vector A Vector is a Number Tuple that Represents a point in some Vector Space.