Difference between revisions of "Unsupervised Learning Task"

From GM-RKB
Jump to: navigation, search
(ContinuousReplacement)
(Tag: continuous replacement)
 
(One intermediate revision by the same user not shown)
Line 3: Line 3:
 
** <B>[[Task Input|input]]:</B> [[Learning Data Records]], a [[Dataset]] <code>X</code> (without [[target variable|target out put]] y).
 
** <B>[[Task Input|input]]:</B> [[Learning Data Records]], a [[Dataset]] <code>X</code> (without [[target variable|target out put]] y).
 
** <B>[[Task Output|output]]:</B> [[Model]] that [[Predict]]s a [[Test Case]]'s [[Cluster]].
 
** <B>[[Task Output|output]]:</B> [[Model]] that [[Predict]]s a [[Test Case]]'s [[Cluster]].
** It can be solved solved by an [[Unsupervised Learning System]] (that implements an [[unsupervised learning algorithm]]) or manually by a human (e.g. visually).
+
** It can be solved solved by an [[Unsupervised Learning System]] (that implements an [[unsupervised learning algorithm]]).
 
* <B>Example(s):</B>
 
* <B>Example(s):</B>
 
** a [[Data-Driven Clustering Task]].
 
** a [[Data-Driven Clustering Task]].
Line 17: Line 17:
  
 
== References ==
 
== References ==
 +
 +
=== 2019 ===
 +
* (Wikipedia, 2019) &rArr; https://en.wikipedia.org/wiki/unsupervised_learning Retrieved:2019-12-4.
 +
** '''Unsupervised learning''' is a type of self-organized [[Hebbian learning]] that helps find previously unknown patterns in data set without pre-existing labels. It is also known as [[self-organization]] and allows modeling [[Probability density function|probability densities]] of given inputs.<ref name="Hinton99a"></ref> It is one of the main three categories of machine learning, along with [[supervised learning|supervised]] and [[reinforcement learning]]. [[Semi-supervised learning]] has also been described, and is a hybridization of supervised and unsupervised techniques. ...
 +
  
 
=== 2017a ===
 
=== 2017a ===
Line 25: Line 30:
 
* (Triplet & Foucher, 2017) ⇒ Thomas Triplet, and Samuel Foucher ([[2017]]). [https://link.springer.com/referenceworkentry/10.1007%2F978-3-319-17885-1_1625 "Clustering of Geospatial Big Data in a Distributed Environment"]. In: [https://link.springer.com/referencework/10.1007/978-3-319-17885-1 Encyclopedia of GIS] pp 236-246
 
* (Triplet & Foucher, 2017) ⇒ Thomas Triplet, and Samuel Foucher ([[2017]]). [https://link.springer.com/referenceworkentry/10.1007%2F978-3-319-17885-1_1625 "Clustering of Geospatial Big Data in a Distributed Environment"]. In: [https://link.springer.com/referencework/10.1007/978-3-319-17885-1 Encyclopedia of GIS] pp 236-246
 
** QUOTE: [[Clustering]], sometimes called [[unsupervised learning]]/[[classification]] or [[exploratory data analysis]], is one of the most fundamental steps in understanding a [[dataset]], aiming to discover the unknown nature of [[data]] through the separation of a [[finite dataset]], with little or no ground truth, into a [[finite]] and [[discrete set]] of “natural,” hidden [[data structure]]s. Given a set of n points in a [[two-dimensional space]], the purpose of [[clustering]] is to group them into a number of sets based on similarity [[measure]]s and [[distance vector]]s. [[Clustering]] is also useful for compression purpose in [[large database]]s (Daschiel and Datcu 2005). The term [[Unsupervised Learning]] is sometimes used in some fields (i.e., in [[Machine Learning]] and [[Data Mining]]). [[Clustering]] will usually aim at creating [[homogeneous group]]s that are maximally separable. It is a fundamental tool in [[Knowledge Discovery and Data (KDD) mining]] when looking for meaningful patterns (Alam et al. 2014). [[Geographical Knowledge Discovery (GKD)]] is seen as an extension of [[KDD]] to the case of [[spatial data]] (Miller 2010).
 
** QUOTE: [[Clustering]], sometimes called [[unsupervised learning]]/[[classification]] or [[exploratory data analysis]], is one of the most fundamental steps in understanding a [[dataset]], aiming to discover the unknown nature of [[data]] through the separation of a [[finite dataset]], with little or no ground truth, into a [[finite]] and [[discrete set]] of “natural,” hidden [[data structure]]s. Given a set of n points in a [[two-dimensional space]], the purpose of [[clustering]] is to group them into a number of sets based on similarity [[measure]]s and [[distance vector]]s. [[Clustering]] is also useful for compression purpose in [[large database]]s (Daschiel and Datcu 2005). The term [[Unsupervised Learning]] is sometimes used in some fields (i.e., in [[Machine Learning]] and [[Data Mining]]). [[Clustering]] will usually aim at creating [[homogeneous group]]s that are maximally separable. It is a fundamental tool in [[Knowledge Discovery and Data (KDD) mining]] when looking for meaningful patterns (Alam et al. 2014). [[Geographical Knowledge Discovery (GKD)]] is seen as an extension of [[KDD]] to the case of [[spatial data]] (Miller 2010).
 
=== 2017C ===
 
* (Wikipedia, 2017) ⇒ https://en.wikipedia.org/wiki/Unsupervised_learning Retrieved:2017-11-26.
 
** '''Unsupervised machine learning''' is the [[machine learning]] task of inferring a function to describe hidden structure from "unlabeled" data (a classification or categorization is not included in the observations). Since the examples given to the learner are unlabeled, there is no evaluation of the accuracy of the structure that is output by the relevant algorithm — which is one way of distinguishing unsupervised learning from [[supervised learning]] and [[reinforcement learning]]. <P> A central case of unsupervised learning is the problem of [[density estimation]] in [[statistics]],<ref name="JordanBishop2004">Jordan, Michael I.; Bishop, Christopher M. ([[2004]]). “Neural Networks". In Allen B. Tucker. Computer Science Handbook, Second Edition (Section VII: Intelligent Systems). Boca Raton, FL: Chapman & Hall/CRC Press LLC. ISBN 1-58488-360-X.</ref> though unsupervised learning encompasses many other problems (and solutions) involving summarizing and explaining key features of the data. <P> Approaches to unsupervised learning include:
 
*** [[data clustering|clustering]]
 
**** [[k-means]]
 
**** [[mixture models]]
 
**** [[hierarchical clustering]],
 
*** [[anomaly detection]]
 
*** [[Artificial neural network|Neural Networks]]
 
****[[Hebbian Learning]]
 
****[[Generative Adversarial Networks]]
 
*** Approaches for learning [[latent variable model]]s such as
 
**** [[Expectation–maximization algorithm]] (EM)
 
**** [[Method of moments (statistics)|Method of moments]]
 
**** [[Blind signal separation technique]]s, e.g.,
 
***** [[Principal component analysis]],
 
***** [[Independent component analysis]],
 
***** [[Non-negative matrix factorization]],
 
***** [[Singular value decomposition]]. <ref> [[Ranjan Acharyya|Acharyya, Ranjan]] (2008); ''A New Approach for Blind Source Separation of Convolutive Sources'', (this book focuses on unsupervised learning with Blind Source Separation) </ref>
 
  
 
=== 2011 ===
 
=== 2011 ===
Line 70: Line 55:
 
__NOTOC__
 
__NOTOC__
 
[[Category:Concept]] [[Category:Machine Learning]]
 
[[Category:Concept]] [[Category:Machine Learning]]
 
=== 2019 ===
 
* (Wikipedia, 2019) ⇒ https://en.wikipedia.org/wiki/unsupervised_learning Retrieved:2019-12-4.
 
** '''Unsupervised learning''' is a type of self-organized [[Hebbian learning]] that helps find previously unknown patterns in data set without pre-existing labels. It is also known as [[self-organization]] and allows modeling [[Probability density function|probability densities]] of given inputs.<ref name="Hinton99a"></ref> It is one of the main three categories of machine learning, along with [[supervised learning|supervised]] and [[reinforcement learning]]. [[Semi-supervised learning]] has also been described, and is a hybridization of supervised and unsupervised techniques. <P> Two of the main methods used in unsupervised learning are [[principal component analysis|principal component]] and [[cluster analysis]]. [[Cluster analysis]] is used in unsupervised learning to group, or segment, datasets with shared attributes in order to extrapolate algorithmic relationships.  Cluster analysis is a branch of [[machine learning]] that groups the data that has not been [[labeled data|labelled]], classified or categorized. Instead of responding to feedback, cluster analysis identifies commonalities in the data and reacts based on the presence or absence of such commonalities in each new piece of data. This approach helps detect anomalous data points that do not fit into either group. A central application of unsupervised learning is in the field of [[density estimation]] in [[statistics]],<ref name="JordanBishop2004"></ref> though unsupervised learning encompasses many other domains involving summarizing and explaining data features. It could be contrasted with supervised learning by saying that whereas supervised learning intends to infer a [[conditional probability distribution]] <math display="inline">p_X(x\,|\,y)</math> conditioned on the label <math display="inline">y</math> of input data; unsupervised learning intends to infer an [[a priori probability]] distribution <math display="inline">p_X(x)</math>. <P> [[Generative adversarial networks]] can also be used with unsupervised learning, though they can also be applied to supervised and reinforcement techniques.
 

Latest revision as of 18:57, 4 December 2019

An Unsupervised Learning Task is a data-driven learning task with no labeled training cases.



References

2019


2017a

2017b

2011

2009

2008

  • (Redei, 2008) ⇒ George P. Rédei. (2008). "Unsupervised Learning". In: Encyclopedia of Genetics, Genomics, Proteomics and Informatics pp 2067-2067
    • QUOTE: Identifies new, so far undetected, shared pattern(s) of sequences in macromolecules and determines the positive and negative representatives of the pattern(s). The information permits correlations between structure and function in languages as well as in proteins without direct human intervention in the details

2000

1998


  1. Cite error: Invalid <ref> tag; no text was provided for refs named Hinton99a