Difference between revisions of "Noisy Dataset"

From GM-RKB
Jump to: navigation, search
(References)
(ContinuousReplacement)
(Tag: continuous replacement)
Line 13: Line 13:
 
----
 
----
 
----
 
----
 +
 
== References ==
 
== References ==
 +
 
=== 2017 ===
 
=== 2017 ===
 
* ([[Sammut & Webb, 2017]]) ⇒ [[Claude Sammut]], and [[Geoffrey I. Webb]]. ([[2017]]). [https://link.springer.com/referenceworkentry/10.1007/978-1-4899-7687-1_957 “Noise”]. In: ([[Sammut & Webb, 2017]]) [https://doi.org/10.1007/978-1-4899-7687-1_957 DOI:10.1007/978-1-4899-7687-1_957].
 
* ([[Sammut & Webb, 2017]]) ⇒ [[Claude Sammut]], and [[Geoffrey I. Webb]]. ([[2017]]). [https://link.springer.com/referenceworkentry/10.1007/978-1-4899-7687-1_957 “Noise”]. In: ([[Sammut & Webb, 2017]]) [https://doi.org/10.1007/978-1-4899-7687-1_957 DOI:10.1007/978-1-4899-7687-1_957].
Line 20: Line 22:
 
*** In [[supervised learning]], <i>[[classification error]]</i> means that a [[training example]] has an incorrect [[class label]].
 
*** In [[supervised learning]], <i>[[classification error]]</i> means that a [[training example]] has an incorrect [[class label]].
 
:: In addition to [[error]]s, [[training example]]s may have [[missing attribute value]]s. That is, the [[value]]s of some [[attribute value]]s are not recorded. <P>[[Noisy data]] can cause [[learning algorithm]]s to [[fail to converge]] to a [[concept description]] or to build a [[concept description]] that has poor [[classification accuracy]] on [[unseen example]]s. This is often due to  [[overfitting]]
 
:: In addition to [[error]]s, [[training example]]s may have [[missing attribute value]]s. That is, the [[value]]s of some [[attribute value]]s are not recorded. <P>[[Noisy data]] can cause [[learning algorithm]]s to [[fail to converge]] to a [[concept description]] or to build a [[concept description]] that has poor [[classification accuracy]] on [[unseen example]]s. This is often due to  [[overfitting]]
 +
 
=== 2009 ===
 
=== 2009 ===
 
* ([[2009_ExploitingWikipediaAsExternalKn|Hu et al., 1999]]) ⇒ [[Xiaohua Hu]], Xiaodan Zhang, Caimei Lu, E. K. Park, and Xiaohua Zhou. ([[2009]]). “Exploiting Wikipedia as External Knowledge for Document Clustering.” In: Proceedings of [[ACM SIGKDD]] Conference ([[KDD-2009]]). [http://dx.doi.org/10.1145/1557019.1557066 doi:10.1145/1557019.1557066]
 
* ([[2009_ExploitingWikipediaAsExternalKn|Hu et al., 1999]]) ⇒ [[Xiaohua Hu]], Xiaodan Zhang, Caimei Lu, E. K. Park, and Xiaohua Zhou. ([[2009]]). “Exploiting Wikipedia as External Knowledge for Document Clustering.” In: Proceedings of [[ACM SIGKDD]] Conference ([[KDD-2009]]). [http://dx.doi.org/10.1145/1557019.1557066 doi:10.1145/1557019.1557066]

Revision as of 03:35, 21 February 2019

A Noisy Dataset is a dataset whose data records contain measurement error (or measurement uncertainty).



References

2017

In addition to errors, training examples may have missing attribute values. That is, the values of some attribute values are not recorded.

Noisy data can cause learning algorithms to fail to converge to a concept description or to build a concept description that has poor classification accuracy on unseen examples. This is often due to overfitting

2009

  • (Hu et al., 1999) ⇒ Xiaohua Hu, Xiaodan Zhang, Caimei Lu, E. K. Park, and Xiaohua Zhou. (2009). “Exploiting Wikipedia as External Knowledge for Document Clustering.” In: Proceedings of ACM SIGKDD Conference (KDD-2009). doi:10.1145/1557019.1557066
    • … There are two major issues for this approach: (1) the coverage of the ontology is limited, even for WordNet or Mesh, (2) using ontology terms as replacement or additional features may cause information loss, or introduce noise.

2008