tf-idf Scoring Function

From GM-RKB
(Redirected from tf*idf)
Jump to navigation Jump to search

A tf-idf Scoring Function is a scoring function for a vocabulary member relative to a multiset based on the multiplication of a tf measure and an idf measure, [math]\displaystyle{ \operatorname{tf}() \times \operatorname{idf}() }[/math].



References

2015

  • (Wikipedia, 2015) ⇒ http://en.wikipedia.org/wiki/Tf–idf Retrieved:2015-2-22.
    • tf–idf, short for term frequency–inverse document frequency, is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus. It is often used as a weighting factor in information retrieval and text mining.

      The tf-idf value increases proportionally to the number of times a word appears in the document, but is offset by the frequency of the word in the corpus, which helps to adjust for the fact that some words appear more frequently in general.

      Variations of the tf–idf weighting scheme are often used by search engines as a central tool in scoring and ranking a document's relevance given a user query. tf–idf can be successfully used for stop-words filtering in various subject fields including text summarization and classification.

      One of the simplest ranking functions is computed by summing the tf–idf for each query term; many more sophisticated ranking functions are variants of this simple model.

2007

  1. Note that in the description of tf*idf weights, the word “document” is traditionally used since the original motivation was to retrieve documents. While the chapter will stick with the original terminology, in a recommendation system, the documents correspond to a text description of an item to be recommended. Note that the equations here are representative of the class of formulae called tf*idf. In general, tf*idf systems have weights that increase monotonically with term frequency and decrease monotonically with document frequency.