# tf-idf Score

A tf-idf Score is a non-negative real number score from a tf-idf function (for a vocabulary member relative to a multiset set member).

• Context:
• It can (typically) increase with respect to Set Member Frequency (frequent vocab members within a single multiset/document are more informative than rare items).
• It can (typically) increase with respect to IDF Score (frequent vocab members over an entire multiset/corpus are less informative than rare terms).
• It can be a member of a tf-idf Vector.
• Example(s):
• $0$, when every multiset contains the member.
• $0.046...$ for $\operatorname{tf-idf}(\text{quaint}'',\text{doc}_{184}, \text{Newsgroups 20 corpus})$, i.e. $\frac{\log(200)}{500} \equiv \frac{4}{2,000} \times \log(\frac{8,000}{40})$, if the word quaint is present 4 times in document $\text{doc}_{184}$with 2,000 words, and is contained in 40 documents from a corpus with 8,000 documents.
• Counter-Example(s):
• See: TF-IDF Ranking Function.

## References

### 2007

1. Note that in the description of tf*idf weights, the word “document” is traditionally used since the original motivation was to retrieve documents. While the chapter will stick with the original terminology, in a recommendation system, the documents correspond to a text description of an item to be recommended. Note that the equations here are representative of the class of formulae called tf*idf. In general, tf*idf systems have weights that increase monotonically with term frequency and decrease monotonically with document frequency.