Topical Coherence Measure

From GM-RKB
Jump to navigation Jump to search

A Topical Coherence Measure is a topic modeling measure that evaluates the degree of semantic relatedness among the top words in a topic model.

  • Context:
    • It can (often) be crucial in determining the interpretability of topics to humans by measuring the similarity of words within a topic.
    • ...
  • Example(s):
    • CV Coherence Score.
    • UMass Coherence Score.
    • UCI Coherence Score.
    • Word2vec Coherence Score.
    • one based on Normalized Pointwise Mutual Information.
    • one based on Micro F1.
    • a One-All Coherence Measure, which evaluates the coherence of a topic by comparing one word with all other words in the set. For instance, in a topic with words ["climate", "change", "global", "warming", "emissions"], the measure would assess the coherence by evaluating the semantic relationship of "climate" with "change", "global", "warming", and "emissions" individually and then averaging these relationships.
    • a One-Any Coherence Measure, which assesses coherence by evaluating the relationship between one word and any other word in the set. For example, in a topic with words ["neural", "networks", "learning", "algorithm", "data"], it might compare "neural" with "networks" and then "neural" with "learning", choosing the pair that shows the strongest semantic relationship.
    • a Any-Any Coherence Measure, which looks at coherence by considering relationships between any pairs of words within a topic. For a topic containing ["economic", "growth", "market", "investment", "policy"], it would evaluate all possible pairs like "economic-growth", "market-investment", "growth-policy", etc., to assess the overall coherence.
    • A Boolean Window Model or PMI Coherence Score, which uses probabilistic measures like Pointwise Mutual Information (PMI) within the coherence measure framework. In a topic model producing words like ["quantum", "mechanics", "particle", "physics", "energy"], PMI could be used to assess the probability of co-occurrence of these words within a specific window of text, thereby indicating their coherence.
    • A Difference Measure (Interest in Association Rule Mining) adapted from association rule mining, evaluates the 'difference' or 'interest' between subsets of words. For a topic with words ["social", "media", "networking", "online", "communication"], this measure would assess the difference in the occurrence of subsets like ["social", "media"] and ["networking", "online"] within the topic to determine its coherence.
    • ...
  • Counter-Example(s):
  • See: Topic Modeling, Text Analysis Measure, Semantic Analysis, Topic Modeling Technique.


References

2024

  • Zvornicanin, E. "When coherence score is good or bad in topic modeling? Baeldung on Computer Science." (2022).
    • QUOTE: We can use the coherence score in topic modeling to measure how interpretable the topics are to humans. In this case, topics are represented as the top N words with the highest probability of belonging to that particular topic. Briefly, the coherence score measures how similar these words are to each other.
    • QUOTE: In this article, we introduced the intuition behind the topic modeling concept. Also, we explained in detail the LDA algorithm that is one of the most popular methods for solving this task. In the end, we resolve the problem of determining the meaning of the coherence score and how to know when this score is good or bad.

2017