Shifted PPMI Matrix

From GM-RKB
Jump to navigation Jump to search

A Shifted PPMI Matrix is a word-context PMI matrix where each matrix cell is computed as [math]\displaystyle{ \operatorname{max}(0.0, PMI(w, c) - \log(k)) }[/math], where [math]\displaystyle{ k }[/math] is the number of negative samples.



References

2014

 1 | [[0.    0.83  0.83  0.49  0.49  0.    0.49  0.13  0.    0.    0.    0. ]
 2 | [  0.83  0.    1.16  0.    0.    0.83  0.    0.    0.98  0.    0.    0. ]
 3 | [  0.83  1.16  0.    0.    0.    0.13  0.    0.47  0.98  0.    0.    0. ]
 4 | [  0.49  0.    0.    0.    0.49  0.    1.18  0.83  0.    0.    0.    0. ]
 5 | [  0.49  0.    0.    0.49  0.    0.    0.49  0.13  0.    0.    0.83  1.05]
 6 | [  0.    0.83  0.13  0.    0.    0.    0.    0.13  1.05  0.    0.    0. ]
 7 | [  0.49  0.    0.    1.18  0.49  0.    0.    0.83  0.    0.    0.    0. ]
 8 | [  0.13  0.    0.47  0.83  0.13  0.13  0.83  0.    0.29  0.    0.    0. ]
 9 | [  0.    0.98  0.98  0.    0.    1.05  0.    0.29  0.    0.    0.    0. ]
10 | [  0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    2.37  1.9]
11 | [  0.    0.    0.    0.    0.83  0.    0.    0.    0.    2.37  0.    2.08]
12 | [  0.    0.    0.    0.    1.05  0.    0.    0.    0.    1.9   2.08  0.]]
No neural network training, no parameter tuning, we can directly take rows of this SPPMI matrix to be the word vectors.

... The SPPMI-SVD method simply factorizes the sparse SPPMI matrix using Singular Value Decomposition (SVD), rather than the gradient descent methods of word2vec/GloVe, and uses the (dense) left singular vectors as the final word embeddings.