Pointwise Mutual Information

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Pointwise Mutual Information (PMI) quantifies coincidence from their joint distribution and their individual distributions, assuming independence.

A little more concretely, PMI calculates probability of one word following another divided by the probability of each appearing at all:

PMI(word1,word2) = log(  P(word1,word2) / ( P(word2) * P(word1) )  )

Notes:

the base for the log isn't important when this is a unitless thing used only for relative ranking

'Pointwise' mostly just points out we are looking at specific cases at a time.

Mutual Information without the 'pointwise' used on variables/distributions (e.g. based on all possible events)

See also:

https://en.wikipedia.org/wiki/Pointwise_mutual_information

K Church, P Hanks (1990) "Word Association Norms, Mutual Information, and Lexicography"

B Daille (1994) "Approche mixte pour l'extraction automatique de terminologie : statistiques lexicales et filtres linguistiques. Thèse de Doctorat en Informatique Fondamentale

C Manning & H Schütze (1999) "Foundations of Statistical Natural Language Processing"

G Bouma (2006) "Normalized (pointwise) mutual information in collocation extraction"

W Croft et al. (2010) "Search engines: information retrieval in practice"

F Role, M Nadif (2011), "Handling the Impact of Low Frequency Events on Co-occurrence based Measures of Word Similarity - A Case Study of Pointwise Mutual Information"

T Mikolov et at. (2013), "Distributed Representations of Words and Phrases and their Compositionality"

Pointwise Mutual Information

Navigation menu