Power law

From Helpful
Jump to navigation Jump to search
This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


Power laws[1], or power distributions, are often named when someone wishes to indicate certain statistical properties, typically a fairly specific falloff of probability.


(In statistics) See e.g. include the Zipf distribution[2], the zeta distribution[3], the Pareto distribution[4], and a few others.


Other things that come to mind: Stevens's power law, the idea that most stimuli need a magnitude increase to be felt roughly linearly more strongly.



Zipf's law, Zipfian word distributions

Linguists often specifically know Zipf's law, which refers to to the observation that when you count words, not only are a few very common and a lot fairly rare, for most of the words it roughly holds that each word's frequency in a text is roughly inversely proportional to its rank.

There are various ways to graph this, see e.g. the graphs in the various references.


This has implications such as that in top so-many terms in some text accounts for the bulk of that text. In corpora of text, half the word use is often covered by the top 200 or so words that occur in that text.

In languages that have function words (which is many of them), those are likely to take most or all places in the top ten while carrying near-zero semantic value (more, but still relatively little, once you consider syntax)


For example, you may find that in some English documents

  • the (rank 1) occurs as ~7% of all words
  • of (rank 2) occurs ~3.5%
  • and (rank 3) occurs ~2.9%
  • You can estimate that something at rank 10 would occur ~0.7%
  • something at rank 1000 occurs 0.007%
  • ...etc.


This largely holds to when you e.g. do n-grams or otherwise use larger units.

For example, if you analyse emails or some chat logs, or just people interacting, the top sentences by count consist largely of formalities, interjections, and (other) daily social interactions.

See also