Phraseology, Collocations, Distributional Similarity

From Helpful
(Redirected from Collocations)
Jump to navigation Jump to search
This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)


In linguistics, idiomaticity, idiomaticness, or just idiom can refer to the concept of "among all the possible realizations, this is the one the language ended on", so referring to the particular syntax, grammar, and other structural patterns a language has.

Idiomatic may more specifically point at the most preferred realization, and/or patterns - whatever makes you not evoke a "that's a weird way to say that" or "that's not how that typically works in a sentence".

Consider e.g. that a lot of common verbs have a preferred the adpositional particle, e.g. work on,

Outside of linguistics, an idiom tends to more specifically refer to any figurative, non-literal phrases to express an idea, and can easily include figures of speech and such.

Figures of speech

This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)

A figure of speech is, roughly, anything that is more figurative than purely literal.

Since we do this a lot, there's a large amount of things that fit the description. In fact, it isn't a stretch to include non-literal meaning used anywhere, from arguments, to storytelling, to cinema, to politics, and more.

We also have a bunch of names that try to group into different degrees and types of doing this, some overlapping with each other, and other concepts around semantics.

Such lists reveal that this is primarily rhetorical - details to literature, storytelling, and plays more in semantics and pragmatics.

Linguistics doesn't deal with most of that directly, but can still detect such non-figurative use (and some applications have to).

For example: ()

  • allusion - casual/brief reference (explicit or implicit) to another person, place, event, etc.
  • meiosis - intentional understatement [1]
e.g. 'the pond' to refer to the Atlantic Ocean, 'the troubles' for the northern irish conflict
  • litotes - understatement that uses a negation to express a positive, e.g. using "not bad" to mean pretty good.
actual meaning can depends on context
e.g. 'not bad' could have any literal meaning from 'not entirely horrible as such' to 'excellent'.

  • oxymoron - conjunction of words with intentionally contradictory meaning (see also contradiction in terms, paradox)

  • metaphor - implied comparative description that implies some sort of similarity
usually by equating things with no direct relation.
Often used to economically imply certain properties.
Similar but different from simile, which is an explicit comparison
  • allegory - sustained metaphor, usually tying in various metaphors related to an initial one[2]
  • parable - anecdotal extended metaphor intending to make a (often didactic or moral) point [3]
  • catachresis - a mix of more than one metaphor (by design or not) (verify) [4]

  • hyperbole - exaggeration meant to be used as stress
    • auxesis - hyperbole relying on word choice, e.g. 'tome' for a book, 'laceration' for a scratch
    • adynaton - extreme hyperbole, suggesting impossibility [5]

  • metonymy and synecdoche - reference to proximate object, often metaphorical
for example
'the law' to refer to the police
'hired hands'
'bricks and mortar' [6]
'bread' for food in general
equating a university's actions with its board
  • irony - intentional implication of the opposite of the standard meaning (verify)

  • tropes - less-literal reference often understood as a replacement, e.g. in retoric, storytelling
when approached as "what we do in storytelling", many of the above apply, particularly the ones that play on meaning, twist meaning, lead to contrasted interpretations


This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)

More of a device in literature and rhetoric (than in linguistics directly), tropes are rhetorical figures of speech understood specifically as a replacement with a less-literal meaning.

Many also rely on a play, twist, or approximation of words or meaning, and contasts, so includes things like

Which makes them most associated with rhetoric, storytelling and cinema, where there is specific focus on how concepts are conveyed.

In particular, we often imply concepts them from patterns we recognize, without having to spell them out, and often use layers of contextual meaning.

For example, in writing and speaking, tropes are often employed for the more colorful result that is more interesting to read or listen to, and is often explained as a part of rhetoric.

In particular visual storytelling has its own conventions, as it can both add visual metaphor, and more easily hide details, as well as rely on consistently doing symbolism, no matter whether it makes sense or not. [7][8])}}

Metonymy, Synechdoche

This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)

Metonymy (Latin, literally something like 'change of name') is using one name/entity for another thing.

These tend to be associative, culturally specific/embedded phrases, usually to refer to a more complex concept with a brief term.

(Does this include figures of speech? Just some? Or is there a specific separation?(verify))

Examples of metonymy:

  • The Crown to refer to the British monarchy. Similarly, Washington sometimes to refer to the United States government
  • "Nixon bombed Hanoi" ('Nixon' referring to the armed forces controlled by Nixon)
  • "A dish" referring to a part of a meal
  • "The car rear-ended me" ('me' referring to the car that the speaker was driving)
  • "Bread and circuses" to superficial appeasement
  • "The pen is mightier than the sword" (the pen mostly referring to publication, 'the sword' referring to a show of force).
  • "Lend me your ears", meaning listen to me, give me your attention

Note that the reference does not necessary carry any shared properties. The British monarchy is not crown-shaped, food isn't like the plate it's on, appeasement does not need to take the form of food and distracting entertainment, a driver doesn't resemble their car, publication isn't done with a pen, Nixon and the bombers only shared as much as being in the same chain of command.

Contrasted with

  • metaphor in that they intentionally compare across domains
  • analogy, which works by similarity, often explicit comparison, and is usually used to communicate a shared quality/property. In contrast, metonymy works by contiguity/proximity and is used to suggest associations.

Synechdoche is a subset of metonymy where one name is part of another.

For example,

saying that there are hungry mouths to feed,
or referring to your car as your wheels.

The distinction between metonymy and synechdoche is not always clear.

For example, "The White House said..." could refer to the President, his staff, or both with or without the distinction.

You could argue that both are part of the one concept - or that actions of one are distinct and only associated with actions of the other.

Synechdoche is a figure of speech where a term is used to refer to something else

  • referring to a whole by a part (perhaps the most common variant)
    • Example: 'hands' to refer to workers
  • referring to a part by the whole
    • Example: "The city put up a sign", "The world treated him badly", 'the police', 'the company',
  • referring to a wider class by example
    • Example: 'Bug' (for various insects, spiders, and such), give us our daily bread (food), using brand names like kleenex, xeroxing, googling
  • referring to an example by a wider class
    • Example: milk (usually meaning cow's milk),
  • referring to an object made from a material by that material
    • Example: threads (clothing), silver (for cutlery),
  • referring to contents by its container (also relying on context)
    • Example: keg (of beer),

Some examples are more complex, such as "silver" (material used for a relatively uncommon realization of cutlery), "the press" (a printing device referring to the news media, but also commonly to a team from)

Synechdoche can be the source or realization of of various fallacies, including fallacy of division, hasty generalization, and more.



This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)

When we have figures of speech that are non-literal, refers to a self-contained message, that we recognize as such (typcally because they have become lexicalized enough to be recognized, and reproduced fairly fairfully), we tend to call that a saying (or idiom), or something more specific.

Can be comments, references, observations, reactions, and aphorisms and the like.

We have dozens of variants of that, including:

  • Aphorism – a saying that contains a general, observational truth; "a pithy expression of wisdom or truth".
adages or proverbs refer to those that are widely known, and/or in long-term use
  • Cliché or bromide – a saying that is overused and probably unoriginal
which are platitudes when not very applicable, useful, or meaningful

  • Idiom – a phrase that means more than the sum of its parts
often mainly (or only) has non-literal interpretation.
More than compositional, perhaps not at all, and hearing it for the first time may gives no meaning
(There are other meanings for idiom, related to expression, but they are rarer and usually mentioned by their meaning)

  • Epithet – a byname - a saying or word used as a nickname, already having been widely associated with the person, idea, or thing being referred to.
including those added to a name, e.g. Richard the Lion-Heart
but more often adjectival characterization, e.g. Star-crossed lovers (when referring to Romeo and Juliet)

  • Maxim - An instructional saying about a principle, or rule for behavior.
Which occasionally makes it an aphorism as well
  • Motto – a saying used to concisely state outlook or intentions.
  • Mantra – a repeated saying, e.g. in meditation, religion, mysticism,

  • Epigram – a (written) saying or poem commenting on a particular person, idea, or thing.
Often clever and/or poetic, otherwise they tend to be witticisms.
Often making a specific point. Often short. Can be cliche or platitude.
  • Witticism – a saying that is concise and, preferably, also clever and/or amusing.
Also quips - which are often more in-the-moment.

Also related:

This can apply to idioms and the like , can be informal names where a formal one also exists,


This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)

Substituted phrases and/or double-meaninged phrases

Substituted phrases

Euphemism replaces a words/phrases with others, while preserving most meaning.

Typically the replacement is more figurative, regularly metaphor.

The intent is often to say saying something without saying it directly, for reasons like:

softening emotional blows (e.g. passed away instead of died),
tact to avoid potential offense (student is not working to his full potential, developing countries)
avoid rude sounding words (pretty much any word for toilet, including toilet itself originally, is a fairly polite reference for the place where we poop),
but probably the most fun and thereby the most common is hinting at dirty deeds. To the point where any unknown phrase, particular in the form <verb>ing the <adjective>d <noun>, potentially makes us giggle.
see also innuendo, double entendre
not mentioning implications, often doubletalk, e.g. downsizing the department (firing a bunch of people), collateral damage (we murdered some civilians we didn't mean to), special interrogation (torture).
powerful sounding business bullshit[9]

(not the best example because these tend to creatively obfuscate meaning in ways that are much less generally known than doubletalk)

A dysphemism and cacophemism replaces a word/phrase with a more offensive one, say, Manson-Nixon line for Mason–Dixon line.

Cacoponism refers to the more blatant and intentionally offensive variation.

Multiple meanings

A double entendre is a phrase/sentence that can in be interpreted in different ways - and mostly cases where at least one is dirty.

The extra meaning is intentionally there, but the fact they they can be masked by the more direct (and clean) read gives some deniability, though depending on how you say it, not much.

The words themselves don't necessarily hint at the second meaning. The understanding may come from context and both parties thinking the same thing - a complicitness.

If you go looking, you can find a lot of unintentional ones, like anything you can "that's what she said" to.

A single entendre isn't really a thing, though is used to point out when people didn't quite manage to make their entendre double, and mainly manage a single vaguely vulgar meaning.

Innuendo uses language to allude to additional meaning, yet with a wording that leaves some plausible deniability (without that deniability it would be clear insinuation).

Innuenndo can be fun and good natured (and is the n much closer to double entendre, which also only works when both parties understand the suggested meaning) but innuendo is more often specifically used to imply (often clearly imply, but only imply), and to specifically imply something negative - to disparage, to hint at foul play, plant seeds of doubt about someone, their reputation, or such (see e.g. the early stages of many american presidential runs).

Innuendo, like euphemisms, does not have to be sexual, though this perhaps is as common as it is assumed.

Double entendre does not have to be intentional, innuendo (and single entendre) is.

Puns use either multiple meanings of a word, or similar-sounding words, for humour or rhetorical effect. We mostly know them for the really bad ones.

See also:


Phraseology studies and describes the context in which a word is used, a mainly descriptive approach.

Concepts in the area


This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)

Collocations are statistically idiosyncratic sequences, series of words that occur together more often than just chance would suggest.

This points to anything between word use together out of historical/idiosyncratic habit, MWEs, grammatic patterns, very specific idioms, or anything inbetween - 'any set of words better treated as a single token'

People have slightly varied focus.

Say, language learning/teaching may

  • point at some grammatical idiosyncrasies, like which prepositions tend to sit next to which verbs, and which verbs tend to be how you do specific nouns (see examples below), as this can matter to the correctness of sentences
  • point out that a lot of collocations are not compositional, so when some adjective-noun combination doesn't seem to make direct sense (e.g. bright idea), you can assume it's some sort of expression you should look up
  • overlap strongly with technical terms and jargon - things that carry strong meaning but are not compositional.

...see e.g. for examples

Linguistics sometimes focuses on preferences between words that lexically speaking might be interchangeable, but practically have preferences (which you can see as not being purely compositional)


  • adjective-noun, often or preferred adjective used to make a noun stronger or more specific, e.g. maiden voyage, excruciating pain, bright idea, spare time, broad daylight, stiff breeze, strong tea. Alternative adjectives would typically be understood but be considered some amount of unusual
  • adverb-adjective, e.g. downright amazed, fully aware
  • verb-adverb, e.g. prepare feverishly, wave frantically
  • verb-preposition pairs, e.g. agree with
  • verbs-noun, e.g. we make rather than take a decision, we make rather than tidy a bed
  • noun combinations: a surge of anger, a bar of soap, a ceasefire agreement
  • ...and more

In a more applied sense

  • Collocations matter to translation.
In theory they are harder in that word-for-word translation will be wrong (not being compositional),
but at the same time, may be easier when you detect these (or happen to, as statistical translation might), as their meaning is more singular.
  • natural language generation would like to know these preferred combinations
  • we might also be able to suggest that certain words are more likely to appear than others, helping spelling correction, OCR correction, etc.
  • collocation may focus more on things that are appear more than you would expect, but it is sometimes also useful to note that unusual, i.e. less likely
  • some uses use reveals cultural attitudes, e.g. which adjectives we use for behaviour of specific groups

If you see 'collocation anlysis' mentioned as a method near some math, it is the statistics that helps reveal them.

including the assumptions that method makes (e.g. how the baseline expectation is defined), filtering to avoid cases you don't care about
and not just the human-curated-and-categorized cases.

Collocation analysis


See also:

  • W Croft et al. (2010) "Search engines: information retrieval in practice"
  • F Role, M Nadif (2011), "Handling the Impact of Low Frequency Events on Co-occurrence based Measures of Word Similarity - A Case Study of Pointwise Mutual Information"

Hidden pragmatism
What do you compare?
Dealing with sparsity, and avoiding an explosion of data




Figure of speech


Computational aspects

Phrase chunking, phrase identification

Distributional Similarity

This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)


Named entities

This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)

Named entities usually refer to finding/recognizing phrases that are (used as) nominal compounds.

Systems often deal with entities such as persons, organizations, locations, named objects, and such, particularly when they can work from known lists.

The same systems often also extract simple references such as times and dates, quantities such as monetary values and percentages.

Specific tasks in the area may be referred to / known as:

  • Entity Extraction
  • Entity Identification (EI)
  • Named Entity Extraction (NEE)
  • Named Entity Recognition (NER)
  • Named Entity Classification (NEC)
  • ...and others.

See also