Figures of speech, expressions, phraseology, etc.: Difference between revisions

From Helpful
Jump to navigation Jump to search
 
(37 intermediate revisions by the same user not shown)
Line 31: Line 31:
-->
-->


=Idiom=
=Idiomaticity, idioms=




In linguistics, '''idiomaticity''', '''idiomaticness''', or just '''idiom''' can refer to the concept of
In linguistics, '''idiomaticity''', '''idiomaticness''' or just '''idiom''', can refer to the sense of
"among all the possible realizations, this is the one the language ended on", the least [[marked]] way of expressing this,
"among all the possible realizations, this is the one the language uses / uses most",  
which sometimes is a fixed sequence, but also studying the mildly flexible patterns can be important e.g. to language learning.
the least [[marked]] way of expressing a thing.


For example, consider that  
A lot of such idiomaticity is ''relatively'' specific to a language.
Consider e.g. that a lot of common verbs have a preferred the [[adposition]]al [[particle]], e.g. work ''on'',
 
Also relevant to language learning, because you have probably to think about this harder
in everything you are not native in.
 
 
For example, consider that
* a lot of common verbs have a preferred the [[adposition]]al [[particle]], e.g. work ''on'',
 
* in English, you "I am '''a''' plumber" (you need the [[indefinite article]])
:: not "I am plumber", which is how various other languages would do it
 
 
Sometimes these are single correct ways, sometimes fixed sequences,
but of at least equal interest is studying the limited flexibility of these patterns.


https://en.wikipedia.org/wiki/Idiom_(language_structure)
https://en.wikipedia.org/wiki/Idiom_(language_structure)




Outside of linguistics,
idioms are mostly understood as [[figures of speech]] and other such figurative language.


This can be considered just one everyday case of the wider concept.
Outside of linguistics, '''idioms''' are usually understood as [[figures of speech]]
and other such figurative language.
 
You could see this as one of of the most everyday cases of the wider concept of idiomaticity.


https://en.wikipedia.org/wiki/Idiom
https://en.wikipedia.org/wiki/Idiom


=Figures of speech=
 
=Phraseology=
 
'''Phraseology''' usually refers to a part of linguistics
that studies and describes the context in which a word is used, and a mainly descriptive approach.
 
 
 
===Collocations===
{{stub}}
{{stub}}


A figure of speech is any use of a word/phrase where the intended meaning deviates from the literal meaning -- anything that is more figurative than purely literal.
 
'''Collocations''' are statistically idiosyncratic sequences: series of words that occur together more often than just chance would suggest
 
For example, sequences of words "pretend to", "as a matter of fact", "leaves all parties", "downright amazed", "good news".




It's not much of a stretch to go beyond word choice and include any sort of induced non-literal meaning used in arguments, in literature, in cinema, in politics, and in other kinds of storytelling.
Some are about [[idiomaticity]] - the sequences words we ''prefer'' to express a thing, among other possible sequences of words to do so, but still primarily [[compositional]] and readily understood among those alternatives.
Say, when you VERB a decision, you probably say you ''make'' it. Other verbs like 'take' would be ''understood'', but make would be preferred.




They seem constrained only by creativity and other people's ability to understand the result,
Some of them are expressions that have much more figurative than compositional meaning.
and since we do this a ''lot'', there's a large amount of things that fit the description, and it's something we spit up into more things to discuss.
So we have a bunch more words in the area. Some overlapping with each other.
And some from semantics and pragmatics that happen to be quite relevant.
Such lists reveal that this is for a large part related to our habit of rhetoric.


And these come in variants (not just weird symbolic things your grandpa always said),
so we might call these sayings, figures of speech, MWEs, or other things.
When studying them you care about more specific aspects - e.g. some are fixed, others are not, etc.




Also, people have varied reasons to focus on just some.


For example:
* Collocations are useful in language learning/teaching, which may
** point at some grammatical idiosyncrasies, such as which [[prepositions]] tend to pair with which verbs, and which verbs tend to be how you ''do'' specific nouns (see examples below), as this can matter to the correctness of sentences (related to [[idiomaticity]])
** point out that a lot of collocations are not compositional, so when some adjective-noun combination doesn't seem to make direct sense (e.g. bright idea), you can assume it's some sort of expression you should look up
** overlap strongly with technical terms and jargon - things that carry strong meaning but are not compositional.
** ...see e.g. http://www.ozdic.com/ for examples
 
* Collocations matter to translation.
: collocations make translation harder in that word-for-word translation will be wrong (not being compositional)
: since collocation analysis points out a sequence is idiosyncratic, it can makes it easier to detect that, which helps focus on learning what it might correspond to


* '''allusion''' - casual/brief reference (explicit or implicit) to another person, place, event, etc.<!--
* natural language ''generation'' would like to know these preferred combinations
:: calling something your achilles heel is a reference to a whole tale, though the meaning is something being a weakness in some way-->


* we might also be able to suggest that certain words are more likely to appear than others, helping spelling correction, OCR correction, etc.


* '''meiosis''' - intentional understatement [https://en.wikipedia.org/wiki/Meiosis_(figure_of_speech)]
* collocation may focus more on things that are appear more than you would expect, but it is sometimes also useful to note that ''unusual'', i.e. ''less'' likely <!--
:: e.g. 'the pond' to refer to the Atlantic Ocean, 'the troubles' for the northern irish conflict
: for example, spammers have recently discovered that using synonyms may subvert spam filters-->
 
* some uses use reveals cultural attitudes, e.g. which adjectives we use for behaviour of specific groups


* linguistics may study smaller idiomatic preferences - say, in "VERB a decision", you would probably prefer 'make' over 'take' or most other verbs. Similarly,
:: [[adjective]]-[[noun]], often or preferred adjective used to make a noun stronger or more specific, e.g. maiden voyage, excruciating pain, bright idea, spare time, broad daylight, stiff breeze, strong tea. Alternative adjectives would typically be ''understood'' but be considered some amount of unusual
:: [[adverb]]-[[adjective]], e.g. downright amazed, fully aware
:: [[verb]]-[[adverb]],  e.g. prepare feverishly, wave frantically
:: [[verb]]-[[preposition]] pairs, e.g. agree with, care about
:: [[verbs]]-[[noun]], e.g. we make rather than take a decision, we make rather than tidy a bed
:: [[noun]] combinations: a ''surge'' of ''anger'', a ''bar'' of ''soap'', a ''ceasefire'' ''agreement''


* '''litotes''' - understatement that uses a negation to express a positive, e.g. using "not bad" to mean pretty good.
* it might bring up other patterns useful to natural language parsing - e.g. we agree with someone, we agree on something
: actual meaning can depend on context
:: e.g. 'not bad' could have any literal meaning from 'not entirely ''horrible'' as such' to 'excellent'.


* '''oxymoron''' - conjunction of words with intentionally contradictory meaning (see also contradiction in terms, paradox)
:: e.g. act naturally, old news, minor crisis, oxymoron (roughly means sharp dull)
:: sometimes less intentional, e.g. original copy
:: civil war would apply except you can both argue that's a [[calque]] (loan translation) just pointing out it's between civilians, and/or that it's [[Equivocation|equivocating]] civil in the sense of polite, and in the sense of groups ''within'' the same state.


====Collocation analysis====
<!--


* '''[[irony]]''' - intentional implication of the opposite of the standard meaning {{verify}}
If you see 'collocation anlysis' mentioned as a method near some math,
it points at something that takes unstructured text,
and finds such less-usual sequences, based on statistics.




...and not the human-curated-and-categorized cases of interesting things,
which at most finds those known sequences/patterns in new text.


* '''metaphor''' - implied comparative description that implies some sort of similarity
: usually by equating things with no direct relation.
: Often used to economically imply certain properties.
: Similar but different from simile, which is an explicit comparison


* '''allegory''' - sustained metaphor, usually tying in various metaphors related to an initial one[https://en.wikipedia.org/wiki/Allegory]
There is no singular method - the simplest variants are pretty noisy,
adding filtering and assumptions is cleaner but ''may'' remove some interesting things.


* '''parable''' - anecdotal extended metaphor intending to make a (often didactic or moral) point [https://en.wikipedia.org/wiki/Parable]
Systems that implement collocation, and the articles that describe them,
are often clear about the final scoring,
but may vary in all the steps before this and call it pragmatism.


* '''catachresis''' - a mix of more than one metaphor (by design or not) {{verify}} [https://en.wikipedia.org/wiki/Catachresis]
Some give much cleaner results than others, for a handful of different reasons.  
We would like to know what these are, at least roughly.


-->


* '''[[tropes]]''' - less-literal reference often understood as a replacement, e.g. in rhetoric, storytelling
====Collocation's reference probabilities====
:: when approached as "what we do in storytelling", many of the above apply, particularly the ones that play on meaning, twist meaning, lead to contrasted interpretations
<!--
Asking "do these words occur together more often than the occurrence of each individually would suggest?"
implies you already have probabilities for each word.


While this isn't about comparing texts,
you ''do'' have to have a baseline of how likely each word is (comparatively).


* '''hyperbole''' - exaggeration meant to be used as stress
For example, compare:
** auxesis - hyperbole relying on word choice, e.g. 'tome' for a book, 'laceration' for a scratch
: if you train it on legal text and then run it on that legal text,
** adynaton - extreme hyperbole, suggesting impossibility [https://en.wikipedia.org/wiki/Adynaton]
: if you train it on general text and then run it on that legal text
...chances are the output is ''similar'', but the first may show fewer formulaic phrases.


Not because the ''formulaic sequences'' are learned,
but just because those formulaic sequences are common and increase the counts for their constituent words.


* '''metonymy''' and '''synecdoche''' - reference to proximate object, often metaphorical
(For similar reasons, the larger your document set is, the more you can even get away with training on the data itself)
: for example
:: 'the law' to refer to the police
:: 'hired hands'
:: 'bricks and mortar' [https://en.wikipedia.org/wiki/Brick_and_mortar]
:: 'bread' for food in general
:: equating a university's actions with its board




* [[hendiadys]]
Given a good training set, we can usually deal fine with unseen documents well,
though there is a small question in what to do with words that we haven't seen before.


Generally we assume those are rare, so assign those a probability near the bottom of what we ''do'' have.


* an implied [[analogy]]
-->


* stylistic reasons ([[rhythm]], [[rhyme]])
====Filters and assumptions in collocation analysis====


* [[rhetoric]]
<!--
Some things it does well out of the box. Say, consider for example a letter repeating a full name. We have a probability of the words that make up the parts, but repeating that sequence is statistically unusual.


* a [[trope]]
Similarly, [[idiomatic]] preferences (and MWEs and other combinations out of historical/idiosyncratic habit)
* a play on words
tend to roll out fairly well,
* [[euphemisms]]
because the least-marked sequence among alternatives ''implies'' seeing specific words together more often.




Yet the simplest methods may output a good portion of sequences like:
"that for the" where yes, we understand that appears more often, but it's not ''interesting''.




===Tropes===
Maybe we are primarily interested in, say, the kinds of terms this book uses that other books do not.
{{stub}}
Perhaps just the noun phrases. 
Maybe you want to be able to count that with and/or without the adjectives stuck on front of them.


More of a device in literature and rhetoric (than in linguistics directly), '''tropes''' are [[:Category:rhetoric|rhetorical]] [[figure of speech|figures of speech]] understood specifically as a replacement with a less-literal meaning.
That all suggests that we do some POS analysis of the text we are dealing with and remove everything
not matching a particular pattern.


Many also rely on a play, twist, or approximation of words or meaning, and contasts, so includes things like
* [[hyperbole]]
* [[metaphor]]
* [[metonymy]] and [[synechdoche]]
* [[catachresis]]
* [[meiosis]]
* [[oxymorons]]
* [[irony]]
* [[litotes]]


That ''is'' going to give you cleaner output.


Which makes them most associated with rhetoric, storytelling and cinema,
...just be sure that everything this removes isn't interesting to you.
where there is specific focus on ''how'' concepts are conveyed.
Which is hard.
-->


In particular, we often ''imply'' concepts them from patterns we recognize,
<!--
without having to spell them out, and often use layers of contextual meaning.
Possibly all documents of that same corpus, though a lot of real-world systems will also care for these systems to say
something useful about unseen records.




For example, in writing and speaking, tropes are often employed for the more colorful result that is more interesting to read or listen to, and is often explained as a part of [[rhetoric]].
It is also potentially useful to compare different sets of documents,
to answer questions like "what kind of language do we see a lot more in legal documents than in general use"




In particular visual storytelling has its own conventions,
as it can both add visual metaphor, and more easily hide details,
as well as rely on ''consistently'' doing symbolism, no matter whether it makes sense or not.
[https://en.wikipedia.org/wiki/Trope_(cinema)][https://tvtropes.org/])}}


we typically either
* compare a document to the statistics of a larger corpus it is part of
* find collocations in a large corpus (basically forgetting its structure).


====Metonymy, Synechdoche====
A larger corpus is a good idea in general, giving more stable and relatively neutral reference.
{{stub}}


...with some care, anyway. If your comparison is newspaper text, or legal text, or fiction,
there are a number of constructions that won't be used, some that will be used more often,
there will be topics that won't be mentioned, there will be specific styles, etc.


'''Metonymy''' (meaning something like 'change of name') is using one name/entity for another thing.
(and note that one of the issues with PMI is that it over-values ''rare'' occurrences)


Related to [[meronym]]/[[holonym]], but often with a specific part that is also ''representative'' of the whole.


For example:
-->
* The Crown to refer to the British monarchy. Similarly, Washington sometimes to refer to the United States government
* "Nixon bombed Hanoi" ('Nixon' referring to the armed forces controlled by Nixon)
* "A dish" referring to a part of a meal
* "The car rear-ended me" ('me' referring to the car that the speaker was driving)


* "Bread and circuses" to superficial appeasement
====Choices in collocation math - dealing with sparsity and with an explosion of data====
* "The pen is mightier than the sword" (the pen mostly referring to publication of ideas, 'the sword' referring to a show of force).
* "Lend me your ears", meaning listen to me, give me your attention


<!--
<!--
* using word association to convey a mood {{verify}}


* A lot of abstract jargon ends up being metonymous
When finding things in a document, you probably individually score every possible sequence.
-->




Metonymy tends to be associative,
The math is somewhat against us here.
usually to refer to a more complex concept with a brief term.


The reference does not necessary carry any shared properties.
The British monarchy is not crown-shaped,
food isn't like the plate it's on,
appeasement does not need to take the form of food and distracting entertainment,
a driver doesn't resemble their car,
publication isn't done with a pen,
Nixon and the bombers only shared as much as being in the same chain of command.


A number are likely to be culturally embedded, and somewhat local.
((We work in probabilities, because comparing ''counts'' of one phrase versus another won't get you very far.))
<!--
 
(Does this include figures of speech? Just some? Or is there a specific separation?{{verify}})
-->


Contrasted with
* [[metaphor]] in that they intentionally compare across domains


* [[analogy]], which works by similarity, often explicit comparison, and is usually used to communicate a shared quality/property. In contrast, metonymy works by contiguity/proximity and is used to suggest associations.


'''It's sort of explosive ''yet'' has sparse meaning'''
* the set of all possible word combinations is by nature an incredibly large set
:: say, if your language has 50000 possible words, then there are 2.5 billion 2-grams, and 125 trillion 3-grams
:: most documents will not use anywhere near all possible words


* the ''combinations'' that we will actually find - at all or more than rarely, is incredibly sparse compared to the ''possible'' combinations.
:: if you were e.g. to make a 50000-by-50000 table for 2-grams, that's a ''huge'' table
::: almost all cells would be 0
::: and many that aren't zero will be 1
:: So we tend to count this sparsely. {{comment|(We don't have an n-by-n matrix, we have a hashmap with entries. (Though note this is more about "keeping it in a single PC's RAM to keep access times reasonable", not a hard limitation on what you couldn't or shouldn't do))}}


'''Synechdoche''' is a subset of metonymy where one name is part of another.
* When you introduce [[skip-grams]], our estimations aren't hindered as much by this kind of sparsity -- but you get even more entries.


For example,  
* When you introduce n-grams for n&ge;3, to find longer collocations, you also get a lot more entries.
: saying that there are hungry mouths to feed,
: or referring to your car as your wheels.


The distinction between metonymy and synechdoche is not always clear.
: For example, "The White House said..." could refer to the President, his staff, or both with or without the distinction.
You could argue that both are part of the one concept - or that actions of one are distinct and only associated with actions of the other.


* because words in languages have [[Power_law|Zipfian tendencies]], most of the combinations we record will involve one or more semantically-empty function words from the top of that list (the, of, and, be, to, a, in, that)


* if you ignore very common words in a stopword-like way, that would also be (fairly arbitrarily) removing the ability to deal with phrases that involve them
: which is a reasonable amount of them. Consider e.g. rock and roll, through the grapevine, etc.


Synechdoche is a [[figure of speech]] where a term is used to refer to something else


* referring to a whole by a part (perhaps the most common variant)
** Example: 'hands' to refer to workers


* referring to a part by the whole
'''The values in there are also worth a think'''
** Example: "The city put up a sign", "The world treated him badly", 'the police', 'the company'


* referring to a wider class by example
* we overvalue things involving words that are rare
** Example: 'Bug' (for various insects, spiders, and such), give us our daily bread (food), using brand names like kleenex, xeroxing, googling
: you can clean up a lot of results saying "ignore anything that involves very-rare unigrams"
: ...but whether that is useful, or removes what you are looking for, depends a lot on what you are doing this analysis for.


* referring to an example by a wider class
** Example: milk (usually meaning cow's milk),


* referring to an object made from a material by that material
Many methods will try to correct for how (un)informative words are,  
** Example: threads (clothing), silver (for cutlery),
for example by comparing ''combined'' appearance against ''expected'' appearance.


* referring to contents by its container (also relying on context)
We look to things like [[log likelihood ratio]], Pointwise Mutual Information (PMI), which are similar ideas (also related to entropy, and (naive) Bayes).
** Example: keg (of beer),




Some examples are more complex, such as
It varies exactly how that estimation works, and the assumptions you may, how thorough the model is (many do ''not'' go as far as distribution estimation or [[smoothing sparse data|smoothing]] the inherently sparse data), or even preferences you build in.
"silver" (material used for a relatively uncommon realization of cutlery),
"the press" (a printing device referring to the news media, but also commonly to a team from)




Synechdoche can be the source or realization of of various fallacies, including fallacy of division, hasty generalization, and more.
One major detail is that various mathematical approaches (e.g. plain 'chance of combination divided by chance of appearing individually') will overvalue the rare - including tokens that are rare for any reason, not unusually because because they are misspelled, while all actual phrases are further down the list.


===Schemes===
And that makes some sense, in that we are looking for unusual things,
<!--
yet we are typically looking for ''patterns'' that are less usual,
which almost by definition lies somewhere ''between'' completely regular and extremely unusual.


In linguistics, a scheme is a rhetorical figure of speech that, usually,
It may be useful to have a 'how unusual' parameter to the model,  
draws attention by making a change to either the most neutral or the most expectable way of putting something.
because it also lets you tune it between 'specialized jargon' levels and 'general associations' levels.




This includes recognizable parallels between clauses, climax (ordering by importance) and antithesis (juxtaposition),


changes in typical word order (inversion, [[parenthesis]] to change flow, [[apposition]])


omission ([[ellipsis]] of words, omission of conjunctions)
If you have an endless source of text (say, the internet),
then you could choose to filter in only evidence that is relatively clear,
particularly if you're doing things like skip-grams.


repetition (sound on adjacent words, words in adjacent clauses, words in different senses, words from the same root, etc.)


You could enter skip-gram evidence ''only'' if there is more than one occurrence in a document and/or the thing occurs in multiple documents.
The assumption being that such rare occurence is often mistakes.


https://en.wikipedia.org/wiki/Scheme_(linguistics)


We could keep a long-term vocabulary based on such ''filtered'' evidence,
so that multiple passes over the same dataset will avoid the bulk of low-evidence -grams.




-->
This is still bias, yes.


===Sayings===
A single document mentioning it twice
{{stub}}
would weigh more than a thousand mentioning it once,
but these are ''reasonable'' assumptions in many cases.


You might be tempted to filter out the top as well, but it would barely make a difference - they will remove the the majority of ''counts'', yes, but only a tiny amount of ''entries''.


When we have figures of speech that are non-literal, refers to a self-contained message,
that we recognize as such (typcally because they have become [[lexicalized]] enough to be recognized, and reproduced fairly fairfully),
we tend to call that a '''saying''' (or '''idiom'''), or something more specific.


Can be comments, references, observations, reactions, and aphorisms and the like.




We have dozens of variants of that, including:
* '''Aphorism''' – a saying that contains a general, observational truth; "a pithy expression of wisdom or truth".
: '''adages''' or '''proverbs''' refer to those that are widely known, and/or in long-term use


* '''Cliché''' or '''bromide''' – a saying that is overused and probably unoriginal
: which are '''platitudes''' when not very applicable, useful, or meaningful


https://github.com/rtapiaoregui/collocater
* the varied data file seems to suggest mainly detecting known ones?


* '''[[Idiom]]''' – a phrase that means more than the sum of its parts
https://radimrehurek.com/gensim/models/phrases.html
: often mainly (or only) has non-literal interpretation.
: More than compositional, perhaps not at all, and hearing it for the first time may gives no meaning
: (There are other meanings for idiom, related to expression, but they are rarer and usually mentioned by their meaning)


https://pitt.libguides.com/textmining/collocation


* '''Epithet''' – a byname - a saying or word used as a nickname, already having been widely associated with the person, idea, or thing being referred to.
https://python.plainenglish.io/collocation-discovery-with-pmi-3bde8f351833
: including those added to a name, e.g. Richard ''the Lion-Heart''
: but more often adjectival characterization, e.g. Star-crossed lovers (when referring to Romeo and Juliet)




* '''Maxim''' - An instructional saying about a principle, or rule for behavior.
: Which occasionally makes it an [[aphorism]] as well


* '''Motto''' – a saying used to concisely state outlook or intentions.


* '''Mantra''' – a repeated saying, e.g. in meditation, religion, mysticism,




* '''Epigram''' – a (written) saying or poem commenting on a particular person, idea, or thing.
'''Sliding windows and skip-grams'''
: Often clever and/or poetic, otherwise they tend to be witticisms.
: Often making a specific point. Often short. Can be cliche or platitude.


* '''Witticism''' – a saying that is concise and, preferably, also clever and/or amusing.
For a text of k words we may see at most k unique 2-grams,
: Also '''quips''' - which are often more in-the-moment.
yet they come from a space of up to n<sup>2</sup> possible 2-grams.




Also related:
* [[colloqualism]] is something that originated in verbal speech.
: This can apply to idioms and the like , can be informal names where a formal one also exists,




-->
-->


===MWEs===
<!--
'''Multi-Word Expressions''' (MWEs) are any word combinations that are syntactically and/or semantically [[idiosyncratic]].
The term was coined (relatively) recently, by computational research that noticed there are many sequence of words we use together out of habit and/or for specific meaning worth noting.
(cf. collocations, which tends to have more specific focus)
They are clearly something more structured than just words coexisting (and there are some typologies of MWEs),
but are not compounds or phrases, and not always strongly structured.
MWE suggests a focus on finding idiosyncratic uses, typically not strictly [[compositional]],
often expressing something that cannot be expressed very simply with more typical/literal use of words.
Consider idioms, [[figures of speech]], institutionalized phrases - and arguably [[phrasal verbs]], [[nominal compounds]] (and named entities), (some) terminology, and others.
The same analyses and researchers also go into the more mundane sort of collocations mentioned above.
Universal dependencies splits MWEs into three types:
* fixed - entirely fixed sequences, that work as function words or short adverbials
::
* flat: exocentric (headless) things that are only somewhat fixed
:: dates
:: names (and any titles, except when appositional modifiers are more appropriate)


{{stub}}


==Substituted phrases and/or double-meaninged phrases==
* compound:
:: noun compounds
:: verb compounds (other languages do this more than english)
:: serial verbs
:: particle verbs like 'put up' (compound:prt)
 
 
 
MWEs are interesting for various computational [[NLP]] applications, including [[translation]], general [[parsing]], [[information retrieval]], [[natural language generation]], (computational) [[lexicography]], and more.
 


{{info|Note that some of this moves ''well'' out of phraseology, into 'meanings and words are complex, okay' territory}}
One subdivision of MWEs is '''institutionalized phrases''' with compositional syntax and semantics/pragmatics but which still occur together frequently, versus '''lexicalized''', which have idiosyncratic syntax and/or semantics/pragmatics,  


Lexicalized phrases vary in how flexibly they can be used, inflected, altered.


===Substituted phrases===


('Substituted phrases' is not a term from linguistics, but seems a useful one to group euphemism and some related concepts)
'''Institutionalized phrases''' are those that are syntactically and semantically compositional,  
but ''occur'' with unusually ([[markedly]]) high frequency.




The further names usually refer to specific properties or natures of phrases.


'''Euphemism''' replaces a words/phrases with others, while preserving most meaning.


Typically the replacement is a form that is less direct - more figurative, possibly metaphor, or another reason to pick an nearby meaning.
'''Idioms''' are lexicalized phrases that often come from metaphors, some compositional and intuitive enough to be directly understandable, others historical and long since institutionalized, fossilized or otherwise habitual.


The intent is often to say saying something without saying it ''directly'', for reasons like:
* softening emotional blows (e.g. passed away instead of died),


* tact to avoid potential offense (student is not working to their full potential, developing countries)
There is often a distinction made between fixed idioms (like 'by and large' and 'face to face') and those that are modifiable to some degree (reordered, creatively augmented, etc.).


* understatement, e.g. 'more than a few' for 'a lot'
Commonly used idioms see some cross-language adoption over time, so some may become/be fairly universal and translatable. Others are very hard to translate well because their meaning depends on their background, with literal translation often resulting in hilarity.


* avoiding rude sounding words (pretty much ''any'' other word used for toilet, including toilet ''itself'' originally, is a fairly polite reference for the place where we poop)
Other cases of collocations are things that could be read compositionally, but are not meant to be interpreted entirely literally.
For example, in 'taking a photo' (which is frequent), 'taking' is a support verb of sorts - the phrase does not refer to picking one up or stealing it. Instead it is considered equivalent to making a photo.


* but probably the most fun and thereby the most common is hinting at sexiness. To the point where any unknown phrase, particular in the form &lt;verb>ing the &lt;adjective>d &lt;noun>, potentially makes us giggle.
:: compare with [[innuendo]], [[double entendre]]


* not mentioning implications, often doubletalk, e.g. downsizing the department (firing a bunch of people), collateral damage (we murdered some civilians we didn't mean to), special interrogation (torture).


* powerful sounding business bullshit[https://ig.ft.com/sites/guffipedia/chief-manifesto-catalyst/]
'''Terminology''' (a.k.a. Jargon. There is a difference, but broadly seen the difference is not well defined) is relatively isolated development, and disambiguates in its own context.
(not the best example because these tend to creatively obfuscate meaning in ways that are much less generally known than doubletalk)


Terminology includes new terms, but also a lot of existing words that take on new/specific meanings and sometimes altered syntax, in the context of a particular subject area.






'''[[Phrasal verbs]]''' act like / are idiomatic phrases, often entering around a [[verb]] + [[particle]], for example, to look down on, to drop in, put off, spell out, set off, let down, break down, etc. Varies in transitivity.


A '''dysphemism''' and cacophemism replaces a word/phrase with a more offensive one, say, Manson-Nixon line for [https://en.wikipedia.org/wiki/Mason%E2%80%93Dixon_line Mason–Dixon] line.




'''Cacoponism''' refers to the more blatant and intentionally offensive variation.
'''Nominal Compounds''' are a specific form of noun [[compound]]s. They are a pair of nouns that act as a single referene,
often with the first acting as an adjective to the second, e.g. city park.  


Differs from a noun phrases in general in that they seem compositional, but are often somewhat lexicalized themselves,
and often as something more specific than their compositional meaning.
Consider washing machine, cat food, newspaper.
...though some feel on the edge of "but what else would it be?", e.g. raincoat, bookshelf.






'''Cranberry expressions''' refer to MWEs/phrases that contain '''cranberry words''', which are words that are rarely or not seen elsewhere.
Examples of cranberry words include {{example|hotcakes}}, which is almost only seen in "selling like hotcakes" (and plays on that), nonsense words like {{example|caboodle}}, and various others more.


* http://en.wikipedia.org/wiki/Euphemism
Cranberry words are rather irregular lexical items.
* http://en.wikipedia.org/wiki/Dysphemism
They are [[lexically fixed]] but have no literal meaning.
Note their idiomatic nature and the structural similarity to [[collocations]].


===Multiple meanings===
There are related terms, such as ''(phraseologically) bound words'',




'''Polysemy''' refers to a word referring to multiple distinct possible meanings {{comment|(or, if you're going to get [[semiotic]], any sign/symbol that does)}}.


====See also====
* http://en.wikipedia.org/wiki/Collocation
* [[Computational aspects of phrases and clauses]]


Usually multiple ''related'' meanings, and this can have useful generalizing value.
In fact, in many languages, many words have a ''small'' amount of this.


<!--
Arguably, a lot of emotive words are usefully vague.
-->
-->


===Phraseme===
<!--
<!--
In other cases, a lot of different senses got assigned to the same word somehow,
or to specific


In english, run, set, sound all have more than a dozen distinct uses.
'''Phraseme''' seems to mean something like "multi-morphemic thing were at least ''some'' of it is it is not freely chosen".


Common verbs tend to have this... problem, or feature, depending on how you see it.




We can fuzzily argue for a ''primary'' sense,
often the first one you'ld learn,
and the first one you'ld think of when that word is not in any context.
-->
-->


=Sayings with figurative meaning=
Most languages have established phrases with figurative meaning, or otherwise syntactically and/or semantically [[idiosyncratic]].


<!--
<!--
Arguably, a lot of emotive words are usefully vague.
Definitions also go into [[selectional restriction]], semantic restrictions on combining concepts
Say, 'love' has many and varied interpretations, and even if you try try to catch it in definitions you will easily find a dozen distinct-enough things people consistently include into the concet, so love is arguably one of the more polysemous words around.
-->
-->


Aside from ''words'' in dictionaries, though, we also have ''phrasing'' that plays with meanings.


They also have to be learned, because unseen they create confusion - not just intentional contradiction<!--,
and there will probably always be a spectrum between "so common they are almost part of everyday grammar"
and "I'm a fourty year old native speaker and have learned that differently / have never heard that"-->.




A '''double entendre''' is a phrase/sentence that can in be interpreted in different ways - and mostly cases where at least one is dirty.


The extra meaning is intentionally there, but the fact they they can be masked by the more direct (and clean) read gives some deniability,
A '''phraseme''' is often understood as an utterance in which at least some parts cannot be freely chosen,
though depending on how you say it, not much.
related to [[idiomaticity]] in the sense of "among all the possible realizations, this is the one the language uses".


The words themselves don't necessarily hint at the second meaning. The understanding may come from context and both parties thinking the same thing - a complicitness.
As such, terms like
* set phrase
* fixed expression
* idiomatic phrase
* multiword expression
* idiom
...suggest phrases that should be reproduced exactly.


If you go looking, you can find a lot of unintentional ones, like anything you can "that's what she said" to.
And if "you know what they say" is true, that makes sense,
in that at least some of those sayings are assumed to be ''references'' to previously-decided wisdom {{comment|(but maybe let's leave the [[epistomology]] to the philiosophers)}}.


So while figurative language ''in general'' allow some amount of creativity,
the just-mentioned concepts generally resist it.




A '''single entendre''' isn't really a thing, though is used to point out when people didn't quite manage to make their entendre double, and mainly manage a single vaguely vulgar meaning.
==Figures of speech==
{{stub}}


A figure of speech is any use of a word/phrase where the intended meaning deviates from the literal meaning -- anything that is more figurative than purely literal.


This is a fairly open-ended idea, and can lean e.g.
on the choice of words,
playing with the meaning of words (tropes),
changing the structure of sentences (schemes), and more.


'''Innuendo''' uses language to allude to additional meaning,
It's not much of a stretch to go beyond word choice and include
yet with a wording that leaves some plausible deniability
non-literal meaning as used
{{comment|(without that deniability it would be clear insinuation)}}.
in arguments (a bunch of rhetorical devices),
in literature,
in cinema,
in politics,
and in other kinds of storytelling.


Arguably, where e.g. [[phrasemes]] includes everything from set phrases to the ''concept'' of the creative freedom that lets us communicate with non-literal, figures of speech could be seen as the exercise of that creativity, and not only the most established ones (...arguable. The definitions vary and are fuzzy).


Innuenndo can be fun and good natured (and is the n much closer to double entendre, which also only works when both parties understand the suggested meaning)
but innuendo is more often specifically used to imply (often ''clearly'' imply, but ''only'' imply), and to specifically imply something negative - to disparage, to hint at foul play, plant seeds of doubt about someone, their reputation, or such {{comment|(see e.g. the early stages of many american presidential runs)}}.


Innuendo, like euphemisms, does not have to be sexual, though this perhaps ''is'' as common as it is assumed.
In some sense,  
figures of speech seem constrained only by creativity and other people's ability to understand the result,
and since we do this a ''lot'',
there's a large amount of other concepts that fit this description,
and it's something we would want to split up to discuss better.
So we have a bunch more words in the area. Some overlapping with each other.
And some from semantics and pragmatics that happen to be quite relevant.
Such lists reveal that this is for a large part related to our habit of rhetoric.


Double entendre does not have to be intentional, innuendo (and single entendre) is.
For example:




===Allusion===
* '''allusion''' - casual/brief reference (explicit or implicit) to another person, place, event, etc.<!--
:: calling something your achilles heel is a reference to a whole tale, though the meaning is something being a weakness in some way-->


'''Puns''' use either multiple meanings of a word, or similar-sounding words, for humour or rhetorical effect.
===Meiosis===
We mostly know them for the really bad ones.
* '''meiosis''' - intentional understatement [https://en.wikipedia.org/wiki/Meiosis_(figure_of_speech)]
<!--To say that innuendo is an Italian suppository would be a pun.-->
:: e.g. 'the pond' to refer to the Atlantic Ocean, 'the troubles' for the northern irish conflict


See also:
* [[Polysemy]]
* http://en.wikipedia.org/wiki/Innuendo
* http://en.wikipedia.org/wiki/Double_entendre
* http://en.wikipedia.org/wiki/Euphemism


===Litotes===
* '''litotes''' - understatement that uses a negation to express a positive, e.g. using "not bad" to mean pretty good.
: actual meaning can depend on context
:: e.g. 'not bad' could have any literal meaning from 'not entirely ''horrible'' as such' to 'excellent'.


===Oxymoron===
* '''oxymoron''' - conjunction of words with intentionally contradictory meaning (see also contradiction in terms, paradox)
:: e.g. act naturally, old news, minor crisis, oxymoron (roughly means sharp dull)
:: sometimes less intentional, e.g. original copy
:: civil war would apply except you can both argue that's a [[calque]] (loan translation) just pointing out it's between civilians, and/or that it's [[Equivocation|equivocating]] civil in the sense of polite, and in the sense of groups ''within'' the same state.




[[Category:Linguistics]]
===Irony===
[[Category:Rhetoric]]
* '''[[irony]]''' - intentional implication of the opposite of the standard meaning {{verify}}
[[Category:Semantics]]
[[Category:Pragmatics]]
[[Category:Reference]]


=Phraseology=


'''Phraseology''' studies and describes the context in which a word is used,
===Metaphor===
a mainly descriptive approach.
* '''metaphor''' - implied comparative description that implies some sort of similarity
: usually by equating things with no direct relation.
: Often used to economically imply certain properties.
: Similar but different from simile, which is an explicit comparison


====Allegory====
* '''allegory''' - sustained metaphor, usually tying in various metaphors related to an initial one[https://en.wikipedia.org/wiki/Allegory]


==Concepts in the area==


===Collocations===
===Parable===
{{stub}}
* '''parable''' - anecdotal extended metaphor intending to make a (often didactic or moral) point [https://en.wikipedia.org/wiki/Parable]


===Catachresis===
* '''catachresis''' - a mix of more than one metaphor (by design or not) {{verify}} [https://en.wikipedia.org/wiki/Catachresis]


'''Collocations''' are statistically idiosyncratic sequences: series of words that occur together more often than just chance would suggest


Put another way, of all the possible sequences of words, it shows the sequences that come up more often - say, "pretend to", "as a matter of fact", "leaves all parties", "downright amazed", "good news".


===Tropes===
A trope is a less-literal reference, something often understood as a replacement, e.g. in rhetoric, storytelling.


People have slightly varied focus.
When approached as "what we do in storytelling", a lot of the others in this list apply to some degree,
particularly the ones that play on meaning, twist meaning, lead to contrasted interpretations.


* Collocations are useful in language learning/teaching, which may
----
** point at some grammatical idiosyncrasies, like which prepositions tend to sit next to which verbs, and which verbs tend to be how you ''do'' specific nouns (see examples below), as this can matter to the correctness of sentences
** point out that a lot of collocations are not compositional, so when some adjective-noun combination doesn't seem to make direct sense (e.g. bright idea), you can assume it's some sort of expression you should look up
** overlap strongly with technical terms and jargon - things that carry strong meaning but are not compositional.
** ...see e.g. http://www.ozdic.com/ for examples


* Collocations matter to translation.
More of a device in literature and rhetoric (than in linguistics directly), '''tropes''' are [[:Category:rhetoric|rhetorical]] [[figure of speech|figures of speech]] understood specifically as a replacement with a less-literal meaning.
: collocations make translation harder in that word-for-word translation will be wrong (not being compositional)
: since collocation analysis points out a sequence is idiosyncratic, it can makes it easier to detect that, which helps focus on learning what it might correspond to


* natural language ''generation'' would like to know these preferred combinations
Many also rely on a play, twist, or approximation of words or meaning, and contasts, so includes things like
* [[hyperbole]]
* [[metaphor]]
* [[metonymy]] and [[synechdoche]]
* [[catachresis]]
* [[meiosis]]
* [[oxymorons]]
* [[irony]]
* [[litotes]]


* we might also be able to suggest that certain words are more likely to appear than others, helping spelling correction, OCR correction, etc.


* collocation may focus more on things that are appear more than you would expect, but it is sometimes also useful to note that ''unusual'', i.e. ''less'' likely <!--
Which makes them most associated with rhetoric, storytelling and cinema,
: for example, spammers have recently discovered that using synonyms may subvert spam filters-->
where there is specific focus on ''how'' concepts are conveyed.


* some uses use reveals cultural attitudes, e.g. which adjectives we use for behaviour of specific groups
In particular, we often ''imply'' concepts them from patterns we recognize,
without having to spell them out, and often use layers of contextual meaning.


* linguistics may study smaller idiomatic preferences - say, in "VERB a decision", you would probably prefer 'make' over 'take' or most other verbs. Similarly,
:: [[adjective]]-[[noun]], often or preferred adjective used to make a noun stronger or more specific, e.g. maiden voyage, excruciating pain, bright idea, spare time, broad daylight, stiff breeze, strong tea. Alternative adjectives would typically be ''understood'' but be considered some amount of unusual
:: [[adverb]]-[[adjective]], e.g. downright amazed, fully aware
:: [[verb]]-[[adverb]],  e.g. prepare feverishly, wave frantically
:: [[verb]]-[[preposition]] pairs, e.g. agree with, care about
:: [[verbs]]-[[noun]], e.g. we make rather than take a decision, we make rather than tidy a bed
:: [[noun]] combinations: a ''surge'' of ''anger'', a ''bar'' of ''soap'', a ''ceasefire'' ''agreement''


* it might bring up other patterns useful to natural language parsing - e.g. we agree with someone, we agree on something
For example, in writing and speaking, tropes are often employed for the more colorful result that is more interesting to read or listen to, and is often explained as a part of [[rhetoric]].  




In particular visual storytelling has its own conventions,
as it can both add visual metaphor, and more easily hide details,
as well as rely on ''consistently'' doing symbolism, no matter whether it makes sense or not.
[https://en.wikipedia.org/wiki/Trope_(cinema)][https://tvtropes.org/])}}


===Schemes===
<!--


===Collocation analysis===
In linguistics, a scheme is a rhetorical figure of speech that, usually,
<!--
draws attention by making a change to either the most neutral or the most expectable way of putting something.


If you see 'collocation anlysis' mentioned as a method near some math,
it points at some statistics that helps list such less-usual sequences.


...and not the human-curated-and-categorized cases of interesting things.
This includes recognizable parallels between clauses, climax (ordering by importance) and antithesis (juxtaposition),


changes in typical word order (inversion, [[parenthesis]] to change flow, [[apposition]])


There is no singular method - the simplest variants are pretty noisy,
omission ([[ellipsis]] of words, omission of conjunctions)
adding filtering and assumptions is cleaner but ''may'' remove some interesting things.


Systems that implement collocation, and the articles that describe them,
repetition (sound on adjacent words, words in adjacent clauses, words in different senses, words from the same root, etc.)
are often clear about the final scoring,
but may vary in all the steps before this and call it pragmatism.


Some give much cleaner results than others, for a handful of different reasons.
We would like to know what these are, at least roughly.


-->
https://en.wikipedia.org/wiki/Scheme_(linguistics)


====The reference probabilities====
<!--
Asking "do these words occur together more often than the occurrence of each individually would suggest?"
implies you already have probabilities for each word.


While this isn't about comparing texts,
you ''do'' have to have a baseline of how likely each word is (comparatively).


For example, compare:
-->
: if you train it on legal text and then run it on that legal text,
: if you train it on general text and then run it on that legal text
...chances are the output is ''similar'', but the first may show fewer formulaic phrases.


Not because the ''formulaic sequences'' are learned,
===Hyperbole===
but just because those formulaic sequences are common and increase the counts for their constituent words.
* '''hyperbole''' - exaggeration meant to be used as stress
** auxesis - hyperbole relying on word choice, e.g. 'tome' for a book, 'laceration' for a scratch
** adynaton - extreme hyperbole, suggesting impossibility [https://en.wikipedia.org/wiki/Adynaton]


(For similar reasons, the larger your document set is, the more you can even get away with training on the data itself)




Given a good training set, we can usually deal fine with unseen documents well,
===Circumlocution===
though there is a small question in what to do with words that we haven't seen before.
{{stub}}


Generally we assume those are rare, so assign those a probability near the bottom of what we ''do'' have.
Circumlocution refers to using unnecessarily large number of words to express an idea in a roundabout way.


-->


====Filters and assumptions====
It frequently turns up as indirect and/or long-winded descriptions where more succinct ones exist,
less-direct figures of speech where clearer ones exist, or such.


<!--
Some things it does well out of the box. Say, consider for example a letter repeating a full name. We have a probability of the words that make up the parts, but repeating that sequence is statistically unusual.


Similarly, [[idiomatic]] preferences (and MWEs and other combinations out of historical/idiosyncratic habit)  
Circumlocution may be done
tend to roll out fairly well,
: to avoid revealing information,
because the least-marked sequence among alternatives ''implies'' seeing specific words together more often.
: to be intentionally vague (e.g. with [[equivocation]]),
: in language aquisition, teaching meaning via description
: to work around not knowing a term in another language and getting there via description
: to work around aphasia
: avoid saying specific words ([[euphemism]], [[cledonism]]),
: to construct [[euphemisms]] in the [[innuendo]] sense
: to construct [[equivocations]]
: and (other) varied [[rhetoric]]
: creatively, to set set up similes




Yet the simplest methods may output a good portion of sequences like:
It can also refer to avoiding complex words, and/or inflected/derived terms (see e.g. [[Periphrasis]]), and usefully so.
"that for the" where yes, we understand that appears more often, but it's not ''interesting''.


Dictionaries are often intentionally somewhat circumlocutory, to avoid a lot of entries depending on other entries others you would have to look up.


Maybe we are primarily interested in, say, the kinds of terms this book uses that other books do not.
===metonymy, synecdoche===
Perhaps just the noun phrases. 
Maybe you want to be able to count that with and/or without the adjectives stuck on front of them.


That all suggests that we do some POS analysis of the text we are dealing with and remove everything
'''Metonymy''' and '''synecdoche''' (meaning something like 'change of name') is
not matching a particular pattern.
figurative language using one name/entity for another thing - reference to proximate object, often metaphorical.




That ''is'' going to give you cleaner output.
'''Metonyms''' are cases that relates part and whole (see also [[meronym]]/[[holonym]]),
often specifically with a specific part that is ''representative'' of the whole,
often to catch a more complex concept with a brief term in an associative way.


...just be sure that everything this removes isn't interesting to you.
For example:
Which is hard.
* "{{example|The Crown}}" to refer to the British monarchy. Similarly, Washington sometimes to refer to the United States government
-->
* "{{example|Nixon bombed Hanoi}}" ('Nixon' referring to the armed forces controlled by Nixon)
* "{{example|A dish}}" referring to a part of a meal
* "{{example|The car rear-ended me}}" ('me' referring to the car that the speaker was driving)


<!--
* "{{example|Bread and circuses}}" to superficial appeasement
Possibly all documents of that same corpus, though a lot of real-world systems will also care for these systems to say
* "{{example|The pen is mightier than the sword}}" (the pen mostly referring to publication of ideas, 'the sword' referring to a show of force).
something useful about unseen records.
* "{{example|Lend me your ears}}", meaning listen to me, give me your attention


* "{{example|the law}}" to refer to the police
* "{{example|hired hands}}"
* "{{example|bricks and mortar}}" [https://en.wikipedia.org/wiki/Brick_and_mortar]
* "{{example|bread}}" for food in general


It is also potentially useful to compare different sets of documents,
The reference frequently does not carry any directly shared properties.
to answer questions like "what kind of language do we see a lot more in legal documents than in general use"
The British monarchy is not crown-shaped,
food isn't like the dish it's on,
appeasement does not need to take the form of food and distracting entertainment,
a driver doesn't resemble their car,
publication isn't done with a pen,
Nixon and the bombers only shared as much as being in the same chain of command.


A number are likely to be culturally embedded, and somewhat local.
<!--
(Does this include figures of speech? Just some? Or is there a specific separation?{{verify}})
-->


Contrasted with
* [[metaphor]] in that they intentionally compare across domains


we typically either
* [[analogy]], which works by similarity, often explicit comparison, and is usually used to communicate a shared quality/property. In contrast, metonymy works by contiguity/proximity and is used to suggest associations.
* compare a document to the statistics of a larger corpus it is part of
* find collocations in a large corpus (basically forgetting its structure).


A larger corpus is a good idea in general, giving more stable and relatively neutral reference.


...with some care, anyway. If your comparison is newspaper text, or legal text, or fiction,
there are a number of constructions that won't be used, some that will be used more often,
there will be topics that won't be mentioned, there will be specific styles, etc.


(and note that one of the issues with PMI is that it over-values ''rare'' occurrences)
'''Synechdoche''' is a specific kind of metonymy that deals with the part-and-while relationship


* referring to a whole by a part (perhaps the most common variant)
** Example: 'hands' to refer to workers, 'suits' for businessmen, 'your wheels' to refer to your car, hungry mouths to feed, 'Downing Street' or 'Number 10' for the office of the Prime Minister of the United Kingdom.
:: Sometimes only in specific expressions, e.g. boots refers to soldiers only in a few expressions like 'boots on the ground'


-->
* referring to a part by the whole
** Example: "{{example|The city put up a sign}}", "{{example|The world treated him badly}}", {{example|the police}}, {{example|the company}}


====Choices in math - and avoiding an explosion of data====
* referring to a wider class by example
** Example: {{example|bug}} (for various insects, spiders, and such), {{example|give us our daily bread}} (food), using [[genericised trademarks|brand names]] like {{example|kleenex}}, {{example|xeroxing}}, {{example|googling}}


<!--
* referring to an example by a wider class
Just counting how often words occur, compared to just the counts of other pairs, won't get you very far.
** Example: {{example|milk}} (usually meaning cow's milk, but recently less so),  


* referring to an object made from a material by that material
** Example: {{example|threads}} (clothing), {{example|silver}} (for cutlery),


The math is somewhat against us here.
* referring to contents by its container (also relying on context)
** Example: {{example|keg}} (of beer),


* the set of all possible word combinations is by nature an incredibly large set
:: say, if your language has 50000 possible words, then there are 2.5 billion 2-grams, and 125 trillion 3-grams


* the combinations that we will actually find in text is incredibly sparse
Some examples are more complex, such as
:: if you were e.g. to make a 50000-by-50000 table for 2-grams, most cells would be 0
* "silver" (material used for a relatively uncommon realization of cutlery),


* because words in languages have [[Power_law|Zipfian tendencies]], most of the combinations we record will involve one or more semantically-empty function words from the top of that list (the, of, and, be, to, a, in, that)
* "the press" (a printing device referring to the news media, but also commonly to a team doing everything else)


* if you ignore very common words in a stopword-like way, that would also be (fairly arbitrarily) removing the ability to deal with phrases that involve them
* "The White House said..." could refer to the president, their staff, the press secretry/(ies), or even that day's speaker (but probably not because they only read out what was written by someone else).
: which is a reasonable amount of them. Consider e.g. rock and roll, through the grapevine, etc.
:: also meaning the distinction between metonymy and synechdoche is not always clear.


* we overvalue things involving words that are rare
: you can clean up a lot of results saying "ignore anything that involves very-rare unigrams"
: ...but whether that is useful, or removes what you are looking for, depends a lot on what you are doing this analysis for.


Synechdoche is also a way to commit some fallacies, including [[fallacy of division]], [[hasty generalization]], and more.


Many methods will try to correct for how (un)informative words are,
===...and more===
for example by comparing ''combined'' appearance against ''expected'' appearance.
* [[hendiadys]]


We look to things like [[log likelihood ratio]], Pointwise Mutual Information (PMI), which are similar ideas (also related to entropy, and (naive) Bayes).
* an implied [[analogy]]


* stylistic reasons ([[rhythm]], [[rhyme]])


It varies exactly how that estimation works, and the assumptions you may, how thorough the model is (many do ''not'' go as far as distribution estimation or [[smoothing sparse data|smoothing]] the inherently sparse data), or even preferences you build in.
* [[rhetoric]]


* [[euphemisms]]


One major detail is that various mathematical approaches (e.g. plain 'chance of combination divided by chance of appearing individually') will overvalue the rare - including tokens that are rare for any reason, not unusually because because they are misspelled, while all actual phrases are further down the list.


And that makes some sense, in that we are looking for unusual things,
yet we are typically looking for ''patterns'' that are less usual,
which almost by definition lies somewhere ''between'' completely regular and extremely unusual.


It may be useful to have a 'how unusual' parameter to the model,
because it also lets you tune it between 'specialized jargon' levels and 'general associations' levels.


===More on tropes===
{{stub}}


https://github.com/rtapiaoregui/collocater
===Sayings===
* the varied data file seems to suggest mainly detecting known ones?
{{stub}}


https://radimrehurek.com/gensim/models/phrases.html


https://pitt.libguides.com/textmining/collocation
When we have figures of speech that are non-literal, refers to a self-contained message,
that we recognize as such (typcally because they have become [[lexicalized]] enough to be recognized, and reproduced fairly fairfully),
we tend to call that a '''saying''' (or '''idiom'''), or something more specific.


https://python.plainenglish.io/collocation-discovery-with-pmi-3bde8f351833
Can be comments, references, observations, reactions, and aphorisms and the like.


-->


===MWEs===
We have dozens of variants of that, including:
<!--
* '''Aphorism''' – a saying that contains a general, observational truth; "a pithy expression of wisdom or truth".
'''Multi-Word Expressions''' (MWEs) are any word combinations that are syntactically and/or semantically [[idiosyncratic]].
: '''adages''' or '''proverbs''' refer to those that are widely known, and/or in long-term use


The term was coined (relatively) recently, by computational research that noticed there are many sequence of words we use together out of habit and/or for specific meaning worth noting.
* '''Cliché''' or '''bromide''' – a saying that is overused and probably unoriginal
(cf. collocations, which tends to have more specific focus)
: which are '''platitudes''' when not very applicable, useful, or meaningful




* '''[[Idiom]]''' – a phrase that means more than the sum of its parts
: often mainly (or only) has non-literal interpretation.
: More than compositional, perhaps not at all, and hearing it for the first time may gives no meaning
: (There are other meanings for idiom, related to expression, but they are rarer and usually mentioned by their meaning)




They are clearly something more structured than just words coexisting (and there are some typologies of MWEs),  
* '''Epithet''' – a byname - a saying or word used as a nickname, already having been widely associated with the person, idea, or thing being referred to.
but are not compounds or phrases, and not always strongly structured.
: including those added to a name, e.g. Richard ''the Lion-Heart''
: but more often adjectival characterization, e.g. Star-crossed lovers (when referring to Romeo and Juliet)




MWE suggests a focus on finding idiosyncratic uses, typically not strictly [[compositional]],
* '''Maxim''' - An instructional saying about a principle, or rule for behavior.
often expressing something that cannot be expressed very simply with more typical/literal use of words.
: Which occasionally makes it an [[aphorism]] as well


Consider idioms, [[figures of speech]], institutionalized phrases - and arguably [[phrasal verbs]], [[nominal compounds]] (and named entities), (some) terminology, and others.  
* '''Motto''' – a saying used to concisely state outlook or intentions.


* '''Mantra''' – a repeated saying, e.g. in meditation, religion, mysticism,


The same analyses and researchers also go into the more mundane sort of collocations mentioned above.


* '''Epigram''' – a (written) saying or poem commenting on a particular person, idea, or thing.
: Often clever and/or poetic, otherwise they tend to be witticisms.
: Often making a specific point. Often short. Can be cliche or platitude.


Universal dependencies splits MWEs into three types:
* '''Witticism''' – a saying that is concise and, preferably, also clever and/or amusing.
* fixed - entirely fixed sequences, that work as function words or short adverbials
: Also '''quips''' - which are often more in-the-moment.
::


* flat: exocentric (headless) things that are only somewhat fixed
:: dates
:: names (and any titles, except when appositional modifiers are more appropriate)


Also related:
* [[colloqualism]] is something that originated in verbal speech.
: This can apply to idioms and the like , can be informal names where a formal one also exists,


* compound:
:: noun compounds
:: verb compounds (other languages do this more than english)
:: serial verbs
:: particle verbs like 'put up' (compound:prt)


-->




MWEs are interesting for various computational [[NLP]] applications, including [[translation]], general [[parsing]], [[information retrieval]], [[natural language generation]], (computational) [[lexicography]], and more.
{{stub}}


==Substituted phrases and/or double-meaninged phrases==


One subdivision of MWEs is '''institutionalized phrases''' with compositional syntax and semantics/pragmatics but which still occur together frequently, versus '''lexicalized''', which have idiosyncratic syntax and/or semantics/pragmatics,
{{info|Note that some of this moves ''well'' out of phraseology, into 'meanings and words are complex, okay' territory}}


Lexicalized phrases vary in how flexibly they can be used, inflected, altered.


===Substituted phrases===


'''Institutionalized phrases''' are those that are syntactically and semantically compositional,  
('Substituted phrases' is not a term from linguistics, but seems a useful one to group euphemism and some related concepts)
but ''occur'' with unusually ([[markedly]]) high frequency.




The further names usually refer to specific properties or natures of phrases.


'''Euphemism''' replaces a words/phrases with others, while preserving most meaning.


'''Idioms''' are lexicalized phrases that often come from metaphors, some compositional and intuitive enough to be directly understandable, others historical and long since institutionalized, fossilized or otherwise habitual.
Typically the replacement is a form that is less direct - more figurative, possibly metaphor, or another reason to pick an nearby meaning.


The intent is often to say saying something without saying it ''directly'', for reasons like:
* softening emotional blows (e.g. passed away instead of died),


There is often a distinction made between fixed idioms (like 'by and large' and 'face to face') and those that are modifiable to some degree (reordered, creatively augmented, etc.).
* tact to avoid potential offense (student is not working to their full potential, developing countries)


Commonly used idioms see some cross-language adoption over time, so some may become/be fairly universal and translatable. Others are very hard to translate well because their meaning depends on their background, with literal translation often resulting in hilarity.
* understatement, e.g. 'more than a few' for 'a lot'


Other cases of collocations are things that could be read compositionally, but are not meant to be interpreted entirely literally.
* avoiding rude sounding words (pretty much ''any'' other word used for toilet, including toilet ''itself'' originally, is a fairly polite reference for the place where we poop)
For example, in 'taking a photo' (which is frequent), 'taking' is a support verb of sorts - the phrase does not refer to picking one up or stealing it. Instead it is considered equivalent to making a photo.


* but probably the most fun and thereby the most common is hinting at sexiness. To the point where any unknown phrase, particular in the form &lt;verb>ing the &lt;adjective>d &lt;noun>, potentially makes us giggle.
:: compare with [[innuendo]], [[double entendre]]


* not mentioning implications, often doubletalk, e.g. downsizing the department (firing a bunch of people), collateral damage (we murdered some civilians we didn't mean to), special interrogation (torture).


'''Terminology''' (a.k.a. Jargon. There is a difference, but broadly seen the difference is not well defined) is relatively isolated development, and disambiguates in its own context.
* powerful sounding business bullshit[https://ig.ft.com/sites/guffipedia/chief-manifesto-catalyst/]
(not the best example because these tend to creatively obfuscate meaning in ways that are much less generally known than doubletalk)


Terminology includes new terms, but also a lot of existing words that take on new/specific meanings and sometimes altered syntax, in the context of a particular subject area.






'''[[Phrasal verbs]]''' act like / are idiomatic phrases, often entering around a [[verb]] + [[particle]], for example, to look down on, to drop in, put off, spell out, set off, let down, break down, etc. Varies in transitivity.


A '''dysphemism''' and cacophemism replaces a word/phrase with a more offensive one, say, Manson-Nixon line for [https://en.wikipedia.org/wiki/Mason%E2%80%93Dixon_line Mason–Dixon] line.




'''Nominal Compounds''' are a specific form of noun [[compound]]s. They are a pair of nouns that act as a single referene,
'''Cacoponism''' refers to the more blatant and intentionally offensive variation.
often with the first acting as an adjective to the second, e.g. city park.  


Differs from a noun phrases in general in that they seem compositional, but are often somewhat lexicalized themselves,
and often as something more specific than their compositional meaning.
Consider washing machine, cat food, newspaper.
...though some feel on the edge of "but what else would it be?", e.g. raincoat, bookshelf.






'''Cranberry expressions''' refer to MWEs/phrases that contain '''cranberry words''', which are words that are rarely or not seen elsewhere.
Examples of cranberry words include {{example|hotcakes}}, which is almost only seen in "selling like hotcakes" (and plays on that), nonsense words like {{example|caboodle}}, and various others more.


Cranberry words are rather irregular lexical items.
* http://en.wikipedia.org/wiki/Euphemism
They are [[lexically fixed]] but have no literal meaning.
* http://en.wikipedia.org/wiki/Dysphemism
Note their idiomatic nature and the structural similarity to [[collocations]].


There are related terms, such as ''(phraseologically) bound words'',
===Multiple meanings===




'''Polysemy''' refers to a word referring to multiple distinct possible meanings {{comment|(or, if you're going to get [[semiotic]], any sign/symbol that does)}}.


====See also====
* http://en.wikipedia.org/wiki/Collocation
* [[Computational aspects of phrases and clauses]]


Usually multiple ''related'' meanings, and this can have useful generalizing value.
In fact, in many languages, many words have a ''small'' amount of this.


<!--
Arguably, a lot of emotive words are usefully vague.
-->
-->


====Phraseme====
<!--
<!--
In other cases, a lot of different senses got assigned to the same word somehow,
or to specific


'''Phraseme''' seems to mean something like "multi-morphemic thing were at least ''some'' of it is it is not freely chosen".
In english, run, set, sound all have more than a dozen distinct uses.


Common verbs tend to have this... problem, or feature, depending on how you see it.




We can fuzzily argue for a ''primary'' sense,
often the first one you'ld learn,
and the first one you'ld think of when that word is not in any context.
-->
-->


=Unsorted=


===Figure of speech===
<!--
Arguably, a lot of emotive words are usefully vague.
Say, 'love' has many and varied interpretations, and even if you try try to catch it in definitions you will easily find a dozen distinct-enough things people consistently include into the concet, so love is arguably one of the more polysemous words around.
-->


====Circumlocution====
Aside from ''words'' in dictionaries, though, we also have ''phrasing'' that plays with meanings.
{{stub}}


Circumlocution refers to using unnecessarily large number of words to express an idea in a roundabout way.




It frequently turn up as indirect and/or long-winded descriptions where more succinct ones exist,
A '''double entendre''' is a phrase/sentence that can in be interpreted in different ways - and mostly cases where at least one is dirty.
less-direct figures of speech where clearer ones exist, or such.  


The extra meaning is intentionally there, but the fact they they can be masked by the more direct (and clean) read gives some deniability,
though depending on how you say it, not much.


Circumlocution may be done
The words themselves don't necessarily hint at the second meaning. The understanding may come from context and both parties thinking the same thing - a complicitness.
: to avoid revealing information,
: to be intentionally vague (e.g. with [[equivocation]]),
: in language aquisition, teaching meaning via description
: to work around not knowing a term in another language and getting there via description
: to work around aphasia
: avoid saying specific words ([[euphemism]], [[cledonism]]),
: to construct [[euphemisms]] in the [[innuendo]] sense
: to construct [[equivocations]]
: and (other) varied [[rhetoric]]
: creatively, to set set up similes


If you go looking, you can find a lot of unintentional ones, like anything you can "that's what she said" to.


It can also refer to avoiding complex words, and/or inflected/derived terms (see e.g. [[Periphrasis]]), and usefully so.


Dictionaries are often intentionally somewhat circumlocutory, to avoid a lot of entries depending on other entries others you would have to look up.


==Computational aspects==
A '''single entendre''' isn't really a thing, though is used to point out when people didn't quite manage to make their entendre double, and mainly manage a single vaguely vulgar meaning.
<!--
Idiosyncratic uses can be found with relatively mechanical methods,
so doing so computationally can be fairly efficient.




See also related pages like [[Phraseology]], [[Statistical semantics]].


-->
'''Innuendo''' uses language to allude to additional meaning,
yet with a wording that leaves some plausible deniability
{{comment|(without that deniability it would be clear insinuation)}}.


==Phrase chunking, phrase identification==
Innuenndo can be fun and good natured (and is the n much closer to double entendre, which also only works when both parties understand the suggested meaning)
<!--
but innuendo is more often specifically used to imply (often ''clearly'' imply, but ''only'' imply), and to specifically imply something negative - to disparage, to hint at foul play, plant seeds of doubt about someone, their reputation, or such {{comment|(see e.g. the early stages of many american presidential runs)}}.
Chunking often refers to phrase chunking, which identifies and marks [[phrases]], usually focusing on [[noun phrases]], [[verb phrases]], and sometimes [[prepositional phrases]].  


Phrase identification is in many ways the same problem, but can refer to a more loosely described search for compositional word groups.
Innuendo, like euphemisms, does not have to be sexual, though this perhaps ''is'' as common as it is assumed.


Note that there is an upper limit on this. Consider for example that the use of multiple modifiers on the same noun is only so compositional, humans may disagree on details, and converting to semantic representaiton may depend on real-world knowledge.
Double entendre does not have to be intentional, innuendo (and single entendre) is.


approaches include:
* marking of known phrases
* looking for word/POS patterns (statistics)
* rough estimation by breaking on various [[closed class]] words and punctuation
*
-->




==Compounds==
'''Puns''' use either multiple meanings of a word, or similar-sounding words, for humour or rhetorical effect.
We mostly know them for the really bad ones.
<!--To say that innuendo is an Italian suppository would be a pun.-->


See also:
* [[Polysemy]]
* http://en.wikipedia.org/wiki/Innuendo
* http://en.wikipedia.org/wiki/Double_entendre
* http://en.wikipedia.org/wiki/Euphemism


http://wiki.apertium.org/wiki/Compounds


==Named entities==
{{stub}}


Named entities usually refer to finding/recognizing phrases that are (used as) [[nominal compounds]].


Systems often deal with entities such as persons, organizations, locations, named objects, and such,
[[Category:Linguistics]]
particularly when they can work from known lists.
[[Category:Rhetoric]]
 
[[Category:Semantics]]
The same systems often also extract simple references such as times and dates,
[[Category:Pragmatics]]
quantities such as monetary values and percentages.
[[Category:Reference]]
 
 
Specific tasks in the area may be referred to / known as:
* Entity Extraction
* Entity Identification (EI)
* Named Entity Extraction (NEE)
* Named Entity Recognition (NER)
* Named Entity Classification (NEC)
* ...and others.




===See also===
* http://en.wikipedia.org/wiki/Named_entity


* [[:Category:Named entities]]


* [[Text summarization]]





Latest revision as of 10:57, 4 July 2024

Language units large and small

Marked forms of words - Inflection, Derivation, Declension, Conjugation · Diminutive, Augmentative

Groups and categories and properties of words - Syntactic and lexical categories · Grammatical cases · Correlatives · Expletives · Adjuncts

Words and meaning - Morphology · Lexicology · Semiotics · Onomasiology · Figures of speech, expressions, phraseology, etc. · Word similarity · Ambiguity · Modality ·

Segment function, interaction, reference - Clitics · Apposition· Parataxis, Hypotaxis· Attributive· Binding · Coordinations · Word and concept reference

Sentence structure and style - Agreement · Ellipsis· Hedging

Phonology - Articulation · Formants· Prosody · Sound change · Intonation, stress, focus · Diphones · Intervocalic · Glottal stop · Vowel_diagrams · Elision · Ablaut_and_umlaut · Phonics

Speech processing · Praat notes · Praat plugins and toolkit notes · Praat scripting notes

Analyses, models, software - Minimal pairs · Concordances · Linguistics software · Some_relatively_basic_text_processing · Word embeddings · Semantic similarity

Unsorted - Contextualism · · Text summarization · Accent, Dialect, Language · Pidgin, Creole · Natural language typology · Writing_systems · Typography, orthography · Digraphs, ligatures, dipthongs · More linguistic terms and descriptions · Phonetic scripts


This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Idiomaticity, idioms

In linguistics, idiomaticity, idiomaticness or just idiom, can refer to the sense of "among all the possible realizations, this is the one the language uses / uses most", the least marked way of expressing a thing.

A lot of such idiomaticity is relatively specific to a language.

Also relevant to language learning, because you have probably to think about this harder in everything you are not native in.


For example, consider that

not "I am plumber", which is how various other languages would do it


Sometimes these are single correct ways, sometimes fixed sequences, but of at least equal interest is studying the limited flexibility of these patterns.

https://en.wikipedia.org/wiki/Idiom_(language_structure)


Outside of linguistics, idioms are usually understood as figures of speech and other such figurative language.

You could see this as one of of the most everyday cases of the wider concept of idiomaticity.

https://en.wikipedia.org/wiki/Idiom


Phraseology

Phraseology usually refers to a part of linguistics that studies and describes the context in which a word is used, and a mainly descriptive approach.


Collocations

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


Collocations are statistically idiosyncratic sequences: series of words that occur together more often than just chance would suggest

For example, sequences of words "pretend to", "as a matter of fact", "leaves all parties", "downright amazed", "good news".


Some are about idiomaticity - the sequences words we prefer to express a thing, among other possible sequences of words to do so, but still primarily compositional and readily understood among those alternatives. Say, when you VERB a decision, you probably say you make it. Other verbs like 'take' would be understood, but make would be preferred.


Some of them are expressions that have much more figurative than compositional meaning.

And these come in variants (not just weird symbolic things your grandpa always said), so we might call these sayings, figures of speech, MWEs, or other things. When studying them you care about more specific aspects - e.g. some are fixed, others are not, etc.


Also, people have varied reasons to focus on just some.

  • Collocations are useful in language learning/teaching, which may
    • point at some grammatical idiosyncrasies, such as which prepositions tend to pair with which verbs, and which verbs tend to be how you do specific nouns (see examples below), as this can matter to the correctness of sentences (related to idiomaticity)
    • point out that a lot of collocations are not compositional, so when some adjective-noun combination doesn't seem to make direct sense (e.g. bright idea), you can assume it's some sort of expression you should look up
    • overlap strongly with technical terms and jargon - things that carry strong meaning but are not compositional.
    • ...see e.g. http://www.ozdic.com/ for examples
  • Collocations matter to translation.
collocations make translation harder in that word-for-word translation will be wrong (not being compositional)
since collocation analysis points out a sequence is idiosyncratic, it can makes it easier to detect that, which helps focus on learning what it might correspond to
  • natural language generation would like to know these preferred combinations
  • we might also be able to suggest that certain words are more likely to appear than others, helping spelling correction, OCR correction, etc.
  • collocation may focus more on things that are appear more than you would expect, but it is sometimes also useful to note that unusual, i.e. less likely
  • some uses use reveals cultural attitudes, e.g. which adjectives we use for behaviour of specific groups
  • linguistics may study smaller idiomatic preferences - say, in "VERB a decision", you would probably prefer 'make' over 'take' or most other verbs. Similarly,
adjective-noun, often or preferred adjective used to make a noun stronger or more specific, e.g. maiden voyage, excruciating pain, bright idea, spare time, broad daylight, stiff breeze, strong tea. Alternative adjectives would typically be understood but be considered some amount of unusual
adverb-adjective, e.g. downright amazed, fully aware
verb-adverb, e.g. prepare feverishly, wave frantically
verb-preposition pairs, e.g. agree with, care about
verbs-noun, e.g. we make rather than take a decision, we make rather than tidy a bed
noun combinations: a surge of anger, a bar of soap, a ceasefire agreement
  • it might bring up other patterns useful to natural language parsing - e.g. we agree with someone, we agree on something


Collocation analysis

Collocation's reference probabilities

Filters and assumptions in collocation analysis

Choices in collocation math - dealing with sparsity and with an explosion of data

MWEs

Phraseme

Sayings with figurative meaning

Most languages have established phrases with figurative meaning, or otherwise syntactically and/or semantically idiosyncratic.


They also have to be learned, because unseen they create confusion - not just intentional contradiction.


A phraseme is often understood as an utterance in which at least some parts cannot be freely chosen, related to idiomaticity in the sense of "among all the possible realizations, this is the one the language uses".

As such, terms like

  • set phrase
  • fixed expression
  • idiomatic phrase
  • multiword expression
  • idiom

...suggest phrases that should be reproduced exactly.

And if "you know what they say" is true, that makes sense, in that at least some of those sayings are assumed to be references to previously-decided wisdom (but maybe let's leave the epistomology to the philiosophers).

So while figurative language in general allow some amount of creativity, the just-mentioned concepts generally resist it.


Figures of speech

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

A figure of speech is any use of a word/phrase where the intended meaning deviates from the literal meaning -- anything that is more figurative than purely literal.

This is a fairly open-ended idea, and can lean e.g. on the choice of words, playing with the meaning of words (tropes), changing the structure of sentences (schemes), and more.

It's not much of a stretch to go beyond word choice and include non-literal meaning as used in arguments (a bunch of rhetorical devices), in literature, in cinema, in politics, and in other kinds of storytelling.

Arguably, where e.g. phrasemes includes everything from set phrases to the concept of the creative freedom that lets us communicate with non-literal, figures of speech could be seen as the exercise of that creativity, and not only the most established ones (...arguable. The definitions vary and are fuzzy).


In some sense, figures of speech seem constrained only by creativity and other people's ability to understand the result, and since we do this a lot, there's a large amount of other concepts that fit this description, and it's something we would want to split up to discuss better.

So we have a bunch more words in the area. Some overlapping with each other. And some from semantics and pragmatics that happen to be quite relevant. Such lists reveal that this is for a large part related to our habit of rhetoric.

For example:


Allusion

  • allusion - casual/brief reference (explicit or implicit) to another person, place, event, etc.

Meiosis

  • meiosis - intentional understatement [1]
e.g. 'the pond' to refer to the Atlantic Ocean, 'the troubles' for the northern irish conflict


Litotes

  • litotes - understatement that uses a negation to express a positive, e.g. using "not bad" to mean pretty good.
actual meaning can depend on context
e.g. 'not bad' could have any literal meaning from 'not entirely horrible as such' to 'excellent'.

Oxymoron

  • oxymoron - conjunction of words with intentionally contradictory meaning (see also contradiction in terms, paradox)
e.g. act naturally, old news, minor crisis, oxymoron (roughly means sharp dull)
sometimes less intentional, e.g. original copy
civil war would apply except you can both argue that's a calque (loan translation) just pointing out it's between civilians, and/or that it's equivocating civil in the sense of polite, and in the sense of groups within the same state.


Irony

  • irony - intentional implication of the opposite of the standard meaning (verify)


Metaphor

  • metaphor - implied comparative description that implies some sort of similarity
usually by equating things with no direct relation.
Often used to economically imply certain properties.
Similar but different from simile, which is an explicit comparison

Allegory

  • allegory - sustained metaphor, usually tying in various metaphors related to an initial one[2]


Parable

  • parable - anecdotal extended metaphor intending to make a (often didactic or moral) point [3]

Catachresis

  • catachresis - a mix of more than one metaphor (by design or not) (verify) [4]


Tropes

A trope is a less-literal reference, something often understood as a replacement, e.g. in rhetoric, storytelling.

When approached as "what we do in storytelling", a lot of the others in this list apply to some degree, particularly the ones that play on meaning, twist meaning, lead to contrasted interpretations.


More of a device in literature and rhetoric (than in linguistics directly), tropes are rhetorical figures of speech understood specifically as a replacement with a less-literal meaning.

Many also rely on a play, twist, or approximation of words or meaning, and contasts, so includes things like


Which makes them most associated with rhetoric, storytelling and cinema, where there is specific focus on how concepts are conveyed.

In particular, we often imply concepts them from patterns we recognize, without having to spell them out, and often use layers of contextual meaning.


For example, in writing and speaking, tropes are often employed for the more colorful result that is more interesting to read or listen to, and is often explained as a part of rhetoric.


In particular visual storytelling has its own conventions, as it can both add visual metaphor, and more easily hide details, as well as rely on consistently doing symbolism, no matter whether it makes sense or not. [5][6])}}

Schemes

Hyperbole

  • hyperbole - exaggeration meant to be used as stress
    • auxesis - hyperbole relying on word choice, e.g. 'tome' for a book, 'laceration' for a scratch
    • adynaton - extreme hyperbole, suggesting impossibility [7]


Circumlocution

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Circumlocution refers to using unnecessarily large number of words to express an idea in a roundabout way.


It frequently turns up as indirect and/or long-winded descriptions where more succinct ones exist, less-direct figures of speech where clearer ones exist, or such.


Circumlocution may be done

to avoid revealing information,
to be intentionally vague (e.g. with equivocation),
in language aquisition, teaching meaning via description
to work around not knowing a term in another language and getting there via description
to work around aphasia
avoid saying specific words (euphemism, cledonism),
to construct euphemisms in the innuendo sense
to construct equivocations
and (other) varied rhetoric
creatively, to set set up similes


It can also refer to avoiding complex words, and/or inflected/derived terms (see e.g. Periphrasis), and usefully so.

Dictionaries are often intentionally somewhat circumlocutory, to avoid a lot of entries depending on other entries others you would have to look up.

metonymy, synecdoche

Metonymy and synecdoche (meaning something like 'change of name') is figurative language using one name/entity for another thing - reference to proximate object, often metaphorical.


Metonyms are cases that relates part and whole (see also meronym/holonym), often specifically with a specific part that is representative of the whole, often to catch a more complex concept with a brief term in an associative way.

For example:

  • "The Crown" to refer to the British monarchy. Similarly, Washington sometimes to refer to the United States government
  • "Nixon bombed Hanoi" ('Nixon' referring to the armed forces controlled by Nixon)
  • "A dish" referring to a part of a meal
  • "The car rear-ended me" ('me' referring to the car that the speaker was driving)
  • "Bread and circuses" to superficial appeasement
  • "The pen is mightier than the sword" (the pen mostly referring to publication of ideas, 'the sword' referring to a show of force).
  • "Lend me your ears", meaning listen to me, give me your attention
  • "the law" to refer to the police
  • "hired hands"
  • "bricks and mortar" [8]
  • "bread" for food in general

The reference frequently does not carry any directly shared properties. The British monarchy is not crown-shaped, food isn't like the dish it's on, appeasement does not need to take the form of food and distracting entertainment, a driver doesn't resemble their car, publication isn't done with a pen, Nixon and the bombers only shared as much as being in the same chain of command.

A number are likely to be culturally embedded, and somewhat local.

Contrasted with

  • metaphor in that they intentionally compare across domains
  • analogy, which works by similarity, often explicit comparison, and is usually used to communicate a shared quality/property. In contrast, metonymy works by contiguity/proximity and is used to suggest associations.


Synechdoche is a specific kind of metonymy that deals with the part-and-while relationship

  • referring to a whole by a part (perhaps the most common variant)
    • Example: 'hands' to refer to workers, 'suits' for businessmen, 'your wheels' to refer to your car, hungry mouths to feed, 'Downing Street' or 'Number 10' for the office of the Prime Minister of the United Kingdom.
Sometimes only in specific expressions, e.g. boots refers to soldiers only in a few expressions like 'boots on the ground'
  • referring to a part by the whole
    • Example: "The city put up a sign", "The world treated him badly", the police, the company
  • referring to a wider class by example
    • Example: bug (for various insects, spiders, and such), give us our daily bread (food), using brand names like kleenex, xeroxing, googling
  • referring to an example by a wider class
    • Example: milk (usually meaning cow's milk, but recently less so),
  • referring to an object made from a material by that material
    • Example: threads (clothing), silver (for cutlery),
  • referring to contents by its container (also relying on context)
    • Example: keg (of beer),


Some examples are more complex, such as

  • "silver" (material used for a relatively uncommon realization of cutlery),
  • "the press" (a printing device referring to the news media, but also commonly to a team doing everything else)
  • "The White House said..." could refer to the president, their staff, the press secretry/(ies), or even that day's speaker (but probably not because they only read out what was written by someone else).
also meaning the distinction between metonymy and synechdoche is not always clear.


Synechdoche is also a way to commit some fallacies, including fallacy of division, hasty generalization, and more.

...and more



More on tropes

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Sayings

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


When we have figures of speech that are non-literal, refers to a self-contained message, that we recognize as such (typcally because they have become lexicalized enough to be recognized, and reproduced fairly fairfully), we tend to call that a saying (or idiom), or something more specific.

Can be comments, references, observations, reactions, and aphorisms and the like.


We have dozens of variants of that, including:

  • Aphorism – a saying that contains a general, observational truth; "a pithy expression of wisdom or truth".
adages or proverbs refer to those that are widely known, and/or in long-term use
  • Cliché or bromide – a saying that is overused and probably unoriginal
which are platitudes when not very applicable, useful, or meaningful


  • Idiom – a phrase that means more than the sum of its parts
often mainly (or only) has non-literal interpretation.
More than compositional, perhaps not at all, and hearing it for the first time may gives no meaning
(There are other meanings for idiom, related to expression, but they are rarer and usually mentioned by their meaning)


  • Epithet – a byname - a saying or word used as a nickname, already having been widely associated with the person, idea, or thing being referred to.
including those added to a name, e.g. Richard the Lion-Heart
but more often adjectival characterization, e.g. Star-crossed lovers (when referring to Romeo and Juliet)


  • Maxim - An instructional saying about a principle, or rule for behavior.
Which occasionally makes it an aphorism as well
  • Motto – a saying used to concisely state outlook or intentions.
  • Mantra – a repeated saying, e.g. in meditation, religion, mysticism,


  • Epigram – a (written) saying or poem commenting on a particular person, idea, or thing.
Often clever and/or poetic, otherwise they tend to be witticisms.
Often making a specific point. Often short. Can be cliche or platitude.
  • Witticism – a saying that is concise and, preferably, also clever and/or amusing.
Also quips - which are often more in-the-moment.


Also related:

This can apply to idioms and the like , can be informal names where a formal one also exists,


-->


This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Substituted phrases and/or double-meaninged phrases

🛈 Note that some of this moves well out of phraseology, into 'meanings and words are complex, okay' territory


Substituted phrases

('Substituted phrases' is not a term from linguistics, but seems a useful one to group euphemism and some related concepts)


Euphemism replaces a words/phrases with others, while preserving most meaning.

Typically the replacement is a form that is less direct - more figurative, possibly metaphor, or another reason to pick an nearby meaning.

The intent is often to say saying something without saying it directly, for reasons like:

  • softening emotional blows (e.g. passed away instead of died),
  • tact to avoid potential offense (student is not working to their full potential, developing countries)
  • understatement, e.g. 'more than a few' for 'a lot'
  • avoiding rude sounding words (pretty much any other word used for toilet, including toilet itself originally, is a fairly polite reference for the place where we poop)
  • but probably the most fun and thereby the most common is hinting at sexiness. To the point where any unknown phrase, particular in the form <verb>ing the <adjective>d <noun>, potentially makes us giggle.
compare with innuendo, double entendre
  • not mentioning implications, often doubletalk, e.g. downsizing the department (firing a bunch of people), collateral damage (we murdered some civilians we didn't mean to), special interrogation (torture).
  • powerful sounding business bullshit[9]

(not the best example because these tend to creatively obfuscate meaning in ways that are much less generally known than doubletalk)



A dysphemism and cacophemism replaces a word/phrase with a more offensive one, say, Manson-Nixon line for Mason–Dixon line.


Cacoponism refers to the more blatant and intentionally offensive variation.



Multiple meanings

Polysemy refers to a word referring to multiple distinct possible meanings (or, if you're going to get semiotic, any sign/symbol that does).


Usually multiple related meanings, and this can have useful generalizing value. In fact, in many languages, many words have a small amount of this.



Aside from words in dictionaries, though, we also have phrasing that plays with meanings.


A double entendre is a phrase/sentence that can in be interpreted in different ways - and mostly cases where at least one is dirty.

The extra meaning is intentionally there, but the fact they they can be masked by the more direct (and clean) read gives some deniability, though depending on how you say it, not much.

The words themselves don't necessarily hint at the second meaning. The understanding may come from context and both parties thinking the same thing - a complicitness.

If you go looking, you can find a lot of unintentional ones, like anything you can "that's what she said" to.


A single entendre isn't really a thing, though is used to point out when people didn't quite manage to make their entendre double, and mainly manage a single vaguely vulgar meaning.


Innuendo uses language to allude to additional meaning, yet with a wording that leaves some plausible deniability (without that deniability it would be clear insinuation).

Innuenndo can be fun and good natured (and is the n much closer to double entendre, which also only works when both parties understand the suggested meaning) but innuendo is more often specifically used to imply (often clearly imply, but only imply), and to specifically imply something negative - to disparage, to hint at foul play, plant seeds of doubt about someone, their reputation, or such (see e.g. the early stages of many american presidential runs).

Innuendo, like euphemisms, does not have to be sexual, though this perhaps is as common as it is assumed.

Double entendre does not have to be intentional, innuendo (and single entendre) is.


Puns use either multiple meanings of a word, or similar-sounding words, for humour or rhetorical effect. We mostly know them for the really bad ones.

See also: