Natural language typology

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, or tell me)


A simple classification is whether a language is natural or not:

  • Natural languages are those that evolved and changed naturally by being used by large groups
  • Constructed languages (conlangs), artificial languages, and also planned languages are (near-)synonyms for languages that are usually created by small groups or committees, most for experimentation, toy languages, or sometimes as workable and easy-to-use imitations of natural languages. Languages like Interlingua, Esparanto, and Ido were designed to be adopted by large groups of people.
  • Auxiliary languages are those that are meant as exchange languages secondary to its speakers. Includes constructed languages (e.g. Interlingua, Esparanto, Ido), but languages such as English and French are also commonly used as auxiliary languages.

  • Formal languages, in machine theory generative linguistics, are mathematically described and tractible (though possibly divergent and infinite) languages - a set of words, an alphabet, and a grammar. These are mainly useful for:
    • logical analysis, constructing languages such as programming languages, with the explicit requirement that it be non-ambiguous (language and compiler design) and
    • computational linguistics: Analysing and modelling natural language to understand its complexity, and to parse and generate it mechanically. (grammar design, statistical methods, and more)

Information encoding: morphology, modification, and more

Phonological details

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, or tell me)

To set up two contrasts:

many languages use tone/pitch to convey paralinguistic information emotion, emphasis, make it clearer that something is a question,
many use loudness to convey stress/accent.

A tonal language is one that uses tone/pitch to distinguish (or inflect) between words/lexemes - that match in what in .

Chinese is probably the best known of the tonal languages (most/all dialects), but there are various others in Asia, Africa, south America and the Pacific

A pitch-accent language is one where stress/accent comes from pitch, not loudness.

This seems to include e.g. Serbo-Croatian, Slovene, Turkish, Japanese, Norwegian, Swedish, and various local dialects

There doesn't seem to be an established name for our perceived default of not doing these things.(verify)

Note that various non-tonal languages actually do have a handful of cases cases where pitch or stress disambiguates between existing homophones.

When ignoring cognitive frameworks, you could argue that most information in language is encoded in things like (morphological) word marking, word order (syntactic), reasonable word and concept reference (which often relates to marking and word order), and such.

This is indeed what most typology we have come up with so far focuses on - syntactic patterns, grammatical patterns, marking.

(The degree to we include semantics and pragmatics seems to vary per researcher's preference and optimism - reasonable, as it is neither ignorable, or quite as central.)

How any language does this means you can add some entries to your language typology.

Lots, in fact, with potentially meaningful distinctions from phonemes to morphemes to affixes to syllables to compounds to phrases to clauses to sentences.

...though we tend to keep it ones that seem to help us more broadly.

So typology can relate to any of these, and relate to any description where it differs from other languages.

...and where it doesn't necessarily matter how fundamental or settled the ideas are, how much overlap there is, and how much disagreement there is between said models.

You can e.g. argue for nominative-accusative languages, which mark their direct objects of transitive verbs somehow (...which distinguises them from subjects).
e.g. with articles.
And e.g. German then also marks these articles according to the noun's grammatical gender.
And for Ergative-absolutive languages, which marks the subjects of transitive verbs (...distinguishing it from objects, and from subjects of intransitive verbs).
e.g. Basque morphologically marks whether the subject is a definite form

Predicate constituent order

The order of constituents in predicates is another qualification of sentence structure. This is also called word order (...typology), but the term constituent is more correct, as the units may not only be words but usually also (often short) phrases.

The abbreviations SVO, SOV, VSO, VOS, OVS, and OSV refer to the order of the order of the subject and object around transitive verbs, primarily in main clauses.

A decent amount of languages use SVO and/or SOV, although all six possible variations appear in some language or other.

There are further categories you could base on the above, with terms such as 'OV languages' and 'Subject-initial'. some of these are simply group the types described by the three-letter acronyms above.

Related things that are studied:

  • the order of intransitive verbs (VS or SV)
  • the object order in bitransitive verbs (This is usually a much more specific discussion)
  • possible differences in restrictions in main versus those in dependent clauses
  • The difference between word order restrictions in the main and subordinate phrases is also studied. For example, V2 is an abbreviation of 'verb-second,' the quality that the main phrase always has the verb as the second constituent. Most languages use another type (or types) in dependent clauses and/or as a marked forms, and perhaps an extra form or two for particular styles of prose.

Note also that many languages allow more than one. Exactly how and where differs per language.

When a language uses a marked secondary form, they regularly have specific functions. For example:

  • Spanish is typically SVO, but allows VSO to put emphasis on the verb and also OVS to put emphasis on the object(verify) [1].
  • Modern English is SVO but allows OSV in subordinate clauses, particularly as a poetic form.

The language may simply be flexible, often categorizable per clause type:

  • German is SVO in the main clause, and often SOV in depedent ones, and Dutch is fairly similar
  • Russian is primarily SVO. It apparently also allows OVS(verify), and in intransive use it seems to do both SV and VS.
  • Japanese is primarily SOV, though OVS is also occasionally observed.
  • Turkish is primarily SOV, but is somewhat flexible to other forms.

Other order details

More specific


Grammatical features

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, or tell me)

Grammatical features refer to basically any part of a sentence that expresses information in the sentence.

The more common ones add information like relations in time, number, (grammatical) gender, place (locatives), and such. In many languages these are also marked and play in agreement (and things like referent resolution).

Such cases are a fundamental part of the grammar of a language (and we could describe a typology of languages based on them).

Note that you can easily think of dozens of grammatical features that are less central, much more abstract, but for which the details are still apparently internalized fluent-enough speakers (arguably part of what makes you fluent).

Or that even native speakers don't entirely agree on, like how some countability (math, source code)

Common features (more frequently modified for / used in agreement)

Grammatical number (noun)

Many languages mark number, distinguishing singular and plural. One fish, Two fishes.

Often part of agreement in nouns, pronouns, adjectives and verbs.

Grammatical gender (noun)

In various languages, nouns are split into masculine/feminine or masculine/feminine/neuter. When nouns are not marked, respective articles and pronouns often signal gender instead.

Note that, for any of these, there may be additional details to a language's grammar. For example, various languages have more gender distinction in plural (definite) articles than in the singular form articles.

"Noun class" (noun)
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, or tell me)

Noun classes are any categorical groupings that are useful for some reason or other.

Because a lot of languages do this primarily based on grammatical gender, it's sometimes just considered that.

In other case, things like animate/inanimate, countable/uncountable make this a more general concept, and calling it something more general can be clearer.

Grammatical case (noun)
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, or tell me)

Grammatical cases (often just 'case'), is a grammatical category for words determined by its function in a sentence.

'Case' tends to still be a group, of grammatical features that (mostly) apply to nouns

  • well, mostly nouns (but possibly also pronouns and parts of noun phrases), because the focus is often on being more specific about subjects and objects
Some languages go beyond that, e.g. to also mark adpositions, numerals, adjectives, superlatives or comparatives.
  • You could argue for dozens of cases - but people tend to only call it a case when it is marked, and/or affects sentence ordering, as then then the grammatical correctness (and/or meaning) of a sentence depends on understanding them.
For example, Finnish marks for more cases than many other languages.
  • Sometimes case is about indicating relations by marking instead of other means.
Consider how most languages use sentence order (instead of marking) to indicate who the subject and object is (see predicate-constituent ordering).
Historically, 'case' was used only for languages that explicitly indicate function via inflection (like Latin does for a lot of things[2], and some modern languages do for just one or two things). This is now loosened somewhat, e.g. also used to mark up function analysed other ways.(verify)

Some more common cases include:

  • nominative case - subject of a verb (In English this is often referred to as subjective case)
  • accusative case - direct object
  • vocative - indicates an addressee
  • genitive/posessive case - implies ownership (also possessive case)
  • dative - person/thing something is given to

Example: In "John, I gave the bean to Alice's sister,"

  • "John" is vocative
  • "I" is nominative/subjective,
  • "the bean" is accusative,
  • "Alice's" is genitive/possessive
  • "Alice's sister" is dative,

((verify) all of this example)

How many cases a language has is a somewhat fuzzy concept.

In the semantics in any way at all? Easily dozens.

Most argue you should not count anything that does not do much at a sentence level (marking, order, agreement, other grammatical effects). In which case it's just a few.

For example

  • Latin
has six (main) cases: nominative, accusative, dative, genitive, vocative, and ablative
Locative is arguable.
  • English
modern English marks only the genitive (Middle and Old English had more)
and relies on things like positioning and disambiguation for most other things
  • finnish
is mildly crazy, e.g. marking various locative-style things, including:
Adessive case ("on" / "at")
Inessive case ("in")
Elative case ("out of")
Illative case ("into")
Allative case ("onto")
Ablative case ("from" / "off")

Language coarses and linguists may not typically dive deep into cases, for various reasons.


  • vocative is often not mentioned in English courses.
There is little to learn about it that people won't pick up intuitively, and little to gain from giving it a big name.

  • objective case refers to thing that can be objects, which is why it is sometimes used as a synonym for accusative case, but can also include dative, and apparently also ablative
using overlapping concepts like that is not going to clarify much to language learners or linguists.
  • some are hard to describe, very specific, archaic, or otherwise do not name a clear grammatical effect.
For example, ablative and locative in many languages arguably play more at/from the semantic level.
  • ablative case is not marked for in a lot of languages
Latin uses it someone regularly, so is important to know about its analysis
in many other languages it includes any semantic meaning, relating anything "from", "with", "in", "by" (in a sense often compared to prepositions) but often difficult to give a concise summary of.

Grammatical tense (verb)
Grammatical person (verb)

(Deictic) Reference to participants.

Consider for example I am versus we are.

Person often interacts with plurality and gender (consider I, we, she, he, it). Languages pronoun systems vary, e.g. in the number of persons it has, and how many pronouns it uses, whether/how it explicitly marks for these properties. (This also varies over time. In Middle and Old English, there used to be more varied and more explicitly marked forms)

Pragmatically seen, the function of grammatical person seems to be to be able to refer to more people less ambiguously.

Definiteness (noun)


Grammatical aspect
Grammatical mood

(a.k.a. mode)

For modality (relation to reality or truth) signaled with grammatical affixes.

...where modality expresses how a speaker considers what they are saying - a statement of fact, desire, command

Modifiers, quantifiers

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, or tell me)

In a narrower meaning, modification is any morphological process which changes root or stem.


Modification is also used to refer to semantic alteration - often a (more accurate) description of another element (often phrase-level(verify)).

Modifiers may be optional in that their removal leaves behind a gramatically correct sentence (a grammatical modifier seems to refer to this(verify)), or one which can be made correct with very little polishing.

Modifiers are often interjected words, phrases, or clauses. It is probably a good idea to think of modification as structure that uses words for content, rather than just words.

Note that sometimes, the difference between a clause being restrictive and non-restrictive/descriptive is very subtle. For example, compare:

  • The officer helped the civilians who had been shot
    • restrictive.
    • In this case meaning that the officers helped specific people
  • The officer helped the civilians, who had been shot
    • non-restrictive
    • In this case implying that all the civilians had been shot

Restrictive modification

Also known as: defining modification, identifying modification, essential modification, and necessary modification.

Can be marked

  • syntactically
    • commas

Descriptive modification

Also known as: non-restrictive modification, non-defining modification, non-identifying modification, descriptive modification, or unnecessary modification.

Non-restrictive modification is regularly used to avoid repetition (like pronouns)

(because non-restrictive is typically used in avoiding ) The first makes it clear that

In this example, there is no comma before "who." Therefore, what follows is a restrictive clause. (not all of the civilians had been shot).

Non-restrictive example:

Add description, but are not pragmatically necessary for reference.

Descriptive modifier