Labeling in linguistics: Difference between revisions
mNo edit summary |
|||
(One intermediate revision by the same user not shown) | |||
Line 1: | Line 1: | ||
{{#addbodyclass:tag_ling}} | |||
{{stub}} | {{stub}} | ||
Line 173: | Line 174: | ||
https://en.wikipedia.org/wiki/Universal_Dependencies | https://en.wikipedia.org/wiki/Universal_Dependencies | ||
==Universal Parts of Speech== | |||
udep's idea http://universaldependencies.org/docs/u/pos/index.html | |||
ADJ: adjective | |||
ADP: adposition | |||
ADV: adverb | |||
AUX: auxiliary verb | |||
CONJ: coordinating conjunction | |||
DET: determiner | |||
INTJ: interjection | |||
NOUN: noun | |||
NUM: numeral | |||
PART: particle | |||
PRON: pronoun | |||
PROPN: proper noun | |||
PUNCT: punctuation | |||
SCONJ: subordinating conjunction | |||
SYM: symbol | |||
VERB: verb | |||
X: other |
Latest revision as of 23:07, 20 April 2024
Lexical categories / POS
Refers to the sets of lexical categories that POS taggers annotate data with. For example, the Penn tagset has been used to create the Penn treebank.
Tagsets are usually primarily based on parts of speech, sometimes with common linguistic features added.
The basic few POS tags are agreed on, but there is always discussion possible about the detail, special cases, the degree of normalisation (have specific tags vs. have tags essentially with features). This also affects the ways it can or cannot tag languages other than English.
UPOS
Penn
TEI
C5, a.k.a BNC basic
61 tags See e.g. [4]
(C6)
The same as C7 except for handling of punctuation.
C7, a.k.a BNC enriched
146 tags See e.g. [5]
CLAWS
Various versions, current is C8.
The latest program seems to be referred to as CLAWS4. See [6]
CLAWS1 tagset - 132 tags - [7]
CLAWS2 tagset - 166 tags - [8]
C5 - [9]
C6 - [10]
C7 - [11]
C8 - [12]
https://ucrel.lancs.ac.uk/claws/
Corpus-specific
...usually meaning 'not generally used'. See also corpora.
Parole
Brown
London-Lund
LOB, SEC
POW
German
STSS
54 tags
Dependencies
Stanford dependency representation
https://downloads.cs.stanford.edu/nlp/software/dependencies_manual.pdf
Universal Dependencies
Broad overview: https://universaldependencies.org/u/dep/
https://nlp.stanford.edu/pubs/USD_LREC14_paper_camera_ready.pdf
https://universaldependencies.org/introduction.html
https://en.wikipedia.org/wiki/Universal_Dependencies
Universal Parts of Speech
udep's idea http://universaldependencies.org/docs/u/pos/index.html
ADJ: adjective ADP: adposition ADV: adverb AUX: auxiliary verb CONJ: coordinating conjunction DET: determiner INTJ: interjection NOUN: noun NUM: numeral PART: particle PRON: pronoun PROPN: proper noun PUNCT: punctuation SCONJ: subordinating conjunction SYM: symbol VERB: verb X: other