FoLiA notes: Difference between revisions
mNo edit summary |
mNo edit summary |
||
Line 1: | Line 1: | ||
{{stub}} | {{stub}} | ||
FoLiA (Format for Linguistic Annotation) is | FoLiA (Format for Linguistic Annotation) is a format to annotate text resources, | ||
in theory for rich and interoperable linguistic annotation, for things like transciption, corpora (glossaries, dictionaries, thesauri and wordnets, etc), and processing. | |||
While serialized into somewhat complex XML, | |||
libraries should make it reasonable to read and alter. | |||
It was presented as a better alternative to ad hoc storage, | It was presented as a better alternative to ad hoc storage, | ||
which you tend to | which you tend to spend time figuring out for each dataset, | ||
It is unopinionated in the sense that | |||
: it does not restrict to a particular label set or theory | |||
: it allows marking up of different things | |||
: all vocabulary sets need to be explicitly referenced (a SKOS / RDF thing, but don't let that scare you off). | |||
It | |||
It deals separately with things like | |||
* inline annotations of individual elements | |||
* (inline) annotations of spans of elements | |||
* subtoken, for morphology and phonology | |||
* document structure | |||
* higher-level things like arbitrary selections, arbitrary relations, | |||
See also: | |||
* https://proycon.github.io/folia/ | |||
* https://folia.readthedocs.io/en/latest/introduction.html | |||
* https://www.researchgate.net/publication/261215684_FoLiA_A_practical_XML_format_for_linguistic_annotation_-_A_descriptive_and_comparative_study | |||
Line 16: | Line 39: | ||
[https://folia.readthedocs.io/en/latest/fql.html] | [https://folia.readthedocs.io/en/latest/fql.html] | ||
There are web annotation tools like FLAT, that build on a document server | There are web annotation tools like FLAT, | ||
that build on a document server | |||
Line 47: | Line 65: | ||
'''What does it look like? | '''What does it look like?''' | ||
https://github.com/proycon/folia/tree/master/examples | https://github.com/proycon/folia/tree/master/examples | ||
Line 56: | Line 74: | ||
Universities, mainly. | Universities, mainly. | ||
[[Category:Computational linguistics]] | [[Category:Computational linguistics]] |
Latest revision as of 00:46, 10 August 2023
FoLiA (Format for Linguistic Annotation) is a format to annotate text resources,
in theory for rich and interoperable linguistic annotation, for things like transciption, corpora (glossaries, dictionaries, thesauri and wordnets, etc), and processing.
While serialized into somewhat complex XML, libraries should make it reasonable to read and alter.
It was presented as a better alternative to ad hoc storage,
which you tend to spend time figuring out for each dataset,
It is unopinionated in the sense that
- it does not restrict to a particular label set or theory
- it allows marking up of different things
- all vocabulary sets need to be explicitly referenced (a SKOS / RDF thing, but don't let that scare you off).
It deals separately with things like
- inline annotations of individual elements
- (inline) annotations of spans of elements
- subtoken, for morphology and phonology
- document structure
- higher-level things like arbitrary selections, arbitrary relations,
See also:
There is also a FoLiA Query Language that lets you select and also edit documents. [1]
There are web annotation tools like FLAT, that build on a document server
What does it annotate?
things like:
- relatively mechanical structure
- on the macro level (e.g. paragraphs, head, divisions, lists, figures), the ability to define terms and create glossaries and such
- smaller level like (e.g. whitespace, tokens, morphemes),
- more semantic things like quotes, events, the difference between utterances and sentences
- additional annotation types, e.g. phonetic, sentiment, language; POS, lemma, sense, language, reference,
- larger annotation, like spans and span relations
- corrections
...although it may not be advisable to use it for everything it can do at once.
https://folia.readthedocs.io/en/latest/introduction.html#annotation-types
What does it look like?
https://github.com/proycon/folia/tree/master/examples
Who or what uses it?
Universities, mainly.