FoLiA notes

From Helpful
Jump to navigation Jump to search
This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)


FoLiA (Format for Linguistic Annotation) is a format to annotate text resources, in theory for rich and interoperable linguistic annotation, for things like transciption, corpora (glossaries, dictionaries, thesauri and wordnets, etc), and processing.

While serialized into somewhat complex XML, libraries should make it reasonable to read and alter.


It was presented as a better alternative to ad hoc storage, which you tend to spend time figuring out for each dataset,

It is unopinionated in the sense that

it does not restrict to a particular label set or theory
it allows marking up of different things
all vocabulary sets need to be explicitly referenced (a SKOS / RDF thing, but don't let that scare you off).


It deals separately with things like

  • inline annotations of individual elements
  • (inline) annotations of spans of elements
  • subtoken, for morphology and phonology
  • document structure
  • higher-level things like arbitrary selections, arbitrary relations,


See also:


There is also a FoLiA Query Language that lets you select and also edit documents. [1]

There are web annotation tools like FLAT, that build on a document server



What does it annotate?

things like:

relatively mechanical structure
on the macro level (e.g. paragraphs, head, divisions, lists, figures), the ability to define terms and create glossaries and such
smaller level like (e.g. whitespace, tokens, morphemes),
more semantic things like quotes, events, the difference between utterances and sentences
additional annotation types, e.g. phonetic, sentiment, language; POS, lemma, sense, language, reference,
larger annotation, like spans and span relations
corrections

...although it may not be advisable to use it for everything it can do at once.


https://folia.readthedocs.io/en/latest/introduction.html#annotation-types


What does it look like?

https://github.com/proycon/folia/tree/master/examples


Who or what uses it?

Universities, mainly.