FoLiA notes: Difference between revisions

Latest revision as of 00:46, 10 August 2023

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

FoLiA (Format for Linguistic Annotation) is a format to annotate text resources, in theory for rich and interoperable linguistic annotation, for things like transciption, corpora (glossaries, dictionaries, thesauri and wordnets, etc), and processing.

While serialized into somewhat complex XML, libraries should make it reasonable to read and alter.

It was presented as a better alternative to ad hoc storage, which you tend to spend time figuring out for each dataset,

It is unopinionated in the sense that

it does not restrict to a particular label set or theory

it allows marking up of different things

all vocabulary sets need to be explicitly referenced (a SKOS / RDF thing, but don't let that scare you off).

It deals separately with things like

inline annotations of individual elements
(inline) annotations of spans of elements
subtoken, for morphology and phonology
document structure
higher-level things like arbitrary selections, arbitrary relations,

See also:

https://proycon.github.io/folia/

https://folia.readthedocs.io/en/latest/introduction.html

https://www.researchgate.net/publication/261215684_FoLiA_A_practical_XML_format_for_linguistic_annotation_-_A_descriptive_and_comparative_study

There is also a FoLiA Query Language that lets you select and also edit documents. [1]

There are web annotation tools like FLAT, that build on a document server

What does it annotate?

things like:

relatively mechanical structure

on the macro level (e.g. paragraphs, head, divisions, lists, figures), the ability to define terms and create glossaries and such

smaller level like (e.g. whitespace, tokens, morphemes),

more semantic things like quotes, events, the difference between utterances and sentences

additional annotation types, e.g. phonetic, sentiment, language; POS, lemma, sense, language, reference,

larger annotation, like spans and span relations

corrections

...although it may not be advisable to use it for everything it can do at once.

https://folia.readthedocs.io/en/latest/introduction.html#annotation-types

What does it look like?

https://github.com/proycon/folia/tree/master/examples

Who or what uses it?

Universities, mainly.

@@ Line 1: / Line 1: @@
 {{stub}}
-<!--
-FoLiA (Format for Linguistic Annotation) is intended for rich and interoperable linguistic annotation, for things like transciption, corpora, and processing.
+FoLiA (Format for Linguistic Annotation) is a format to annotate text resources,
+in theory for rich and interoperable linguistic annotation, for things like transciption, corpora (glossaries, dictionaries, thesauri and wordnets, etc), and processing.
+While serialized into somewhat complex XML,
+libraries should make it reasonable to read and alter.
 It was presented as a better alternative to ad hoc storage,
-which you tend to have to figure out per dataset,
+which you tend to spend time figuring out for each dataset,
+It is unopinionated in the sense that
+: it does not restrict to a particular label set or theory
+: it allows marking up of different things
+: all vocabulary sets need to be explicitly referenced (a SKOS / RDF thing, but don't let that scare you off).
-It is unopinionated in that all vocabulary sets need to be explicitly referenced (a SKOS / RDF thing, but don't let that scare you off).
+It deals separately with things like
+* inline annotations of individual elements
+* (inline) annotations of spans of elements
+* subtoken, for morphology and phonology
+* document structure
+* higher-level things like arbitrary selections, arbitrary relations,
+See also:
+* https://proycon.github.io/folia/
+* https://folia.readthedocs.io/en/latest/introduction.html
+* https://www.researchgate.net/publication/261215684_FoLiA_A_practical_XML_format_for_linguistic_annotation_-_A_descriptive_and_comparative_study
@@ Line 16: / Line 39: @@
 [https://folia.readthedocs.io/en/latest/fql.html]
-There are web annotation tools like FLAT, that build on a document server
+There are web annotation tools like FLAT,
+that build on a document server
-While serialized into somewhat complex XML,
-libraries should make it more reasonable to read and alter.
-https://foliapy.readthedocs.io/en/latest/folia.html
@@ Line 47: / Line 65: @@
-'''What does it look like?
+'''What does it look like?'''
 https://github.com/proycon/folia/tree/master/examples
@@ Line 56: / Line 74: @@
 Universities, mainly.
-There's
-https://proycon.github.io/folia/
-https://folia.readthedocs.io/en/latest/
-https://www.researchgate.net/publication/261215684_FoLiA_A_practical_XML_format_for_linguistic_annotation_-_A_descriptive_and_comparative_study
-FoLiA (Format for Linguistic Annotation) is an XML-based format to annotate text resources.
-It tries to be a singular answer in that
-* it does not restrict to a particular label set or theory
-* it allows marking up of different things
-glossaries, dictionaries, thesauri and wordnets
-It deals separately with things like
-* inline annotations of individual elements
-* (inline) annotations of spans of elements
-* subtoken, for morphology and phonology
-* document structure
-* higher-level things like arbitrary selections, arbitrary relations,
-See also:
-* https://proycon.github.io/folia/
-* https://folia.readthedocs.io/en/latest/introduction.html
--->
 [[Category:Computational linguistics]]

FoLiA notes: Difference between revisions

Latest revision as of 00:46, 10 August 2023

Navigation menu