Data annotation notes: Difference between revisions

From Helpful
Jump to navigation Jump to search
Line 163: Line 163:
https://github.com/explosion/spaCy/tree/master/extra/example_data/ner_example_data
https://github.com/explosion/spaCy/tree/master/extra/example_data/ner_example_data


-->
===IOB===
<!--
IOB (Inside, Outside, Beginning), a.k.a. BIO, is a format
used around to mark sequences larger than a single token,
such as named entity recongition.
This seems to come from the habit tagging coming out of a [[chunker]],
annotating in a separate data stream.
IOB can be the concept of marking a token as
- the beginning of a sequence
- inside a sequence
- outside a sequence (terminating an ongoing sequence, or later not inside one)
, also commonly referred to as the BIO format
IOB/IOB2/BILUO
-->
-->

Revision as of 20:56, 23 February 2024

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


Tools

Online, open source

label studio


doccano


ML-Annotate


brat


annotator.js


Annotation Lab (a.k.a. NLP Lab)


(mostly online or self-hosted)


datagym


LightTag


Label Your Data


prodigy


LabelBox


CVAT

GUI, open source

LabelImg

MAE (Multi-document Annotation Environment)


YEDDA


ELAN


Praat


Phon

Unsorted

ipyannotations

  • text (images overall)
  • python notebook


poplar


VGG Oxford University

  • varied


Annotation data formats

IOB