Lingusitics software

From Helpful
Jump to: navigation, search
This hasn't been updated for a while, so could be outdated (particularly if it's about something that evolves constantly, such as software).
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

See also Linguistic data and resources

Mostly POS taggers

Stanford Log-linear POS Tagger

http://nlp.stanford.edu/software/tagger.shtml

Open source and freely usable (verify)


Trigrams'n'Tags

A statistical POS tagger that allows training choices in both language and the tagset used.

http://www.coli.uni-saarland.de/~thorsten/tnt/

Licensed/commercial, no charge for research use. (verify)


TreeTagger

http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/DecisionTreeTagger.html

Open source, freely usable for personal/research but not other/commercial usage. (verify)

TATOO

http://www.issco.unige.ch/staff/robert/tatoo/tatoo.html

Licensed/commercial (verify)


MuTBL

µ-TBL

See also:

CLAWS

POS tagger for English

http://ucrel.lancs.ac.uk/claws/

Lingua-EN-Tagger

http://search.cpan.org/~acoburn/Lingua-EN-Tagger/

Open source and freely usable (verify)


FnTbl

tagger, chunker, more?(verify)

See also:

ACOPOST

A Collection of POS Taggers

http://acopost.sourceforge.net/

Open source and freely usable (verify)


Xtag

See also:


QTAG

http://www.english.bham.ac.uk/staff/omason/software/qtag.html

(is this also the 'Birmingham tagger'?)(verify)

Open source and freely usable (verify)


Language Technology's LT POS

http://www.ltg.ed.ac.uk/software/pos/index.html

http://www.ltg.ed.ac.uk/software/ttt/

Licensed/commercial (verify)



SFST

Stuttgart Finite State Transducer Tools (SFST)


See also:


SVMTool

A Support Vector Machine based tool applied to POS tagging, with good results.

Open source (LGPL) and freely usable (verify)

See also:




MINIPAR

A parser for English for basic dependency relations

Seems to have its own ~20-item tagset, and detects ~30 types of relations.

License: free for non-commercial use.

See also:



Stanford NLP

Has a POS tagger, Named entity recognition, and more

Java

Open Source, License: GPL

See also:

Larger toolsets

Bow

"A Toolkit for Statistical Language Modeling, Text Retrieval, Classification and Clustering", though when I used it it was mostly classification and basic Information Retrieval. Consists of:

  • Rainbow (document classification)
  • Crossbow (clustering, clssification)
  • Arrow (document retrieval)


See also:


Cogito

Commercial.

Apparantly has morphological, grammatical, and even logical analysis.

See also:


NLTK

Natural Language Toolkit (NLTK) is a collection of python modules and sample data, set up for interactive exploration of a good number of computational linguisic methods related to tagging, parsing, extraction, clustering and classifying.

Open Source, License: Noncommercial, No Derivative Works

See also:



OpenNLP

Community that can be seen to collect a number of tools

Includes:


License: Varies with subproject(verify). Often LGPL, Apache 2.0, or such(verify).


See also:



LinguaStream

IDE-based processing

Free for educational use

Java

See also:


Lingpipe

Geared somewhat to extraction tasks. Deals with POS tagging, phrase chunking, entities and their relations, simple summaries


Free for research (with the restriction that the product must be released under a free software license), commercial non-restricted use under paid license.


See also:


GATE

GATE (General Architecture for Text Engineering) is a GUI for text processing, geared somewhat to extraction tasks, which also has an API.

Has a POS tagger, Named entity extraction, sentence splitting, and other things

License: LGPL

See also:


Digital sonata

http://www.digitalsonata.com/default.aspx


Freeling

Deals with POS tagging, tokenization, Named entities, sentence splitting, some morphology


Implemented in C (and a little Perl),

License: GPL (since version 2), with some data under other licenses (primarily the parts from WordNet), and additional language support data (Catalan, Spanish, Italian, Galician) under the licenses of their data sources.


NooJ

  • .NET

See also:

Unsorted

Lexed

Lexicalizer from INRIA

See also:



YamCha

NER, Chunking, more?(verify)

See also:


PosTech

Korean and Chinese Morphological Analysis

License: research only; commercial use needs license