Linguistics software
⌛ This hasn't been updated for a while, so could be outdated (particularly if it's about something that evolves constantly, such as software or research). |
See also Linguistic data and resources
Mostly POS taggers
Stanford Log-linear POS Tagger
http://nlp.stanford.edu/software/tagger.shtml
Open source and freely usable (verify)
Trigrams'n'Tags
A statistical POS tagger that allows training choices in both language and the tagset used.
http://www.coli.uni-saarland.de/~thorsten/tnt/
Licensed/commercial, no charge for research use. (verify)
TreeTagger
http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/DecisionTreeTagger.html
Open source, freely usable for personal/research but not other/commercial usage. (verify)
TATOO
http://www.issco.unige.ch/staff/robert/tatoo/tatoo.html
Licensed/commercial (verify)
MuTBL
µ-TBL
See also:
CLAWS
POS tagger for English
http://ucrel.lancs.ac.uk/claws/
Lingua-EN-Tagger
http://search.cpan.org/~acoburn/Lingua-EN-Tagger/
Open source and freely usable (verify)
FnTbl
tagger, chunker, more?(verify)
See also:
ACOPOST
A Collection of POS Taggers
http://acopost.sourceforge.net/
Open source and freely usable (verify)
Xtag
See also:
QTAG
http://www.english.bham.ac.uk/staff/omason/software/qtag.html
(is this also the 'Birmingham tagger'?)(verify)
Open source and freely usable (verify)
Language Technology's LT POS
http://www.ltg.ed.ac.uk/software/pos/index.html
http://www.ltg.ed.ac.uk/software/ttt/
Licensed/commercial (verify)
SFST
Stuttgart Finite State Transducer Tools (SFST)
See also:
SVMTool
A Support Vector Machine based tool applied to POS tagging, with good results.
Open source (LGPL) and freely usable (verify)
See also:
- [[1]]
MINIPAR
A parser for English for basic dependency relations
Seems to have its own ~20-item tagset, and detects ~30 types of relations.
License: free for non-commercial use.
See also:
Stanford NLP
Has a POS tagger, Named entity recognition, and more
Java
Open Source, License: GPL
See also:
Larger toolsets
Bow
"A Toolkit for Statistical Language Modeling, Text Retrieval, Classification and Clustering", though when I used it it was mostly classification and basic Information Retrieval. Consists of:
- Rainbow (document classification)
- Crossbow (clustering, clssification)
- Arrow (document retrieval)
See also:
Cogito
Commercial.
Apparantly has morphological, grammatical, and even logical analysis.
See also:
NLTK
Natural Language Toolkit (NLTK) is a collection of python modules and sample data, set up for interactive exploration of a good number of computational linguisic methods related to tagging, parsing, extraction, clustering and classifying.
Open Source, License: Noncommercial, No Derivative Works
See also:
OpenNLP
Community that can be seen to collect a number of tools
Includes:
- OpenNLP maxent [2]
- OpenCCG, a Combinatory Categorial Grammar parser/generator [3] [4]
- ...and more, see http://opennlp.sourceforge.net/projects.html
License: Varies with subproject(verify). Often LGPL, Apache 2.0, or such(verify).
See also:
LinguaStream
IDE-based processing
Free for educational use
Java
See also:
Lingpipe
Geared somewhat to extraction tasks. Deals with POS tagging, phrase chunking, entities and their relations, simple summaries
Free for research (with the restriction that the product must be released under a free software license),
commercial non-restricted use under paid license.
See also:
GATE
GATE (General Architecture for Text Engineering) is a GUI for text processing, geared somewhat to extraction tasks, which also has an API.
Has a POS tagger, Named entity extraction, sentence splitting, and other things
License: LGPL
See also:
- http://en.wikipedia.org/wiki/General_Architecture_for_Text_Engineering
- ANNIE, the IE component
Digital sonata
http://www.digitalsonata.com/default.aspx
Freeling
Deals with POS tagging, tokenization, Named entities, sentence splitting, some morphology
Implemented in C (and a little Perl),
License: GPL (since version 2), with some data under other licenses (primarily the parts from WordNet), and additional language support data (Catalan, Spanish, Italian, Galician) under the licenses of their data sources.
NooJ
- .NET
See also:
Unsorted
Lexed
Lexicalizer from INRIA
See also:
YamCha
See also:
PosTech
Korean and Chinese Morphological Analysis
License: research only; commercial use needs license