Linguistics software

Marked forms of words - Inflection, Derivation, Declension, Conjugation · Diminutive, Augmentative

Groups and categories and properties of words - Syntactic and lexical categories · Grammatical cases · Correlatives · Expletives · Adjuncts

Words and meaning - Morphology · Lexicology · Semiotics · Onomasiology · Figures of speech, expressions, phraseology, etc. · Word similarity · Ambiguity · Modality ·

Segment function, interaction, reference - Clitics · Apposition· Parataxis, Hypotaxis· Attributive· Binding · Coordinations · Word and concept reference

Sentence structure and style - Agreement · Ellipsis· Hedging

Phonology - Articulation · Formants· Prosody · Sound change · Intonation, stress, focus · Diphones · Intervocalic · Glottal stop · Vowel_diagrams · Elision · Ablaut_and_umlaut · Phonics

Analyses, models, software - Minimal pairs · Concordances · Linguistics software · Some_relatively_basic_text_processing · Word embeddings · Semantic similarity

Unsorted - Contextualism · · Text summarization · Accent, Dialect, Language · Pidgin, Creole · Natural language typology · Writing_systems · Typography, orthography · Digraphs, ligatures, dipthongs · More linguistic terms and descriptions ·

⌛ This hasn't been updated for a while, so could be outdated (particularly if it's about something that evolves constantly, such as software or research).

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Mostly POS taggers

Stanford Log-linear POS Tagger

http://nlp.stanford.edu/software/tagger.shtml

Open source and freely usable (verify)

Trigrams'n'Tags

A statistical POS tagger that allows training choices in both language and the tagset used.

http://www.coli.uni-saarland.de/~thorsten/tnt/

Licensed/commercial, no charge for research use. (verify)

TreeTagger

http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/DecisionTreeTagger.html

Open source, freely usable for personal/research but not other/commercial usage. (verify)

TATOO

http://www.issco.unige.ch/staff/robert/tatoo/tatoo.html

Licensed/commercial (verify)

MuTBL

µ-TBL

See also:

http://www.ling.gu.se/~lager/mutbl.html

CLAWS

POS tagger for English

http://ucrel.lancs.ac.uk/claws/

Lingua-EN-Tagger

http://search.cpan.org/~acoburn/Lingua-EN-Tagger/

Open source and freely usable (verify)

FnTbl

tagger, chunker, more?(verify)

See also:

http://nlp.cs.jhu.edu/~rflorian/fntbl/

ACOPOST

A Collection of POS Taggers

http://acopost.sourceforge.net/

Open source and freely usable (verify)

Xtag

See also:

http://www.cis.upenn.edu/~xtag/

QTAG

http://www.english.bham.ac.uk/staff/omason/software/qtag.html

(is this also the 'Birmingham tagger'?)(verify)

Open source and freely usable (verify)

Language Technology's LT POS

http://www.ltg.ed.ac.uk/software/pos/index.html

http://www.ltg.ed.ac.uk/software/ttt/

Licensed/commercial (verify)

SFST

Stuttgart Finite State Transducer Tools (SFST)

See also:

http://www.ims.uni-stuttgart.de/projekte/gramotron/SOFTWARE/SFST.html

SVMTool

A Support Vector Machine based tool applied to POS tagging, with good results.

Open source (LGPL) and freely usable (verify)

See also:

[[1]]

MINIPAR

A parser for English for basic dependency relations

Seems to have its own ~20-item tagset, and detects ~30 types of relations.

License: free for non-commercial use.

See also:

http://www.cs.ualberta.ca/~lindek/minipar.htm

Stanford NLP

Has a POS tagger, Named entity recognition, and more

Java

Open Source, License: GPL

See also:

http://nlp.stanford.edu/software/index.shtml

Larger toolsets

Bow

"A Toolkit for Statistical Language Modeling, Text Retrieval, Classification and Clustering", though when I used it it was mostly classification and basic Information Retrieval. Consists of:

Rainbow (document classification)
Crossbow (clustering, clssification)
Arrow (document retrieval)

See also:

http://www.cs.cmu.edu/~mccallum/bow/

Cogito

Commercial.

Apparantly has morphological, grammatical, and even logical analysis.

See also:

http://www.expertsystem.net/page.asp?id=1521

NLTK

Natural Language Toolkit (NLTK) is a collection of python modules and sample data, set up for interactive exploration of a good number of computational linguisic methods related to tagging, parsing, extraction, clustering and classifying.

Open Source, License: Noncommercial, No Derivative Works

See also:

OpenNLP

Community that can be seen to collect a number of tools

Includes:

OpenNLP maxent [2]
OpenCCG, a Combinatory Categorial Grammar parser/generator [3] [4]
...and more, see http://opennlp.sourceforge.net/projects.html

License: Varies with subproject(verify). Often LGPL, Apache 2.0, or such(verify).

See also:

http://opennlp.sourceforge.net/

LinguaStream

IDE-based processing

Free for educational use

Java

See also:

Lingpipe

Geared somewhat to extraction tasks. Deals with POS tagging, phrase chunking, entities and their relations, simple summaries

Free for research (with the restriction that the product must be released under a free software license), commercial non-restricted use under paid license.

See also:

GATE

GATE (General Architecture for Text Engineering) is a GUI for text processing, geared somewhat to extraction tasks, which also has an API.

Has a POS tagger, Named entity extraction, sentence splitting, and other things

License: LGPL

See also:

http://en.wikipedia.org/wiki/General_Architecture_for_Text_Engineering
ANNIE, the IE component

Digital sonata

http://www.digitalsonata.com/default.aspx

Freeling

Deals with POS tagging, tokenization, Named entities, sentence splitting, some morphology

Implemented in C (and a little Perl),

License: GPL (since version 2), with some data under other licenses (primarily the parts from WordNet), and additional language support data (Catalan, Spanish, Italian, Galician) under the licenses of their data sources.

NooJ

.NET

See also:

http://www.nooj4nlp.net/

Unsorted

Lexed

Lexicalizer from INRIA

See also:

http://www.lionel-clement.net/lexed

YamCha

NER, Chunking, more?(verify)

See also:

http://chasen.org/~taku/software/yamcha/

PosTech

Korean and Chinese Morphological Analysis

License: research only; commercial use needs license

Linguistics software

Contents

Mostly POS taggers

Stanford Log-linear POS Tagger

Trigrams'n'Tags

TreeTagger

TATOO

MuTBL

CLAWS

Lingua-EN-Tagger

FnTbl

ACOPOST

Xtag

QTAG

Language Technology's LT POS

SFST

SVMTool

MINIPAR

Stanford NLP

Larger toolsets

Bow

Cogito

NLTK

OpenNLP

LinguaStream

Lingpipe

GATE

Digital sonata

Freeling

NooJ

Unsorted

Lexed

YamCha

PosTech

Navigation menu