Linguistics software

Language units large and small

Marked forms of words - Inflection, Derivation, Declension, Conjugation · Diminutive, Augmentative

Groups and categories and properties of words - Syntactic and lexical categories · Grammatical cases · Correlatives · Expletives · Adjuncts

Words and meaning - Morphology · Lexicology · Semiotics · Onomasiology · Figures of speech, expressions, phraseology, etc. · Word similarity · Ambiguity · Modality ·

Segment function, interaction, reference - Clitics · Apposition· Parataxis, Hypotaxis· Attributive· Binding · Coordinations · Word and concept reference

Sentence structure and style - Agreement · Ellipsis· Hedging

Phonology - Articulation · Formants· Prosody · Sound change · Intonation, stress, focus · Diphones · Intervocalic · Glottal stop · Vowel_diagrams · Elision · Ablaut_and_umlaut · Phonics

Speech processing · Praat notes · Praat plugins and toolkit notes · Praat scripting notes

Analyses, models, software - Minimal pairs · Concordances · Linguistics software · Some_relatively_basic_text_processing · Word embeddings · Semantic similarity

Unsorted - Contextualism · · Text summarization · Accent, Dialect, Language · Pidgin, Creole · Natural language typology · Writing_systems · Typography, orthography · Digraphs, ligatures, dipthongs · More linguistic terms and descriptions · Phonetic scripts

See also Linguistic data and resources

Mostly POS taggers

Stanford Log-linear POS Tagger

Open source and freely usable (verify)


A statistical POS tagger that allows training choices in both language and the tagset used.

Licensed/commercial, no charge for research use. (verify)


Open source, freely usable for personal/research but not other/commercial usage. (verify)


Licensed/commercial (verify)



POS tagger for English


Open source and freely usable (verify)


tagger, chunker, more?(verify)

A Collection of POS Taggers

Open source and freely usable (verify)


(is this also the 'Birmingham tagger'?)(verify)

Open source and freely usable (verify)

Language Technology's LT POS

Licensed/commercial (verify)


Stuttgart Finite State Transducer Tools (SFST)

A Support Vector Machine based tool applied to POS tagging, with good results.

Open source (LGPL) and freely usable (verify)

A parser for English for basic dependency relations

Seems to have its own ~20-item tagset, and detects ~30 types of relations.

License: free for non-commercial use.

Stanford NLP

Has a POS tagger, Named entity recognition, and more


Open Source, License: GPL

Larger toolsets


"A Toolkit for Statistical Language Modeling, Text Retrieval, Classification and Clustering", though when I used it it was mostly classification and basic Information Retrieval. Consists of:

  • Rainbow (document classification)
  • Crossbow (clustering, clssification)
  • Arrow (document retrieval)

Apparantly has morphological, grammatical, and even logical analysis.

Natural Language Toolkit (NLTK) is a collection of python modules and sample data, set up for interactive exploration of a good number of computational linguisic methods related to tagging, parsing, extraction, clustering and classifying.

Open Source, License: Noncommercial, No Derivative Works

Community that can be seen to collect a number of tools


License: Varies with subproject(verify). Often LGPL, Apache 2.0, or such(verify).

IDE-based processing

Free for educational use


Geared somewhat to extraction tasks. Deals with POS tagging, phrase chunking, entities and their relations, simple summaries

Free for research (with the restriction that the product must be released under a free software license), commercial non-restricted use under paid license.

GATE (General Architecture for Text Engineering) is a GUI for text processing, geared somewhat to extraction tasks, which also has an API.

Has a POS tagger, Named entity extraction, sentence splitting, and other things

License: LGPL

Digital sonata


Deals with POS tagging, tokenization, Named entities, sentence splitting, some morphology

Implemented in C (and a little Perl),

License: GPL (since version 2), with some data under other licenses (primarily the parts from WordNet), and additional language support data (Catalan, Spanish, Italian, Galician) under the licenses of their data sources.


  • .NET

Lexicalizer from INRIA

NER, Chunking, more?(verify)

Korean and Chinese Morphological Analysis

License: research only; commercial use needs license