Linguistics software

From Helpful
Jump to navigation Jump to search

Language units large and small

Marked forms of words - Inflection, Derivation, Declension, Conjugation · Diminutive, Augmentative

Groups and categories and properties of words - Syntactic and lexical categories · Grammatical cases · Correlatives · Expletives · Adjuncts

Words and meaning - Morphology · Lexicology · Semiotics · Onomasiology · Figures of speech, expressions, phraseology, etc. · Word similarity · Ambiguity · Modality ·

Segment function, interaction, reference - Clitics · Apposition· Parataxis, Hypotaxis· Attributive· Binding · Coordinations · Word and concept reference

Sentence structure and style - Agreement · Ellipsis· Hedging

Phonology - Articulation · Formants· Prosody · Sound change · Intonation, stress, focus · Diphones · Intervocalic · Glottal stop · Vowel_diagrams · Elision · Ablaut_and_umlaut · Phonics

Speech processing · Praat notes · Praat plugins and toolkit notes · Praat scripting notes

Analyses, models, software - Minimal pairs · Concordances · Linguistics software · Some_relatively_basic_text_processing · Word embeddings · Semantic similarity

Unsorted - Contextualism · · Text summarization · Accent, Dialect, Language · Pidgin, Creole · Natural language typology · Writing_systems · Typography, orthography · Digraphs, ligatures, dipthongs · More linguistic terms and descriptions · Phonetic scripts

⌛ This hasn't been updated for a while, so could be outdated (particularly if it's about something that evolves constantly, such as software or research).
This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

See also Linguistic data and resources

Mostly POS taggers

Stanford Log-linear POS Tagger

Open source and freely usable (verify)


A statistical POS tagger that allows training choices in both language and the tagset used.

Licensed/commercial, no charge for research use. (verify)


Open source, freely usable for personal/research but not other/commercial usage. (verify)


Licensed/commercial (verify)



See also:


POS tagger for English


Open source and freely usable (verify)


tagger, chunker, more?(verify)

See also:


A Collection of POS Taggers

Open source and freely usable (verify)


See also:


(is this also the 'Birmingham tagger'?)(verify)

Open source and freely usable (verify)

Language Technology's LT POS

Licensed/commercial (verify)


Stuttgart Finite State Transducer Tools (SFST)

See also:


A Support Vector Machine based tool applied to POS tagging, with good results.

Open source (LGPL) and freely usable (verify)

See also:


A parser for English for basic dependency relations

Seems to have its own ~20-item tagset, and detects ~30 types of relations.

License: free for non-commercial use.

See also:

Stanford NLP

Has a POS tagger, Named entity recognition, and more


Open Source, License: GPL

See also:

Larger toolsets


"A Toolkit for Statistical Language Modeling, Text Retrieval, Classification and Clustering", though when I used it it was mostly classification and basic Information Retrieval. Consists of:

  • Rainbow (document classification)
  • Crossbow (clustering, clssification)
  • Arrow (document retrieval)

See also:



Apparantly has morphological, grammatical, and even logical analysis.

See also:


Natural Language Toolkit (NLTK) is a collection of python modules and sample data, set up for interactive exploration of a good number of computational linguisic methods related to tagging, parsing, extraction, clustering and classifying.

Open Source, License: Noncommercial, No Derivative Works

See also:


Community that can be seen to collect a number of tools


License: Varies with subproject(verify). Often LGPL, Apache 2.0, or such(verify).

See also:


IDE-based processing

Free for educational use


See also:


Geared somewhat to extraction tasks. Deals with POS tagging, phrase chunking, entities and their relations, simple summaries

Free for research (with the restriction that the product must be released under a free software license), commercial non-restricted use under paid license.

See also:


GATE (General Architecture for Text Engineering) is a GUI for text processing, geared somewhat to extraction tasks, which also has an API.

Has a POS tagger, Named entity extraction, sentence splitting, and other things

License: LGPL

See also:

Digital sonata


Deals with POS tagging, tokenization, Named entities, sentence splitting, some morphology

Implemented in C (and a little Perl),

License: GPL (since version 2), with some data under other licenses (primarily the parts from WordNet), and additional language support data (Catalan, Spanish, Italian, Galician) under the licenses of their data sources.


  • .NET

See also:



Lexicalizer from INRIA

See also:


NER, Chunking, more?(verify)

See also:


Korean and Chinese Morphological Analysis

License: research only; commercial use needs license