Linguistic Tools

Some available software supports morphological analysis, natural language processing, tagging, transcription, building concordances, speech analysis, and other linguistic research.



  • AGTK: Annotation Graph Toolkit: "a suite of software components for building tools for annotating linguistic signals, time-series data which documents any kind of linguistic behavior (e.g. audio, video). The internal data structures are based on annotation graphs" (Open source, Java-based/Windows)

  • BaseLing: "an online database for linguistic data...a repository of pedagogical exercises for use by students and instructors" (Free, web-based)

  • CiteLing: a "master bibliography (*.bib) for all your linguistic needs" (Free, web-based)

  • CLaRK: "XML-based system for corpora development"; used for corpora markup, dictionary compilation for human users, and corpora investigation (Free software written in Java)

  • CMLaTeX: "a collection of information, links, advice, and bits of code for Linguists who use LaTeX"; also includes downloads for linguistic and LaTeX resources (Free, web-based)

  • Computing Optimality with Python: "a tutorial on implementing OT in Python"; includes chunks of Python code and a how-to (Free, web-based)

  • ConstraintWiki: "a repository for constraints used in Optimality Theory analyses" (Free, web-based)

  • EXMARaLDA: a system for creating, managing and analysing spoken language corpora

  • Erculator: a "web-based software that lets you create Optimality Theoretic (OT) tableaux, check their consistency, make inferences about winning candidates and plausible constraint rankings within and across tableaux, explore language typologies, and generate images (png, ps, pdf, Latex) of tableaux for direct inclusion in Word, LaTeX, and other documents" (Free, web-based)

  • JGAAP: Java-based,modular, program for textual analysis, text categorization, and authorship attribution

  • MonoConc: a "concordance (text searching) program...used in the analysis of English or other texts...also produces wordlists and collocation information" (Commercial, Windows)

  • Natural Language Toolkit: "Open source Python modules, linguistic data and documentation for research and development in natural language processing, supporting dozens of NLP tasks, with distributions for Windows, Mac OSX and Linux." (Free, Windows/Mac/Linux)

  • Praat: doing phonetics by computer: "a computer program with which you can analyse, synthesize, and manipulate speech, and create high-quality pictures for your articles and thesis" (Open source, cross-platform)

  • Saplo: A text analysis API with text recommendations, text filtering, text categorization, automatic tagging, automatic related articles and sentiment analysis. Read more in the text analysis API documentation (Free to try; special offers for researchers and universities are available, web-based).  

  • Stanford POS Tagger: "a piece of software that reads text in some language and assigns parts of speech to each word (and other token), such as noun, verb, adjective, etc." (Open source, cross-platform) [Review]

  • TACTWeb: text analysis software that makes TACT TDB databases available on the web to both TACT and non-TACT users (Free, web-based)

  • TIGERSearch: a search program that "lets you explore linguistically annotated texts. For example, a lexicographer or terminologist can use TIGERSearch to find out about lexical properties of a word like the collocations the word is used in." (Open source, cross-platform)

  • Toolbox (The Field Linguist’s Toolbox): "a data management and analysis tool for field linguists. It is especially useful for maintaining lexical data, and for parsing and interlinearizing text, but it can be used to manage virtually any kind of data." (Free, cross-platform)

  • Transana: "a computer program that allows researchers to transcribe and analyze large collections of video and audio data" (Commercial; open source, Windows/Mac)

  • Transformer: "a software tool for scientists who work with transcribed linguistic data. It addresses conversation analysts, phoneticians, anthropologists, and other social scientists who want to analyze digital audio or video data and language. The Transformer is a program to manage and convert transcribed linguistic and aligned data in a quick, safe, and easy way." (Commercial, Windows)

  • Voicewalker: "a transcriber's tool, designed to help you transcribe audio or video recordings. VoiceWalker lets you play back the sound in a controlled way, with the benefit of being able to systematically step (or "walk") through a recording, repeating short segments for a specified number of repetitions, then moving on to the next segment" (Free, Windows)

  • WMatrix: "a software tool for corpus analysis and comparison. It provides a web interface to the USAS and CLAWS corpus annotation tools, and standard corpus linguistic methodologies such as frequency lists and concordances. It also extends the keywords method to key grammatical categories and key semantic domains." (Annual subscription, web-based)




See also:


