Best Open Source Windows Linguistics Software 2026

Linguistics Software for Windows

Linguistics Windows Information Analysis Clear Filters

Browse free open source Linguistics software and projects for Windows below. Use the toggles on the left to filter open source Linguistics software by OS, license, language, programming language, and project status.

Find Hidden Risks in Windows Task Scheduler
Free diagnostic script reveals configuration issues, error patterns, and security risks. Instant HTML report.

Windows Task Scheduler might be hiding critical failures. Download the free JAMS diagnostic tool to uncover problems before they impact production—get a color-coded risk report with clear remediation steps in minutes.

Download Free Tool
AI-powered service management for IT and enterprise teams
Enterprise-grade ITSM, for every business

Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.

Try it Free
1

iramuteq

IRAMUTEQ : Interface de R pour les Analyses Multidimensionnelles de Textes et de Questionnaires. Logiciel de traitement de données pour des corpus texte ou de type individus/caractères. Permet notamment de réaliser des analyses de type "ALCESTE"

Downloads: 810 This Week

Last Update: 2024-11-03
See Project
2

Wordcorr

Data management for comparative linguistics

Wordcorr automates the tedious and risky process of tabulating and managing the sound correspondences used in working out the historical development of natural languages. Initial support was from NSF.

4 Reviews

Downloads: 12 This Week

Last Update: 2013-01-05
See Project
3

SPPAS

SPPAS - the automatic annotation and analyses of speech

SPPAS is a scientific computer software package written and maintained by Brigitte Bigi of the Laboratoire Parole et Langage, in Aix-en-Provence, France. Available for free, with open source code, there is simply no other package for linguists to simple use in the automatic annotations of speech, the analyses of any kind of annotated data and the conversion of annotated files. SPPAS is able to produce automatically speech annotations from a recorded speech sound and its orthographic transcription. SPPAS is helpful for the analysis of any annotated data: estimate statistical distributions, make requests, manage files, visualize annotations. SPPAS offers a file converter from/to a wide range of formats: xra, TextGrid, eaf, trs... <https://sppas.org>

Downloads: 11 This Week

Last Update: 2026-01-26
See Project
4

BioC

We describe a simple XML format to share text documents and annotation

A minimalist approach to share text documents and data annotations. Allows a large number of different annotations to be represented. Project files contain: - simple code to hold/read/write data and perform sample processing. - BioC-formatted corpora - BioC tools that work with BioC corpora BioC goals - simplicity - interoperability - broad use - reuse There should be little investment required to learn to use a format or a software module to process that format. We are interested in reuse, and we focus on common NLP tasks that are broadly useful for textmining.

Downloads: 6 This Week

Last Update: 2016-08-08
See Project
Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
5

Ghawwas_V4

An open source system for Arabic corpora processing

Ghawwas (previously known as Khawas) is an open source system for Arabic corpora processing. Ghawwas V4.0 provides the following main functions: a. Frequency list for single word and N-Grams b. Concordance c. Collocation (MI, CHI Squared, LL, T-Score, Z Score, Dice, Log Dice, Weirdness Coefficient) d. Lexical patterns search e. Two corpora frequency profile comparison based on MI, CHI, LL, T-Score, Z Score, Dice, Log Dice, Weirdness Coefficient f. Accept Windows and UTF-8 character encoding g. Accept TXT, DOC, DOCX, RTF and HTML formats h. Export the processing results in CSV file format

1 Review

Downloads: 3 This Week

Last Update: 2018-12-09
See Project
6

HanNanum - Korean POS Tagger

HanNanum is a Korean Morphological Analyzer and POS Tagger. A plug-in component-based architecture is adapted to the new Java version for flexible use. You can find the work flow for morphological analysis, POS tagging, noun extraction, etc. Contact: kschoi@kaist.ac.kr hjjeong@world.kaist.ac.kr

2 Reviews

Downloads: 1 This Week

Last Update: 2015-08-02
See Project
7

Rhyme Analyzer

A lyrical analysis and classification tool focused specifically on rhyming style in rap lyrics. Functions include phonetic transcription, rhyme visualization, and rapper classification.

Downloads: 2 This Week

Last Update: 2013-04-23
See Project
8

Wikipedia Categories

SQL file and code for finding the categories and taxonomy of Wikipedia. Includes source code. Go to Downloads to get both.

1 Review

Downloads: 1 This Week

Last Update: 2013-04-09
See Project
9

Aramorpher

Based on the Buckwalter Morphological Analyzer (Version 1.0) for doing Arabic stemming and POS tagging. Includes a rewrite of the original Perl script, with better documentation and more flexible options, and a C++ interface (usable as a library or app).

Downloads: 1 This Week

Last Update: 2016-08-09
See Project
Grafana: The open and composable observability platform
Faster answers, predictable costs, and no lock-in built by the team helping to make observability accessible to anyone.

Grafana is the open source analytics & monitoring solution for every database.

Learn More
10

Khawas

An Arabic Corpora Processing Tool

The new version is available at https://sourceforge.net/projects/ghawwasv4/

Downloads: 1 This Week

Last Update: 2014-08-02
See Project
11

Maui Topic Indexer

Maui is a multi-purpose automatic topic indexing algorithm. Given a document, Maui automatically identifies its topics. Depending on the task topics are tags, keywords, keyphrases, vocabulary terms, descriptors or Wikipedia titles.

Downloads: 1 This Week

Last Update: 2014-04-25
See Project
12

Text Expander, Inverse summarizer

Expand text, inverse summarizer

IT WILL WORK WITH A JAVA DEVELOPMENT KIT 1.7 ONLY !!! This program is a data-miner and a knowledge-miner. It does exactly the opposite of what the text summarizers do. A text summarizer produces a shortened text given some text as an input. An inverse summarizer takes the shortened input, a similar or a same text and does the process in reverse. This results in an expanded text. It can be used with any text or notes that have the knowledge gaps. It is a great aid to any creative work and it simply pin-points to data that may be of some relevance. How to run this program? 1. Make sure JAVA 1.7 development is installed and running/compiling properly with all environment variables properly set. 2. Uncompress the LITE release into a desired directory. 3. Go to a src/ directory with a Terminal/Console/Command-Prompt and do the following: javac Te2.java java Te2 The program should open.

Downloads: 1 This Week

Last Update: 2016-02-28
See Project
13

Welsh Natural Language Toolkit

WNLT is a suite of open source natural language modules for the Welsh

The project supports the Welsh Language Technology domain with a set of NLP tools that drive innovation and advance the development of sophisticated textual analysis solutions. The WNLT project delivers four core NLP modules; a) Word Segmentation for separating text into words b) Sentence Boundary Disambiguation for finding sentence boundaries c) Part of Speech Tagger for determining the part of speech of each word d) Morphological Analyser for identifying the root form (lemma) of words. The modules are written in JAVA and ‘wrapped’ for execution under the General Architecture for Text Engineering (GATE) framework. The project also includes CYMRIE an adapted version for Welsh of the GATE - ANNIE Named Entity Recognition (NER) application for a range of entities such as Persons, Organisations, Locations, and date and time expressions.

Downloads: 1 This Week

Last Update: 2016-11-29
See Project
14

CRFSharp

CRFSharp is a .NET(C#) implementation of Conditional Random Field

CRFSharp(aka CRF#) is a .NET(C#) implementation of Conditional Random Fields, an machine learning algorithm for learning from labeled sequences of examples. It is widely used in Natural Language Process (NLP) tasks, for example: word breaker, postagging, named entity recognized, query chunking and so on. CRF#'s mainly algorithm is the same as CRF++ written by Taku Kudo. It encodes model parameters by L-BFGS. Moreover, it has many significant improvement than CRF++, such as totally parallel encoding, optimizing memory usage and so on. Currently, when training corpus, compared with CRF++, CRF# can make full use of multi-core CPUs and only uses very low memory, and memory grow is very smoothly and slowly while amount of training corpus, tags increase. with multi-threads process, CRF# is more suitable for large data and tags training than CRF++ now. For example, in machine with 64GB, CRF# encodes model with more than 4.5 hundred million features quickly.

Downloads: 0 This Week

Last Update: 2015-08-03
See Project
15

Colloquium QDA

A free and open source qualitative ethnographic interview coding tool.

Colloquium QDA is a tool for custom coding and analyzing qualitative ethnographic interviews. To run, make sure you first have JRE 8 or later installed (http://www.oracle.com/technetwork/java/javase/downloads/). Colloquium QDA is an open source cross-platform Java Swing app utilizing an embedded Java DB with Lucene integrated search.

Downloads: 0 This Week

Last Update: 2017-01-23
See Project
16

DCTFinder

Extract title and creation time from web page.

Web pages do not offer reliable metadata concerning their creation date and time. However, getting the document creation time is a necessary step for allowing to apply temporal normalization systems to web pages. DCTFinder is a system that parses a web page and extracts from its content the title and the creation date of this web page. DCTFinder combines heuristic title detection, supervised learning with Conditional Random Fields (CRFs) for document date extraction, and rule-based creation time recognition. DCTFinder is released under CeCILL free software license agreement. The system is described in the following paper (see 'Files' section): Xavier Tannier. "Extracting News Web Page Creation Time with DCTFinder". Proceedings of the 9th Language Resources and Evaluation Conference. Reykjavik, Iceland.

Downloads: 0 This Week

Last Update: 2016-10-21
See Project
17

ELIA(eye-tracking for psycholinguistics)

ELIA(Eyegaze Language Integration Analysis) supports the analysis of eye-tracking data for studies in language processing. ELIA eases early analysis of data to enable iterative development of experiments in response to spoken language.

1 Review

Downloads: 0 This Week

Last Update: 2013-04-24
See Project
18

EyeMap - Eye Movement Data Analyzer

EyeMap is a visualization and analysis tool for text reading eye movement data. It can process Unicode, proportion/non-proportion and spaced/unspaced reading materials, which supports various languages and experiment methods.

1 Review

Downloads: 0 This Week

Last Update: 2013-08-10
See Project
19

FALCON - Text Search Java Project

JSON based text search Java Project

----------------- - What is it? - ----------------- The "Falcon Search" is a JAVA API and tool to search inside the documents. It was originally started to search the content in pdf files under the project "HAWK Search". Searching with this tool is query-based not word-based as in most of the document search tools OR document readers. It also takes care of jumbling of words within query and spelling mistakes. Commonly used techniques in this project are Natural Language Processing, Information Extraction and Question-Answering Architecture. ---------------------- - Latest Version - ---------------------- Details of latest version can be found on project website - http://geekdadaji.com --------------------------- - CONTACT DETAILS - --------------------------- CREATOR : SWAPNIL A JADHAV (saj1919) EMAIL ID : dadajibudhau@gmail.com WEBSITE : http://geekdadaji.com LICENSE : CC BY-NC 4.0

Downloads: 0 This Week

Last Update: 2014-04-18
See Project
20

FREJ

FREJ stands for "Fuzzy Regular Expressions for Java" - it is a command-line tool and library which allow you easily compare strings with patterns disregarding nasty typos and considering several variants (like "Barack Obama", "B.H.Obama" etc.) Project sources are moved to github: https://github.com/RodionGork/FREJ

Downloads: 0 This Week

Last Update: 2015-10-17
See Project
21

HAWK - PDF Text Search Java Project

No more support for this project - TAKE A LOOK AT FALCONSEARCH

No more support for this project - TAKE A LOOK AT FALCONSEARCH "https://sourceforge.net/projects/falcontextsearch/"

Downloads: 0 This Week

Last Update: 2014-04-19
See Project
22

KH Coder

Quantitative Content Analysis or Text Mining

************************************************************ THIS PROJECT IS MOVED. See http://khcoder.net/en for the latest & greatest. You can download this tool from the new home. See you there! ************************************************************

9 Reviews

Downloads: 0 This Week

Last Update: 2018-12-30
See Project
23

LanguageTool

Proofreading Software for 20+ Languages

LanguageTool is an Open Source language/grammar checker. *** THIS REPOSITORY IS OUT OF DATE, see https://github.com/languagetool-org INSTEAD ***

2 Reviews

Downloads: 0 This Week

Last Update: 2014-05-09
See Project
24

RDRPOSTagger

A Rule-based Part-of-Speech and Morphological Tagging Toolkit

RDRPOSTagger is a robust, easy-to-use and language-independent rule-based toolkit for Part-of-Speech (POS) and morphological tagging. RDRPOSTagger obtains fast performance in both learning and tagging process. RDRPOSTagger also achieves a very competitive accuracy in comparison to the state-of-the-art results. RDRPOSTagger now supports pre-trained POS and morphological tagging models for Bulgarian, Czech, Dutch, English, French, German, Hindi, Italian, Portuguese, Spanish, Swedish, Thai and Vietnamese. Additionally, RDRPOSTagger supports the pre-trained Universal POS tagging models for 40 languages. See the full usage of RDRPOSTagger at: http://rdrpostagger.sourceforge.net/

2 Reviews

Downloads: 0 This Week

Last Update: 2017-05-24
See Project
25

Simple Semantic Classifier

The Simple Semantic Classifier classifies short chunks of natural language text into broad semantic classes that correspond to the OBO ontologies provided as input.

Downloads: 0 This Week

Last Update: 2013-04-11
See Project