Open Source Natural Language Processing (NLP) Tools

Browse free open source Natural Language Processing (NLP) tools and projects below. Use the toggles on the left to filter open source Natural Language Processing (NLP) tools by OS, license, language, programming language, and project status.

  • Outgrown Windows Task Scheduler? Icon
    Outgrown Windows Task Scheduler?

    Free diagnostic identifies where your workflow is breaking down—with instant analysis of your scheduling environment.

    Windows Task Scheduler wasn't built for complex, cross-platform automation. Get a free diagnostic that shows exactly where things are failing and provides remediation recommendations. Interactive HTML report delivered in minutes.
    Download Free Tool
  • AI-powered service management for IT and enterprise teams Icon
    AI-powered service management for IT and enterprise teams

    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
    Try it Free
  • 1
    MeCab is a fast and customizable Japanese morphological analyzer. MeCab is designed for generic purpose and applied to variety of NLP tasks, such as Kana-Kanji conversion. MeCab provides parameter estimation functionalities based on CRFs and HMM
    Leader badge
    Downloads: 1,930 This Week
    Last Update:
    See Project
  • 2
    Virastyar

    Virastyar

    Virastyar is an spell checker for low-resource languages

    Virastyar is a free and open-source (FOSS) spell checker. It stands upon the shoulders of many free/libre/open-source (FLOSS) libraries developed for processing low-resource languages, especially Persian and RTL languages Publications: Kashefi, O., Nasri, M., & Kanani, K. (2010). Towards Automatic Persian Spell Checking. SCICT. Kashefi, O., Sharifi, M., & Minaie, B. (2013). A novel string distance metric for ranking Persian respelling suggestions. Natural Language Engineering, 19(2), 259-284. Rasooli, M. S., Kahefi, O., & Minaei-Bidgoli, B. (2011). Effect of adaptive spell checking in Persian. In NLP-KE Contributors: Omid Kashefi Azadeh Zamanifar Masoumeh Mashaiekhi Meisam Pourafzal Reza Refaei Mohammad Hedayati Kamiar Kanani Mehrdad Senobari Sina Iravanin Mohammad Sadegh Rasooli Mohsen Hoseinalizadeh Mitra Nasri Alireza Dehlaghi Fatemeh Ahmadi Neda PourMorteza
    Leader badge
    Downloads: 284 This Week
    Last Update:
    See Project
  • 3
    OpenNLP provides the organizational structure for coordinating several different projects which approach some aspect of Natural Language Processing. OpenNLP also defines a set of Java interfaces and implements some basic infrastructure for NLP compon
    Leader badge
    Downloads: 27 This Week
    Last Update:
    See Project
  • 4
    libpostal

    libpostal

    A C library for parsing/normalizing street addresses around the world

    A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data. libpostal is a C library for parsing/normalizing street addresses around the world using statistical NLP and open data. The goal of this project is to understand location-based strings in every language, everywhere. Addresses and the locations they represent are essential for any application dealing with maps (place search, transportation, on-demand/delivery services, check-ins, reviews). Yet even the simplest addresses are packed with local conventions, abbreviations and context, making them difficult to index/query effectively with traditional full-text search engines. This library helps convert the free-form addresses that humans use into clean normalized forms suitable for machine comparison and full-text indexing. Though libpostal is not itself a full geocoder, it can be used as a preprocessing step to make any geocoding application smarter, and simpler.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 5
    JWNL is a Java API for accessing the WordNet relational dictionary. WordNet is widely used for developing NLP applications, and a Java API such as JWNL will allow developers to more easily use Java for building NLP applications.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 6
    MARF is a general cross-platform framework with a collection of algorithms for audio (voice, speech, and sound) and natural language text analysis and recognition along with sample applications (identification, NLP, etc.) of its use, implemented in Java.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 7
    This ohnlp project has released "pipelines" that were contributed by members of the OHNLP Consortium. The pipelines are based on the Apache UIMA framework. medKAT/P, MedCoref, MedTagger, MedXN, and cTAKES are licensed under Apache License V2.0. MedTime is licensed under GNU General Public License version 3.0 (GPLv3). cTAKES development has moved to apache.org. See http://ctakes.apache.org/
    Downloads: 3 This Week
    Last Update:
    See Project
  • 8
    JInsect
    The JINSECT toolkit is a Java-based toolkit and library that supports and demonstrates the use of n-gram graphs within Natural Language Processing applications, ranging from summarization and summary evaluation to text classification and indexing.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 9

    BioC

    We describe a simple XML format to share text documents and annotation

    A minimalist approach to share text documents and data annotations. Allows a large number of different annotations to be represented. Project files contain: - simple code to hold/read/write data and perform sample processing. - BioC-formatted corpora - BioC tools that work with BioC corpora BioC goals - simplicity - interoperability - broad use - reuse There should be little investment required to learn to use a format or a software module to process that format. We are interested in reuse, and we focus on common NLP tasks that are broadly useful for textmining.
    Leader badge
    Downloads: 6 This Week
    Last Update:
    See Project
  • BoldTrail Real Estate CRM Icon
    BoldTrail Real Estate CRM

    A first-of-its-kind homeownership solution that puts YOU at the center of the coveted lifetime consumer relationship.

    BoldTrail, the #1 rated real estate platform, is built to power your entire brokerage with next-generation technology your agents will use and love. Showcase your unique brand with customizable websites for your company, offices, and every agent. Maximize lead capture with a modern, portal-like consumer search experience and intelligent behavior tracking. Hyper-local area pages, home valuation pages and options for rich lifestyle data keep customers searching with your brokerage as the local experts. The most robust lead gen tools on the market help your brokerage, teams & agents effectively drive new business - no matter their budget. Empower your agents to generate free leads instantly with our simple to use landing pages & IDX squeeze pages. Drive more leads with higher quality and lower cost through in-house tools built within the platform. Diversify lead sources with our automated social media posting, integrated Google and Facebook advertising, custom text codes and more.
    Learn More
  • 10
    CRF++ is a simple, customizable, and open source implementation of Conditional Random Fields (CRFs) for segmenting/labeling sequential data. CRF++ is designed for generic purpose and will be applied to a variety of NLP tasks.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 11
    TXM

    TXM

    Unicode XML TEI text analysis platform

    TXM is a free and open-source cross-platform Unicode & XML based text analysis environment and graphical client, supporting Windows, Linux and Mac OS X. It can also be used online as a J2EE standard compliant web portal (GWT based) with access control built in. DOWNLOAD LATEST VERSION OF TXM : http://textometrie.ens-lyon.fr/spip.php?rubrique61&lang=en TXM offers a comprehensive range of analysis tools (concordances, collocate search, frequency lists, etc.) based on the powerfull CQP full text search engine (http://cwb.sourceforge.net) and a range of statistical functions (factorial analysis, classification, cooccurrency analysis, etc.) based on R packages (http://www.r-project.org). Read the scientific background at the Textométrie project web site http://textometrie.ens-lyon.fr/?lang=en. Read a full description at the TEI Tools wiki http://wiki.tei-c.org/index.php/TXM.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 12
    OpenNN - Open Neural Networks Library

    OpenNN - Open Neural Networks Library

    Machine learning algorithms for advanced analytics

    OpenNN is a software library written in C++ for advanced analytics. It implements neural networks, the most successful machine learning method. Some typical applications of OpenNN are business intelligence (customer segmentation, churn prevention…), health care (early diagnosis, microarray analysis…) and engineering (performance optimization, predictive maitenance…). OpenNN does not deal with computer vision or natural language processing. The main advantage of OpenNN is its high performance. This library outstands in terms of execution speed and memory allocation. It is constantly optimized and parallelized in order to maximize its efficiency. The documentation is composed by tutorials and examples to offer a complete overview about the library. OpenNN is developed by Artelnics, a company specialized in artificial intelligence.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13

    KSUCCA Corpus

    A 50 million tokens corpus of Classical Arabic.

    King Saud University Corpus of Classical Arabic (KSUCCA) is a pioneering 50 million tokens annotated corpus of Classical Arabic texts from the period of pre-Islamic era until the fourth Hijri century (equivalent to the period from the seventh until early eleventh century CE), which is the period of pure classical Arabic. The main aim of this corpus is to be used for studying the distributional lexical semantics of The Quran words. However, it can be used for other research purposes, such as: • Arabic linguistics, which includes: lexical, morphological, syntactic, semantic and pragmatic research. • Arabic computational linguistics, which includes: lexical, morphological, syntactic, semantic and pragmatic research including their various applications. • Arabic language teaching for both Arabs and non Arabs. • Artificial intelligence. • Natural language processing. • Information retrieval. • Question answering. • Machine translation.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 14
    MutationFinder is a biomedical natural language processing (NLP) system for extracting mentions of point mutations from free text. MutationFinder achieves high performance (99% precision, 81% recall on blind test data) as an information extraction system
    Downloads: 2 This Week
    Last Update:
    See Project
  • 15

    OPTIMA cidoc-crm Semantic Annotation

    Semantic annotation of archaeology reports with respect to CIDOC-CRM

    The semantic annotation system OPTIMA is the result of Andreas Vlachidis PhD work, (supervised by Prof. Douglas Tudhope, University of Glamorgan, UK). OPTIMA performs the NLP tasks of Named Entity Recognition, Relation Extraction, Negation Detection and Word Sense Disambiguation using hand-crafted rules and SKOS terminological resources (English Heritage Thesauri and Glossaries). The resulted semantic annotations are associated with classes of the (ISO 21127:2006) CIDOC Conceptual Reference Model (CRM) and its archaeological extension, CRM-EH. OPTIMA is also targeted at the detection and recognition of contextual relations between CRM entities. Such relations are modeled with respect to the CRM-EH archaeology extension. The pipeline targets the CIDOC-CRM entities; E19.Physical_Object, E53.Place, E49.Time_Appellation and E57.Material and the CRM-EH entities; EHE1001.Context_Event, EHE1002.Production_Event, EHE1004.Deposition_Event and P45.consists_of material property
    Downloads: 2 This Week
    Last Update:
    See Project
  • 16
    Grok is a library of natural language processing components, including support for parsing with categorial grammars and various preprocessing tasks such as part-of-speech tagging, sentence detection, and tokenization.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 17
    IceNLP is an open source Natural Language Processing (NLP) toolkit for analyzing and processing Icelandic text. The toolkit is implemented in Java.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18

    Arabic Corpus

    Text categorization, arabic language processing, language modeling

    The Arabic Corpus {compiled by Dr. Mourad Abbas ( http://sites.google.com/site/mouradabbas9/corpora ) The corpus Khaleej-2004 contains 5690 documents. It is divided to 4 topics (categories). The corpus Watan-2004 contains 20291 documents organized in 6 topics (categories). Researchers who use these two corpora would mention the two main references: (1) For Watan-2004 corpus ---------------------- M. Abbas, K. Smaili, D. Berkani, (2011) Evaluation of Topic Identification Methods on Arabic Corpora,JOURNAL OF DIGITAL INFORMATION MANAGEMENT,vol. 9, N. 5, pp.185-192. 2) For Khaleej-2004 corpus --------------------------------- M. Abbas, K. Smaili (2005) Comparison of Topic Identification Methods for Arabic Language, RANLP05 : Recent Advances in Natural Language Processing ,pp. 14-17, 21-23 september 2005, Borovets, Bulgary. More useful references to check: ------------------------------------------- https://sites.google.com/site/mouradabbas9/corpora
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19

    CRP - Chemical Reaction Prediction

    Predicting Organic Reactions using Neural Networks.

    The intend is to solve the forward-reaction prediction problem, where the reactants are known and the interest is in generating the reaction products using Deep learning. This Graphical User Interface takes simplified molecular-input line-entry system (SMILES) as an input and generates the product SMILE & molecule. Beam search is used in Version 2, to generate top 5 predictions. Maximum input length for the model is 15 (excluding spaces).
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    This package contains different tools to add NLP capabilities for Lucene 4.x (it has been tested using Lucene version from 4.6.x to 4.8.1). Although it was originally developed for German, it is, mostly, language independent. It allows the user to lemmatize words to be indexed, to weight termy ba their parts of speech (e.g. weighting nouns mor hevaily than pronouns), and to add synonyms taken from GermaNet or a list you provide to the search index and thereby increase recall of lucene.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    This project is devoted to the development of natural language processing tools and resources for the Lingala language, which is spoken by tens of millions of people in central Africa.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22

    MITRE Annotation Toolkit

    A toolkit for managing and manipulating text annotations

    The MITRE Annotation Toolkit (MAT) is a suite of tools which can be used for automated and human tagging of annotations. Annotation is a process, used mostly by researchers in natural language processing, of enhancing documents with information about the various phrase types the documents contain. MAT supports both UI interaction and command-line interaction, and provides various levels of control over the overall annotation process. It can be customized for specific tasks (e.g., named entity identification, de-identification of medical records). The goal of MAT is not to help you configure your training engine (in the default case, the Carafe CRF system) to achieve the best possible performance on your data. MAT is for "everything else": all the tools you end up wishing you had.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    OpenCCG, the OpenNLP CCG Library, is a collection of natural language processing components and tools which provide support for parsing and realization with Combinatory Categorial Grammar (CCG).
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24
    Sanchay
    Sanchay is a collection of tools and APIs for language researchers. It has some implementations of NLP algorithms, some flexible APIs, several user friendly annotation interfaces and Sanchay Query Language for language resources.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 25
    Sylli
    Sylli is a universal syllabifier. Developed for Italian, it can easily be adapted to any language that is claimed to respect the SSP. Sylli divides timit, strings, files and directories into syllables.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • Next