Showing 124 open source projects for "linguistic"

View related business solutions
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • Outgrown Windows Task Scheduler? Icon
    Outgrown Windows Task Scheduler?

    Free diagnostic identifies where your workflow is breaking down—with instant analysis of your scheduling environment.

    Windows Task Scheduler wasn't built for complex, cross-platform automation. Get a free diagnostic that shows exactly where things are failing and provides remediation recommendations. Interactive HTML report delivered in minutes.
    Download Free Tool
  • 1
    Stanford CoreNLP

    Stanford CoreNLP

    Stanford CoreNLP, a Java suite of core NLP tools

    ...Pipelines produce CoreDocuments, data objects that contain all of the annotation information, accessible with a simple API, and serializable to a Google Protocol Buffer. CoreNLP generates a variety of linguistic annotations, including parts of speech, named entities, dependency parses, and coreference.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Linguistic Tree Constructor

    Linguistic Tree Constructor

    Syntax tree editor for rapid annotation of existing text

    Linguistic Tree Constructor (LTC) is a tool for drawing lingusitic syntax trees of already-existing text. It is a syntax editor, not a text editor, so the text has to exist already. It is best suited for large-scale, rapid creation of hand-annotated treebanks. The user can define their own node categories, and can label each node with labels, also definable by the user.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 3
    compromise

    compromise

    Modest natural-language processing

    Language is complicated and there's a gazillion words. Compromise is a javascript library that interprets and pre-parses text and makes some reasonable decisions so things are way easier. Compromise tries its best to parse text. it is small, quick, and often good-enough. It is not as smart as you'd think. Conjugate and negate verbs in any tense. Play between plural, singular and possessive forms. Interpret plain-text numbers. Handle implicit terms. Use it on the client-side or as an...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 4
    Classical Language Toolkit (CLTK)

    Classical Language Toolkit (CLTK)

    The Classical Language Toolkit

    The Classical Language Toolkit (CLTK) is a Python library offering natural language processing support for classical languages, including Latin, Greek, and others.
    Downloads: 2 This Week
    Last Update:
    See Project
  • Atera - The depth of a full-stack IT platform, with the power of AI. Icon
    Atera - The depth of a full-stack IT platform, with the power of AI.

    Atera introduces your autonomous AI agent - Ensure operational efficiency at any scale with 24/7 autonomous IT support.

    Atera prioritizes security and compliance through robust protections that align with industry standards. Our AI-driven features were built on responsible AI principles and empower IT teams to work efficiently while maintaining trust and compliance.
    Learn More
  • 5
    Stanza

    Stanza

    Stanford NLP Python library for many human languages

    Stanza is a collection of accurate and efficient tools for the linguistic analysis of many human languages. Starting from raw text to syntactic analysis and entity recognition, Stanza brings state-of-the-art NLP models to languages of your choosing. Stanza is a Python natural language analysis package. It contains tools, which can be used in a pipeline, to convert a string containing human language text into lists of sentences and words, to generate base forms of those words, their parts of speech and morphological features, to give a syntactic structure dependency parse, and to recognize named entities. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Qwen3-TTS

    Qwen3-TTS

    Qwen3-TTS is an open-source series of TTS models

    ...The project includes pre-trained models and inference scripts that let users synthesize speech locally or integrate TTS into larger pipelines such as voice assistants, accessibility tools, or multimedia generation workflows. Because it’s part of the broader Qwen ecosystem, it benefits from the model’s understanding of linguistic nuances, enabling more accurate pronunciation, prosody, and contextual delivery than many traditional TTS systems. Developers can customize voice output parameters like speed, pitch, and volume, and combine the TTS stack with other AI components.
    Downloads: 32 This Week
    Last Update:
    See Project
  • 7
    Unredact

    Unredact

    A simple tool for reading in poorly redacted documents

    ...Unlike traditional optical character recognition (OCR), which only reads visible text, Unredact focuses on inferring missing content where redaction has been applied by analyzing surrounding context, font characteristics, and linguistic patterns to produce candidate reconstructions. It accepts a variety of input formats, automatically identifies redacted regions, and then generates text suggestions that are presented alongside visual overlays so users can choose or refine outputs.
    Downloads: 17 This Week
    Last Update:
    See Project
  • 8
    Step-Audio-EditX

    Step-Audio-EditX

    LLM-based Reinforcement Learning audio edit model

    ...Rather than treating audio editing as low-level waveform manipulation, this model converts speech into a sequence of discrete “audio tokens” (via a dual-codebook tokenizer) — combining a linguistic token stream and a semantic (prosody/emotion/style) token stream — thereby abstracting audio editing into high-level token operations. This allows users to modify not only what is said (the text) but also how it's said: emotion, tone, speaking style, prosody, accent, even paralinguistic cues. Because the model is trained with a “large-margin learning” objective over many synthesized and natural speech samples, it gains robust control over expressive attributes, and can perform iterative editing: e.g. you could record a line, then ask the model to “make it sadder,” “speak slower,” or “change accent to X.”
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    IndexTTS2

    IndexTTS2

    Industrial-level controllable zero-shot text-to-speech system

    IndexTTS is a modern, zero-shot text-to-speech (TTS) system engineered to deliver high-quality, natural-sounding speech synthesis with few requirements and strong voice-cloning capabilities. It builds on state-of-the-art models such as XTTS and other modern neural TTS backbones, improving them with a conformer-based speech conditional encoder and upgrading the decoder to a high-quality vocoder (BigVGAN2), leading to clearer and more natural audio output. The system supports zero-shot voice...
    Downloads: 10 This Week
    Last Update:
    See Project
  • Grafana: The open and composable observability platform Icon
    Grafana: The open and composable observability platform

    Faster answers, predictable costs, and no lock-in built by the team helping to make observability accessible to anyone.

    Grafana is the open source analytics & monitoring solution for every database.
    Learn More
  • 10
    Lingua-RS

    Lingua-RS

    The most accurate natural language detection library for Rust

    Lingua-RS is a language detection library implemented in Rust, designed to accurately identify the language of given text samples. It tells you which language some text is written in. This is very useful as a preprocessing step for linguistic data in natural language processing applications such as text classification and spell checking. Other use cases, for instance, might include routing e-mails to the right geographically located customer service department, based on the e-mails' languages.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    GLM-Image

    GLM-Image

    GLM-Image: Auto-regressive for Dense-knowledge and High-fidelity Image

    ...It excels at generating images that include complex layouts and detailed text content, making it especially useful for posters, diagrams, info-graphics, social media graphics, and visual content that requires precise text placement and semantic alignment. Because it blends linguistic reasoning with image synthesis, GLM-Image produces visual outputs where semantic relationships and textual accuracy are prioritized alongside artistic style and realism, and its model structure enables it to handle dense visual knowledge tasks that challenge many pure diffusion models. The model’s design and weights are available under an open-source license that encourages experimentation, integration, and deployment across a range of creative workflows.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 12
    Lingua-Py

    Lingua-Py

    The most accurate natural language detection library for Python

    Its task is simple: It tells you which language some text is written in. This is very useful as a preprocessing step for linguistic data in natural language processing applications such as text classification and spell checking. Other use cases, for instance, might include routing e-mails to the right geographically located customer service department, based on the e-mails' languages. Language detection is often done as part of large machine learning frameworks or natural language processing applications. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Qwen-VL

    Qwen-VL

    Chat & pretrained large vision language model

    Qwen-VL is Alibaba Cloud’s vision-language large model family, designed to integrate visual and linguistic modalities. It accepts image inputs (with optional bounding boxes) and text, and produces text (and sometimes bounding boxes) as output. The model variants (VL-Plus, VL-Max, etc.) have been upgraded for better visual reasoning, text recognition from images, fine-grained understanding, and support for high image resolutions / extreme aspect ratios.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 14
    Chinese-XLNet

    Chinese-XLNet

    Chinese XLNet pre-trained model

    ...Unlike traditional masked language modeling, XLNet uses a permutation language modeling objective that captures bidirectional context more effectively by training over all possible token orderings, yielding richer contextual representations. This model is trained on large-scale Chinese text datasets to learn linguistic patterns, long-range dependencies, and semantic nuance typical of Chinese writing, making it useful for tasks like text classification, question answering, named entity recognition, and language generation. Chinese-XLNet offers an alternative to models like BERT by emphasizing autoregressive and permutation-based learning, which can lead to performance improvements on certain benchmarks and tasks.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    PaddleSpeech

    PaddleSpeech

    Easy-to-use Speech Toolkit including Self-Supervised Learning model

    ...We provide production ready streaming asr and streaming tts system. Our frontend contains Text Normalization and Grapheme-to-Phoneme (G2P, including Polyphone and Tone Sandhi). Moreover, we use self-defined linguistic rules to adapt Chinese context.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    StyleTTS 2

    StyleTTS 2

    Towards Human-Level Text-to-Speech through Style Diffusion

    ...StyleTTS2 supports both single-speaker and multi-speaker configurations, with the ability to sample or transfer styles from reference audio, making it powerful for expressive TTS and character voices. The repository includes training scripts, configuration files, and pre-trained auxiliary modules such as a text aligner, pitch extractor, and PL-BERT-based linguistic encoder.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    ML Ferret

    ML Ferret

    Refer and Ground Anything Anywhere at Any Granularity

    ...The repo presents the vision-language pipeline, model assets, and paper resources that show how Ferret answers questions, follows instructions, and returns grounded outputs rather than just text. In practice, this enables tasks like “find that small red icon next to the chart and describe it” where both the linguistic reference and the visual region are ambiguous without fine spatial reasoning.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Fuzzy sets for Ada

    Fuzzy sets for Ada

    Fuzzy sets, logic, numbers; intuitionistic fuzzy sets, fuzzy linguis

    Fuzzy sets for Ada is a library providing implementations of confidence factors with the operations not, and, or, xor, +, and *, classical fuzzy sets with the set-theoretic operations and the operations of the possibility theory, intuitionistic fuzzy sets with the operations on them, fuzzy logic based on the intuitionistic fuzzy sets and the possibility theory; fuzzy numbers, both integer and floating-point with conventional arithmetical operations, and linguistic variables and sets of linguistic variables with operations on them. String-oriented I/O is supported. A rich set of GTK+ GUI widgets is provided.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19

    WhisperJAV

    A subtitle generator for Japanese Adult Videos.

    A subtitle generator for Japanese Adult Videos. Transformer-based ASR architectures like Whisper suffer significant performance degradation when applied to the spontaneous and noisy domain of JAV. This degradation is driven by specific acoustic and temporal characteristics that defy the statistical distributions of standard training data.
    Leader badge
    Downloads: 57 This Week
    Last Update:
    See Project
  • 20
    IMS Open Corpus Workbench

    IMS Open Corpus Workbench

    Indexing and query tools for very large text corpora

    The IMS Open Corpus Workbench is a collection of tools for managing and querying large text corpora (100 M words and more) with linguistic annotations. Its central component is the flexible and efficient query processor CQP, which can be used interactively in a terminal session, as a backend e.g. from a Perl script, or through the Web-based GUI CQPweb.
    Leader badge
    Downloads: 7 This Week
    Last Update:
    See Project
  • 21
    OmegaT - multiplatform CAT tool

    OmegaT - multiplatform CAT tool

    The free computer aided translation (CAT) tool for professionals

    OmegaT is a free and open source multiplatform Computer Assisted Translation tool with fuzzy matching, translation memory, keyword search, glossaries, and translation leveraging into updated projects.
    Leader badge
    Downloads: 1,529 This Week
    Last Update:
    See Project
  • 22
    ...Share your open-source data sets and MWE extraction tools, exchange ideas on evaluation strategies and further development of the tools, and discuss theoretical definitions and linguistic properties of MWEs.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Fuzzy machine learning framework

    Fuzzy machine learning framework

    A library and a GUI front-end for fuzzy machine learning

    ...The approach is based on the intuitionistic fuzzy sets and the possibility theory. Further characteristics are fuzzy features and classes; numeric, enumeration features and features based on linguistic variables; user-defined features; derived and evaluated features; classifiers as features for building hierarchical systems; automatic refinement in case of dependent features; incremental learning; fuzzy control language support; object-oriented software design with extensible objects and automatic garbage collection; generic data base support through ODBC or SQLite; text I/O and HTML output; an advanced graphical user interface based on GTK+; and examples of use.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    LaBB-CAT

    LaBB-CAT

    A linguistic annotation store

    LABB-CAT is a browser-based linguistics research tool that stores recordings and regular-expression searchable text transcripts of interviews. The search results, entire transcripts, and media, can be viewed or exported in a variety of format
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Text Encoding Initiative

    Text Encoding Initiative

    TEI produces the TEI Guidelines and associated software

    The TEI is an international and interdisciplinary standard used by libraries, museums, publishers, and academics to represent all kinds of literary and linguistic texts, using an encoding scheme that is maximally expressive and minimally obsolescent.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next