text parsing free download

Showing 214 open source projects for "text parsing"

View related business solutions

Outgrown Windows Task Scheduler?
Free diagnostic identifies where your workflow is breaking down—with instant analysis of your scheduling environment.

Windows Task Scheduler wasn't built for complex, cross-platform automation. Get a free diagnostic that shows exactly where things are failing and provides remediation recommendations. Interactive HTML report delivered in minutes.

Download Free Tool
Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
1

TextFSM

Python module for parsing semi-structured text into python tables

TextFSM is a Python library created by Google that provides a template-based state machine engine for parsing semi-structured text. It is particularly useful for extracting structured data from command-line interface (CLI) outputs, such as those from network devices, routers, and switches. By defining parsing logic through reusable template files, TextFSM transforms unstructured text into structured data like lists or tables without requiring complex regular expression code. ...

Downloads: 0 This Week

Last Update: 2025-10-11
See Project
2

YAML

JavaScript parser and stringifier for YAML

yaml is a definitive library for YAML, the human friendly data serialization standard. This library supports both YAML 1.1 and YAML 1.2 and all common data schemas, passes all of the yaml-test-suite tests. It can accept any string as input without throwing, parsing as much YAML out of it as it can, and supports parsing, modifying, and writing YAML comments and blank lines. The library is released under the ISC open source license, and the code is available on GitHub. It has no external...

Downloads: 11 This Week

Last Update: 2025-11-30
See Project
3

npm-pdfreader

Parse text and tables from PDF files.

npm-pdfreader is a Node.js library for reading text and parsing tables from PDF files. It supports tabular data with automatic column detection and rule-based parsing, making it useful for extracting structured data from PDFs.

Downloads: 3 This Week

Last Update: 2025-11-01
See Project
4

RAG Anything

RAG-Anything: All-in-One RAG Framework

...The system uses a multi-stage pipeline (e.g., document parsing, content analysis, knowledge graph construction, intelligent retrieval) so queries can navigate across modalities with deeper understanding and relevance.

Downloads: 4 This Week

Last Update: 3 days ago
See Project
Enterprise-grade ITSM, for every business
Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.

Try it Free
5

GROBID

A machine learning software for extracting information

GROBID is a machine learning library for extracting, parsing, and re-structuring raw documents such as PDF into structured XML/TEI encoded documents with a particular focus on technical and scientific publications. First developments started in 2008 as a hobby. In 2011 the tool has been made available in open source. Work on GROBID has been steady as a side project since the beginning and is expected to continue as such. Header extraction and parsing from article in PDF format. The...

Downloads: 6 This Week

Last Update: 2025-05-11
See Project
6

ChordSheetJS

A JavaScript library for parsing and formatting chords and chord sheet

ChordSheetJS is a JavaScript library for parsing, formatting, and transposing chord sheets. It supports various chord sheet formats and provides tools for rendering and manipulating chord and lyric data.

Downloads: 2 This Week

Last Update: 2026-01-30
See Project
7

Ksoup

Ksoup is a lightweight Kotlin Multiplatform library for parsing HTML

Ksoup is a lightweight Kotlin Multiplatform library for parsing HTML, extracting HTML tags, attributes, and text, and encoding and decoding HTML entities.

Downloads: 0 This Week

Last Update: 2025-06-08
See Project
8

zpdf

Zero-copy PDF text extraction library written in Zig

zpdf is a high-performance PDF text extraction library written in Zig that focuses on speed, low overhead, and modern parsing techniques. It leans heavily on memory-mapped file reading and zero-copy patterns where possible, so it can scan large PDFs without repeatedly copying data around in memory. The library supports streaming extraction using efficient arena allocation, making it well suited for workloads that need to process big documents quickly or in batches.

Downloads: 1 This Week

Last Update: 7 days ago
See Project
9

LlamaParse

Parse files for optimal RAG

LlamaParse is a GenAI-native document parser that can parse complex document data for any downstream LLM use case (RAG, agents). Load in 160+ data sources and data formats, from unstructured, and semi-structured, to structured data (API's, PDFs, documents, SQL, etc.) Store and index your data for different use cases. Integrate with 40+ vector stores, document stores, graph stores, and SQL db providers.

Downloads: 1 This Week

Last Update: 5 days ago
See Project
Run applications fast and securely in a fully managed environment
Cloud Run is a fully-managed compute platform that lets you run your code in a container directly on top of scalable infrastructure.

Run frontend and backend services, batch jobs, deploy websites and applications, and queue processing workloads without the need to manage infrastructure.

Try for free
10

Markdig

A fast, powerful, CommonMark compliant, extensible Markdown processor

A fast, powerful, CommonMark compliant, extensible Markdown processor for .NET. Very fast parser and HTML renderer (no-regexp), very lightweight in terms of GC pressure. Abstract Syntax Tree with precise source code location for syntax tree, useful when building a Markdown editor. Check out MarkdownEditor for Visual Studio powered by Markdig! Even the core Markdown/CommonMark parsing is pluggable, so it allows to disable built-in Markdown/Commonmark parsing (e.g Disable HTML parsing) or...

Downloads: 1 This Week

Last Update: 2025-11-25
See Project
11

ANTLR

Parser generator to read, process, or translate structured text

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files. It's widely used to build languages, tools, and frameworks. From a grammar, ANTLR generates a parser that can build and walk parse trees. It’s widely used in academia and industry to build all sorts of languages, tools, and frameworks. Twitter search uses ANTLR for query parsing, with over 2 billion queries a day. ...

Downloads: 10 This Week

Last Update: 2024-08-03
See Project
12

CommandLineUtils

Command line parsing and utilities for .NET

CommandLineUtils is a library that helps developers implement command line applications in .NET. The primary goal of the library is to assist with parsing command line arguments and executing the correct commands related to those arguments. The library also provides various other utilities such as input helpers. This project helps you create command line applications using .NET. It simplifies parsing arguments provided on the command line, validating user inputs, and generating help text. ...

Downloads: 16 This Week

Last Update: 2026-01-21
See Project
13

commonmark-java

Java library for parsing and rendering CommonMark (Markdown)

Java library for parsing and rendering Markdown text according to the CommonMark specification (and some extensions). Provides classes for parsing input to an abstract syntax tree of nodes (AST), visiting and manipulating nodes, and rendering to HTML. It started out as a port of commonmark.js, but has since evolved into a full library with a nice API.

Downloads: 0 This Week

Last Update: 2026-01-14
See Project
14

tree-sitter

An incremental parsing system for programming tools

Tree-sitter is a parser generator tool and an incremental parsing library. It can build a concrete syntax tree for a source file and efficiently update the syntax tree as the source file is edited. General enough to parse any programming language. Fast enough to parse on every keystroke in a text editor. Robust enough to provide useful results even in the presence of syntax errors. Dependency-free so that the runtime library (which is written in pure C) can be embedded in any application. ...

Downloads: 4 This Week

Last Update: 6 days ago
See Project
15

Underthesea

Underthesea - Vietnamese NLP Toolkit

Underthesea is a Vietnamese NLP toolkit providing various text processing capabilities, including word segmentation, part-of-speech tagging, and named entity recognition.

Downloads: 9 This Week

Last Update: 24 hours ago
See Project
16

Asciidoctor

A fast, open source text processor and publishing toolchain

A fast text processor & publishing toolchain for converting AsciiDoc to HTML5, DocBook & more. Asciidoctor is a fast, open source, Ruby-based text processor for parsing AsciiDoc® into a document model and converting it to output formats such as HTML 5, DocBook 5, manual pages, PDF, EPUB 3, and other formats. Asciidoctor also has an ecosystem of extensions, converters, build plugins, and tools to help you author and publish content written in AsciiDoc.

Downloads: 13 This Week

Last Update: 2025-10-24
See Project
17

ELisp Tree-sitter

Tree-sitter bindings for Emacs Lisp

...The minor mode tree-sitter-mode provides a buffer-local syntax tree, which is kept up-to-date with changes to the buffer’s text. Run M-x tree-sitter-hl-mode to replace the regex-based highlighting provided by font-lock-mode with tree-based syntax highlighting.

Downloads: 0 This Week

Last Update: 2026-01-16
See Project
18

Hazm

Persian NLP Toolkit

Hazm is a natural language processing (NLP) library for Persian text, offering various tools for text preprocessing, tokenization, part-of-speech tagging, and more.

Downloads: 0 This Week

Last Update: 2025-12-20
See Project
19

Notion-to-MD

Convert notion pages, block and list of blocks to markdown

Notion-to-MD is a Node.js package that allows you to convert Notion pages to Markdown format.Convert notion pages, blocks, and list of blocks to markdown (supports nesting) using notion-sdk-js.

Downloads: 0 This Week

Last Update: 2025-07-19
See Project
20

mavonEditor

A markdown editor based on Vue

A markdown editor based on Vue that supports a variety of personalized features. The default toolbar properties are all true, You can customize the object to cover them. The language parsing files and code highlighting in Code Highlighting highlight.js will be loaded on demand. GitHub-markdown-CSS and katex will load only when mounted.

Downloads: 0 This Week

Last Update: 2025-03-05
See Project
21

Umi-OCR

OCR software, free and offline

Umi-OCR is a free and open-source optical character recognition (OCR) tool designed to provide fast, offline text extraction from images, screenshots, PDFs, and more without requiring a network connection. It includes a highly efficient offline OCR engine with built-in multilingual recognition libraries, so users can extract text across multiple languages with high accuracy directly on their machines. The software supports flexible usage patterns including screenshot capture OCR, batch processing of large sets of images or documents, PDF parsing, QR code detection, and layout-aware paragraph output. ...

Downloads: 45 This Week

Last Update: 2026-01-15
See Project
22

Perlite

A web-based markdown viewer optimized for Obsidian

A web-based markdown viewer optimized for Obsidian Notes. Just put your whole Obsidian vault or markdown folder/file structure in your web directory. The page builds itself. It's an open source alternative to Obsidian Publish.

Downloads: 0 This Week

Last Update: 2026-01-21
See Project
23

Helix

A post-modern modal text editor

Helix is a modal (Kakoune/Vim‑inspired) terminal-based text editor written in Rust. It features modern modal editing, multiple selections, smart syntax highlighting, and built-in language server (LSP) integration leveraging tree‑sitter for fast, incremental parsing and code intelligence.

Downloads: 0 This Week

Last Update: 2025-07-31
See Project
24

Vis

A vi-like editor based on Plan 9's structural regular expressions

...There is also a Lua API for in-process extensions. Vis strives to be simple and focuses on its core task: efficient text management.

Downloads: 1 This Week

Last Update: 2024-05-01
See Project
25

py-pdf-parser

A Python tool to help extracting information from structured PDFs

py-pdf-parser is a Python tool designed to help extract information from structured PDFs. It provides a simple interface to define parsing rules and extract data from PDF documents.

Downloads: 0 This Week

Last Update: 2025-04-28
See Project