Showing 124 open source projects for ".pdf"

View related business solutions
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • Ship AI Apps Faster with Vertex AI Icon
    Ship AI Apps Faster with Vertex AI

    Go from idea to deployed AI app without managing infrastructure. Vertex AI offers one platform for the entire AI development lifecycle.

    Ship AI apps and features faster with Vertex AI—your end-to-end AI platform. Access Gemini 3 and 200+ foundation models, fine-tune for your needs, and deploy with enterprise-grade MLOps. Build chatbots, agents, or custom models. New customers get $300 in free credit.
    Try Vertex AI Free
  • 1
    PDF.js

    PDF.js

    A PDF Reader in JavaScript

    PDF.js is a web standards-based platform for parsing and rendering Portable Document Formats (PDFs). Open source and built with HTML5, this PDF viewer is supported by a great community and Mozilla Labs. PDF.js can be used on both modern and older browsers, and is built into version 19+ of Firefox.
    Downloads: 87 This Week
    Last Update:
    See Project
  • 2
    pdfcpu

    pdfcpu

    A PDF processor written in Go

    pdfcpu is a PDF processing library written in Go supporting encryption. It provides both an API and a CLI. Supported are all versions up to PDF 1.7 (ISO-32000). This is an effort to build a comprehensive PDF processing library from the ground up written in Go. Over time pdfcpu aims to support the standard range of PDF processing features and also any interesting use cases that may present themselves along the way.
    Downloads: 28 This Week
    Last Update:
    See Project
  • 3
    Vanilla.PDF

    Vanilla.PDF

    Cross-platform SDK for creating and modifying PDF documents

    ...Vanilla.PDF supports advanced PDF features such as adding CMS (PKCS#7) digital signatures, modifying content streams and metadata, and working with encryption and permissions based on standard PDF security models. It includes tools for parsing PDF internals like cross-reference tables and objects, providing fine-grained document analysis capabilities. The project is unit-tested with continuous integration pipelines, supporting sanitizers for enhanced code quality and stability.
    Downloads: 16 This Week
    Last Update:
    See Project
  • 4
    PdfPig

    PdfPig

    Read and extract text and other content from PDFs in C#

    This project allows users to read and extract text and other content from PDF files. In addition the library can be used to create simple PDF documents containing text and geometrical shapes.
    Downloads: 5 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    GROBID

    GROBID

    A machine learning software for extracting information

    ...The extraction here covers the usual bibliographical information (e.g. title, abstract, authors, affiliations, keywords, etc.). References extraction and parsing from articles in PDF format, around .87 F1-score against on an independent PubMed Central set of 1943 PDF containing 90,125 references, and around .89 on a similar bioRxiv set of 2000 PDF (using the Deep Learning citation model). All the usual publication metadata are covered (including DOI, PMID, etc.).
    Downloads: 5 This Week
    Last Update:
    See Project
  • 6
    xhtml2pdf

    xhtml2pdf

    A library for converting HTML into PDFs using ReportLab

    xhtml2pdf enables users to generate PDF documents from HTML content easily and with automated flow control such as pagination and keeping text together. The Python module can be used in any Python environment, including Django. The Command line tool is a stand-alone program that can be executed from the command line.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    HummusJS

    HummusJS

    Node.js module for high performance creation and modification of PDFs

    ...Notable examples for Emoji fonts are Windows Segoe UI emoji and Google Noto font. This means that writing text that include emojis will result in lovely colorful emojis, rather than black and white representations. PDFHummus is a fast and free PDF Writing, Parsing and Modification library.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 8

    Tesseract OCR

    Open Source OCR Engine

    ...Tesseract can recognize over 100 languages out-of-the-box, and can be trained to recognize other languages. It supports various output formats, including plain text, HTML, PDF and more. It also has unicode (UTF-8) support.
    Downloads: 1,964 This Week
    Last Update:
    See Project
  • 9
    Asciidoc Editor based on JavaFX 20

    Asciidoc Editor based on JavaFX 20

    Asciidoc Editor and Toolchain written with JavaFX 19

    Asciidoc FX is a WYSIWYG editor for the Asciidoc markup language. You can build PDF, Epub, and HTML books, documents, and slides. Supported Operating Systems and Builds shows the list of available builds with links for reference. If you are looking for the very latest version, visit the link in the note above to be guaranteed of downloading the latest and greatest version of AsciidocFX. AsciidocFX converts documents via the AsciidoctorJ library.
    Downloads: 3 This Week
    Last Update:
    See Project
  • AI-generated apps that pass security review Icon
    AI-generated apps that pass security review

    Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

    Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
    Try Retool free
  • 10
    changedetection.io

    changedetection.io

    The best free open source website change detection and restock service

    ...Monitor and track PDF file changes, and know when a PDF file has text changes. Know when your favourite product is on sale, or other special deals are announced before anyone else. Detect and monitor changes in JSON API responses.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 11
    Papermerge

    Papermerge

    Open Source Document Management System for Digital Archives

    ...Each user can be assigned different permissions to perform only a specific kind of action e.g. view only documents from a specific folder. OCR technology is vital part of Papermerge. It extracts text information from scanned documents, PDF, JPEG, TIFF files.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 12
    Resume-Matcher

    Resume-Matcher

    Improve your resumes with Resume Matcher

    Resume-Matcher is a command-line application that compares resumes against job descriptions using natural language processing. It provides a compatibility score based on keyword relevance and highlights areas where the resume aligns—or doesn't—with the target role. Designed for job seekers and HR professionals, it helps improve resume tailoring and streamlines candidate screening.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 13
    deepdoctection

    deepdoctection

    A Repo For Document AI

    DeepDoctection is a document AI framework that applies deep learning techniques to analyze and extract structured data from scanned documents, PDFs, and images. deepdoctection is a Python library that orchestrates document extraction and document layout analysis tasks using deep learning models. It does not implement models but enables you to build pipelines using highly acknowledged libraries for object detection, OCR and selected NLP tasks and provides an integrated frameworks for...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 14
    DeepSeek-OCR 2

    DeepSeek-OCR 2

    Visual Causal Flow

    DeepSeek-OCR-2 is the second-generation optical character recognition system developed to improve document understanding by introducing a “visual causal flow” mechanism, enabling the encoder to reorder visual tokens in a way that better reflects semantic structure rather than strict raster scan order. It is designed to handle complex layouts and noisy documents by giving the model causal reasoning capabilities that mimic human visual scanning behavior, enhancing OCR performance on documents...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 15
    PRDownloader

    PRDownloader

    A file downloader library for Android with pause and resume support

    A file downloader library for Android with pause and resume support. PRDownloader can be used to download any type of files like image, video, pdf, apk and etc. This file downloader library supports pause and resume while downloading a file. Supports large file download. This downloader library has a simple interface to make download request. We can check if the status of downloading with the given download Id. PRDownloader gives callbacks for everything like onProgress, onCancel, onStart, onError and etc while downloading a file. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 16
    PaperQA2

    PaperQA2

    High accuracy RAG for answering questions from scientific documents

    PaperQA2 is a package for doing high-accuracy retrieval augmented generation (RAG) on PDFs or text files, with a focus on the scientific literature. See our recent 2024 paper to see examples of PaperQA2's superhuman performance in scientific tasks like question answering, summarization, and contradiction detection. In this example we take a folder of research paper PDFs, magically get their metadata - including citation counts and a retraction check, then parse and cache PDFs into a...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 17
    docker-maven-plugin

    docker-maven-plugin

    Maven plugin for running and creating Docker images

    This is a Maven plugin for building Docker images and managing containers for integration tests. It works with Maven 3.0.5 and Docker 1.6.0 or later.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Visual Regression Tracker

    Visual Regression Tracker

    Backend and Frontend application for tracking differences via image

    Open source, self-hosted solution for visual testing and managing results of visual testing. Service receives images, performs pixel-by-pixel comparisons with its previously accepted baseline, and provides immediate results in order to catch unexpected changes. Use implemented libraries to integrate with existing automated suites by adding assertions based on image comparison. We provide native integration with automation libraries, core SDK and Rest API interfaces that allow the system to...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    QPDF

    QPDF

    PDF transformation/manipulation program + library

    QPDF is a C++ library and set of programs that inspect and manipulate the structure of PDF files. It can encrypt and linearize files, expose the internals of a PDF file, and do many other operations useful to end users and PDF developers.
    Leader badge
    Downloads: 1,057 This Week
    Last Update:
    See Project
  • 20
    ArXiv MCP Server

    ArXiv MCP Server

    A Model Context Protocol server for searching and analyzing arXiv

    arxiv-mcp-server bridges AI assistants and the arXiv repository through a clean MCP interface, enabling search, metadata retrieval, and content access without bespoke scraping. With simple tools like “search” and “fetch,” an agent can find papers, pull abstracts, and download PDFs for downstream summarization or analysis. The project includes packaging and CI to publish to PyPI, plus tests and linting for reliability. Issue threads show feature requests such as extracting embedded LaTeX and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Controllable-RAG-Agent

    Controllable-RAG-Agent

    This repository provides an advanced RAG

    Controllable-RAG-Agent is an advanced Retrieval-Augmented Generation (RAG) system designed specifically for complex, multi-step question answering over your own documents. Instead of relying solely on simple semantic search, it builds a deterministic control graph that acts as the “brain” of the agent, orchestrating planning, retrieval, reasoning, and verification across many steps. The pipeline ingests PDFs, splits them into chapters, cleans and preprocesses text, then constructs vector...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Jina

    Jina

    Build cross-modal and multimodal applications on the cloud

    ...Jina handles the infrastructure complexity, making advanced solution engineering and cloud-native technologies accessible to every developer. Build applications that deliver fresh insights from multiple data types such as text, image, audio, video, 3D mesh, PDF with Jina AI’s DocArray. Polyglot gateway that supports gRPC, Websockets, HTTP, GraphQL protocols with TLS. Intuitive design pattern for high-performance microservices. Seamless Docker container integration: sharing, exploring, sandboxing, versioning and dependency control via Jina Hub. Fast deployment to Kubernetes, Docker Compose and Jina Cloud. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Apache OpenOffice

    Apache OpenOffice

    The free and Open Source productivity suite

    ...OpenOffice is available in many languages, works on all common computers, stores data in ODF - the international open standard format - and is able to read and write files in other formats, included the format used by the most common office suite packages. OpenOffice is also able to export files in PDF format. OpenOffice has supported extensions, in a similar manner to Mozilla Firefox, making easy to add new functionality to an existing OpenOffice installation.
    Leader badge
    Downloads: 281,362 This Week
    Last Update:
    See Project
  • 24
    Hypernomicon

    Hypernomicon

    Hypertext-infused philosophy personal database software

    Hypernomicon is a personal productivity/database application for researchers that combines structured note-taking, mind-mapping, management of files (e.g., PDFs) and folders, and reference management into an integrated environment that organizes all of the above into semantic networks or hierarchies in terms of debates, positions, arguments, labels, terminology/concepts, and user-defined keywords by means of database relations and automatically generated hyperlinks (hence ‘Hyper’ in the...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 25
    Rolemaster Office
    PC and NPC character generator for Rolemaster RMFRP roleplaying system (from Iron Crown Enterprises). The program calculates all bonus and generates a nice PDF character sheet that contains additionally pages. The programm does not provide during-game support.
    Downloads: 11 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB