.pdf free download - SourceForge

Showing 15 open source projects for ".pdf"

View related business solutions

Artificial Intelligence Windows Apache License V2.0 Clear Filters & Widen Search

Easily Host LLMs and Web Apps on Cloud Run
Run everything from popular models with on-demand NVIDIA L4 GPUs to web apps without infrastructure management.

Run frontend and backend services, batch jobs, host LLMs, and queue processing workloads without the need to manage infrastructure. Cloud Run gives you on-demand GPU access for hosting LLMs and running real-time AI—with 5-second cold starts and automatic scale-to-zero so you only pay for actual usage. New customers get $300 in free credit to start.

Try Cloud Run Free
Cut Cloud Costs with Google Compute Engine
Save up to 91% with Spot VMs and get automatic sustained-use discounts. One free VM per month, plus $300 in credits.

Save on compute costs with Compute Engine. Reduce your batch jobs and workload bill 60-91% with Spot VMs. Compute Engine's committed use offers customers up to 70% savings through sustained use discounts. Plus, you get one free e2-micro VM monthly and $300 credit to start.

Try Compute Engine
1

GROBID

A machine learning software for extracting information

...The extraction here covers the usual bibliographical information (e.g. title, abstract, authors, affiliations, keywords, etc.). References extraction and parsing from articles in PDF format, around .87 F1-score against on an independent PubMed Central set of 1943 PDF containing 90,125 references, and around .89 on a similar bioRxiv set of 2000 PDF (using the Deep Learning citation model). All the usual publication metadata are covered (including DOI, PMID, etc.).

Downloads: 5 This Week

Last Update: 2025-05-11
See Project
2

Tesseract OCR

Open Source OCR Engine

...Tesseract can recognize over 100 languages out-of-the-box, and can be trained to recognize other languages. It supports various output formats, including plain text, HTML, PDF and more. It also has unicode (UTF-8) support.

Downloads: 1,964 This Week

Last Update: 2025-12-26
See Project
3

Papermerge

Open Source Document Management System for Digital Archives

...Each user can be assigned different permissions to perform only a specific kind of action e.g. view only documents from a specific folder. OCR technology is vital part of Papermerge. It extracts text information from scanned documents, PDF, JPEG, TIFF files.

Downloads: 7 This Week

Last Update: 2025-07-24
See Project
4

Resume-Matcher

Improve your resumes with Resume Matcher

Resume-Matcher is a command-line application that compares resumes against job descriptions using natural language processing. It provides a compatibility score based on keyword relevance and highlights areas where the resume aligns—or doesn't—with the target role. Designed for job seekers and HR professionals, it helps improve resume tailoring and streamlines candidate screening.

Downloads: 2 This Week

Last Update: 2026-02-10
See Project
Build AI Apps with Gemini 3 on Vertex AI
Access Google’s most capable multimodal models. Train, test, and deploy AI with 200+ foundation models on one platform.

Vertex AI gives developers access to Gemini 3—Google’s most advanced reasoning and coding model—plus 200+ foundation models including Claude, Llama, and Gemma. Build generative AI apps with Vertex AI Studio, customize with fine-tuning, and deploy to production with enterprise-grade MLOps. New customers get $300 in free credits.

Try Vertex AI Free
5

deepdoctection

A Repo For Document AI

DeepDoctection is a document AI framework that applies deep learning techniques to analyze and extract structured data from scanned documents, PDFs, and images. deepdoctection is a Python library that orchestrates document extraction and document layout analysis tasks using deep learning models. It does not implement models but enables you to build pipelines using highly acknowledged libraries for object detection, OCR and selected NLP tasks and provides an integrated frameworks for...

Downloads: 5 This Week

Last Update: 2026-02-17
See Project
6

DeepSeek-OCR 2

Visual Causal Flow

DeepSeek-OCR-2 is the second-generation optical character recognition system developed to improve document understanding by introducing a “visual causal flow” mechanism, enabling the encoder to reorder visual tokens in a way that better reflects semantic structure rather than strict raster scan order. It is designed to handle complex layouts and noisy documents by giving the model causal reasoning capabilities that mimic human visual scanning behavior, enhancing OCR performance on documents...

Downloads: 6 This Week

Last Update: 2026-02-03
See Project
7

ArXiv MCP Server

A Model Context Protocol server for searching and analyzing arXiv

arxiv-mcp-server bridges AI assistants and the arXiv repository through a clean MCP interface, enabling search, metadata retrieval, and content access without bespoke scraping. With simple tools like “search” and “fetch,” an agent can find papers, pull abstracts, and download PDFs for downstream summarization or analysis. The project includes packaging and CI to publish to PyPI, plus tests and linting for reliability. Issue threads show feature requests such as extracting embedded LaTeX and...

Downloads: 0 This Week

Last Update: 2026-01-26
See Project
8

Controllable-RAG-Agent

This repository provides an advanced RAG

Controllable-RAG-Agent is an advanced Retrieval-Augmented Generation (RAG) system designed specifically for complex, multi-step question answering over your own documents. Instead of relying solely on simple semantic search, it builds a deterministic control graph that acts as the “brain” of the agent, orchestrating planning, retrieval, reasoning, and verification across many steps. The pipeline ingests PDFs, splits them into chapters, cleans and preprocesses text, then constructs vector...

Downloads: 0 This Week

Last Update: 2025-11-13
See Project
9

Jina

Build cross-modal and multimodal applications on the cloud

...Jina handles the infrastructure complexity, making advanced solution engineering and cloud-native technologies accessible to every developer. Build applications that deliver fresh insights from multiple data types such as text, image, audio, video, 3D mesh, PDF with Jina AI’s DocArray. Polyglot gateway that supports gRPC, Websockets, HTTP, GraphQL protocols with TLS. Intuitive design pattern for high-performance microservices. Seamless Docker container integration: sharing, exploring, sandboxing, versioning and dependency control via Jina Hub. Fast deployment to Kubernetes, Docker Compose and Jina Cloud. ...

Downloads: 0 This Week

Last Update: 2024-11-12
See Project
Managed MySQL, PostgreSQL, and SQL Databases on Google Cloud
Get back to your application and leave the database to us. Cloud SQL automatically handles backups, replication, and scaling.

Cloud SQL is a fully managed relational database for MySQL, PostgreSQL, and SQL Server. We handle patching, backups, replication, encryption, and failover—so you can focus on your app. Migrate from on-prem or other clouds with free Database Migration Service. IDC found customers achieved 246% ROI. New customers get $300 in credits plus a 30-day free trial.

Try Cloud SQL Free
10

VietOCR

Provides optical character recognition (OCR) solutions for Vietnamese language.

24 Reviews

Downloads: 157 This Week

Last Update: 2026-01-17
See Project
11

MyBox

Easy Tools of PDF, Image, File, Network, Data, and Medias

javafx-desktop-apps pdf image ocr icc barcode color-palette text bytes markdown html archive compress digest video audio editor converter media https://github.com/Mararsh/MyBox Self-contain packages need not java env nor installation. Jar packages need Java 16 or higher.

Downloads: 0 This Week

Last Update: 2026-02-10
See Project
12

LangChain Apps on Production with Jina

Langchain Apps on Production with Jina & FastAPI

Jina is an open-source framework for building scalable multi-modal AI apps on Production. LangChain is another open-source framework for building applications powered by LLMs. long-chain-serve helps you deploy your LangChain apps on Jina AI Cloud in a matter of seconds. You can benefit from the scalability and serverless architecture of the cloud without sacrificing the ease and convenience of local development. And if you prefer, you can also deploy your LangChain apps on your own...

Downloads: 0 This Week

Last Update: 2023-08-25
See Project
13

BingGPT

Desktop application of Bing's AI-powered chat (Windows, Mac, Linux)

Desktop application of new Bing's AI-powered chat 1. Get access to the early preview of new Bing 2. Sign in to your Microsoft account 3. Start chatting

1 Review

Downloads: 3 This Week

Last Update: 2023-07-02
See Project
14

Parsr

Transforms PDF, Documents and Images into Enriched Structured Data

Parsr is an open-source document parsing tool that converts PDFs, scanned images, and other structured documents into structured, machine-readable data formats.

Downloads: 0 This Week

Last Update: 2025-01-21
See Project
15

mbFXWords

Analyze text. Diagonal read subject, predicate, obj. Search other pdf.

... - Divide plain text: subject, predicate, object. - Count words: stemming. - Search for similar content: pdf's. Gives out subject, predicate and object of sentences of pdf and plain text files. Provides comfortable GUI. Automatic language detection.

Downloads: 1 This Week

Last Update: 2021-12-08
See Project