Go from Data Warehouse to Data and AI platform with BigQuery
Build, train, and run ML models with simple SQL. Automate data prep, analysis, and predictions with built-in AI assistance from Gemini.
BigQuery is more than a data warehouse—it's an autonomous data-to-AI platform. Use familiar SQL to train ML models, run time-series forecasts, and generate AI-powered insights with native Gemini integration. Built-in agents handle data engineering and data science workflows automatically. Get $300 in free credit, query 1 TB, and store 10 GB free monthly.
Try BigQuery Free
Ship AI Apps Faster with Vertex AI
Go from idea to deployed AI app without managing infrastructure. Vertex AI offers one platform for the entire AI development lifecycle.
Ship AI apps and features faster with Vertex AI—your end-to-end AI platform. Access Gemini 3 and 200+ foundation models, fine-tune for your needs, and deploy with enterprise-grade MLOps. Build chatbots, agents, or custom models. New customers get $300 in free credit.
Data Import and export framework in JAVA. Data can be exported / imported to and from XML, Excel, PDF, Delimited file (CSV, TAB, User defined delimiter), Database table.
ElateXam is a complete toolsuite for electronic exams. It includes several task types (multiple choice, cloze texts, free texts, mapping, drawing, autotool), correction tools, analysis and export features. It's used at the university of Leipzig.
Lire is a pluggable log analyzer, supporting HTTP, email, DNS, FTP, firewall and print services. Output generated can be txt, (X)HTML, PDF, RTF, and DocBook. The latter four support graphics. For news/support visit the project homepage.
LAMP web application which currently provides basic facilities to manage documentation concerning a company's human resources with an eye on software lifecycles. It produces ODF documents and reports. PDF output is supported through openoffice.org.
New to Google Cloud? Get $300 in free credit to explore Compute Engine, BigQuery, Cloud Run, Vertex AI, and 150+ other products.
Start your next project with $300 in free Google Cloud credit. Spin up VMs, run containers, query exabytes in BigQuery, or build AI apps with Vertex AI and Gemini. Once your credits are used, keep building with 20+ products with free monthly usage, including Compute Engine, Cloud Storage, GKE, and Cloud Run functions. Sign up to start building right away.
SplitPDF -SplitPDF.jar- is a ‘command-line driven’ Java-program, it splits a PDF-file by bookmarks into separated PDF’s. The bookmark is used as title for the newly created PDF. Extremely usefull and fast in a batch processing environment.
Note this project has moved to github: https://github.com/sglass68/paperman
This is a document management program similar to PaperPort in aim. It allows scanning of paper into directories quickly and easily with a thumbnail/desktop view. Maxview decodes .max files and can convert them to PDF.
A tool to help in customization and generation of reports in PDF to open source web-based systems. It was developed a visual editor using PHP for interactive design of reports, according to the needs of the developer.
From dev environments to AI training, choose preset or custom VMs with 1–96 vCPUs and industry-leading 99.95% uptime SLA.
Compute Engine delivers high-performance virtual machines for web apps, databases, containers, and AI workloads. Choose from general-purpose, compute-optimized, or GPU/TPU-accelerated machine types—or build custom VMs to match your exact specs. With live migration and automatic failover, your workloads stay online. New customers get $300 in free credits.
ANts P2P realizes a third generation P2P net. It protects your privacy while you are connected and makes you not trackable, hiding your identity (ip) and crypting everything you are sending/receiving from others.
TFTgallery is a PHP based web image gallery which doesn't need a database. It uses the directory structure for data storage. The main features are: an on-the-fly thumbnail creation, PDF and ZIP creation, image calendars, EXIF support
A Java library for rendering forms on PDF (may be extended for other formats), based on a Template File (PDF or other type), and an XML description of contents. This library uses the iText package (http://www.lowagie.com/iText/) for PDF manipulation.
CNV Workshop is a web-enabled platform for analyzing genome variation such as copy number variation (CNV). Learn about CNV Workshop in our associated BMC Bioinformatics manuscript: http://www.biomedcentral.com/1471-2105/11/74
Autshumato PTE (PDF Text Extractor) is a utility application which extracts the text from PDF documents with the aim of making it translatable. It is also able to extract the pages of the PDF document as PNG images.
Educative desktop application that it's main goal is develop any kind of mathematical operation in an easy and quick way. It's focus to users that don't have programming knowledge.
Whyteboard is a painting whiteboard application for Linux and Windows, that allows the annotation of PDF and PostScript documents, and image files with common drawing tools.
Reporting engine library written in C. Create one XML file and generate PDF, HTML, TXT, and CSV reports based on queries. Has support for MySQL, PostgreSQL, ODBC. Bindings for PHP, Java, Python.
PHP, Perl and MySql based web interface for the Nessus security scanner and Nmap port scanner. The system presents scan results via a Email notification, a HTML interface, or exported to a PDF file.
Toolkit e-formulieren is een opensource toolkit voor het op een gebruikersvriendelijke manier kunnen maken en onderhouden van e-formulieren.
De Toolkit maakt gebruik van Orbeon, en ondersteunt XForms-compliant e-formulieren, evt. met voorinvulling.
Note as of 2013-09-13: I'm moving this project over to github due to this:
http://www.gluster.org/2013/08/how-far-the-once-mighty-sourceforge-has-fallen/
Feel free to rejoin the more updated versions on
https://github.com/mnott/PDFOCRWrapper
Thanks.
Matthias
--
This is a wrapper written in Java that allows to recursively iterate a directory structure and call an OCR engine on each found PDF on the condition that it hat not yet been called for that PDF. It works well with the ABBYY OCR Engine for Linux.