Browse free open source Multimedia software and projects for Windows and Linux below. Use the toggles on the left to filter open source Multimedia software by OS, license, language, programming language, and project status.

  • Outgrown Windows Task Scheduler? Icon
    Outgrown Windows Task Scheduler?

    Free diagnostic identifies where your workflow is breaking down—with instant analysis of your scheduling environment.

    Windows Task Scheduler wasn't built for complex, cross-platform automation. Get a free diagnostic that shows exactly where things are failing and provides remediation recommendations. Interactive HTML report delivered in minutes.
    Download Free Tool
  • AI-powered service management for IT and enterprise teams Icon
    AI-powered service management for IT and enterprise teams

    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
    Try it Free
  • 1
    OpenCV

    OpenCV

    Open Source Computer Vision Library

    The Open Source Computer Vision Library has >2500 algorithms, extensive documentation and sample code for real-time computer vision. It works on Windows, Linux, Mac OS X, Android, iOS in your browser through JavaScript. Languages: C++, Python, Julia, Javascript Homepage: https://opencv.org Q&A forum: https://forum.opencv.org/ Documentation: https://docs.opencv.org Source code: https://github.com/opencv Please pay special attention to our tutorials! https://docs.opencv.org/master Books about the OpenCV are described here: https://opencv.org/books.html
    Leader badge
    Downloads: 2,604 This Week
    Last Update:
    See Project
  • 2
    eSpeak: speech synthesis
    Text to Speech engine for English and many other languages. Compact size with clear but artificial pronunciation. Available as a command-line program with many options, a shared library for Linux, and a Windows SAPI5 version.
    Leader badge
    Downloads: 2,276 This Week
    Last Update:
    See Project
  • 3
    DeepFaceLab

    DeepFaceLab

    The leading software for creating deepfakes

    DeepFaceLab is currently the world's leading software for creating deepfakes, with over 95% of deepfake videos created with DeepFaceLab. DeepFaceLab is an open-source deepfake system that enables users to swap the faces on images and on video. It offers an imperative and easy-to-use pipeline that even those without a comprehensive understanding of the deep learning framework or model implementation can use; and yet also provides a flexible and loose coupling structure for those who want to strengthen their own pipeline with other features without having to write complicated boilerplate code. DeepFaceLab can achieve results with high fidelity that are indiscernible by mainstream forgery detection approaches. Apart from seamlessly swapping faces, it can also de-age faces, replace the entire head, and even manipulate speech (though this will require some skill in video editing).
    Downloads: 227 This Week
    Last Update:
    See Project
  • 4
    NAPS2 - Not Another PDF Scanner

    NAPS2 - Not Another PDF Scanner

    Scan documents to PDF and other file types, as simply as possible.

    Visit NAPS2's home page at www.naps2.com. NAPS2 is a document scanning application with a focus on simplicity and ease of use. Scan your documents from WIA- and TWAIN-compatible scanners, organize the pages as you like, and save them as PDF, TIFF, JPEG, PNG, and other file formats. Available on Windows, Mac, and Linux. NAPS2 is currently available in over 40 different languages. Want to see NAPS2 in your preferred language? Help translate! See the wiki for more details.
    Leader badge
    Downloads: 877 This Week
    Last Update:
    See Project
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 5
    Buzz

    Buzz

    Transcribe and translate audio offline on your personal computer

    Buzz transcribes and translates audio to text offline using OpenAI's Whisper. Import audio and video files into Buzz and export them as TXT, SRT, or VTT files. Buzz supports Whisper, Whisper.cpp, Faster Whisper, Whisper-compatible models from the Hugging Face repository, and the OpenAI Whisper API. Get linux versions from: - https://flathub.org/apps/io.github.chidiwilliams.Buzz - https://snapcraft.io/buzz Home page of Buzz https://github.com/chidiwilliams/buzz Note for Windows: App is not signed, you will get a warning when you install it. Select More info -> Run anyway.
    Leader badge
    Downloads: 4,214 This Week
    Last Update:
    See Project
  • 6
    Upscayl

    Upscayl

    Free and Open Source AI Image Upscaler for Linux, MacOS and Windows

    Free and Open Source AI Image Upscaler for Linux, MacOS and Windows built with Linux-First philosophy. Upscayl is a cross-platform application built with the Linux-first philosophy. This means that we prioritize Linux builds over others but that doesn't mean we'll break things for other OSes. Upscayl does not work without a GPU, sorry. You'll need a Vulkan-compatible GPU to upscale images. CPU or iGPU won't work. You can also download the flatpak version and double-click the flatpak file to install via Store but wait for the full release, we'll be pushing it to Flathub for easy access. Upscayl uses AI models to enhance your images by guessing what the details could be. It uses Real-ESRGAN (and more in the future) model to achieve this. The CLI tool is called real-esrgan-ncnn-vulkan and it's available on the Real-ESRGAN repository.
    Downloads: 152 This Week
    Last Update:
    See Project
  • 7
    Roop

    Roop

    One-click face swap

    Take a video and replace the face with a face of your choice. You only need one image of the desired face. No dataset, and no training.
    Downloads: 131 This Week
    Last Update:
    See Project
  • 8
    VideoSubFinder
    The main purpose of this program is to provide functionality for extract hardcoded subtitles (hardsub) from video. It provides two main features: 1) Autodetection of frames with hardcoded text (hardsub) on video with saving info about timing positions. 2) Generation of cleared from background text images, which allows with usage of OCR programs (like FineReader, Subtitle Edit, Google Drive) to generate complete subtitles with original text and timing. For working of this program on Windows will be required "Microsoft Visual C++ Redistributable runtime libraries 2022": https://support.microsoft.com/en-us/help/2977003/the-latest-supported-visual-c-downloads Latest versions were built and tested on: Windows 10 x64, Ubuntu 20.04.5 LTS, openSUSE Leap 15.4, Arch Linux (EndeavourOS Cassini Nova 03-2023) For faster support in case of bug fixes please contact me in: https://vk.com/skosnits For donate: https://sourceforge.net/projects/videosubfinder/donate
    Leader badge
    Downloads: 622 This Week
    Last Update:
    See Project
  • 9
    Darknet YOLO

    Darknet YOLO

    Real-Time Object Detection for Windows and Linux

    This is YOLO-v3 and v2 for Windows and Linux. YOLO (You only look once) is a state-of-the-art, real-time object detection system of Darknet, an open source neural network framework in C. YOLO is extremely fast and accurate. It uses a single neural network to divide a full image into regions, and then predicts bounding boxes and probabilities for each region. This project is a fork of the original Darknet project.
    Downloads: 79 This Week
    Last Update:
    See Project
  • AestheticsPro Medical Spa Software Icon
    AestheticsPro Medical Spa Software

    Our new software release will dramatically improve your medspa business performance while enhancing the customer experience

    AestheticsPro is the most complete Aesthetics Software on the market today. HIPAA Cloud Compliant with electronic charting, integrated POS, targeted marketing and results driven reporting; AestheticsPro delivers the tools you need to manage your medical spa business. It is our mission To Provide an All-in-One Cutting Edge Software to the Aesthetics Industry.
    Learn More
  • 10
    MESHROOM

    MESHROOM

    3D reconstruction software

    Photogrammetry is the science of making measurements from photographs. It infers the geometry of a scene from a set of unordered photographies or videos. Photography is the projection of a 3D scene onto a 2D plane, losing depth information. The goal of photogrammetry is to reverse this process. The dense modeling of the scene is the result yielded by chaining two computer vision-based pipelines, “Structure-from-Motion” (SfM) and “Multi View Stereo” (MVS). Fusion of Multi-bracketing LDR images into HDR. Alignment of panorama images. Support for fisheye optics. Automatically estimate fisheye circle or manually edit it. Take advantage of motorized-head file. Easy to integrate in your Renderfarm System. Add specific rules to select the most suitable machines regarding CPU, RAM, GPU requirements of each Node.
    Downloads: 79 This Week
    Last Update:
    See Project
  • 11
    tesseract-ocr alternative download

    tesseract-ocr alternative download

    Alternative download for tesseract-ocr project

    Alternative download for tesseract-ocr project
    Leader badge
    Downloads: 1,248 This Week
    Last Update:
    See Project
  • 12
    Open JTalk is a Japanese text-to-speech synthesis system. This software is released under the Modified BSD license.
    Leader badge
    Downloads: 1,046 This Week
    Last Update:
    See Project
  • 13
    eGuideDog free software for the blind
    eGuideDog project develops free software for the blind. Currently, we focus on WebSpeech, Ekho TTS and WebAnywhere.
    Leader badge
    Downloads: 161 This Week
    Last Update:
    See Project
  • 14
    Provides optical character recognition (OCR) solutions for Vietnamese language.
    Leader badge
    Downloads: 194 This Week
    Last Update:
    See Project
  • 15
    gImageReader

    gImageReader

    A graphical frontend to tesseract-ocr

    gImageReader is a simple Gtk/Qt front-end to tesseract. Features include: - Import PDF documents and images from disk, scanning devices, clipboard and screenshots - Process multiple images and documents in one go - Manual or automatic recognition area definition - Recognize to plain text or to hOCR documents - Recognized text displayed directly next to the image - Post-process the recognized text, including spellchecking - Generate PDF documents from hOCR documents **Note**: This page is only a mirror for the downloads. Development is happening on github at https://github.com/manisandro/gImageReader, release binaries are also posted there.
    Leader badge
    Downloads: 151 This Week
    Last Update:
    See Project
  • 16

    OpenFace

    A state-of-the-art facial behavior analysis toolkit

    OpenFace is an advanced facial behavior analysis toolkit intended for computer vision and machine learning researchers, those in the affective computing community, and those who are simply interested in creating interactive applications based on facial behavior analysis. The OpenFace toolkit is capable of performing several complex facial analysis tasks, including facial landmark detection, eye-gaze estimation, head pose estimation and facial action unit recognition. OpenFace is able to deliver state-of-the-art results in all of these mentioned tasks. OpenFace is available for Windows, Ubuntu and macOS installations. It is capable of real-time performance and does not need to run on any specialist hardware, a simple webcam will suffice.
    Downloads: 26 This Week
    Last Update:
    See Project
  • 17
    GIMP ML

    GIMP ML

    AI for GNU Image Manipulation Program

    This repository introduces GIMP3-ML, a set of Python plugins for the widely popular GNU Image Manipulation Program (GIMP). It enables the use of recent advances in computer vision to the conventional image editing pipeline. Applications from deep learning such as monocular depth estimation, semantic segmentation, mask generative adversarial networks, image super-resolution, de-noising and coloring have been incorporated with GIMP through Python-based plugins. Additionally, operations on images such as edge detection and color clustering have also been added. GIMP-ML relies on standard Python packages such as numpy, scikit-image, pillow, pytorch, open-cv, scipy. In addition, GIMP-ML also aims to bring the benefits of using deep learning networks used for computer vision tasks to routine image processing workflows.
    Downloads: 16 This Week
    Last Update:
    See Project
  • 18
    DeepSpeech

    DeepSpeech

    Open source embedded speech-to-text engine

    DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers. DeepSpeech is an open-source Speech-To-Text engine, using a model trained by machine learning techniques based on Baidu's Deep Speech research paper. Project DeepSpeech uses Google's TensorFlow to make the implementation easier. A pre-trained English model is available for use and can be downloaded following the instructions in the usage docs. If you want to use the pre-trained English model for performing speech-to-text, you can download it (along with other important inference material) from the DeepSpeech releases page.
    Downloads: 14 This Week
    Last Update:
    See Project
  • 19
    Google2SRT

    Google2SRT

    Download, save and convert multiple subtitles from YouTube videos

    Google2SRT allows you to download, save and convert multiple subtitles and translations from YouTube and Google Video to SubRip (.srt) format, which is recognized by most video players. You can download XML subtitles or simply type video's URL, Google2SRT will do the rest.
    Downloads: 69 This Week
    Last Update:
    See Project
  • 20
    Translate-Subtitle-File

    Translate-Subtitle-File

    Subtitle Creation Assistant

    Subtitle group machine translation assistant - [Function 1: Translate subtitle file] .srt .ass .vtt [Function 2: Voice to text] (Drag in video or audio to recognize subtitles) (The latest version v4.1.0 Update time 2021 2 May 23) 12 translation service providers can be configured, such as Google, Baidu, Tencent, Caiyun, IBM, Azure, Amazon, etc. (6 voice service providers can be configured: Alibaba Cloud, Xunfei, Tencent Cloud, IBM, Azure, Amazon ) Advantages: 1. You can use multiple service providers, 2. You can configure your own API Key to use your own account's free quota, such as Tencent's free translation quota of 5 million characters per month, IBM's 500-minute speech-to-text free quota (tern. best The domain name has expired and I don't want to renew it.) Azure speech-to-text and DeepL free version have problems, it is normal to not use it, please wait for the next version to fix. Machine translation of subtitle files, use machine translation to process files.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 21
    html2canvas

    html2canvas

    A JavaScript HTML screenshot renderer

    html2canvas is a JavaScript HTML renderer. The script provides you with the tools to take screenshots of webpages directly on the browser. The screenshot is based on the DOM and therefore, it may not be 100% accurate to the real representation, given that it is not an actual screenshot, but a type of screenshot built based on the available data and information of the page. The script renders such page as a canvas image, by reading the DOM and the different styles of the featured elements. It doesn't require rendering from the server, given that the image is created on the user's browser. However, as it is heavily dependent on the browser, the library is not to be used in nodejs. It can't circumvent any browser content policy restrictions and to render cross-origin content a proxy will be needed to get the content to the same origin.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 22
    A Java JNA wrapper for Tesseract OCR API
    Leader badge
    Downloads: 59 This Week
    Last Update:
    See Project
  • 23
    PhotoPrism

    PhotoPrism

    AI-Powered Photos App for the Decentralized Web 🌈💎✨

    PhotoPrism® is an AI-Powered Photos App for the Decentralized Web. It makes use of the latest technologies to tag and find pictures automatically without getting in your way. You can run it at home, on a private server, or in the cloud. Our mission is to provide the most user- and privacy-friendly solution to keep your pictures organized and accessible. That's why PhotoPrism was built from the ground up to run wherever you need it, without compromising freedom, privacy, or functionality.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 24
    Stable Diffusion v 2.1 web UI

    Stable Diffusion v 2.1 web UI

    Lightweight Stable Diffusion v 2.1 web UI: txt2img, img2img, depth2img

    Lightweight Stable Diffusion v 2.1 web UI: txt2img, img2img, depth2img, in paint and upscale4x. Gradio app for Stable Diffusion 2 by Stability AI. It uses Hugging Face Diffusers implementation. Currently supported pipelines are text-to-image, image-to-image, inpainting, upscaling and depth-to-image.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 25
    TTS

    TTS

    Deep learning for text to speech

    TTS is a library for advanced Text-to-Speech generation. It's built on the latest research, was designed to achieve the best trade-off among ease-of-training, speed, and quality. TTS comes with pre-trained models, tools for measuring dataset quality, and is already used in 20+ languages for products and research projects. Released models in PyTorch, Tensorflow and TFLite. Tools to curate Text2Speech datasets underdataset_analysis. Demo server for model testing. Notebooks for extensive model benchmarking. Modular (but not too much) code base enabling easy testing for new ideas. Text2Spec models (Tacotron, Tacotron2, Glow-TTS, SpeedySpeech). Speaker Encoder to compute speaker embeddings efficiently. Vocoder models (MelGAN, Multiband-MelGAN, GAN-TTS, ParallelWaveGAN, WaveGrad, WaveRNN). If you are only interested in synthesizing speech with the released TTS models, installing from PyPI is the easiest option.
    Downloads: 7 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next