Showing 193 open source projects for "synthetic"

View related business solutions
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 1
    YData Synthetic

    YData Synthetic

    Synthetic data generators for tabular and time-series data

    A package to generate synthetic tabular and time-series data leveraging state-of-the-art generative models. Synthetic data is artificially generated data that is not collected from real-world events. It replicates the statistical components of real data without containing any identifiable information, ensuring individuals' privacy. This repository contains material related to Generative Adversarial Networks for synthetic data generation, in particular regular tabular data and time-series. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Synthetic Data Generator

    Synthetic Data Generator

    SDG is a specialized framework

    Synthetic Data Generator is an open-source framework designed to generate high-quality synthetic tabular datasets that replicate the statistical characteristics of real data while avoiding privacy risks. The platform enables developers and data scientists to create artificial datasets that preserve important relationships between variables without containing sensitive personal information.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Synthetic Data Kit

    Synthetic Data Kit

    Tool for generating high quality Synthetic datasets

    Synthetic Data Kit is a CLI-centric toolkit for generating high-quality synthetic datasets to fine-tune Llama models, with an emphasis on producing reasoning traces and QA pairs that line up with modern instruction-tuning formats. It ships an opinionated, modular workflow that covers ingesting heterogeneous sources (documents, transcripts), prompting models to create labeled examples, and exporting to fine-tuning schemas with minimal glue code.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Synthetic Data Vault (SDV)

    Synthetic Data Vault (SDV)

    Synthetic Data Generation for tabular, relational and time series data

    The Synthetic Data Vault (SDV) is a Synthetic Data Generation ecosystem of libraries that allows users to easily learn single-table, multi-table and timeseries datasets to later on generate new Synthetic Data that has the same format and statistical properties as the original dataset. Synthetic data can then be used to supplement, augment and in some cases replace real data when training Machine Learning models.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Train ML Models With SQL You Already Know Icon
    Train ML Models With SQL You Already Know

    BigQuery automates data prep, analysis, and predictions with built-in AI assistance.

    Build and deploy ML models using familiar SQL. Automate data prep with built-in Gemini. Query 1 TB and store 10 GB free monthly.
    Try Free
  • 5
    Synthea Patient Generator

    Synthea Patient Generator

    Synthetic Patient Population Simulator

    SyntheaTM is an open-source, synthetic patient generator that models the medical history of synthetic patients. Our mission is to provide high-quality, synthetic, realistic but not real, patient data and associated health records covering every aspect of healthcare. The resulting data is free from cost, privacy, and security restrictions, enabling research with Health IT data that is otherwise legally or practically unavailable.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    SDGym

    SDGym

    Benchmarking synthetic data generation methods

    The Synthetic Data Gym (SDGym) is a benchmarking framework for modeling and generating synthetic data. Measure performance and memory usage across different synthetic data modeling techniques – classical statistics, deep learning and more! The SDGym library integrates with the Synthetic Data Vault ecosystem. You can use any of its synthesizers, datasets or metrics for benchmarking.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    Copulas

    Copulas

    A library to model multivariate data using copulas

    Copulas is a Python library for modeling multivariate distributions and sampling from them using copula functions. Given a table of numerical data, use Copulas to learn the distribution and generate new synthetic data following the same statistical properties. Choose from a variety of univariate distributions and copulas – including Archimedian Copulas, Gaussian Copulas and Vine Copulas. Compare real and synthetic data visually after building your model. Visualizations are available as 1D histograms, 2D scatterplots and 3D scatterplots. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    CTGAN

    CTGAN

    Conditional GAN for generating synthetic tabular data

    CTGAN is a collection of Deep Learning based synthetic data generators for single table data, which are able to learn from real data and generate synthetic data with high fidelity. If you're just getting started with synthetic data, we recommend installing the SDV library which provides user-friendly APIs for accessing CTGAN. The SDV library provides wrappers for preprocessing your data as well as additional usability features like constraints.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    DataDreamer

    DataDreamer

    DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models

    DataDreamer is a tool designed to assist in the generation and manipulation of synthetic data for various applications, including testing and machine learning.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Host LLMs in Production With On-Demand GPUs Icon
    Host LLMs in Production With On-Demand GPUs

    NVIDIA L4 GPUs. 5-second cold starts. Scale to zero when idle.

    Deploy your model, get an endpoint, pay only for compute time. No GPU provisioning or infrastructure management required.
    Try Free
  • 10
    Gretel Synthetics

    Gretel Synthetics

    Synthetic data generators for structured and unstructured text

    Unlock unlimited possibilities with synthetic data. Share, create, and augment data with cutting-edge generative AI. Generate unlimited data in minutes with synthetic data delivered as-a-service. Synthesize data that are as good or better than your original dataset, and maintain relationships and statistical insights. Customize privacy settings so that data is always safe while remaining useful for downstream workflows.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Bespoke Curator

    Bespoke Curator

    Synthetic data curation for post-training and data extraction

    Curator is an open-source Python library designed to build synthetic data pipelines for training and evaluating machine learning models, particularly large language models. The system helps developers generate, transform, and curate high-quality datasets by combining automated generation with structured validation and filtering. It supports workflows where models are used to produce synthetic examples that can later be refined into reliable training datasets for reasoning, question answering, or structured information extraction tasks. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Inferno

    Inferno

    React-like JavaScript library for building modern user interfaces

    ...One of the fastest front-end frameworks for rendering UI in the DOM, making 60 FPS on mobile possible. Isomorphic rendering on both client and server, along with fast-booting from server-side renders. Inferno doesn't have a fully synthetic event system like React does. Inferno has a partially synthetic event system, instead opting to only delegate certain events (such as `onClick`). Inferno doesn't support React Native. Inferno was only designed for the browser/server with the DOM in mind. Inferno doesn't support legacy string refs, use `createRef` or callback `ref` API. ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 13
    Neosync

    Neosync

    Open Source Data Security Platform for Developers to Monitor

    Neosync is a secure, open-source platform to generate, mask, and sync realistic test data across environments. It helps engineering teams create privacy-compliant datasets using synthetic data, transformations, and pseudonymization techniques. Designed with extensibility and data governance in mind, Neosync integrates with common databases and cloud services, enabling safe test environments for development and QA.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Synthetix

    Synthetix

    Synthetix Solidity smart contracts

    Synthetic assets, or Synths, are assets voted into existence by the community and can come in the form of fiat currencies, cryptocurrencies, stocks, commodities and anything else with a price. Many platforms already leverage the deep liquidity and composability of Synthetix to deliver better trades with lower slippage, hedging, and other unique use cases.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    lsfg-vk

    lsfg-vk

    Lossless Scaling Frame Generation on Linux

    ...Instead of relying on driver-specific or hardware-accelerated upscaling, this layer intercepts Vulkan API calls and injects frame interpolation on the fly, effectively producing smoother motion in supported games and applications by creating synthetic intermediate frames. It is structured as a modular codebase with separate components for the backend, configuration UI, CLI tooling, shared logic, and the core layer implementation. Although it targets Linux environments and handhelds like Steam Deck, it depends on an existing installation of Lossless Scaling on the system to provide its frame generation logic.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 16
    NetworkX

    NetworkX

    Network analysis in Python

    ...Data structures for graphs, digraphs, and multigraphs. Many standard graph algorithms. Network structure and analysis measures. Generators for classic graphs, random graphs, and synthetic networks. Nodes can be "anything" (e.g., text, images, XML records). Edges can hold arbitrary data (e.g., weights, time-series). Open source 3-clause BSD license. Well tested with over 90% code coverage. Additional benefits from Python include fast prototyping, easy to teach, and multi-platform. Find the shortest path between two nodes in an undirected graph. ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 17
    OpenStatus

    OpenStatus

    Status page with uptime monitoring & API monitoring as code

    OpenStatus is an open-source synthetic monitoring and status page platform designed to help teams track the availability and performance of websites, APIs, and services from multiple global locations. It continuously probes configured endpoints and alerts users when latency thresholds are exceeded or outages occur, enabling proactive incident response. The platform also generates customizable public or private status pages that automatically reflect real-time service health, improving transparency with customers and stakeholders. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 18
    Mimesis

    Mimesis

    High-performance fake data generator for Python

    Mimesis is an open source high-performance fake data generator for Python, able to provide data for various purposes in various languages. It's currently the fastest fake data generator for Python, and supports many different data providers that can produce data related to people, food, transportation, internet and many more. Mimesis is really easy to use, with everything you need just an import away. Simply import an object, called a Provider, which represents the type of data you...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    The Hypersim Dataset

    The Hypersim Dataset

    Photorealistic Synthetic Dataset for Holistic Indoor Scene

    Hypersim is a large-scale, photorealistic synthetic dataset and tooling suite for indoor scene understanding research. It provides richly annotated renderings—RGB, depth, surface normals, instance and semantic segmentations, and material/lighting metadata—produced from high-fidelity virtual environments. The dataset spans diverse furniture layouts, room types, and camera trajectories, enabling robust training for geometry, segmentation, and SLAM-adjacent tasks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Tongyi DeepResearch

    Tongyi DeepResearch

    Tongyi Deep Research, the Leading Open-source Deep Research Agent

    ...It’s built to act like a research agent: synthesizing, reasoning, retrieving information via the web and documents, and backing its outputs with evidence. The model is about 30.5 billion parameters in size, though at any given token only ~3.3B parameters are active. It uses a mix of synthetic data generation, fine-tuning and reinforcement learning; supports benchmarks like web search, document understanding, question answering, “agentic” tasks; provides inference tools, evaluation scripts, and “web agent” style interfaces. The aim is to enable more autonomous, agentic models that can perform sustained knowledge gathering, reasoning, and synthesis across multiple modalities (web, files, etc.).
    Downloads: 3 This Week
    Last Update:
    See Project
  • 21
    NVIDIA Isaac GR00T

    NVIDIA Isaac GR00T

    NVIDIA Isaac GR00T N1.5 is the world's first open foundation model

    ...It accepts multimodal inputs—such as language and images—and uses a diffusion transformer architecture built upon vision-language encoders, enabling adaptive robot behaviors across diverse environments. It is designed to be customizable via post-training with real or synthetic data. The vision-language model remains frozen during both pretraining and finetuning, preserving language understanding and improving generalization. Streamlined MLP connection between vision encoder and LLM with added layer normalization.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Bogus

    Bogus

    A simple and sane fake data generator for C#, F#, and VB.NET

    Bogus is a simple and sane fake data generator for .NET languages like C#, F# and VB.NET. Bogus is fundamentally a C# port of faker.js and inspired by FluentValidation's syntax sugar. Bogus will help you load databases, UI and apps with fake data for your testing needs. When Bogus updates locales from faker.js or issues bug fixes, sometimes deterministic sequences can change. Changes to deterministic outputs are usually highlighted in the release notes. Changes to deterministic outputs is...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Intel RealSense

    Intel RealSense

    Intel® RealSense SDK

    Intel® RealSense™ SDK 2.0 is a cross-platform library for Intel® RealSense™ depth cameras. The SDK allows depth and color streaming and provides intrinsic and extrinsic calibration information. The library also offers synthetic streams (point cloud, depth aligned to color and vise-versa), and built-in support for recording and playback of streaming sessions. Intel has EOLed the LiDAR, Facial Authentication, and Tracking product lines. These products have been discontinued and will no longer be available for new orders. Intel WILL continue to sell and support stereo products including the following: D410, D415, D430, , D401 ,D450 modules and D415, D435, D435i, D435f, D405, D455, D457 depth cameras. ...
    Downloads: 95 This Week
    Last Update:
    See Project
  • 24
    verl

    verl

    Volcano Engine Reinforcement Learning for LLMs

    ...It ships with reference implementations of popular alignment algorithms and clear examples that make it straightforward to reproduce baselines before customizing. Data pipelines treat human feedback, simulated environments, and synthetic preferences as interchangeable sources, which helps with rapid experimentation. VERL is meant for both research and production hardening: logging, checkpointing, and evaluation suites are built in so you can track learning dynamics and regressions over time.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    BlenderProc

    BlenderProc

    Blender pipeline for photorealistic training image generation

    A procedural Blender pipeline for photorealistic training image generation. BlenderProc has to be run inside the blender python environment, as only there we can access the blender API. Therefore, instead of running your script with the usual python interpreter, the command line interface of BlenderProc has to be used. In general, one run of your script first loads or constructs a 3D scene, then sets some camera poses inside this scene and renders different types of images (RGB, distance,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB