Skip to content
@FoundationVision

FoundationVision

Bytedance's opensource FoundationVision models

Welcome to FoundationVision @ ByteDance!

Introduction 👋

Hello! This is the GitHub space for the FoundationVision @ ByteDance.

We are dedicated to exploring the frontiers of multimodal intelligence, with the ultimate goal of building Artificial General Intelligence systems (AGI).

Our research focuses on deep learning and multimodal intelligence. We are particularly interested in:

  • Visual Foundation Models, Generative Pretrained Models and Large Language Models.
  • Multimodal Foundation Models and Representation Learning.
  • Open World Interaction via Unified Multi-modal generation and understanding.
  • Large-scale Multi-modal generative Pretraining and Alignment.

Our group strives to push the boundaries of multimodal intelligence and has produced highly influential works in the field, including:

Popular repositories Loading

  1. VAR VAR Public

    [NeurIPS 2024 Best Paper Award][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". A…

    Jupyter Notebook 8.5k 547

  2. ByteTrack ByteTrack Public

    [ECCV 2022] ByteTrack: Multi-Object Tracking by Associating Every Detection Box

    Python 5.8k 1.1k

  3. LlamaGen LlamaGen Public

    Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation

    Python 1.9k 90

  4. Infinity Infinity Public

    [CVPR 2025 Oral]Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis

    Python 1.5k 82

  5. GLEE GLEE Public

    [CVPR2024 Highlight]GLEE: General Object Foundation Model for Images and Videos at Scale

    Python 1.2k 74

  6. Waver Waver Public

    Industry-level video foundation model for unified Text-to-Video (T2V) and Image-to-Video (I2V) generation.

    745 80

Repositories

Showing 10 of 20 repositories
  • InfinityStar Public

    [NeurIPS 2025 Oral]Infinity⭐️: Unified Spacetime AutoRegressive Modeling for Visual Generation

    FoundationVision/InfinityStar’s past year of commit activity
    Python 572 MIT 18 7 0 Updated Nov 25, 2025
  • .github Public
    FoundationVision/.github’s past year of commit activity
    0 0 0 0 Updated Nov 20, 2025
  • UniTok Public

    [NeurIPS 2025 Spotlight] A Unified Tokenizer for Visual Generation and Understanding

    FoundationVision/UniTok’s past year of commit activity
    Python 462 MIT 10 12 0 Updated Nov 15, 2025
  • VAR Public

    [NeurIPS 2024 Best Paper Award][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!

    FoundationVision/VAR’s past year of commit activity
    Jupyter Notebook 8,500 MIT 547 51 (1 issue needs help) 3 Updated Nov 11, 2025
  • Liquid Public

    (Accepted by IJCV) Liquid: Language Models are Scalable and Unified Multi-modal Generators

    FoundationVision/Liquid’s past year of commit activity
    Python 629 MIT 33 12 0 Updated Nov 11, 2025
  • Infinity Public

    [CVPR 2025 Oral]Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis

    FoundationVision/Infinity’s past year of commit activity
    Python 1,511 MIT 82 53 4 Updated Nov 10, 2025
  • Waver Public

    Industry-level video foundation model for unified Text-to-Video (T2V) and Image-to-Video (I2V) generation.

    FoundationVision/Waver’s past year of commit activity
    745 80 7 2 Updated Aug 28, 2025
  • BitVAE Public

    official training and inference code of bitwise tokenizer

    FoundationVision/BitVAE’s past year of commit activity
    Python 52 MIT 2 2 0 Updated May 19, 2025
  • GenerateU Public

    [CVPR2024] Generative Region-Language Pretraining for Open-Ended Object Detection

    FoundationVision/GenerateU’s past year of commit activity
    Python 186 MIT 8 15 0 Updated Mar 29, 2025
  • FlashVideo Public

    [AAAI-2026]FlashVideo: Flowing Fidelity to Detail for Efficient High-Resolution Video Generation

    FoundationVision/FlashVideo’s past year of commit activity
    Python 453 Apache-2.0 24 13 (2 issues need help) 1 Updated Mar 5, 2025

People

This organization has no public members. You must be a member to see who’s a part of this organization.