FoundationVision

Welcome to FoundationVision @ ByteDance!

Introduction 👋

Hello! This is the GitHub space for the FoundationVision @ ByteDance.

We are dedicated to exploring the frontiers of multimodal intelligence, with the ultimate goal of building Artificial General Intelligence systems (AGI).

Our research focuses on deep learning and multimodal intelligence. We are particularly interested in:

Visual Foundation Models, Generative Pretrained Models and Large Language Models.
Multimodal Foundation Models and Representation Learning.
Open World Interaction via Unified Multi-modal generation and understanding.
Large-scale Multi-modal generative Pretraining and Alignment.

Our group strives to push the boundaries of multimodal intelligence and has produced highly influential works in the field, including:

Generative Models: VAR, Waver, Infinity, LLamaGen, InfinityStar, OmniTokenizer
Object Recognition and Representation Learning: Sparse-RCNN, ByteTrack, SparK, UNINEXT, Unicorn, DanceTrack, VNext, Referformer
Multimodal Foundation Models: Groma, Liquid, UniTok, GLEE

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FoundationVision

Welcome to FoundationVision @ ByteDance!

Introduction 👋

Popular repositories Loading

Repositories

People

Top languages

Most used topics