Lavida-O: Efficient Scaling of Unified Masked Diffusion Model for Multi-modal Understanding and Generation.

Installation

conda create --name lavida python=3.13
conda activate lavida
pip install -e .[lavida]
pip install wheel
MAX_JOBS=32 pip install flash-attn==2.7.4.post1 --no-build-isolation
pip install jupyter notebook
pip install -U huggingface_hub[hf_xet] --force-reinstall

Download Checkpoint

Please download checkpoints from [Huggingface].

You can use the following script to download the checkpoints

python download_checkpoint.py

Inference

Please see this notebook for inference examples of image understanding, generation and editing tasks.

Evaluation

Text-to-Image-Generation

We provide evaluation code for GenEval and DPG benchamrks in the folder eval_img. To reproduce results, run the following command:

bash eval_img/geneval.sh
bash eval_img/dpg.sh

For GenEval, please install mmdetection following official instructions. For DPG, please install modelscope

Image-Edit Bench

bash eval_img/imgedit-eval.sh

RefCOCO Grounding

bash /sensei-fs-3/users/shufanl/lavida-o-public/eval/run_grounding.sh

Image Understanding Tasks

Frist, we need to install the evaluation library

cd eval
pip install -e .
pip install levenshtein

Then, we run

bash /sensei-fs-3/users/shufanl/lavida-o-public/eval/run_grounding.sh

LICENSE

Both the model and code are licensed with Adobe Research License, which is included here [License.pdf].

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
assets		assets
eval		eval
eval_img		eval_img
llava		llava
.gitignore		.gitignore
Demo.ipynb		Demo.ipynb
License.pdf		License.pdf
README.md		README.md
download_checkpoint.py		download_checkpoint.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Lavida-O: Efficient Scaling of Unified Masked Diffusion Model for Multi-modal Understanding and Generation.

Installation

Download Checkpoint

Inference

Evaluation

Text-to-Image-Generation

Image-Edit Bench

RefCOCO Grounding

Image Understanding Tasks

LICENSE

About

Uh oh!

Releases

Packages

Languages

adobe-research/LaVida-O

Folders and files

Latest commit

History

Repository files navigation

Lavida-O: Efficient Scaling of Unified Masked Diffusion Model for Multi-modal Understanding and Generation.

Installation

Download Checkpoint

Inference

Evaluation

Text-to-Image-Generation

Image-Edit Bench

RefCOCO Grounding

Image Understanding Tasks

LICENSE

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages