Lavida-O: Efficient Scaling of Unified Masked Diffusion Model for Multi-modal Understanding and Generation.
[Paper] [Project Site] [Huggingface]

conda create --name lavida python=3.13
conda activate lavida
pip install -e .[lavida]
pip install wheel
MAX_JOBS=32 pip install flash-attn==2.7.4.post1 --no-build-isolation
pip install jupyter notebook
pip install -U huggingface_hub[hf_xet] --force-reinstall
Please download checkpoints from [Huggingface].
You can use the following script to download the checkpoints
python download_checkpoint.py
Please see this notebook for inference examples of image understanding, generation and editing tasks.
We provide evaluation code for GenEval and DPG benchamrks in the folder eval_img. To reproduce results, run the following command:
bash eval_img/geneval.sh
bash eval_img/dpg.sh
For GenEval, please install mmdetection following official instructions. For DPG, please install modelscope
bash eval_img/imgedit-eval.sh
bash /sensei-fs-3/users/shufanl/lavida-o-public/eval/run_grounding.sh
Frist, we need to install the evaluation library
cd eval
pip install -e .
pip install levenshtein
Then, we run
bash /sensei-fs-3/users/shufanl/lavida-o-public/eval/run_grounding.sh
Both the model and code are licensed with Adobe Research License, which is included here [License.pdf].