Project Page | Paper | Arxiv | Video
InstaScene: Towards Complete 3D Instance Decomposition and Reconstruction from Cluttered Scenes,
Zesong Yang, Bangbang Yang, Wenqi Dong, Chenxuan Cao, Liyuan Cui, Yuewen Ma, Zhaopeng Cui, Hujun Bao
ICCV 2025
teaser.mp4
- Installation of Scene Decomposition.
 
conda create -n instascene python=3.9 -y
conda activate instascene 
pip install torch==2.1.0+cu118 torchvision==0.16.0+cu118 torchaudio==2.1.0+cu118 --extra-index-url https://download.pytorch.org/whl/cu118
pip install --extra-index-url=https://pypi.nvidia.com "cudf-cu11==24.2.*" "cuml-cu11==24.2.*"
pip install -r requirements.txtInstall CropFormer for instance-level segmentation.
cd semantic_modules/CropFormer
cd mask2former/modeling/pixel_decoder/ops
sh make.sh
cd ../../../../
git clone git@github.com:facebookresearch/detectron2.git
cd detectron2
pip install -e .
pip install git+https://github.com/cocodataset/panopticapi.git
pip install git+https://github.com/mcordts/cityscapesScripts.git
cd ..
pip install -r requirements.txt
pip install -U openmim
mim install mmcv
mkdir ckptsManually
download CropFormer checkpoint
into semantic_modules/CropFormer/ckpts
- Installation of in-situ generation.
 
Please follow the steps below to process your custom dataset, or directly download our preprocessed datasets.
- It's ok to use other 2D segmentation models, but make sure the input masks don't exhibit overly complex hierarchy relationships; otherwise, our method will default to the finest level.
 
cd semantic_modules/CropFormer
bash run_segmentation.sh "$DATA_DIR"
cd ../..Follow the original repository to train the 2dgs model.
python train.py -s data/3dovs/bed -m output/3dovs/bed/train_2dgsOptional mono normal prior (StableNormal) is available to enhance the reconstruction quality.
## Prepare Normal Priors
cd semantic_modules
git clone https://github.com/Stable-X/StableNormal && cd StableNormal
pip install -r requirements.txt
mv ../inference_stablenormal.py ./
python inference_stablenormal.py "$DATA_DIR"
cd ../..
## Training 2DGS with Normal Priors 
python train.py -s data/3dovs/bed --w_normal_prior stablenormal_normals -m output/3dovs/bed/train_2dgsPut the trained point_cloud.ply file into the $DATA_DIR directory. After successfully executing the above steps, the
data directory should be structured as follows:
data
   |——————3D_OVS
   |   |——————bed
   |      |——————point_cloud.ply
   |      |——————images
   |         |——————00.jpg
   |         ...
   |      |——————sam
   |         |——————mask
   |            |——————00.png
   |            ...
   |      |——————sparse
   |         |——————0
   |            |——————cameras.bin
   |            ...
   |      |——————(optional) stablenormal_normals
   |         |——————00.png
   |         ...
   |     ...
Note that for simple scenes, such as 3D-OVS (simple-object centered without overlap), no need to use spatial relationships to obtain robust semantic priors as shown in our supplementary material. Single-view constrastive learning is sufficient to achieve strong performance.
We train the model on a NVIDIA Tesla A100 GPU (40GB) with 10,000 iterations for about 20 minutes & less than 8GB GPU.
- Reduce the GPU & Speed the time with 
--sample_batchsize 8 * 1024or-r 2. - Use 
--gram_feat_3dfor a more robust feature field in complex scenes. - It's normal to get stuck at the 
DBScan Filter Stage, since the backgrount gaussian points may be divided into multi-regions. - Use 
--consider_negative_labelsto suppress floaters during background segmentation. 
python train_semantic.py -s data/lerf/waldo_kitchen \
                         -m train_semanticgs \
                         --use_seg_feature --iterations 10000 \
                         --load_filter_segmap --consider_negative_labelsAfter completing the training, we provide a GUI modified from Omniseg3D for real-time ineractive segmentation.
The point_cloud.ply in our preprocessed datasets already has pretrained semantic features.
python semantic_gui.py \
  --ply_path data/lerf/waldo_kitchen/point_cloud.ply \
  --interactive_note lerf_waldo_kitchen \
  --use_colmap_camera \
  --source_path data/lerf/waldo_kitchen --resolution 1Left Mousefor changing rendering viewClick Mode+ 0.9Threshold+Right Mousefor segmentationClear Editfor clear the segmentation cacheDelete 3Dfor remove the chosen gaussiansSegment 3Dfor only keep the chosen gaussiansReload Datafor reload the gaussian model
Screencast.2025-07-24.13_31_27.mp4
Feishu20250723-192829.mp4
🔥 Feel free to raise any requests, including support for additional datasets or broader applications of segmentation~
- Release project page and paper.
 - Release scene decomposition code.
 - Release in-situ generation code.
 
Some codes are modified from Omniseg3D, MaskClustering, 2DGS++, thanks for the authors for their valuable works.
If you find this code useful for your research, please use the following BibTeX entry.
@inproceedings{yang2025instascene,
    title={InstaScene: Towards Complete 3D Instance Decomposition and Reconstruction from Cluttered Scenes},
    author={Yang, Zesong and Yang, Bangbang and Dong, Wenqi and Cao, Chenxuan and Cui, Liyuan and Ma, Yuewen and Cui, Zhaopeng and Bao, Hujun},
    booktitle=ICCV,
    year={2025}
}
