Skip to content

Deepayan137/GaB

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

87 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

One VLM to Keep it Learning: Generation and Balancing for Data-free Continual Visual Question Answering

Arxiv

Paper

Abstract: Vision-Language Models (VLMs) have shown significant promise in Visual Question Answering (VQA) tasks by leveraging web-scale multimodal datasets. However, these models often struggle with continual learning due to catastrophic forgetting when adapting to new tasks. As an effective remedy to mitigate catastrophic forgetting, rehearsal strategy uses the data of past tasks upon learning new task. However, such strategy incurs the need of storing past data, which might not be feasible due to hardware constraints or privacy concerns. In this work, we propose the first data-free method that leverages the language generation capability of a VLM, instead of relying on external models, to produce pseudo-rehearsal data for addressing continual VQA. Our proposal, named as GaB, generates pseudo-rehearsal data by posing previous task questions on new task data. Yet, despite being effective, the distribution of generated questions skews towards the most frequently posed questions due to the limited and task-specific training data. To mitigate this issue, we introduce a pseudo-rehearsal balancing module that aligns the generated data towards the ground-truth data distribution using either the question meta-statistics or an unsupervised clustering method. We evaluate our proposed method on two recent benchmarks, \ie VQACL-VQAv2 and CLOVE-function benchmarks. GaB outperforms all the data-free baselines with substantial improvement in maintaining VQA performance across evolving tasks, while being on-par with methods with access to the past data.

Setup

# Create python environment (optional)
conda create -n vqacl python=3.11
source activate vqacl

# Install python dependencies
pip install -r requirements.txt

Code structure

# Store images, features, and annotations
./datasets
    COCO/
        images/
        features/
    vqa/
        Paritition_Q/
    gvqa/
    npy/
        function/
    ...


# Training and testing in the VQACL setting
./VL-T5/
    src/
        blip2/
        	modeling_blip.py                                  <= Our Blip2 model classes
        analysis/                                             <= question generation and sampling
        vqacl.py vqa_data_blip.py vqa_model_blip.py ...   	  <= Testing in the VQACL setting
        param.py                                              <= (argparse) configuration
        utils.py                            				  <= utility functions
    snap/                                                     <= store weight checkpoints
    scripts/                                                  <= bash scripts for evaluation

Dataset Preparation / Model checkpoint

  • Download the VQACL partition of VQA v2 from Google Drive and put it into datasets/nextqa/Partition_Q.
  • Download datasets/COCO from Google Drive
  • Download model checkpoints from Google Drive

Usage

# Training with 1 gpu for VQA v2
cd VL-T5/
bash scripts/train_blip.sh path/to/ckpt_dir balance_strategy # cluster, classifier or unbalanced
# Testing with 1 gpu for VQA v2
cd VL-T5/
bash scripts/test_blip.sh path/to/ckpt_dir

Note:

  • Download the checkpoint for the first task (recognition) from (recognition from recognition_ckpt), place it in snap folder, and start training or evaluating by specifying the checkpoint's path as a command line argument.
  • Before training our model on any subsequent tasks (beyond the first), we need to generate QA pairs as part of our data-free rehearsal strategy.
  • While the next section details how to generate and balance data, pre-generated questions and their balanced versions are already available for download at data link. Unzip the files and place the directory inside datasets/vqa/

Question Generation and Balancing strategy

VQAv2

  • To train the question generator model, execute bash scripts/train_blip_qg.sh:

  • We also provide the trained question generation heads in link. Unzip the folder and place it inside snap.

Note: It is feasible to train the question generation and answering heads simultaneously, but this approach demands reducing the batch size from 80 to 32 to prevent CUDA out-of-memory errors, significantly slowing down the training process.

  • Question Generation:

    • Execute the command: python -m src.analysis.vqacl_gen_ques
  • Storing Question Category Statistics:

    • Obtain the classifier and sentence representations here.
    • Ensure to download and position these files within the ../ckpt_vqacl directory.
    • Run: python -m src.analysis.vqacl_question_distribution
    • This will generate a ../metrics folder to store all distributions.
    • Note: Classifier training and clustering are conducted exclusively on the training set questions.
  • Balanced Data Generation:

    • After acquiring the question category statistics, generate balanced data using:
    • python -m src.analysis.vqacl_create_balanced_rehearsal
    • The default balancing strategy utilized is cluster.
  • We provide the balanced data files in the links below the usage section.

CLOVE

Usage

# Training with 1 gpu for VQA v2
cd VL-T5/
bash scripts/train_blip_sgvqa.sh path/to/ckpt_dir balance_strategy # cluster, classifier or unbalanced
# Testing with 1 gpu for VQA v2
cd VL-T5/
bash scripts/test_blip_sgvqa.sh path/to/ckpt_dir
  • To train the question generator model, execute bash scripts/train_blip_qg.sh:

    • We also provide the trained question generation heads in link. Unzip the folder and place it inside snap.
  • Question Generation:

    • Execute the command: python -m src.analysis.gen_ques
  • Storing Question Category Statistics:

    • Obtain the classifier and sentence representations here and place the files in ../ckpt_sgvqa
    • Run: python -m src.analysis.question_distribution for storing question category statistics
  • Balanced Data Generation:

    • For Generation of balanced replay data, run python -m src.analysis.create_balanced_rehearsal
    • Default strategy is cluster. For classification strategy, change the balancing_strategy to classifier.
  • We provide te balanced data files here. Place the folder in ../datasets

About

Official codebase for the paper "One VLM to keep it Learning", accepted at WACV 2025

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •