One VLM to Keep it Learning: Generation and Balancing for Data-free Continual Visual Question Answering

Abstract: Vision-Language Models (VLMs) have shown significant promise in Visual Question Answering (VQA) tasks by leveraging web-scale multimodal datasets. However, these models often struggle with continual learning due to catastrophic forgetting when adapting to new tasks. As an effective remedy to mitigate catastrophic forgetting, rehearsal strategy uses the data of past tasks upon learning new task. However, such strategy incurs the need of storing past data, which might not be feasible due to hardware constraints or privacy concerns. In this work, we propose the first data-free method that leverages the language generation capability of a VLM, instead of relying on external models, to produce pseudo-rehearsal data for addressing continual VQA. Our proposal, named as GaB, generates pseudo-rehearsal data by posing previous task questions on new task data. Yet, despite being effective, the distribution of generated questions skews towards the most frequently posed questions due to the limited and task-specific training data. To mitigate this issue, we introduce a pseudo-rehearsal balancing module that aligns the generated data towards the ground-truth data distribution using either the question meta-statistics or an unsupervised clustering method. We evaluate our proposed method on two recent benchmarks, \ie VQACL-VQAv2 and CLOVE-function benchmarks. GaB outperforms all the data-free baselines with substantial improvement in maintaining VQA performance across evolving tasks, while being on-par with methods with access to the past data.

Setup

# Create python environment (optional)
conda create -n vqacl python=3.11
source activate vqacl

# Install python dependencies
pip install -r requirements.txt

Code structure

# Store images, features, and annotations
./datasets
    COCO/
        images/
        features/
    vqa/
        Paritition_Q/
    gvqa/
    npy/
        function/
    ...


# Training and testing in the VQACL setting
./VL-T5/
    src/
        blip2/
        	modeling_blip.py                                  <= Our Blip2 model classes
        analysis/                                             <= question generation and sampling
        vqacl.py vqa_data_blip.py vqa_model_blip.py ...   	  <= Testing in the VQACL setting
        param.py                                              <= (argparse) configuration
        utils.py                            				  <= utility functions
    snap/                                                     <= store weight checkpoints
    scripts/                                                  <= bash scripts for evaluation

Dataset Preparation / Model checkpoint

Download the VQACL partition of VQA v2 from Google Drive and put it into datasets/nextqa/Partition_Q.
Download datasets/COCO from Google Drive
Download model checkpoints from Google Drive

Usage

# Training with 1 gpu for VQA v2
cd VL-T5/
bash scripts/train_blip.sh path/to/ckpt_dir balance_strategy # cluster, classifier or unbalanced
# Testing with 1 gpu for VQA v2
cd VL-T5/
bash scripts/test_blip.sh path/to/ckpt_dir

Note:

Download the checkpoint for the first task (recognition) from (recognition from recognition_ckpt), place it in snap folder, and start training or evaluating by specifying the checkpoint's path as a command line argument.
Before training our model on any subsequent tasks (beyond the first), we need to generate QA pairs as part of our data-free rehearsal strategy.
While the next section details how to generate and balance data, pre-generated questions and their balanced versions are already available for download at data link. Unzip the files and place the directory inside datasets/vqa/

Question Generation and Balancing strategy

VQAv2

To train the question generator model, execute bash scripts/train_blip_qg.sh:
We also provide the trained question generation heads in link. Unzip the folder and place it inside snap.

Note: It is feasible to train the question generation and answering heads simultaneously, but this approach demands reducing the batch size from 80 to 32 to prevent CUDA out-of-memory errors, significantly slowing down the training process.

Question Generation:
- Execute the command: python -m src.analysis.vqacl_gen_ques
Storing Question Category Statistics:
- Obtain the classifier and sentence representations here.
- Ensure to download and position these files within the ../ckpt_vqacl directory.
- Run: python -m src.analysis.vqacl_question_distribution
- This will generate a ../metrics folder to store all distributions.
- Note: Classifier training and clustering are conducted exclusively on the training set questions.
Balanced Data Generation:
- After acquiring the question category statistics, generate balanced data using:
- python -m src.analysis.vqacl_create_balanced_rehearsal
- The default balancing strategy utilized is cluster.
We provide the balanced data files in the links below the usage section.

CLOVE

Usage

# Training with 1 gpu for VQA v2
cd VL-T5/
bash scripts/train_blip_sgvqa.sh path/to/ckpt_dir balance_strategy # cluster, classifier or unbalanced
# Testing with 1 gpu for VQA v2
cd VL-T5/
bash scripts/test_blip_sgvqa.sh path/to/ckpt_dir

To train the question generator model, execute bash scripts/train_blip_qg.sh:
- We also provide the trained question generation heads in link. Unzip the folder and place it inside snap.
Question Generation:
- Execute the command: python -m src.analysis.gen_ques
Storing Question Category Statistics:
- Obtain the classifier and sentence representations here and place the files in ../ckpt_sgvqa
- Run: python -m src.analysis.question_distribution for storing question category statistics
Balanced Data Generation:
- For Generation of balanced replay data, run python -m src.analysis.create_balanced_rehearsal
- Default strategy is cluster. For classification strategy, change the balancing_strategy to classifier.
We provide te balanced data files here. Place the folder in ../datasets

Name		Name	Last commit message	Last commit date
Latest commit History 87 Commits
VL-T5		VL-T5
datasets		datasets
media		media
Question_type.py		Question_type.py
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

One VLM to Keep it Learning: Generation and Balancing for Data-free Continual Visual Question Answering

Setup

Code structure

Dataset Preparation / Model checkpoint

Usage

Question Generation and Balancing strategy

VQAv2

CLOVE

Usage

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Deepayan137/GaB

Folders and files

Latest commit

History

Repository files navigation

One VLM to Keep it Learning: Generation and Balancing for Data-free Continual Visual Question Answering

Setup

Code structure

Dataset Preparation / Model checkpoint

Usage

Question Generation and Balancing strategy

VQAv2

CLOVE

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages