One VLM to Keep it Learning: Generation and Balancing for Data-free Continual Visual Question Answering
Abstract: Vision-Language Models (VLMs) have shown significant promise in Visual Question Answering (VQA) tasks by leveraging web-scale multimodal datasets. However, these models often struggle with continual learning due to catastrophic forgetting when adapting to new tasks. As an effective remedy to mitigate catastrophic forgetting, rehearsal strategy uses the data of past tasks upon learning new task. However, such strategy incurs the need of storing past data, which might not be feasible due to hardware constraints or privacy concerns. In this work, we propose the first data-free method that leverages the language generation capability of a VLM, instead of relying on external models, to produce pseudo-rehearsal data for addressing continual VQA. Our proposal, named as GaB, generates pseudo-rehearsal data by posing previous task questions on new task data. Yet, despite being effective, the distribution of generated questions skews towards the most frequently posed questions due to the limited and task-specific training data. To mitigate this issue, we introduce a pseudo-rehearsal balancing module that aligns the generated data towards the ground-truth data distribution using either the question meta-statistics or an unsupervised clustering method. We evaluate our proposed method on two recent benchmarks, \ie VQACL-VQAv2 and CLOVE-function benchmarks. GaB outperforms all the data-free baselines with substantial improvement in maintaining VQA performance across evolving tasks, while being on-par with methods with access to the past data.
# Create python environment (optional)
conda create -n vqacl python=3.11
source activate vqacl
# Install python dependencies
pip install -r requirements.txt# Store images, features, and annotations
./datasets
COCO/
images/
features/
vqa/
Paritition_Q/
gvqa/
npy/
function/
...
# Training and testing in the VQACL setting
./VL-T5/
src/
blip2/
modeling_blip.py <= Our Blip2 model classes
analysis/ <= question generation and sampling
vqacl.py vqa_data_blip.py vqa_model_blip.py ... <= Testing in the VQACL setting
param.py <= (argparse) configuration
utils.py <= utility functions
snap/ <= store weight checkpoints
scripts/ <= bash scripts for evaluation- Download the VQACL partition of VQA v2 from Google Drive and put it into datasets/nextqa/Partition_Q.
- Download
datasets/COCOfrom Google Drive - Download model checkpoints from Google Drive
# Training with 1 gpu for VQA v2
cd VL-T5/
bash scripts/train_blip.sh path/to/ckpt_dir balance_strategy # cluster, classifier or unbalanced
# Testing with 1 gpu for VQA v2
cd VL-T5/
bash scripts/test_blip.sh path/to/ckpt_dirNote:
- Download the checkpoint for the first task (recognition) from (recognition from recognition_ckpt), place it in
snapfolder, and start training or evaluating by specifying the checkpoint's path as a command line argument. - Before training our model on any subsequent tasks (beyond the first), we need to generate QA pairs as part of our data-free rehearsal strategy.
- While the next section details how to generate and balance data, pre-generated questions and their balanced versions are already available for download at data link. Unzip the files and place the directory inside
datasets/vqa/
-
To train the question generator model, execute
bash scripts/train_blip_qg.sh: -
We also provide the trained question generation heads in link. Unzip the folder and place it inside
snap.
Note: It is feasible to train the question generation and answering heads simultaneously, but this approach demands reducing the batch size from 80 to 32 to prevent CUDA out-of-memory errors, significantly slowing down the training process.
-
Question Generation:
- Execute the command:
python -m src.analysis.vqacl_gen_ques
- Execute the command:
-
Storing Question Category Statistics:
- Obtain the classifier and sentence representations here.
- Ensure to download and position these files within the
../ckpt_vqacldirectory. - Run:
python -m src.analysis.vqacl_question_distribution - This will generate a
../metricsfolder to store all distributions. - Note: Classifier training and clustering are conducted exclusively on the training set questions.
-
Balanced Data Generation:
- After acquiring the question category statistics, generate balanced data using:
python -m src.analysis.vqacl_create_balanced_rehearsal- The default balancing strategy utilized is
cluster.
-
We provide the balanced data files in the links below the usage section.
# Training with 1 gpu for VQA v2
cd VL-T5/
bash scripts/train_blip_sgvqa.sh path/to/ckpt_dir balance_strategy # cluster, classifier or unbalanced
# Testing with 1 gpu for VQA v2
cd VL-T5/
bash scripts/test_blip_sgvqa.sh path/to/ckpt_dir-
To train the question generator model, execute
bash scripts/train_blip_qg.sh:- We also provide the trained question generation heads in link. Unzip the folder and place it inside
snap.
- We also provide the trained question generation heads in link. Unzip the folder and place it inside
-
Question Generation:
- Execute the command:
python -m src.analysis.gen_ques
- Execute the command:
-
Storing Question Category Statistics:
- Obtain the classifier and sentence representations here and place the files in
../ckpt_sgvqa - Run:
python -m src.analysis.question_distributionfor storing question category statistics
- Obtain the classifier and sentence representations here and place the files in
-
Balanced Data Generation:
- For Generation of balanced replay data, run
python -m src.analysis.create_balanced_rehearsal - Default strategy is
cluster. For classification strategy, change thebalancing_strategytoclassifier.
- For Generation of balanced replay data, run
-
We provide te balanced data files here. Place the folder in
../datasets
