Skip to content

Implementation of our unlearning method "Partial Model Collapse" introduced in the paper: "Model Collapse Is Not a Bug but a Feature in Machine Unlearning for LLMs" (Preprint).

License

Notifications You must be signed in to change notification settings

partial-model-collapse-unlearning/pmc-unlearning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Model Collapse Is Not a Bug but a Feature in Machine Unlearning for LLMs

Reference implementation of our partial model collapse unlearning method proposed in the preprint:

Model Collapse Is Not a Bug but a Feature in Machine Unlearning for LLMs
Yan Scholten, Sophie Xhonneux, Leo Schwinn*, Stephan Günnemann*


TL;DR: Partial Model Collapse (PMC) enables effective LLM unlearning. By inducing model collapse for specific questions, PMC selectively erases information in a targeted way while preserving overall model utility.

[ Project page | PDF | Blogpost ]

Overview

Existing unlearning methods for large language models (LLMs) incorporate the private information they aim to remove into their unlearning objectives. We contend that this not only risks further exposure of sensitive data but also fundamentally contradicts the principle of minimizing its use.

We introduce a novel perspective inspired by recent findings that training generative models on their own outputs can induce distribution collapse, effectively erasing information from the model. Our central insight is that we can leverage model collapse for machine unlearning: Rather than optimizing the model against answers we aim to unlearn, we finetune it on answers generated by the model itself. Since these answers are already likely under the model’s own distribution, this approach allows the model to diverge naturally from its original generations, facilitating targeted unlearning without compromising model utility.

This repository provides code to perform PMC-unlearning for LLMs as described in our recent preprint.

Disclaimer: This repository is part of ongoing research efforts; the code, hyperparameters and empirical results provided are preliminary and remain subject to revisions. We will provide additional supplemental material for reproducing results from our preprint at a later time. Feedback is greatly appreciated.

Empirical results

The following table presents preliminary empirical results for the models obtained using the configurations available in this repository. Currently supported models are Phi-1.5 and Llama-3.2-*-Instruct.

Models Method Unlearn quality ($\uparrow$) Utility ($\uparrow$) Runtime (H100)
Phi-1.5 Vanilla model 58.23% 64.0%
Finetuned model 38.3% $\pm$ 0.12 70.0% $\pm$ 0.98
PMC-unlearning 95.6% $\pm$ 0.57 69.0% $\pm$ 1.34 40 min $\pm$ 2
Llama-3.2-3B-Instruct Vanilla model 74.68% 71.0%
Finetuned model 35.42% $\pm$ 0.23 91% $\pm$ 0.73
PMC-unlearning 99.15% $\pm$ 0.3 84.0% $\pm$ 2.72 30 min $\pm$ 1

You can find more results in our preprint.

Hyperparameters

The following hyperparameters are central for optimizing the trade-off between unlearning quality, model utility, and computational efficiency:

  • num_epochs: Number of unlearning epochs.
  • num_samples: Number of candidate responses sampled for each forget question.
  • lambda_unlearning: Trade-off parameter balancing the retain loss and the collapse loss (the loss on the sampled synthetic responses).
  • min_len: Synthetic responses with length below this minimal response length are penalized in the reward function.

Usage instructions

First finetune models on the ground truth. Then execute PMC-unlearning.

1. Finetuning on full dataset

cd finetuning
python3 main.py -m -cd=configs -cn=phi
python3 main.py -m -cd=configs -cn=llama3

This will finetune vanilla models on the full dataset and store resulting models in models/finetuned/.

2.1 PMC-unlearning

cd unlearning
python3 main.py -m -cd=configs -cn=PMC-unlearn-phi
python3 main.py -m -cd=configs -cn=PMC-unlearn-llama3

This will apply PMC-unlearning to finetuned models and store resulting models in models/unlearned/.

Installation

Instructions for dependencies and configurations before running code:

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Additionally set HUGGINGFACE_LOGIN_TOKEN in each environment.env.

This code was tested with Python 3.11.9, pip 24.0, PyTorch 2.3.1+cu118, and CUDA 11.8 on a NVIDIA H100 GPU.

Cite

Please cite our paper if you use this code in your own work:

@misc{scholten2025modelcollapse,
      title={Model Collapse Is Not a Bug but a Feature in Machine Unlearning for LLMs}, 
      author={Yan Scholten and Sophie Xhonneux and Leo Schwinn and Stephan Günnemann},
      year={2025},
      eprint={2507.04219},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2507.04219}, 
}

Acknowledgements

This codebase builds upon the TOFU unlearning repository, adapted to demonstrate the effectiveness of our approach. The core principles proposed in our paper are implemented in unlearning/unlearning_trainer.py and unlearning/pmc.py. Note that we consider unlearning as the problem of removing private information from model outputs and follow a different evaluation approach. We believe our evaluation represents an important first step to evaluate collapse-based machine unlearning and invite the community to assess our approach under further aspects.

Contact

For questions and feedback please contact:

Yan Scholten, Technical University of Munich
Sophie Xhonneux, Mila, Université de Montréal
Leo Schwinn, Technical University of Munich
Stephan Günnemann, Technical University of Munich

License

The code by Yan Scholten, Sophie Xhonneux, Leo Schwinn and Stephan Günnemann is licensed under MIT license.

About

Implementation of our unlearning method "Partial Model Collapse" introduced in the paper: "Model Collapse Is Not a Bug but a Feature in Machine Unlearning for LLMs" (Preprint).

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages