| | --- |
| | license: mit |
| | datasets: |
| | - cais/wmdp |
| | language: |
| | - en |
| | base_model: |
| | - HuggingFaceH4/zephyr-7b-beta |
| | pipeline_tag: text-generation |
| | library_name: transformers |
| | tags: |
| | - unlearn |
| | - machine-unlearning |
| | - llm-unlearning |
| | - data-privacy |
| | - large-language-models |
| | - trustworthy-ai |
| | - trustworthy-machine-learning |
| | - language-model |
| | --- |
| | |
| | # GradDiff-Unlearned w/ SAM Model on Task "WMDP" |
| |
|
| | ## Model Details |
| |
|
| | - **Unlearning**: |
| | - **Task**: [🤗datasets/cais/wmdp wmdp-bio](https://huggingface.co/datasets/cais/wmdp) |
| | - **Method**: GradDiff |
| | - **Smoothness Optimization**: Sharpness-aware Minimization (SAM) |
| | - **Origin Model**: [🤗HuggingFaceH4/zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) |
| | - **Code Base**: [github.com/OPTML-Group/Unlearn-Smooth](https://github.com/OPTML-Group/Unlearn-Smooth) |
| | - **Research Paper**: ["Towards LLM Unlearning Resilient to Relearning Attacks: A Sharpness-Aware Minimization Perspective and Beyond"](https://arxiv.org/abs/2502.05374) |
| |
|
| | ## Loading the Model |
| |
|
| | ```python |
| | import torch |
| | from transformers import AutoModelForCausalLM |
| | |
| | model = AutoModelForCausalLM.from_pretrained("OPTML-Group/GradDiff-SAM-WMDP", torch_dtype=torch.bfloat16, trust_remote_code=True) |
| | ``` |
| |
|
| | ## Citation |
| |
|
| | If you use this model in your research, please cite: |
| | ``` |
| | @article{fan2025towards, |
| | title={Towards LLM Unlearning Resilient to Relearning Attacks: A Sharpness-Aware Minimization Perspective and Beyond}, |
| | author={Fan, Chongyu and Jia, Jinghan and Zhang, Yihua and Ramakrishna, Anil and Hong, Mingyi and Liu, Sijia}, |
| | journal={arXiv preprint arXiv:2502.05374}, |
| | year={2025} |
| | } |
| | ``` |
| |
|
| | ## Reporting Issues |
| |
|
| | Reporting issues with the model: [github.com/OPTML-Group/Unlearn-Smooth](https://github.com/OPTML-Group/Unlearn-Smooth) |