AAVGen: Precision Engineering of Adeno-associated Viral Capsids for Renal Selective Targeting
Abstract
AAVGen is a generative AI framework that designs AAV capsids with improved traits through protein language models, supervised fine-tuning, and reinforcement learning techniques.
Adeno-associated viruses (AAVs) are promising vectors for gene therapy, but their native serotypes face limitations in tissue tropism, immune evasion, and production efficiency. Engineering capsids to overcome these hurdles is challenging due to the vast sequence space and the difficulty of simultaneously optimizing multiple functional properties. The complexity also adds when it comes to the kidney, which presents unique anatomical barriers and cellular targets that require precise and efficient vector engineering. Here, we present AAVGen, a generative artificial intelligence framework for de novo design of AAV capsids with enhanced multi-trait profiles. AAVGen integrates a protein language model (PLM) with supervised fine-tuning (SFT) and a reinforcement learning technique termed Group Sequence Policy Optimization (GSPO). The model is guided by a composite reward signal derived from three ESM-2-based regression predictors, each trained to predict a key property: production fitness, kidney tropism, and thermostability. Our results demonstrate that AAVGen produces a diverse library of novel VP1 protein sequences. In silico validations revealed that the majority of the generated variants have superior performance across all three employed indices, indicating successful multi-objective optimization. Furthermore, structural analysis via AlphaFold3 confirms that the generated sequences preserve the canonical capsid folding despite sequence diversification. AAVGen establishes a foundation for data-driven viral vector engineering, accelerating the development of next-generation AAV vectors with tailored functional characteristics.
Community
AAVGen is a generative AI framework for designing adeno-associated virus (AAV) capsids with enhanced multi-trait profiles. By integrating a protein language model (ProtGPT2) with supervised fine-tuning and group sequence policy optimization (GSPO), the authors successfully generate diverse VP1 variants that simultaneously optimize for production fitness, kidney tropism, and thermostability.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Exploring the limits of pre-trained embeddings in machine-guided protein design: a case study on predicting AAV vector viability (2026)
- RIDER: 3D RNA Inverse Design with Reinforcement Learning-Guided Diffusion (2026)
- Structure-based RNA Design by Step-wise Optimization of Latent Diffusion Model (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Models citing this paper 4
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper