| | --- |
| | license: mit |
| | extra_gated_fields: |
| | Name: text |
| | Company: text |
| | Country: country |
| | Specific date: date_picker |
| | I want to use this model for: |
| | type: select |
| | options: |
| | - Research |
| | - Education |
| | - label: Other |
| | value: other |
| | I agree to include the authors of the code (Tianlai Chen and Pranam Chatterjee) as authors on manuscripts with data from designed peptides: checkbox |
| | I agree to share generated sequences and associated data with authors before publishing: checkbox |
| | I agree not to file patents on any sequences generated by this model: checkbox |
| | I agree to use this model for non-commercial use ONLY: checkbox |
| | --- |
| | **PepMLM: Target Sequence-Conditioned Generation of Peptide Binders via Masked Language Modeling** |
| |
|
| |  |
| | In this work, we introduce **PepMLM**, a purely target sequence-conditioned *de novo* generator of linear peptide binders. |
| | By employing a novel masking strategy that uniquely positions cognate peptide sequences at the terminus of target protein sequences, |
| | PepMLM tasks the state-of-the-art ESM-2 pLM to fully reconstruct the binder region, |
| | achieving low perplexities matching or improving upon previously-validated peptide-protein sequence pairs. |
| | After successful *in silico* benchmarking with AlphaFold-Multimer, we experimentally verify PepMLM’s efficacy via fusion of model-derived peptides to E3 ubiquitin ligase domains, demonstrating endogenous degradation of target substrates in cellular models. |
| | In total, PepMLM enables the generative design of candidate binders to any target protein, without the requirement of target structure, empowering downstream programmable proteome editing applications. |
| |
|
| | - Demo: HuggingFace Space Demo [Link](https://huggingface.co/spaces/TianlaiChen/PepMLM). |
| | - Colab Notebook: [Link](https://colab.research.google.com/drive/1u0i-LBog_lvQ5YRKs7QLKh_RtI-tV8qM?usp=sharing) |
| | - Preprint: [Link](https://arxiv.org/abs/2310.03842) |
| | - Nature Biotechnology: [Link](https://www.nature.com/articles/s41587-025-02761-2) |
| |
|
| | ``` |
| | # Load model directly |
| | from transformers import AutoTokenizer, AutoModelForMaskedLM |
| | |
| | tokenizer = AutoTokenizer.from_pretrained("TianlaiChen/PepMLM-650M") |
| | model = AutoModelForMaskedLM.from_pretrained("TianlaiChen/PepMLM-650M") |
| | ``` |
| |  |