| | --- |
| | extra_gated_fields: |
| | Name: text |
| | Company: text |
| | Country: country |
| | Specific date: date_picker |
| | I want to use this model for: |
| | type: select |
| | options: |
| | - Research |
| | - Education |
| | - label: Other |
| | value: other |
| | extra_gated_prompt: "MOG-DFM License: https://drive.google.com/file/d/1LJuGrsRZMoqsrZa1gSfsCihiih5MPVRA/view?usp=sharing" |
| | extra_gated_heading: Acknowledge license to access the repository |
| | extra_gated_button_content: Acknowledge license |
| | --- |
| | |
| | # Multi-Objective-Guided Discrete Flow Matching for Controllable Biological Sequence Design |
| |
|
| | arXiv Paper: <https://arxiv.org/abs/2505.07086> |
| |
|
| | Designing biological sequences that satisfy multiple, often conflicting, functional and biophysical criteria remains a central challenge in biomolecule engineering. While discrete flow matching models have recently shown promise for efficient sampling in high-dimensional sequence spaces, existing approaches address only single objectives or require continuous embeddings that can distort discrete distributions. We present Multi-Objective-Guided Discrete Flow Matching (MOG-DFM), a general framework to steer any pretrained discrete-time flow matching generator toward Pareto-efficient trade-offs across multiple scalar objectives. At each sampling step, MOG-DFM computes a hybrid rank-directional score for candidate transitions and applies an adaptive hypercone filter to enforce consistent multi-objective progression. We also trained two unconditional discrete flow matching models, PepDFM for diverse peptide generation and EnhancerDFM for functional enhancer DNA generation, as base generation models for MOG-DFM. We demonstrate MOG-DFM's effectiveness in generating peptide binders optimized across five properties (hemolysis, non-fouling, solubility, half-life, and binding affinity), and in designing DNA sequences with specific enhancer classes and DNA shapes. In total, MOG-DFM proves to be a powerful tool for multi-property-guided biomolecule sequence design. |
| |
|
| |  |
| |
|
| | ## Usage |
| |
|
| | ### 0. Conda Environment |
| |
|
| | ``` |
| | conda create -n mog-dfm python=3.9 |
| | conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.4 -c pytorch -c nvidia |
| | pip install fair-esm transformers xgboost datasets torchdiffeq |
| | ``` |
| |
|
| | To use Deep DNAshape, please create another conda environment called `deepDNAshape` following [the guidance of its repository](https://github.com/JinsenLi/deepDNAshape?tab=readme-ov-file#installation). |
| |
|
| |
|
| | ### 1. PepDFM and EnhancerDFM training and evaluation |
| | The pretrained weights for PepDFM and EnhancerDFM are available in the `ckpt` directory. |
| |
|
| | The data for PepDFM and EnhancerDFM training are available in the `dataset` directory. |
| |
|
| | We also provide the complete training and evaluation code for both models. |
| |
|
| | ### 2. Multi-Objective Guided Generation |
| |
|
| | #### 2.0 Score Models |
| |
|
| | The pretrained weights for the score models (hemolysis, non-fouling, solubility, half-life, binding affinity, and enhancer class) are available in the `classifier_ckpt` directory. |
| |
|
| | Prediction scripts for each score model are provided in the `classifier_code` directory. |
| |
|
| | #### 2.1 Peptide Generation Task |
| |
|
| | Example command for peptide generation guided by multiple objectives (hemolysis, non-fouling, solubility, half-life, and binding affinity): |
| | ``` |
| | python PepDFM_multi_objective_generation.py --is_peptide True --T 100 --n_samples 5 --n_batches 10 --length 10 --target_protein GSHMIEPNVISVRLFKRKVGGLGFLVKERVSKPPVIISDLIRGGAAEQSGLIQAGDIILAVNDRPLVDLSYDSALEVLRGIASETHVVLILRGPEGFTTHLETTFTGDGTPKTIRVTQPLGPPTKAV |
| | ``` |
| |
|
| | Note that the hemolysis model outputs one minus the actual hemolysis score, and the half-life model outputs the base-10 logarithm of the half-life in hours. |
| |
|
| | The guidance settings and their importance weights can be found and modified in `PepDFM_multi_objective_generation.py` |
| |
|
| | #### 2.2 Enhancer DNA Generation Task |
| |
|
| | Example command for enhancer DNA generation guided by the enhancer class and DNA shape: |
| | ``` |
| | python EnhancerDFM_multi_objective_generation.py --is_peptide False --T 800 --n_samples 5 --n_batches 10 --length 100 --target_enhancer_class 0 --target_DNA_shape HelT |
| | ``` |
| |
|
| | The guidance settings and their importance weights can be found and modified in `EnhancerDFM_multi_objective_generation.py` |
| |
|
| | To use this repository, you agree to abide by the [MOG-DFM License](https://drive.google.com/file/d/1LJuGrsRZMoqsrZa1gSfsCihiih5MPVRA/view?usp=sharing). |