Model - GvEM (Genomic Variant Embedding Model)

GvEM is a PyTorch-based deep learning model designed to embed and model genomic mutation data from VCF (Variant Call Format) files using a biologically-informed hierarchy: Pathway → Chromosome → Gene → Mutations

Hierarchy of input data

example_data = { 'sample1': { 'pathway1': { 'chr1': { 'gene1': [ { 'impact': 'HIGH', 'reference': 'A', 'alternate': 'T' } ] } } } }

Features

VCF Parser: Converts standard VCF files into a hierarchical JSON-like structure.
MutationEmbedder: Learns embeddings for categorical mutation features (scalable).
GeneEncoder: Processes lists of mutations using Transformer and heirarchical attention to get gene-level representations.
ChromosomeEncoder: Aggregates gene encodings.
PathwayEncoder: Aggregates chromosome encodings to yield final sample representation.
Scalable: Easily extensible to new fields or biological groupings.
HuggingFace Compatible: Designed for sharing and experimentation on the 🤗 Hub.

Uses

Direct Use :

Obtain sample level embeddings
Mutation pattern learning
Transfer learning across genomic datasets

Downstream Use :

Variant-based disease prediction (e.g., cancer, rare diseases, ASD)
Multi-omics fusion models (tabular + image + VCF)
Cohort level mutation analysis
Fine-tuning for prognosis, drug response prediction, or variant effect interpretation.

Limitations

Use in clinical decision-making without expert oversight.
Input variants must already be annotated.
Application to non-human genomes, unless explicitly fine-tuned for those organisms.
High-resolution functional variant prediction - FUTURE DEVELOPMENT TO BE MADE

MODEL STILL UNDER DEVELOPMENT

Downloads last month: -; Downloads are not tracked for this model. How to track