Model - GvEM (Genomic Variant Embedding Model)

GvEM is a PyTorch-based deep learning model designed to embed and model genomic mutation data from VCF (Variant Call Format) files using a biologically-informed hierarchy: Pathway β†’ Chromosome β†’ Gene β†’ Mutations


Hierarchy of input data

example_data = { 'sample1': { 'pathway1': { 'chr1': { 'gene1': [ { 'impact': 'HIGH', 'reference': 'A', 'alternate': 'T' } ] } } } }


Features

  • VCF Parser: Converts standard VCF files into a hierarchical JSON-like structure.
  • MutationEmbedder: Learns embeddings for categorical mutation features (scalable).
  • GeneEncoder: Processes lists of mutations using Transformer and heirarchical attention to get gene-level representations.
  • ChromosomeEncoder: Aggregates gene encodings.
  • PathwayEncoder: Aggregates chromosome encodings to yield final sample representation.
  • Scalable: Easily extensible to new fields or biological groupings.
  • HuggingFace Compatible: Designed for sharing and experimentation on the πŸ€— Hub.

Uses

Direct Use :

  • Obtain sample level embeddings
  • Mutation pattern learning
  • Transfer learning across genomic datasets

Downstream Use :

  • Variant-based disease prediction (e.g., cancer, rare diseases, ASD)
  • Multi-omics fusion models (tabular + image + VCF)
  • Cohort level mutation analysis
  • Fine-tuning for prognosis, drug response prediction, or variant effect interpretation.

Limitations

  • Use in clinical decision-making without expert oversight.
  • Input variants must already be annotated.
  • Application to non-human genomes, unless explicitly fine-tuned for those organisms.
  • High-resolution functional variant prediction - FUTURE DEVELOPMENT TO BE MADE

MODEL STILL UNDER DEVELOPMENT

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support