PawanEmbd-68M
A 68M parameter embedding model distilled from Granite-278M
Model Details
- Model Type: Sentence Embedding Model
- Architecture: Transformer-based encoder with projection layer
- Parameters: ~68 million
- Teacher Model: IBM Granite-278M Multilingual Embedding
- Training Method: Knowledge Distillation
- Output Dimensions: 768
- Max Sequence Length: 512 tokens
Training Details
This model was trained using knowledge distillation from the IBM Granite-278M teacher model on the All-NLI dataset (SNLI + MultiNLI).
Training Hyperparameters
- Dataset: sentence-transformers/all-nli (100K samples)
- Epochs: 20
- Batch Size: 32
- Learning Rate: 5e-4 with OneCycleLR scheduler
- Loss Function: Combined MSE + Cosine Similarity (Ξ±=0.5, Ξ²=0.5)
- Mixed Precision: FP16 (AMP)
- Hardware: NVIDIA T4 GPU
Usage
Using Transformers
from transformers import AutoModel, AutoTokenizer
import torch
import torch.nn.functional as F
# Load model and tokenizer
model = AutoModel.from_pretrained("dmedhi/PawanEmbd-68M")
tokenizer = AutoTokenizer.from_pretrained("dmedhi/PawanEmbd-68M")
# Encode sentences
sentences = ["This is an example sentence", "Each sentence is converted to a vector"]
encoded = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
# Get embeddings
with torch.no_grad():
outputs = model(**encoded)
embeddings = outputs.pooler_output # Already normalized
# Compute similarity
similarity = F.cosine_similarity(embeddings[0:1], embeddings[1:2])
print(f"Similarity: {similarity.item():.4f}")
Using Sentence-Transformers
from sentence_transformers import SentenceTransformer
from sentence_transformers.util import cos_sim
# Load your model (should work now!)
model = SentenceTransformer("dmedhi/PawanEmbd-68M")
# Test encoding
sentences = ["This is an example sentence", "Each sentence is converted to a vector"]
embeddings = model.encode(sentences)
print(f"β
Embeddings shape: {embeddings.shape}")
# Compute similarity
similarity = cos_sim(embeddings[0], embeddings[1])
print(f"β
Similarity: {similarity.item():.4f}")
Performance
Comparison with Teacher Model
| Metric | Teacher (Granite-278M) | Student (PawanEmbd-68M) |
|---|---|---|
| Parameters | 278M | 68M (4.1x smaller) |
| Model Size | ~1.1 GB | ~258.7 MB |
| Inference Speed (CPU) | 269.57 ms | 11.57 (23.3x faster) |
| Inference Speed (GPU) | 17.94.57 ms | 2.75 (6.5x faster) |
| Cosine Similarity | 1.000 | 0.943 |
Intended Uses
This model is suitable for:
β
Semantic Search: Find similar documents or passages
β
Clustering: Group similar texts together
β
Duplicate Detection: Identify near-duplicate content
β
Recommendation Systems: Find similar items
β
Question Answering: Retrieve relevant passages
β
Sentence Similarity: Measure semantic similarity between texts
Training Code
The model was trained using PyTorch with knowledge distillation. Training code available at: TODO
Citation
@misc{pawanembdmodel2025,
author = {Dipankar Medhi},
title = {PawanEmbd: A Lightweight Embedding Model via Knowledge Distillation},
year = {2025},
publisher = {Hugging Face},
howpublished = { \url{https://huggingface.co/dmedhi/PawanEmbd-68M} }
}
Acknowledgments
- Teacher model: IBM Granite-278M
- Training data: Sentence-Transformers All-NLI
- Framework: Hugging Face Transformers & PyTorch
License
Apache 2.0
Contact
For questions or feedback, please open an issue on Github.
- Downloads last month
- 158