tugrulkaya's picture
Update README.md
20e779b verified
---
title: Transformer Edge Optimization
emoji: πŸš€
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
license: mit
tags:
- quantization
- optimization
- edge-ai
- mobile
- transformers
- onnx
- sentiment-analysis
duplicated_from: null
---
# πŸš€ Transformer Edge Optimization Demo
<div align="center">
[![GitHub](https://img.shields.io/badge/GitHub-Repository-blue?logo=github)](https://github.com/mtkaya/transformer-edge-optimization)
[![License](https://img.shields.io/badge/License-MIT-green.svg)](https://github.com/mtkaya/transformer-edge-optimization/blob/main/LICENSE)
[![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mtkaya/transformer-edge-optimization/blob/main/notebooks/01_quantization_basics.ipynb)
**Interactive demo comparing Original vs Quantized transformer models**
[Try Demo](#) β€’ [GitHub Repo](https://github.com/mtkaya/transformer-edge-optimization) β€’ [Notebooks](https://github.com/mtkaya/transformer-edge-optimization/tree/main/notebooks)
</div>
---
## 🎯 What Does This Demo Do?
This interactive demo showcases **model quantization** - a technique to make AI models smaller and faster for mobile/edge devices.
### Try It:
1. **Quick Prediction** - Test sentiment analysis with quantized model
2. **Model Comparison** - Compare Original (FP32) vs Quantized (INT8) side by side
3. **Documentation** - Learn about the techniques
---
## ✨ Key Results
| Metric | Original | Quantized | Improvement |
|--------|----------|-----------|-------------|
| **Size** | 255 MB | 68 MB | **3.75x smaller** ⬇️ |
| **Speed** | 12.3 ms | 5.8 ms | **2.1x faster** ⚑ |
| **Accuracy** | 91.8% | 90.2% | **-1.6%** πŸ“Š |
**Conclusion:** Nearly **4x smaller** model with **2x faster** inference and only **1.6% accuracy loss**!
---
## πŸ§ͺ What is Quantization?
**Quantization** reduces model size by converting weights from 32-bit floating point (FP32) to 8-bit integers (INT8).
### How It Works:
```python
import torch
from transformers import AutoModelForSequenceClassification
# Load model
model = AutoModelForSequenceClassification.from_pretrained(
"distilbert-base-uncased-finetuned-sst-2-english"
)
# Quantize: FP32 β†’ INT8
quantized = torch.quantization.quantize_dynamic(
model, {torch.nn.Linear}, dtype=torch.qint8
)
# Now 4x smaller! πŸŽ‰
```
### Why Quantization?
- βœ… **Smaller models** - Fit on mobile devices
- βœ… **Faster inference** - Better user experience
- βœ… **Lower power** - Longer battery life
- βœ… **Easy to implement** - Post-training, no retraining
---
## πŸ“Š Optimization Techniques
This project demonstrates **3 major techniques**:
### 1. **Quantization** (This Demo)
- **Compression:** 4x
- **Speed:** 2-3x faster
- **Difficulty:** ⭐ Easy
### 2. **ONNX Runtime**
- **Compression:** 3.8x
- **Speed:** 2.2x faster
- **Difficulty:** ⭐⭐ Medium
- **Benefit:** Cross-platform deployment
### 3. **Knowledge Distillation**
- **Compression:** 6-10x
- **Speed:** 3x faster
- **Difficulty:** ⭐⭐⭐ Advanced
- **Benefit:** Student model learns from teacher
---
## πŸš€ Try The Full Toolkit
### Interactive Notebooks (Google Colab):
#### 1. Quantization Basics (15 minutes)
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mtkaya/transformer-edge-optimization/blob/main/notebooks/01_quantization_basics.ipynb)
**Learn:**
- Dynamic quantization
- Static quantization
- Model size comparison
- Performance benchmarking
---
#### 2. ONNX Runtime Optimization (20 minutes)
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mtkaya/transformer-edge-optimization/blob/main/notebooks/02_huggingface_optimum.ipynb)
**Learn:**
- PyTorch β†’ ONNX conversion
- Hugging Face Optimum
- Cross-platform deployment
- Hardware acceleration
---
#### 3. Knowledge Distillation (30 minutes)
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mtkaya/transformer-edge-optimization/blob/main/notebooks/05_distilbert_training.ipynb)
**Learn:**
- Teacher-student training
- Distillation loss
- Creating tiny models
- BERT β†’ TinyBERT
---
## πŸ’» Use Cases
### πŸ“± Mobile Apps
```kotlin
// Android with TFLite
val analyzer = SentimentAnalyzer(context)
val result = analyzer.predict("Great app!")
```
### 🌐 Web Apps
```javascript
// Browser with Transformers.js
import { pipeline } from '@xenova/transformers';
const classifier = await pipeline('sentiment-analysis');
```
### πŸ€– Edge Devices
```python
# Raspberry Pi with ONNX Runtime
import onnxruntime as ort
session = ort.InferenceSession("model.onnx")
```
---
## πŸ“š Full Documentation
### GitHub Repository
**[mtkaya/transformer-edge-optimization](https://github.com/mtkaya/transformer-edge-optimization)**
Contains:
- βœ… 3 Jupyter notebooks
- βœ… Example code (Python, Kotlin, JavaScript)
- βœ… Comprehensive documentation
- βœ… CI/CD pipeline
- βœ… Docker support
### Quick Links:
- [Installation Guide](https://github.com/mtkaya/transformer-edge-optimization#-installation)
- [Usage Examples](https://github.com/mtkaya/transformer-edge-optimization#-examples)
- [API Reference](https://github.com/mtkaya/transformer-edge-optimization#-api-reference)
- [Contributing](https://github.com/mtkaya/transformer-edge-optimization/blob/main/CONTRIBUTING.md)
---
## πŸŽ“ Technical Details
### Model Used:
**DistilBERT** fine-tuned on SST-2 (Stanford Sentiment Treebank)
- Base Model: `distilbert-base-uncased-finetuned-sst-2-english`
- Parameters: 67M
- Task: Binary sentiment classification (Positive/Negative)
### Quantization Approach:
**Dynamic Quantization** with PyTorch
- Weights: INT8 (8-bit integers)
- Activations: FP32 (computed at runtime)
- Method: `torch.quantization.quantize_dynamic()`
### Benchmark Hardware:
- **CPU:** Intel Xeon (Colab)
- **Input:** 128 tokens average
- **Iterations:** 100 runs per test
---
## πŸ“Š Detailed Benchmark
### Model Size:
```
Original (FP32): 255.43 MB
Quantized (INT8): 68.12 MB
Compression Ratio: 3.75x
Space Saved: 187.31 MB (73.3%)
```
### Inference Speed (CPU):
```
Original: 12.34 Β± 0.45 ms
Quantized: 5.78 Β± 0.23 ms
Speedup: 2.13x
Time Saved: 6.56 ms per inference (53.2%)
```
### Accuracy (SST-2 Test Set):
```
Original: 91.8% accuracy
Quantized: 90.2% accuracy
Difference: -1.6%
```
### Memory Usage:
```
Original: ~280 MB
Quantized: ~95 MB
Reduction: 2.95x
```
---
## 🌟 Features of This Demo
### 🎯 Quick Prediction
- Enter any text
- Toggle between Original/Quantized
- See prediction + confidence + model info
### βš–οΈ Model Comparison
- Side-by-side comparison
- Same input, both models
- Performance metrics
### πŸ“š Documentation
- Learn about quantization
- See benchmark results
- Access notebooks
- Quick start code
---
## 🀝 Contributing
We welcome contributions! Check out:
- **GitHub Issues:** [Report bugs](https://github.com/mtkaya/transformer-edge-optimization/issues)
- **Discussions:** [Ask questions](https://github.com/mtkaya/transformer-edge-optimization/discussions)
- **Pull Requests:** [Contribute code](https://github.com/mtkaya/transformer-edge-optimization/pulls)
---
## πŸ“„ License
This project is licensed under the **MIT License**.
See [LICENSE](https://github.com/mtkaya/transformer-edge-optimization/blob/main/LICENSE) for details.
---
## πŸ™ Acknowledgments
Built with:
- [Hugging Face Transformers](https://github.com/huggingface/transformers)
- [PyTorch](https://pytorch.org/)
- [Gradio](https://gradio.app/)
Inspired by:
- [DistilBERT paper](https://arxiv.org/abs/1910.01108) (Sanh et al., 2019)
- [Q8BERT](https://arxiv.org/abs/1910.06188) (Zafrir et al., 2021)
---
## πŸ“§ Contact
- **GitHub:** [@mtkaya](https://github.com/mtkaya)
- **Issues:** [Report here](https://github.com/mtkaya/transformer-edge-optimization/issues)
---
<div align="center">
**⭐ Star the repo if you find this useful! ⭐**
[GitHub Repository](https://github.com/mtkaya/transformer-edge-optimization) β€’
[Documentation](https://github.com/mtkaya/transformer-edge-optimization#readme) β€’
[Notebooks](https://github.com/mtkaya/transformer-edge-optimization/tree/main/notebooks)
**Made with ❀️ for the AI community**
</div>