richardr1126's picture
Upload ONNX optimized DeBERTa model with quantization
1dc2790 verified
---
language: multilingual
license: mit
tags:
- zero-shot-classification
- nli
- onnx
- optimized
- deberta-v3
base_model: MoritzLaurer/deberta-v3-large-zeroshot-v2.0
---
# DeBERTa-v3-large Zero-Shot Classification - ONNX
This is an ONNX-optimized version of [`MoritzLaurer/deberta-v3-large-zeroshot-v2.0`](https://huggingface.co/MoritzLaurer/deberta-v3-large-zeroshot-v2.0) for efficient inference.
## Model Description
This repository contains:
- **model.onnx**: Regular ONNX exported model
- **model_quantized.onnx**: INT8 dynamically quantized model for faster inference with minimal accuracy loss
The model is optimized for zero-shot classification tasks across multiple languages.
## Usage
### Zero-Shot Classification Pipeline (Recommended)
```python
from transformers import pipeline, AutoTokenizer
from optimum.onnxruntime import ORTModelForSequenceClassification
# Load the quantized model
model = ORTModelForSequenceClassification.from_pretrained(
"richardr1126/deberta-v3-large-zeroshot-v2.0-ONNX",
file_name="model_quantized.onnx"
)
tokenizer = AutoTokenizer.from_pretrained(
"richardr1126/deberta-v3-large-zeroshot-v2.0-ONNX"
)
# Patch the model's forward method to handle token_type_ids
original_forward = model.forward
def patched_forward(input_ids=None, attention_mask=None, token_type_ids=None, **kwargs):
return original_forward(input_ids=input_ids, attention_mask=attention_mask, **kwargs)
model.forward = patched_forward
# Create zero-shot classification pipeline
classifier = pipeline(
"zero-shot-classification",
model=model,
tokenizer=tokenizer,
device=-1 # CPU inference
)
# Define your labels
labels = ["politics", "technology", "sports", "entertainment", "business"]
# Classify text
text = "Apple announced their new AI chip with impressive performance gains."
result = classifier(
text,
candidate_labels=labels,
hypothesis_template="This text is about {}",
multi_label=True # Enable multi-label classification
)
print(f"Text: {text}")
for label, score in zip(result['labels'], result['scores']):
print(f" {label}: {score:.2%}")
```
### Using Regular ONNX Model
For the non-quantized model (larger but potentially slightly more accurate):
```python
model = ORTModelForSequenceClassification.from_pretrained(
"richardr1126/deberta-v3-large-zeroshot-v2.0-ONNX",
file_name="model.onnx"
)
# ... rest of the code is the same
```
## Performance
The quantized model provides:
- **Faster inference**: ~2-3x speedup compared to PyTorch
- **Smaller size**: Reduced model size due to INT8 quantization
- **Maintained accuracy**: Minimal accuracy loss (<1%) compared to the original model
## Original Model
This is an optimized version of the original model:
- **Base Model**: [MoritzLaurer/deberta-v3-large-zeroshot-v2.0](https://huggingface.co/MoritzLaurer/deberta-v3-large-zeroshot-v2.0)
- **Architecture**: DeBERTa-v3-large
- **Task**: Zero-shot classification / NLI
## Optimization Details
- **Export**: Converted from PyTorch to ONNX format
- **Quantization**: Dynamic quantization with INT8 weights
- **Framework**: ONNX Runtime with Optimum
## License
Same as the base model - MIT License
## Citation
If you use this model, please cite the original model:
```bibtex
@misc{laurer2022deberta,
author = {Laurer, Moritz and Atteveldt, Wouter van and Casas, Andreu Salleras and Welbers, Kasper},
title = {DeBERTa-v3-large Zero-Shot Classification},
year = {2022},
publisher = {Hugging Face},
url = {https://huggingface.co/MoritzLaurer/deberta-v3-large-zeroshot-v2.0}
}
```
## Acknowledgments
This ONNX optimization was created for efficient deployment in production environments. Special thanks to the original model authors and the Hugging Face Optimum team.