|
|
--- |
|
|
language: multilingual |
|
|
license: mit |
|
|
tags: |
|
|
- zero-shot-classification |
|
|
- nli |
|
|
- onnx |
|
|
- optimized |
|
|
- deberta-v3 |
|
|
base_model: MoritzLaurer/deberta-v3-large-zeroshot-v2.0 |
|
|
--- |
|
|
|
|
|
# DeBERTa-v3-large Zero-Shot Classification - ONNX |
|
|
|
|
|
This is an ONNX-optimized version of [`MoritzLaurer/deberta-v3-large-zeroshot-v2.0`](https://huggingface.co/MoritzLaurer/deberta-v3-large-zeroshot-v2.0) for efficient inference. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
This repository contains: |
|
|
- **model.onnx**: Regular ONNX exported model |
|
|
- **model_quantized.onnx**: INT8 dynamically quantized model for faster inference with minimal accuracy loss |
|
|
|
|
|
The model is optimized for zero-shot classification tasks across multiple languages. |
|
|
|
|
|
## Usage |
|
|
|
|
|
### Zero-Shot Classification Pipeline (Recommended) |
|
|
|
|
|
```python |
|
|
from transformers import pipeline, AutoTokenizer |
|
|
from optimum.onnxruntime import ORTModelForSequenceClassification |
|
|
|
|
|
# Load the quantized model |
|
|
model = ORTModelForSequenceClassification.from_pretrained( |
|
|
"richardr1126/deberta-v3-large-zeroshot-v2.0-ONNX", |
|
|
file_name="model_quantized.onnx" |
|
|
) |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained( |
|
|
"richardr1126/deberta-v3-large-zeroshot-v2.0-ONNX" |
|
|
) |
|
|
|
|
|
# Patch the model's forward method to handle token_type_ids |
|
|
original_forward = model.forward |
|
|
def patched_forward(input_ids=None, attention_mask=None, token_type_ids=None, **kwargs): |
|
|
return original_forward(input_ids=input_ids, attention_mask=attention_mask, **kwargs) |
|
|
model.forward = patched_forward |
|
|
|
|
|
# Create zero-shot classification pipeline |
|
|
classifier = pipeline( |
|
|
"zero-shot-classification", |
|
|
model=model, |
|
|
tokenizer=tokenizer, |
|
|
device=-1 # CPU inference |
|
|
) |
|
|
|
|
|
# Define your labels |
|
|
labels = ["politics", "technology", "sports", "entertainment", "business"] |
|
|
|
|
|
# Classify text |
|
|
text = "Apple announced their new AI chip with impressive performance gains." |
|
|
result = classifier( |
|
|
text, |
|
|
candidate_labels=labels, |
|
|
hypothesis_template="This text is about {}", |
|
|
multi_label=True # Enable multi-label classification |
|
|
) |
|
|
|
|
|
print(f"Text: {text}") |
|
|
for label, score in zip(result['labels'], result['scores']): |
|
|
print(f" {label}: {score:.2%}") |
|
|
``` |
|
|
|
|
|
### Using Regular ONNX Model |
|
|
|
|
|
For the non-quantized model (larger but potentially slightly more accurate): |
|
|
|
|
|
```python |
|
|
model = ORTModelForSequenceClassification.from_pretrained( |
|
|
"richardr1126/deberta-v3-large-zeroshot-v2.0-ONNX", |
|
|
file_name="model.onnx" |
|
|
) |
|
|
# ... rest of the code is the same |
|
|
``` |
|
|
|
|
|
## Performance |
|
|
|
|
|
The quantized model provides: |
|
|
- **Faster inference**: ~2-3x speedup compared to PyTorch |
|
|
- **Smaller size**: Reduced model size due to INT8 quantization |
|
|
- **Maintained accuracy**: Minimal accuracy loss (<1%) compared to the original model |
|
|
|
|
|
## Original Model |
|
|
|
|
|
This is an optimized version of the original model: |
|
|
- **Base Model**: [MoritzLaurer/deberta-v3-large-zeroshot-v2.0](https://huggingface.co/MoritzLaurer/deberta-v3-large-zeroshot-v2.0) |
|
|
- **Architecture**: DeBERTa-v3-large |
|
|
- **Task**: Zero-shot classification / NLI |
|
|
|
|
|
## Optimization Details |
|
|
|
|
|
- **Export**: Converted from PyTorch to ONNX format |
|
|
- **Quantization**: Dynamic quantization with INT8 weights |
|
|
- **Framework**: ONNX Runtime with Optimum |
|
|
|
|
|
## License |
|
|
|
|
|
Same as the base model - MIT License |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model, please cite the original model: |
|
|
|
|
|
```bibtex |
|
|
@misc{laurer2022deberta, |
|
|
author = {Laurer, Moritz and Atteveldt, Wouter van and Casas, Andreu Salleras and Welbers, Kasper}, |
|
|
title = {DeBERTa-v3-large Zero-Shot Classification}, |
|
|
year = {2022}, |
|
|
publisher = {Hugging Face}, |
|
|
url = {https://huggingface.co/MoritzLaurer/deberta-v3-large-zeroshot-v2.0} |
|
|
} |
|
|
``` |
|
|
|
|
|
## Acknowledgments |
|
|
|
|
|
This ONNX optimization was created for efficient deployment in production environments. Special thanks to the original model authors and the Hugging Face Optimum team. |
|
|
|