---
language: multilingual
license: mit
tags:
- zero-shot-classification
- nli
- onnx
- optimized
- deberta-v3
base_model: MoritzLaurer/deberta-v3-large-zeroshot-v2.0
---

# DeBERTa-v3-large Zero-Shot Classification - ONNX

This is an ONNX-optimized version of [`MoritzLaurer/deberta-v3-large-zeroshot-v2.0`](https://huggingface.co/MoritzLaurer/deberta-v3-large-zeroshot-v2.0) for efficient inference.

## Model Description

This repository contains:
- **model.onnx**: Regular ONNX exported model
- **model_quantized.onnx**: INT8 dynamically quantized model for faster inference with minimal accuracy loss

The model is optimized for zero-shot classification tasks across multiple languages.

## Usage

### Zero-Shot Classification Pipeline (Recommended)

```python
from transformers import pipeline, AutoTokenizer
from optimum.onnxruntime import ORTModelForSequenceClassification

# Load the quantized model
model = ORTModelForSequenceClassification.from_pretrained(
    "richardr1126/deberta-v3-large-zeroshot-v2.0-ONNX",
    file_name="model_quantized.onnx"
)

tokenizer = AutoTokenizer.from_pretrained(
    "richardr1126/deberta-v3-large-zeroshot-v2.0-ONNX"
)

# Patch the model's forward method to handle token_type_ids
original_forward = model.forward
def patched_forward(input_ids=None, attention_mask=None, token_type_ids=None, **kwargs):
    return original_forward(input_ids=input_ids, attention_mask=attention_mask, **kwargs)
model.forward = patched_forward

# Create zero-shot classification pipeline
classifier = pipeline(
    "zero-shot-classification",
    model=model,
    tokenizer=tokenizer,
    device=-1  # CPU inference
)

# Define your labels
labels = ["politics", "technology", "sports", "entertainment", "business"]

# Classify text
text = "Apple announced their new AI chip with impressive performance gains."
result = classifier(
    text,
    candidate_labels=labels,
    hypothesis_template="This text is about {}",
    multi_label=True  # Enable multi-label classification
)

print(f"Text: {text}")
for label, score in zip(result['labels'], result['scores']):
    print(f"  {label}: {score:.2%}")
```

### Using Regular ONNX Model

For the non-quantized model (larger but potentially slightly more accurate):

```python
model = ORTModelForSequenceClassification.from_pretrained(
    "richardr1126/deberta-v3-large-zeroshot-v2.0-ONNX",
    file_name="model.onnx"
)
# ... rest of the code is the same
```

## Performance

The quantized model provides:
- **Faster inference**: ~2-3x speedup compared to PyTorch
- **Smaller size**: Reduced model size due to INT8 quantization
- **Maintained accuracy**: Minimal accuracy loss (<1%) compared to the original model

## Original Model

This is an optimized version of the original model:
- **Base Model**: [MoritzLaurer/deberta-v3-large-zeroshot-v2.0](https://huggingface.co/MoritzLaurer/deberta-v3-large-zeroshot-v2.0)
- **Architecture**: DeBERTa-v3-large
- **Task**: Zero-shot classification / NLI

## Optimization Details

- **Export**: Converted from PyTorch to ONNX format
- **Quantization**: Dynamic quantization with INT8 weights
- **Framework**: ONNX Runtime with Optimum

## License

Same as the base model - MIT License

## Citation

If you use this model, please cite the original model:

```bibtex
@misc{laurer2022deberta,
  author = {Laurer, Moritz and Atteveldt, Wouter van and Casas, Andreu Salleras and Welbers, Kasper},
  title = {DeBERTa-v3-large Zero-Shot Classification},
  year = {2022},
  publisher = {Hugging Face},
  url = {https://huggingface.co/MoritzLaurer/deberta-v3-large-zeroshot-v2.0}
}
```

## Acknowledgments

This ONNX optimization was created for efficient deployment in production environments. Special thanks to the original model authors and the Hugging Face Optimum team.