Upload ONNX optimized DeBERTa model with quantization

1dc2790 verified about 2 months ago

3.78 kB

metadata

language: multilingual
license: mit
tags:
  - zero-shot-classification
  - nli
  - onnx
  - optimized
  - deberta-v3
base_model: MoritzLaurer/deberta-v3-large-zeroshot-v2.0

DeBERTa-v3-large Zero-Shot Classification - ONNX

This is an ONNX-optimized version of MoritzLaurer/deberta-v3-large-zeroshot-v2.0 for efficient inference.

Model Description

This repository contains:

model.onnx: Regular ONNX exported model
model_quantized.onnx: INT8 dynamically quantized model for faster inference with minimal accuracy loss

The model is optimized for zero-shot classification tasks across multiple languages.

Usage

Zero-Shot Classification Pipeline (Recommended)

from transformers import pipeline, AutoTokenizer
from optimum.onnxruntime import ORTModelForSequenceClassification

# Load the quantized model
model = ORTModelForSequenceClassification.from_pretrained(
    "richardr1126/deberta-v3-large-zeroshot-v2.0-ONNX",
    file_name="model_quantized.onnx"
)

tokenizer = AutoTokenizer.from_pretrained(
    "richardr1126/deberta-v3-large-zeroshot-v2.0-ONNX"
)

# Patch the model's forward method to handle token_type_ids
original_forward = model.forward
def patched_forward(input_ids=None, attention_mask=None, token_type_ids=None, **kwargs):
    return original_forward(input_ids=input_ids, attention_mask=attention_mask, **kwargs)
model.forward = patched_forward

# Create zero-shot classification pipeline
classifier = pipeline(
    "zero-shot-classification",
    model=model,
    tokenizer=tokenizer,
    device=-1  # CPU inference
)

# Define your labels
labels = ["politics", "technology", "sports", "entertainment", "business"]

# Classify text
text = "Apple announced their new AI chip with impressive performance gains."
result = classifier(
    text,
    candidate_labels=labels,
    hypothesis_template="This text is about {}",
    multi_label=True  # Enable multi-label classification
)

print(f"Text: {text}")
for label, score in zip(result['labels'], result['scores']):
    print(f"  {label}: {score:.2%}")

Using Regular ONNX Model

For the non-quantized model (larger but potentially slightly more accurate):

model = ORTModelForSequenceClassification.from_pretrained(
    "richardr1126/deberta-v3-large-zeroshot-v2.0-ONNX",
    file_name="model.onnx"
)
# ... rest of the code is the same

Performance

The quantized model provides:

Faster inference: ~2-3x speedup compared to PyTorch
Smaller size: Reduced model size due to INT8 quantization
Maintained accuracy: Minimal accuracy loss (<1%) compared to the original model

Original Model

This is an optimized version of the original model:

Base Model: MoritzLaurer/deberta-v3-large-zeroshot-v2.0
Architecture: DeBERTa-v3-large
Task: Zero-shot classification / NLI

Optimization Details

Export: Converted from PyTorch to ONNX format
Quantization: Dynamic quantization with INT8 weights
Framework: ONNX Runtime with Optimum

License

Same as the base model - MIT License

Citation

If you use this model, please cite the original model:

@misc{laurer2022deberta,
  author = {Laurer, Moritz and Atteveldt, Wouter van and Casas, Andreu Salleras and Welbers, Kasper},
  title = {DeBERTa-v3-large Zero-Shot Classification},
  year = {2022},
  publisher = {Hugging Face},
  url = {https://huggingface.co/MoritzLaurer/deberta-v3-large-zeroshot-v2.0}
}

Acknowledgments

This ONNX optimization was created for efficient deployment in production environments. Special thanks to the original model authors and the Hugging Face Optimum team.