Upload ONNX optimized DeBERTa model with quantization

1dc2790 verified 2 months ago

3.78 kB

	---
	language: multilingual
	license: mit
	tags:
	- zero-shot-classification
	- nli
	- onnx
	- optimized
	- deberta-v3
	base_model: MoritzLaurer/deberta-v3-large-zeroshot-v2.0
	---

	# DeBERTa-v3-large Zero-Shot Classification - ONNX

	This is an ONNX-optimized version of [`MoritzLaurer/deberta-v3-large-zeroshot-v2.0`](https://huggingface.co/MoritzLaurer/deberta-v3-large-zeroshot-v2.0) for efficient inference.

	## Model Description

	This repository contains:
	- model.onnx: Regular ONNX exported model
	- model_quantized.onnx: INT8 dynamically quantized model for faster inference with minimal accuracy loss

	The model is optimized for zero-shot classification tasks across multiple languages.

	## Usage

	### Zero-Shot Classification Pipeline (Recommended)

	```python
	from transformers import pipeline, AutoTokenizer
	from optimum.onnxruntime import ORTModelForSequenceClassification

	# Load the quantized model
	model = ORTModelForSequenceClassification.from_pretrained(
	"richardr1126/deberta-v3-large-zeroshot-v2.0-ONNX",
	file_name="model_quantized.onnx"
	)

	tokenizer = AutoTokenizer.from_pretrained(
	"richardr1126/deberta-v3-large-zeroshot-v2.0-ONNX"
	)

	# Patch the model's forward method to handle token_type_ids
	original_forward = model.forward
	def patched_forward(input_ids=None, attention_mask=None, token_type_ids=None, **kwargs):
	return original_forward(input_ids=input_ids, attention_mask=attention_mask, **kwargs)
	model.forward = patched_forward

	# Create zero-shot classification pipeline
	classifier = pipeline(
	"zero-shot-classification",
	model=model,
	tokenizer=tokenizer,
	device=-1 # CPU inference
	)

	# Define your labels
	labels = ["politics", "technology", "sports", "entertainment", "business"]

	# Classify text
	text = "Apple announced their new AI chip with impressive performance gains."
	result = classifier(
	text,
	candidate_labels=labels,
	hypothesis_template="This text is about {}",
	multi_label=True # Enable multi-label classification
	)

	print(f"Text: {text}")
	for label, score in zip(result['labels'], result['scores']):
	print(f" {label}: {score:.2%}")
	```

	### Using Regular ONNX Model

	For the non-quantized model (larger but potentially slightly more accurate):

	```python
	model = ORTModelForSequenceClassification.from_pretrained(
	"richardr1126/deberta-v3-large-zeroshot-v2.0-ONNX",
	file_name="model.onnx"
	)
	# ... rest of the code is the same
	```

	## Performance

	The quantized model provides:
	- Faster inference: ~2-3x speedup compared to PyTorch
	- Smaller size: Reduced model size due to INT8 quantization
	- Maintained accuracy: Minimal accuracy loss (<1%) compared to the original model

	## Original Model

	This is an optimized version of the original model:
	- Base Model: [MoritzLaurer/deberta-v3-large-zeroshot-v2.0](https://huggingface.co/MoritzLaurer/deberta-v3-large-zeroshot-v2.0)
	- Architecture: DeBERTa-v3-large
	- Task: Zero-shot classification / NLI

	## Optimization Details

	- Export: Converted from PyTorch to ONNX format
	- Quantization: Dynamic quantization with INT8 weights
	- Framework: ONNX Runtime with Optimum

	## License

	Same as the base model - MIT License

	## Citation

	If you use this model, please cite the original model:

	```bibtex
	@misc{laurer2022deberta,
	author = {Laurer, Moritz and Atteveldt, Wouter van and Casas, Andreu Salleras and Welbers, Kasper},
	title = {DeBERTa-v3-large Zero-Shot Classification},
	year = {2022},
	publisher = {Hugging Face},
	url = {https://huggingface.co/MoritzLaurer/deberta-v3-large-zeroshot-v2.0}
	}
	```

	## Acknowledgments

	This ONNX optimization was created for efficient deployment in production environments. Special thanks to the original model authors and the Hugging Face Optimum team.