--- language: multilingual license: mit tags: - zero-shot-classification - nli - onnx - optimized - deberta-v3 base_model: MoritzLaurer/deberta-v3-large-zeroshot-v2.0 --- # DeBERTa-v3-large Zero-Shot Classification - ONNX This is an ONNX-optimized version of [`MoritzLaurer/deberta-v3-large-zeroshot-v2.0`](https://huggingface.co/MoritzLaurer/deberta-v3-large-zeroshot-v2.0) for efficient inference. ## Model Description This repository contains: - **model.onnx**: Regular ONNX exported model - **model_quantized.onnx**: INT8 dynamically quantized model for faster inference with minimal accuracy loss The model is optimized for zero-shot classification tasks across multiple languages. ## Usage ### Zero-Shot Classification Pipeline (Recommended) ```python from transformers import pipeline, AutoTokenizer from optimum.onnxruntime import ORTModelForSequenceClassification # Load the quantized model model = ORTModelForSequenceClassification.from_pretrained( "richardr1126/deberta-v3-large-zeroshot-v2.0-ONNX", file_name="model_quantized.onnx" ) tokenizer = AutoTokenizer.from_pretrained( "richardr1126/deberta-v3-large-zeroshot-v2.0-ONNX" ) # Patch the model's forward method to handle token_type_ids original_forward = model.forward def patched_forward(input_ids=None, attention_mask=None, token_type_ids=None, **kwargs): return original_forward(input_ids=input_ids, attention_mask=attention_mask, **kwargs) model.forward = patched_forward # Create zero-shot classification pipeline classifier = pipeline( "zero-shot-classification", model=model, tokenizer=tokenizer, device=-1 # CPU inference ) # Define your labels labels = ["politics", "technology", "sports", "entertainment", "business"] # Classify text text = "Apple announced their new AI chip with impressive performance gains." result = classifier( text, candidate_labels=labels, hypothesis_template="This text is about {}", multi_label=True # Enable multi-label classification ) print(f"Text: {text}") for label, score in zip(result['labels'], result['scores']): print(f" {label}: {score:.2%}") ``` ### Using Regular ONNX Model For the non-quantized model (larger but potentially slightly more accurate): ```python model = ORTModelForSequenceClassification.from_pretrained( "richardr1126/deberta-v3-large-zeroshot-v2.0-ONNX", file_name="model.onnx" ) # ... rest of the code is the same ``` ## Performance The quantized model provides: - **Faster inference**: ~2-3x speedup compared to PyTorch - **Smaller size**: Reduced model size due to INT8 quantization - **Maintained accuracy**: Minimal accuracy loss (<1%) compared to the original model ## Original Model This is an optimized version of the original model: - **Base Model**: [MoritzLaurer/deberta-v3-large-zeroshot-v2.0](https://huggingface.co/MoritzLaurer/deberta-v3-large-zeroshot-v2.0) - **Architecture**: DeBERTa-v3-large - **Task**: Zero-shot classification / NLI ## Optimization Details - **Export**: Converted from PyTorch to ONNX format - **Quantization**: Dynamic quantization with INT8 weights - **Framework**: ONNX Runtime with Optimum ## License Same as the base model - MIT License ## Citation If you use this model, please cite the original model: ```bibtex @misc{laurer2022deberta, author = {Laurer, Moritz and Atteveldt, Wouter van and Casas, Andreu Salleras and Welbers, Kasper}, title = {DeBERTa-v3-large Zero-Shot Classification}, year = {2022}, publisher = {Hugging Face}, url = {https://huggingface.co/MoritzLaurer/deberta-v3-large-zeroshot-v2.0} } ``` ## Acknowledgments This ONNX optimization was created for efficient deployment in production environments. Special thanks to the original model authors and the Hugging Face Optimum team.