| library_name: optimum | |
| tags: | |
| - onnx | |
| - quantized | |
| - int8 | |
| - intent-classification | |
| base_model: rbojja/intent-classification-small | |
| # Intent Classification ONNX Quantized | |
| Quantized ONNX version for fast inference. | |
| ## Usage | |
| ```python | |
| from optimum.onnxruntime import ORTModelForFeatureExtraction | |
| from transformers import AutoTokenizer | |
| model = ORTModelForFeatureExtraction.from_pretrained("pythn/intent-classification-onnx-quantized") | |
| tokenizer = AutoTokenizer.from_pretrained("pythn/intent-classification-onnx-quantized") | |
| text = "I want to book a flight" | |
| inputs = tokenizer(text, return_tensors="pt") | |
| outputs = model(**inputs) | |
| ``` | |
| ## Performance | |
| - ~4x smaller size | |
| - 2-4x faster inference | |
| - Minimal accuracy loss | |