pythn
/

intent-classification-onnx-quantized

intent-classification

Model card Files Files and versions

intent-classification-onnx-quantized / README.md

pythn's picture

Upload quantized ONNX model

ee51b8f verified 4 months ago

|

history blame contribute delete

721 Bytes

	---
	library_name: optimum
	tags:
	- onnx
	- quantized
	- int8
	- intent-classification
	base_model: rbojja/intent-classification-small
	---

	# Intent Classification ONNX Quantized

	Quantized ONNX version for fast inference.

	## Usage

	```python
	from optimum.onnxruntime import ORTModelForFeatureExtraction
	from transformers import AutoTokenizer

	model = ORTModelForFeatureExtraction.from_pretrained("pythn/intent-classification-onnx-quantized")
	tokenizer = AutoTokenizer.from_pretrained("pythn/intent-classification-onnx-quantized")

	text = "I want to book a flight"
	inputs = tokenizer(text, return_tensors="pt")
	outputs = model(**inputs)
	```

	## Performance
	- ~4x smaller size
	- 2-4x faster inference
	- Minimal accuracy loss