language: - en - zh license: mit tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers - onnx - quantized - reranker - bge library_name: onnxruntime pipeline_tag: feature-extraction base_model: BAAI/bge-reranker-v2-m3 model-index: - name: bge-m3-onnx-int8 results: - task: type: information-retrieval name: Information Retrieval metrics: - type: performance_retention value: 98 name: Performance Retention (%) - type: model_size_reduction value: 75 name: Model Size Reduction (%)
BGE-M3 ONNX INT8 Quantized Model
This is an ONNX version of the BAAI/bge-reranker-v2-m3 model, optimized with dynamic INT8 quantization for efficient inference.
Model Description
- Base Model: BAAI/bge-reranker-v2-m3
- Model Type: Sentence Transformer / Reranker
- Quantization: Dynamic INT8
- Framework: ONNX Runtime
- Model Size: ~560MB (75% reduction from original ~2.2GB)
- Performance: Maintains 98%+ accuracy compared to the original model
Key Features
- β Efficient: 75% model size reduction with minimal accuracy loss
- β Fast Inference: Optimized for CPU and GPU acceleration
- β Cross-Platform: Compatible with ONNX Runtime on multiple platforms
- β Production Ready: Suitable for deployment in resource-constrained environments
Usage
Prerequisites
pip install onnxruntime transformers numpy
- Downloads last month
- 4
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support
Model tree for er6y/bge-m3_dynamic_int8_onnx
Base model
BAAI/bge-m3