language: - en - zh license: mit tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers - onnx - quantized - reranker - bge library_name: onnxruntime pipeline_tag: feature-extraction base_model: BAAI/bge-reranker-v2-m3 model-index: - name: bge-m3-onnx-int8 results: - task: type: information-retrieval name: Information Retrieval metrics: - type: performance_retention value: 98 name: Performance Retention (%) - type: model_size_reduction value: 75 name: Model Size Reduction (%)

BGE-M3 ONNX INT8 Quantized Model

This is an ONNX version of the BAAI/bge-reranker-v2-m3 model, optimized with dynamic INT8 quantization for efficient inference.

Model Description

  • Base Model: BAAI/bge-reranker-v2-m3
  • Model Type: Sentence Transformer / Reranker
  • Quantization: Dynamic INT8
  • Framework: ONNX Runtime
  • Model Size: ~560MB (75% reduction from original ~2.2GB)
  • Performance: Maintains 98%+ accuracy compared to the original model

Key Features

  • βœ… Efficient: 75% model size reduction with minimal accuracy loss
  • βœ… Fast Inference: Optimized for CPU and GPU acceleration
  • βœ… Cross-Platform: Compatible with ONNX Runtime on multiple platforms
  • βœ… Production Ready: Suitable for deployment in resource-constrained environments

Usage

Prerequisites

pip install onnxruntime transformers numpy
Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for er6y/bge-m3_dynamic_int8_onnx

Base model

BAAI/bge-m3
Quantized
(67)
this model