FP8-dynamic, FP8-block, NVFP4, INT4, INT8 versions of Qwen3-Next-80B-A3B-Instruct
Inference Optimization
community
AI & ML interests
None defined yet.
Recent Activity
View all activity
Collection on FP8 Quantization of Weights, Activations and KV Cache
-
inference-optimization/Llama-3.1-8B-Instruct-QKV-Cache-FP8-Per-Head
8B • Updated • 25 -
inference-optimization/Llama-3.1-8B-Instruct-QKV-Cache-FP8-Per-Tensor
8B • Updated • 9 -
inference-optimization/Llama-3.1-8B-Instruct-FP8-dynamic-QKV-Cache-FP8-Per-Head
8B • Updated • 10 -
inference-optimization/Llama-3.1-8B-Instruct-FP8-dynamic-QKV-Cache-FP8-Per-Tensor
8B • Updated • 4
Collection of Mixed Precision LLaMA and Qwen Models
-
inference-optimization/Llama-3.1-8B-Instruct-Mixed-NVFP4-FP8_BLOCK-out_proj-all
5B • Updated • 15 -
inference-optimization/Llama-3.1-8B-Instruct-Mixed-NVFP4-FP8_BLOCK-qkv_proj-all
5B • Updated • 14 -
inference-optimization/Llama-3.1-8B-Instruct-Mixed-NVFP4-FP8_BLOCK-down_proj-all
6B • Updated • 15 -
inference-optimization/Llama-3.1-8B-Instruct-Mixed-NVFP4-FP8_BLOCK-gate_up_proj-all
7B • Updated • 16
FP8-dynamic, FP8-block, NVFP4, INT4, INT8 versions of Qwen3-Next-80B-A3B-Instruct
Collection of Mixed Precision LLaMA and Qwen Models
-
inference-optimization/Llama-3.1-8B-Instruct-Mixed-NVFP4-FP8_BLOCK-out_proj-all
5B • Updated • 15 -
inference-optimization/Llama-3.1-8B-Instruct-Mixed-NVFP4-FP8_BLOCK-qkv_proj-all
5B • Updated • 14 -
inference-optimization/Llama-3.1-8B-Instruct-Mixed-NVFP4-FP8_BLOCK-down_proj-all
6B • Updated • 15 -
inference-optimization/Llama-3.1-8B-Instruct-Mixed-NVFP4-FP8_BLOCK-gate_up_proj-all
7B • Updated • 16
Collection on FP8 Quantization of Weights, Activations and KV Cache
-
inference-optimization/Llama-3.1-8B-Instruct-QKV-Cache-FP8-Per-Head
8B • Updated • 25 -
inference-optimization/Llama-3.1-8B-Instruct-QKV-Cache-FP8-Per-Tensor
8B • Updated • 9 -
inference-optimization/Llama-3.1-8B-Instruct-FP8-dynamic-QKV-Cache-FP8-Per-Head
8B • Updated • 10 -
inference-optimization/Llama-3.1-8B-Instruct-FP8-dynamic-QKV-Cache-FP8-Per-Tensor
8B • Updated • 4