SignRoundV2: Closing the Performance Gap in Extremely Low-Bit Post-Training Quantization for LLMs Paper • 2512.04746 • Published 3 days ago • 8
view article Article Introducing AutoRound: Intel’s Advanced Quantization for LLMs and VLMs +7 Apr 29 • 43
Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models Paper • 2501.11873 • Published Jan 21 • 66
A dynamic parallel method for performance optimization on hybrid CPUs Paper • 2411.19542 • Published Nov 29, 2024 • 5
view article Article Building Cost-Efficient Enterprise RAG applications with Intel Gaudi 2 and Intel Xeon +6 May 9, 2024 • 12
view article Article Accelerate StarCoder with 🤗 Optimum Intel on Xeon: Q8/Q4 and Speculative Decoding +9 Jan 30, 2024 • 9
Intel Neural Chat Collection Fine-tuned 7B parameter LLM models, one of which made it to the top of the 7B HF LLM Leaderboard • 15 items • Updated Aug 23, 2024 • 2
TEQ: Trainable Equivalent Transformation for Quantization of LLMs Paper • 2310.10944 • Published Oct 17, 2023 • 10
Efficient Post-training Quantization with FP8 Formats Paper • 2309.14592 • Published Sep 26, 2023 • 11
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs Paper • 2309.05516 • Published Sep 11, 2023 • 10
An Efficient Sparse Inference Software Accelerator for Transformer-based Language Models on CPUs Paper • 2306.16601 • Published Jun 28, 2023 • 4