22 15 103

Haihao Shen

Haihao

https://github.com/intel/auto-round

AI & ML interests

LLM quantization, sparsity, and acceleration

Recent Activity

authored a paper 2 days ago

SignRoundV2: Closing the Performance Gap in Extremely Low-Bit Post-Training Quantization for LLMs

commented on a paper 2 days ago

SignRoundV2: Closing the Performance Gap in Extremely Low-Bit Post-Training Quantization for LLMs

upvoted a paper 2 days ago

SignRoundV2: Closing the Performance Gap in Extremely Low-Bit Post-Training Quantization for LLMs

View all activity

Organizations

upvoted a paper 2 days ago

SignRoundV2: Closing the Performance Gap in Extremely Low-Bit Post-Training Quantization for LLMs

Paper • 2512.04746 • Published 3 days ago • 8

upvoted an article 7 months ago

Article

Introducing AutoRound: Intel’s Advanced Quantization for LLMs and VLMs

Apr 29

•

upvoted a paper 9 months ago

Faster Inference of LLMs using FP8 on the Intel Gaudi

Paper • 2503.09975 • Published Mar 13 • 1

upvoted a collection 9 months ago

DeepSeek

Collection

22 items • Updated Jun 1 • 3

upvoted a paper 11 months ago

Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models

Paper • 2501.11873 • Published Jan 21 • 66

upvoted 2 papers about 1 year ago

A dynamic parallel method for performance optimization on hybrid CPUs

Paper • 2411.19542 • Published Nov 29, 2024 • 5

Effective Quantization for Diffusion Models on CPUs

Paper • 2311.16133 • Published Nov 2, 2023 • 4

upvoted 2 articles over 1 year ago

Article

Building Cost-Efficient Enterprise RAG applications with Intel Gaudi 2 and Intel Xeon

May 9, 2024

•

Article

Accelerate StarCoder with 🤗 Optimum Intel on Xeon: Q8/Q4 and Speculative Decoding

Jan 30, 2024

•

upvoted a collection almost 2 years ago

Intel Neural Chat

Collection

Fine-tuned 7B parameter LLM models, one of which made it to the top of the 7B HF LLM Leaderboard • 15 items • Updated Aug 23, 2024 • 2

upvoted 4 papers about 2 years ago

Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs

Paper • 2309.05516 • Published Sep 11, 2023 • 10

upvoted a paper over 2 years ago

An Efficient Sparse Inference Software Accelerator for Transformer-based Language Models on CPUs

Paper • 2306.16601 • Published Jun 28, 2023 • 4

Haihao Shen

AI & ML interests

Recent Activity

Organizations

Haihao's activity

Introducing AutoRound: Intel’s Advanced Quantization for LLMs and VLMs

Building Cost-Efficient Enterprise RAG applications with Intel Gaudi 2 and Intel Xeon

Accelerate StarCoder with 🤗 Optimum Intel on Xeon: Q8/Q4 and Speculative Decoding