FastKV: Decoupling of Context Reduction and KV Cache Compression for Prefill-Decoding Acceleration Paper • 2502.01068 • Published Feb 3, 2025 • 18 • 2
Mixture of Scales: Memory-Efficient Token-Adaptive Binarization for Large Language Models Paper • 2406.12311 • Published Jun 18, 2024 • 8 • 1