Request: GGUF / quantized weights for Intern-S1-Pro
Hi InternLM team & community,
Is there any plan to release GGUF or other quantized weights (INT4/INT8, AWQ, GPTQ, MLX) for Intern-S1-Pro?
Given that Intern-S1-Pro is a MoE model (~1T total params, ~22B active), a GGUF quant with proper expert routing would make it much more accessible for local inference and research, especially on large-memory systems (e.g. Mac Studio 512 GB RAM version).
Even an experimental / research-grade GGUF (or guidance on recommended quantization settings for MoE experts) would be extremely valuable for the community.
Thanks a lot for the great work on InternLM models!
Agree with quants in general, especially at 4-8 bits. However, another angle which I would love to see explored would be the use of many narrow corpuses to REAP this model by paring down to the experts which are relevant for a given domain.
Mathematics. Physics. Chemistry. Civil Engineering. Different medical subspecialties. I hypothesize that a model like this would work amazingly well as a base for an entire family of subject matter expert agents-on-demand.