Request: GGUF / quantized weights for Intern-S1-Pro

by gileneo - opened 14 days ago

14 days ago

Hi InternLM team & community,

Is there any plan to release GGUF or other quantized weights (INT4/INT8, AWQ, GPTQ, MLX) for Intern-S1-Pro?

Given that Intern-S1-Pro is a MoE model (~1T total params, ~22B active), a GGUF quant with proper expert routing would make it much more accessible for local inference and research, especially on large-memory systems (e.g. Mac Studio 512 GB RAM version).

Even an experimental / research-grade GGUF (or guidance on recommended quantization settings for MoE experts) would be extremely valuable for the community.

Thanks a lot for the great work on InternLM models!

JDWarner

10 days ago

Agree with quants in general, especially at 4-8 bits. However, another angle which I would love to see explored would be the use of many narrow corpuses to REAP this model by paring down to the experts which are relevant for a given domain.

Mathematics. Physics. Chemistry. Civil Engineering. Different medical subspecialties. I hypothesize that a model like this would work amazingly well as a base for an entire family of subject matter expert agents-on-demand.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment