No default RoPE scaling for long context?
#4
by
TomLucidor
- opened
Sorry for asking, but the config seems to be only supporting 32K https://huggingface.co/inclusionAI/Ring-mini-sparse-2.0-exp/blob/main/config.json
When exceeding a sequence length of 32K, the model needs to enable YaRN. You can refer to SGLang's YaRN configuration and add the following to config.json:
"rope_scaling": {
"factor": 4.0,
"original_max_position_embeddings": 32768,
"rope_type": "yarn"
}
Any setup recommendations for vLLM instead of SGLang?