SGLang deploy commands

#2
by vvekthkr - opened

Could you please share recommended SGLang deploy commands, I currently use a rtx 5090 and a pro 6000. If all goes well, I might jump from a 4B model to 8B model with data-parallel pipeline of 2.

Octen-Team org

I’m not sure whether sglang supports deployment for this yet, but we’ve used vLLM and it does work.

You can refer to this example for details: https://huggingface.co/Qwen/Qwen3-Embedding-8B#vllm-usage

What ran well for me.

python -m sglang.launch_server
--model-path Octen/Octen-Embedding-0.6B
--host 0.0.0.0
--port 5000
--is-embedding
--enable-trace
--enable-metrics
--otlp-traces-endpoint 0.0.0.0:4317
--mem-fraction-static 0.20
--log-requests
--show-time-cost
--data-parallel-size 2
--load-balance-method auto
--max-running-requests 64

vvekthkr changed discussion status to closed

Sign up or log in to comment