SGLang deploy commands
Could you please share recommended SGLang deploy commands, I currently use a rtx 5090 and a pro 6000. If all goes well, I might jump from a 4B model to 8B model with data-parallel pipeline of 2.
I’m not sure whether sglang supports deployment for this yet, but we’ve used vLLM and it does work.
You can refer to this example for details: https://huggingface.co/Qwen/Qwen3-Embedding-8B#vllm-usage
What ran well for me.
python -m sglang.launch_server
--model-path Octen/Octen-Embedding-0.6B
--host 0.0.0.0
--port 5000
--is-embedding
--enable-trace
--enable-metrics
--otlp-traces-endpoint 0.0.0.0:4317
--mem-fraction-static 0.20
--log-requests
--show-time-cost
--data-parallel-size 2
--load-balance-method auto
--max-running-requests 64