SGLang deploy commands

by vvekthkr - opened 10 days ago

10 days ago

Could you please share recommended SGLang deploy commands, I currently use a rtx 5090 and a pro 6000. If all goes well, I might jump from a 4B model to 8B model with data-parallel pipeline of 2.

bflhc

Octen-Team org 10 days ago

I’m not sure whether sglang supports deployment for this yet, but we’ve used vLLM and it does work.

You can refer to this example for details: https://huggingface.co/Qwen/Qwen3-Embedding-8B#vllm-usage

vvekthkr

about 14 hours ago

What ran well for me.

python -m sglang.launch_server
--model-path Octen/Octen-Embedding-0.6B
--host 0.0.0.0
--port 5000
--is-embedding
--enable-trace
--enable-metrics
--otlp-traces-endpoint 0.0.0.0:4317
--mem-fraction-static 0.20
--log-requests
--show-time-cost
--data-parallel-size 2
--load-balance-method auto
--max-running-requests 64

vvekthkr changed discussion status to closed about 14 hours ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment