8 tps on nVidia H200

#17

by svilen333 - opened 2 days ago

2 days ago

Hi, I am testing the model on 1 x nVidia H200 with latest vLLM, is it normal to get 8 tps using 128K context or I am doing something wrong?

malithh

2 days ago

Hi
That is not normal for sure, how many concurrent request are you doing?

svilen333

2 days ago

Only one request. Using the BF16 version.

malithh

2 days ago

•

edited 2 days ago

Yea then something is wrong, the auto calibrator might not have picked up the top_k and top parameters. Whats your input length and output length on test ?

svilen333

2 days ago

Input length 15 tokens, output is over 1000. Just gave task to code html+js simple task.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment