Model "thinks" for too long

#12
by Moisha1985 - opened

Serving the model via vllm. Gen speed is at around 70 tps
Even simple "Hi" takes around 10-20 seconds to for thinking.
Am I doing something wrong?

image

they acknowledged its overthinking issue and they said they are working to fix it without the loss of performance . i hope its not going to take long cause it seems like the overthinking issue makes it sometimes unusable , specially in some riddles or problems it took over 25 minutes thinking which is overkill .

Hope they fix it soon, the performance is actually good for 3b model and I really like to use it but this issue needs to be resolved!

Let’s join first; winning is just a matter of time.

Did you try these parameters --temp 0.6 --top-p 0.95 --top-k 40 --min-p 0.01
From my observations, these might be best, and may help with thinking time

Did you try these parameters --temp 0.6 --top-p 0.95 --top-k 40 --min-p 0.01
From my observations, these might be best, and may help with thinking time

I’ve tried first two but not the last two. I’ll give it a try. Thanks

Sign up or log in to comment