Disabling/Reducing model reasoning

#22

by Abdallah1997 - opened 13 days ago

Abdallah1997

13 days ago

I have Important CoT prompts that guide the llm how to think. Using them is leading to latency and large token output, I'd like to reduce the internal model reasoning for those reasons.

Abdallah1997 changed discussion title from Disabling/Reducing reasoning to Disabling/Reducing model reasoning 13 days ago

bobzhuyb

StepFun org 12 days ago

We hear the ask. You are not alone. We will add it in the next version

LagOps

9 days ago

ideally there would also be a non-thinking version or a non-thinking switch to keep the model responsive for local usage on consumer hardware or when latency is key to the application (such as using tex to speech to have a conversation etc.)

ortegaalfredo

8 days ago

•

edited 7 days ago

I have a non-thinking version of it here:

https://www.neuroengine.ai/Neuroengine-Large

Just add </think> at the beggining of the assistant section, like this:

<im_start|>assistant\n</think>

And it will stop reasoning pretty much every time.

Anaya3D

5 days ago

•

edited 5 days ago

Make a custom Jinja Template:

https://huggingface.co/stepfun-ai/Step-3.5-Flash/blob/main/chat_template.jinja

Replacing the last lines with:

{%- if add_generation_prompt %}
    {{- '<|im_start|>assistant\n<think>\n</think>\n' }}
{%- endif %}

Add to llama.cpp with

--chat-template-file jinja.tmpl

Abdallah1997

about 15 hours ago

@ortegaalfredo
I have not been able to reproduce it using https://api.stepfun.ai/v1
I tried what you said, but it doesn't seem to be working for me

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment