thinking tag is not working in llama.cpp

by gopi87 - opened 5 days ago

Discussion

gopi87

5 days ago

hi guys anyone knows why the thinking tag is not working anymore ?

gopi87

5 days ago

for the record i am running like this

CUDA_VISIBLE_DEVICES="0" ./bin/llama-server
--model "/home/gopi/deepresearch-ui/model/PrimeIntellect_INTELLECT-3-IQ1_M.gguf"
--override-tensor "._exps..=CPU"
--override-tensor "..ffn_._exps.*=CPU"
--ctx-size 10000
-ngl 99
--host 127.0.0.1
--jinja
--temp 0.8
--top-p 1.0
--port 8080

xldistance

5 days ago

My model's response also lacks the label .

easiest-ai-shawn

3 days ago

•

edited 3 days ago

The GLM-4.6 chat template with llama.cpp supports thinking, while the Qwen-Coder chat template uses the correct tool calling format.
You can replace the tool calling in the GLM-4.6 chat template to have a blend of both.

Edit: This approach isn't quite right, but it may be a start

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment