thinking tag is not working in llama.cpp
#3
by
gopi87
- opened
hi guys anyone knows why the thinking tag is not working anymore ?
for the record i am running like this
CUDA_VISIBLE_DEVICES="0" ./bin/llama-server
--model "/home/gopi/deepresearch-ui/model/PrimeIntellect_INTELLECT-3-IQ1_M.gguf"
--override-tensor "._exps..=CPU"
--override-tensor "..ffn_._exps.*=CPU"
--ctx-size 10000
-ngl 99
--host 127.0.0.1
--jinja
--temp 0.8
--top-p 1.0
--port 8080
My model's response also lacks the label .
The GLM-4.6 chat template with llama.cpp supports thinking, while the Qwen-Coder chat template uses the correct tool calling format.
You can replace the tool calling in the GLM-4.6 chat template to have a blend of both.
Edit: This approach isn't quite right, but it may be a start