model with thinking?
Models without thinking can only handle simple tasks.
I have never seen any Qwen Coder Thinking. What you need is an agentic tool like sequential thinking instead of reasoning.
Models without thinking can only handle simple tasks.
Thinking is just a way for a model to explore options via random paths. You already get that in an agentic tool, because it tries things, compiles, and if it fails, it can try something else, AND it gets feedback from the compiler, your tests, etc. More than that: it's been trained on the thinking of real developers, who fixed real bugs. Thinking would still add something, but mostly wastes context and adds latency before the real response/action, if the model is trained well and running in a good framework.
Models without thinking can only handle simple tasks.
Thinking is just a way for a model to explore options via random paths. You already get that in an agentic tool, because it tries things, compiles, and if it fails, it can try something else, AND it gets feedback from the compiler, your tests, etc. More than that: it's been trained on the thinking of real developers, who fixed real bugs. Thinking would still add something, but mostly wastes context and adds latency before the real response/action, if the model is trained well and running in a good framework.
Your point is understandable, suggesting that in well-trained models and efficient frameworks, chain-of-thought (thinking) may increase latency and consume resources. However, I would like to argue against this from the following perspectives:
The essence of thinking is not random exploration
Human developers’ thinking is not a purely random search of paths but a directed convergence process based on logical reasoning, abstract induction, and causal judgment. If properly designed, a model's "chain-of-thought" can simulate this structured reasoning rather than merely trial and error. For example, when solving complex bugs, developers first hypothesize the root cause and then systematically verify it, rather than blindly trying every compilation path.
Chain-of-thought reduces trial-and-error costs
Even though agent tools can iterate through compilation feedback, in real-world scenarios (such as production environment deployments or hardware testing), each trial may come with high costs. An internal thinking process allows the model to conduct virtual reasoning before "taking action," thereby eliminating invalid paths in advance, which can actually lower overall costs.
Training data cannot cover all long-tail problems
While models have indeed learned from developers' historical experiences, novel problems often require combining known knowledge, making cross-domain analogies, or creatively deconstructing issues. Chain-of-thought can explicitly construct reasoning steps, helping models break free from the limitations of data distribution rather than merely repeating historical patterns.
The value of interpretability and controllability
The thinking process provides opportunities for human supervision and intervention. If a model directly outputs actions, errors may be harder to trace. Explicit reasoning steps allow developers to make corrections at critical junctures, which is essential for high-reliability scenarios (such as medical or financial code).
Trade-off between latency and effectiveness
Although chain-of-thought increases single-response latency, it may significantly reduce the number of action iterations in complex tasks. For instance, generating the correct solution through multi-step reasoning in one go can save more time than multiple fast but erroneous attempts.
Yes, I was being sloppy by using the term "random". Obviously it's within constraints, like "what might work here; let's try this patch and recompile, and see what the compiler says." Even non-thinking LLMs basically work by searching the space of human knowledge constrained by human language's grammar rules, I think.
I should clarify: I do want a reasoning version of this too (and I'm sure it's on the way), but I don't think it will be better for agentic development tools that provide an automated chat history of coding attempts, because that amounts to the same thing, only with more targetted compiler/tool interactions.
It looks like the model can coherently reason to some degree if you instruct it to do so in the prompt. Lots of room for exploring here, but at first glance it does seem fully steerable in the sense that you could instruct the model to run tools, and tests inside the <thinking> tags. It's probably different for whichever UI you use. For open-webui I basically told it to use thinking tags.
Joe might be right that it may not be better. Cool, yes, but code quality so far seems to be better without it. Might have more benefit for creative work though. (actually not sure about this, code quality actually isn't bad at all after some more testing and prompt adjustments. I also dropped the temp to 0.6)
EDIT: it kinda seems like a normal reasoning model?
When giving the model tricky riddles, it behaves exactly like a reasoning model would without any further prompting or changes to the template:
Example snippet from reasoning:
The user is asking: "what is the nineteenth letter in this sentence"
The sentence in question is: "what is the nineteenth letter in this sentence"
Let me count the letters (ignoring spaces and punctuation, just counting alphabetic characters):
w-h-a-t-i-s-t-h-e-n-i-n-e-t-e-e-n-t-h-l-e-t-t-e-r-i-n-t-h-i-s-s-e-n-t-e-n-c-e
Let me count carefully:
w
h
a
t
i
s
t
h
e
n
i
n
e
t
e
e
n
t
h
l
e
t
t
e
r
i
n
t
h
i
s
s
e
n
t
e
n
c
e
So the 19th letter is "h".
Wait, I need to verify this more carefully by listing each character position:
"what is the nineteenth letter in this sentence"
Let me write it out and count each character (including spaces? Usually when asking for "letter", we mean alphabetic characters only, but sometimes people count all characters. Let me check both interpretations):
... (a bunch more counting and spelling)
Characters with positions (including spaces and punctuation):
... (a bunch of counting and spelling)
So the 19th letter is the first letter of "letter", which is "l".
Wait, I'm getting confused. Let me just write out the entire sentence letter by letter and number them:
"what is the nineteenth letter in this sentence"
Here's what I added to the system message:
### IMPORTANT: Response Format
Always respond in this specific format. Always wrap your reasoning steps inside <thinking>tags</thinking>
not sure if/how it impacts performance yet; but it for sure is a thing, that seems to be working.
