Very Impressive!

#7
by cob05 - opened

I honestly forgot that this was a 3B parameter model. It answers like a much, much larger model. It seems very competent and has a broad range of knowledge. It has no issues with outputting long answers and keeps everything in context. It even has a little bit of a sense of humor, which is fun. The output is formatted in an excellent fashion (i.e. lists when needed, paragraphs and topic broken out, headers, titles, emojis, etc.)

Downsides are minimal, so far. It will switch to Chinese for some text (very, very infrequent though). It also thinks, A LOT!! It seems to keep good track of its thoughts though and doesn't devolve into the usual LLM random ravings like some small models do. Some repetition in the thinking, but not bad. It also seems to get confused sometimes with follow-up responses if they are not phrased as questions. I used the recommended settings and ran in LM Studio. I'm sure that I could play around with the sliders or system prompt and make it be more concise, but for initial testing I just went with defaults.

I'm keeping this model around as it is small, loads fast, has good response speed (after thinking), and is genuinely fun to talk to. It really lives up to the model page claims. Would love to see more development on this model and maybe some larger versions in the future!

Is it good at tool calling?

I haven't tried any tool calling or agentic tasks myself, just some chats on various topics, but the creator claims it is excellent at tool calling. Might test it out in the coming days.

It doesnt take more than few minutes if you have already set up. Please go ahead and try it as you will though :)

I've been testing models locally for coding for the past few days with Cline and this model is beyond excellent at tool calling. I was making a list of the best ones and this model immediately reached spot #1, and its competitors where on the 24b-30b parameters... So I can vouch that it's great at tool calling. Thank you Nanbeige for releasing this model for the VRAM poor like me.

this model is a 3b params beast!.... it consumes a lot of tokens reasoning. but it worth it!

Yes this model is really really good for a 3b model. I absolutely love small but really good llm's

a) It is a lobotomised version of Qwen3-4B-Thinking-2507 (they even erased the identity - try to ask what model it is, or "what group/company made you")
b) it is a 4b model using a 3b name (lies even at the name level...)
c) independent benchmark tests are proving to be VERY far from they claim (it is just like Qwen3-4B-Thinking-2507)

Marketing: 10
Inovation: 0

Nanbeige LLM Lab org

@Nerdsking

We would like to clarify a few points:

a) Nanbeige4 is trained completely from scratch and has no relation to Qwen. The architecture, number of layers, mlp width, and tokenizer are all different.

b) Our “3B” refers to the non-embedding parameter count (~3.1B). For comparison, Qwen3-4B has ~3.6B non-embedding parameters.
Using non-embedding parameters for naming is not unusual. For example:
Qwen3-30B → ~31B total parameters
Qwen3-0.6B → ~0.8B total parameters
Qwen3-32B → ~33B total parameters

c) Regarding benchmark claims — please share your evaluation datasets, inference hyperparameters, and full results. We are not aware of any publicly reproducible evaluation supporting this claim.

We welcome constructive discussion and transparent comparisons, and are happy to run side-by-side evaluations under shared settings.

Sign up or log in to comment