Deepseek architecture?
I don't think this model is DeepSeek architecture, as far as I can tell it's the same as GLM 4.7 which is still based on the architecture used since GLM 4.5
https://huggingface.co/zai-org/GLM-4.7-Flash?inference_provider=zai-org
The link to the technical document is to GLM 4.5
Does the model work after quantizing it as DeepSeek, is it producing coherent output?
ive had to do the same thing before quants came out, it produces coherent output but i had one single output which was incoherent (kept rambling without stop)
someone else posted and is having issues, might need llama.cpp native support
Agree this conversion is broken (non-stop rambling).
Interestingly it notices that something is wrong with stuff like:
Correction: The input is extremely messy.
Not DeepSeek, but this is in order to borrow the MLA attention thing. Needs more support on lcpp side.
Unsloth are also using the deepseek2 architecture...
Not DeepSeek, but this is in order to borrow the MLA attention thing. Needs more support on lcpp side.
In that case shouldn't we wait for llama.cpp to provide support for it and then GGUF them correctly? I mean using the deepseek architecture might get them working but will it get them working at full functionality / quality?
Not DeepSeek, but this is in order to borrow the MLA attention thing. Needs more support on lcpp side.
In that case shouldn't we wait for llama.cpp to provide support for it and then GGUF them correctly? I mean using the deepseek architecture might get them working but will it get them working at full functionality / quality?
From all we know FA is broken at that time