Fixed system prompt + updates
Hey guys, we resolved Devstral’s missing system prompt which Mistral forgot to add due to their different use-cases, and results should be significantly better.
The 24B model is updated; the 123B model is also fixed!
Is llama.cpp supposed to crash before finishing prompt processing with this model? :) (Tried a few llama.cpp versions, but using your UD-Q4_K_XL quant from ~15 hours before this post.) Just curious if the previous version of the quant is known to cause a crash so I should download this version even if I don't use tools or anything. (I tried it at least 3 times, and it crashed after prompt processing roughly 6k tokens out of my ~10k prompt.)
Edit: Not sure about crashes, but after I made this comment, there was a pull request for llama.cpp merged that improved long context handling: https://github.com/ggml-org/llama.cpp/pull/17945#pullrequestreview-3571544856