Text Generation
Transformers
Safetensors
step3p5
conversational
custom_code
Eval Results

llama.cpp

#8
by PotatoSniffer - opened

Please open pull request in official llama.cpp repo

Note that the current llama.cpp fork doesn't seem to work, at least not on apple silicon: https://huggingface.co/stepfun-ai/Step-3.5-Flash-Int4/discussions/2 (nevermind, my download was corrupted)

Note that the current llama.cpp fork doesn't seem to work, at least not on apple silicon: https://huggingface.co/stepfun-ai/Step-3.5-Flash-Int4/discussions/2 (nevermind, my download was corrupted)

mine 's corrupted too.

For anyone who finds this in the future, llama.cpp has been updated to support Step 3.5 Flash, and the implementation is currently working on the latest version of llama.cpp as of writing. Tested on Apple Silicon at q8 quantization and seems to be working well initially!

https://github.com/ggml-org/llama.cpp/pull/19283

Sign up or log in to comment