MLX optimization
Hey thanks a lot for this dev effort - awesome to see models with new architectures performing this well!
I saw that you used a Mac in the benchmarks. Are you by chance interested in optimizing the model for MLX?
There is https://huggingface.co/mlx-community but jamba architecture is not yet supported. Would you be interested in contributing to https://github.com/ml-explore/mlx-lm?
I guess there could be significant speed-ups.
Personally I am currently processing large text dumps for a research prioject and looking for a model that is:
- small & fast
- with high quality outputs
- supporting long-contexts
- high batch throughput
- optimized for MLX
So far you model ticks most bullets for me me. If you optimized it for MLX, I could use batch generation easily (https://github.com/ml-explore/mlx-lm/blob/main/mlx_lm/examples/batch_generate_response.py).
Best
Dominik
Update: opened this issue in mlx-lm https://github.com/ml-explore/mlx-lm/issues/551
Awesome, thanks a lot!
the PR has been merged, the next MLX-LM release will support Jamba.