MLX optimization

#8
by do-me - opened

Hey thanks a lot for this dev effort - awesome to see models with new architectures performing this well!
I saw that you used a Mac in the benchmarks. Are you by chance interested in optimizing the model for MLX?
There is https://huggingface.co/mlx-community but jamba architecture is not yet supported. Would you be interested in contributing to https://github.com/ml-explore/mlx-lm?
I guess there could be significant speed-ups.

Personally I am currently processing large text dumps for a research prioject and looking for a model that is:

  • small & fast
  • with high quality outputs
  • supporting long-contexts
  • high batch throughput
  • optimized for MLX

So far you model ticks most bullets for me me. If you optimized it for MLX, I could use batch generation easily (https://github.com/ml-explore/mlx-lm/blob/main/mlx_lm/examples/batch_generate_response.py).

Best
Dominik

Update: opened this issue in mlx-lm https://github.com/ml-explore/mlx-lm/issues/551

Hey Dominik,

I created a PR last week, I just finished it so it will be merged soon.

Cheers
Gökdeniz

Awesome, thanks a lot!

the PR has been merged, the next MLX-LM release will support Jamba.

Sign up or log in to comment