MLX optimization

by do-me - opened Oct 13

Oct 13

•

Hey thanks a lot for this dev effort - awesome to see models with new architectures performing this well!
I saw that you used a Mac in the benchmarks. Are you by chance interested in optimizing the model for MLX?
There is https://huggingface.co/mlx-community but jamba architecture is not yet supported. Would you be interested in contributing to https://github.com/ml-explore/mlx-lm?
I guess there could be significant speed-ups.

Personally I am currently processing large text dumps for a research prioject and looking for a model that is:

small & fast
with high quality outputs
supporting long-contexts
high batch throughput
optimized for MLX

So far you model ticks most bullets for me me. If you optimized it for MLX, I could use batch generation easily (https://github.com/ml-explore/mlx-lm/blob/main/mlx_lm/examples/batch_generate_response.py).

Best
Dominik

Update: opened this issue in mlx-lm https://github.com/ml-explore/mlx-lm/issues/551

Goekdeniz-Guelmez

Oct 14

•

edited Oct 14

Hey Dominik,

I created a PR last week, I just finished it so it will be merged soon.

Cheers
Gökdeniz

do-me

Oct 14

Awesome, thanks a lot!

Goekdeniz-Guelmez

Oct 15

the PR has been merged, the next MLX-LM release will support Jamba.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment