will there be a smaller version?
A huge, powerful model is great, but what about smaller models for local use?
I'm sure folks will create REAP/REAM versions of this model if they haven't already uploaded them here somewhere. With that said, I doubt you'll get REAP/REAM+Quants that will fit in something like 32gigs of VMEM without some terrible loss in competency. TBH, I expect GPUs on the consumer market to eventually catch up to the "need" of something like this. High end workstation cards are already close to running a REAP/REAM+Quant of this model.
I know that doesn't directly answer your question. I also expect qwen might release smaller 3.5 versions of this architecture, but I doubt they'll have the same competency of this model. As with many previous models, they release the large model, then distill it to create smaller models. This isn't cheap. I'm just thankful Qwen does it. If they DON'T create smaller versions/distillations of this model, I'm sure someone else will.
@RecViking I just bought an rtx pro 6000. at least following the current memory pricing and market, not holding my breath that even with a new gen of gpu from team green that that will mean more vram... one can hope however.