Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
vincentg64 
posted an update Apr 30
Post
463
Doing Better with Less: LLM 2.0 for Enterprise https://mltblog.com/4jMU9KB

Standard LLMs are trained to predict the next tokens or missing tokens. It requires deep neural networks (DNN) with billions or even trillions of tokens, as highlighted by Jensen Huang, CEO of Nvidia, in his keynote talk at the GTC conference earlier this year. Yet, 10 trillion tokens cover all possible string combinations; the vast majority of them is noise. After all, most people have a vocabulary of about 30k words. But this massive training is necessary to prevent DNNs from getting stuck in sub-optimal configurations due to vanishing gradient and other issues.

What if you could do with a million times less? With mere millions of tokens rather than trillions? Afterall, predicting the next token is a task remotely related to what modern LLMs do. Its history is tied to text auto-filling, guessing missing words, autocorrect and so on, developed initially for tools such as BERT. Now, it’s no different than training a plane to efficiently operate on the runway, but not to fly. It also entices LLM vendors to charge clients by token usage, with little regard to ROI.

Our approach is radically different. We do not use DNNs nor GPUs. It is as much different from standard AI than it is from classical NLP and machine learning. Its origins are similar to other tools that we built including NoGAN, our alternative to GAN for tabular data synthetization. NoGAN — a fast technology with no DNN — performs a lot faster with much better results, even in real-time. The output quality is assessed using our ground-breaking evaluation metric capturing important defects missed by all other benchmarking tools.

In this article, I highlight unique components of xLLM, our new architecture for enterprise.

Read full article at https://mltblog

post a working model. less talk, more facts!

·

There's plenty of open-source code on GitHub. Can't post a model here as it has a radically different architecture that HF does not support. No deep neural network, no transformers. So, my "model" is essentially this: "".

Here we go. I posted it! There is a reason I call it zero parameter.