diff --git "a/dist/index.html" "b/dist/index.html" --- "a/dist/index.html" +++ "b/dist/index.html" @@ -48,17 +48,17 @@
The transformers library, built with PyTorch, supports all state-of-the-art LLMs, many VLMs, task-specific vision language models, video models, audio models, table models, classical encoders, to a global count of almost 400 models.
The name of the library itself is mostly majority driven as many models are not even transformers architectures, like Mamba, Zamba, RWKV, and convolution-based models.
-Regardless, each of these is wrought by the research and engineering team that created them, then harmonized into a now famous interface, and callable with a simple .from_pretrained command.
Inference works for all models, training is functional for most. The library is a foundation for many machine learning courses, cookbooks, and overall, several thousands other open-source libraries depend on it. All models are tested as part of a daily CI ensuring their preservation and reproducibility. Most importantly, it is open-source and has been written by the community for a large part. -This isn’t really to brag but to set the stakes: what does it take to keep such a ship afloat, made of so many moving, unrelated parts?
-The ML wave has not stopped, there’s more and more models being added, at a steadily growing rate. Transformers is widely used, and we read the feedback that users post online. Whether it’s about a function that had 300+ keyword arguments, duplicated code and helpers, and mentions of Copied from ... everywhere, along with optimisation concerns. Text-only models are relatively tamed, but multimodal models remain to be harmonized.
Here we will dissect what is the new design philosophy of transformers, as a continuation from the existing older philosophy page, and an accompanying blog post from 2022.
-More recently, and I recommend the read if it’s not done yet, a blog post about recent upgrades to transformers was written, explaining in particular what makes the library faster today.
-Some time ago I dare not say how long, we discussed with transformers maintainers about the state of features in transformers. A lot of recent developments were satisfactory, but if we were only talking about these, self-congratulation would be the only goalpost.
-Reflecting on this philosophy now, as models pile up, is essential and will drive new developments.
-The transformers library, built with PyTorch, supports all state-of-the-art LLMs, many VLMs, task-specific vision language models, video models, audio models, table models, classical encoders, to a global count of almost 400 models.
+The name of the library itself is mostly majority driven as many models are not even transformers architectures, like Mamba, Zamba, RWKV, and convolution-based models.
+Regardless, each of these is wrought by the research and engineering team that created them, then harmonized into a now famous interface, and callable with a simple .from_pretrained command.
+Inference works for all models, training is functional for most. The library is a foundation for many machine learning courses, cookbooks, and overall, several thousands other open-source libraries depend on it. All models are tested as part of a daily CI ensuring their preservation and reproducibility. Most importantly, it is open-source and has been written by the community for a large part.
+This isn’t really to brag but to set the stakes: what does it take to keep such a ship afloat, made of so many moving, unrelated parts?
+The ML wave has not stopped, there’s more and more models being added, at a steadily growing rate. Transformers is widely used, and we read the feedback that users post online. Whether it’s about a function that had 300+ keyword arguments, duplicated code and helpers, and mentions of Copied from ... everywhere, along with optimisation concerns. Text-only models are relatively tamed, but multimodal models remain to be harmonized.
+Here we will dissect what is the new design philosophy of transformers, as a continuation from the existing older philosophy page, and an accompanying blog post from 2022.
+More recently, and I recommend the read if it’s not done yet, a blog post about recent upgrades to transformers was written, explaining in particular what makes the library faster today.
+Some time ago I dare not say how long, we discussed with transformers maintainers about the state of features in transformers. A lot of recent developments were satisfactory, but if we were only talking about these, self-congratulation would be the only goalpost.
+Reflecting on this philosophy now, as models pile up, is essential and will drive new developments.
Every reader, whether an OSS maintainer, power user, or casual fine-tuner, will walk away knowing how to reason about the transformers code base, how to use it better, how to meaningfully contribute to it.
This will also showcase new features you might have missed so you’ll be up-to-date.
So, what are the principles of transformers? We will try to summarize the foundations on which we’ve built everything, and write the “tenets” of the library. They behave like software interfaces, hence it is crucial that they are explicitly written down. However opinionated they are, they have evolved over time.
You can use a script such as [[top_methods.py]] to look at all methods of a given name across your codebase and look at their differences and similarities, that’s what I did (+ a hash to avoid quadraticity).
+You can use a simple regex to look at all methods of a given name across your codebase and look at their differences and similarities, that’s what I did (+ a hash to avoid quadraticity).
So… why keep it in all modeling files? Because if we were to remove it, the model would not work anymore. Think of the modeling files as a car (I know, what a novel metaphor! But, it works out.). All manual transmission cars have a clutch, but we want each view of one of our cars to be able to function. Remove the clutch, you can’t drive. Remove the doors, might be uncomfortable but you’ll get there. So doors can go, but you have to keep the clutch, even though you know perfectly how it works.
-As I was looking for things to improve and make better, it’s one of the iterations I attempted: a function is almost everywhere the same, let’s import it from some common file? But no! Goes against
However, both of these works were already pointing at some drawbacks, which have been iteratively addressed. Transformers has gone modular , allowing a form of inheritance without breaking One model, One file. If you’re familiar with this, you can skip this section and go to the next one.
+It is opinionated, and it can be frustrating when you encounter an opinionated library. Our previous philosophy page, and the blog post were already pointing at some drawbacks, which have been iteratively addressed. Transformers has gone modular, allowing a form of inheritance without breaking One model, One file. If you’re familiar with this, you can skip this section and go to the next one.
We amended the principle of DRY* by removing progressively all pieces of code that were “copied from” another file.
It is explained in details in the documentation above, but overall it works like this, you define a modular_ file that can inherit from any function across all other modeling, configuration and processor files:
As you can see, we can now define any model as a modular of another. This isn’t strictly groundbreaking if you’ve done any programming, you might even think “well that’s just how inheritance works”. The crucial difference is that we do visibly what is essentially the compiler’s job: by unrolling the inheritances, we make visible all of the modeling code, keeping it all in one piece.
A chronological iteration over modular, and a big improvement in terms of readabilty, was to remove the various attention-backend-specific attention classes across the repository. Before, we were adding specific torch operations for each backend (sdpa, flash-attention iterations, flex attention) but it wasn’t a minimal user api.
What will forever stay in the modeling code is the eager_attention_forward because it is a core part of the modeling,
We often read and understand that kwargs are criticized, and we are typing them however we can, but we cannot enforce them all the time because other libraries such as vLLM don’'t use the same kwargs.
It is a strength of the new attention interface, where it can be plugged in various backends, because most of the signature is not enforced. We INFORM but do not ENFORCE. That way, the current system is a minimal user api.
-For better information, we plan to use python features such as Annotated for example, to inform users of what we expect typically in an argument. That way, higher-level information could be included directly in the type annotations.
For better information, we plan to use python features such as Annotated for example, to inform users of what we expect typically in an argument. That way, higher-level information could be included directly in the type annotations, like so (tentative design):
from typing import Annotated
+
+MyModelOutputAnnotated = Annotated[MyModelOutput, "shape: (B, C, H, W)"]
+
We want to touch minimally to the modeling code, and only modify it when architectural changes are involved. For instance, for tensor parallelism, we instead now specify a simple tp_plan.
It is written once in the config and passed to .from_pretrained().
Plus, this opened another angle of contribution for the community. People who are GPU whisperers can now contribute optimized kernels. You can check on the kernel community blog post to learn more about it!
+Even more resources have been added, like the formidable kernel builder with its connected resources to help you build kernels with it and with nix.
Now, we have a form of inheritance in our codebase. Some models become standards, and model contributors are given the opportunity to define standards. Pushing the boundaries of scientific knowledge can translate into the boundaries of engineering if this effort is made, and we’re striving for it.
-My capacity for abstraction is not that great, compared to other computer scientists and engineers: I need to look at little doodles and drawings, especially when components pile up.
-So I wanted to take a look at the current state of modularity across the repository. How many models are defined using components of others?
+Now, we have a form of inheritance in our codebase. Some models become standards, and model contributors are given the opportunity to define standards. Pushing the boundaries of scientific knowledge can translate into the boundaries of engineering if this effort is made, and we’re striving for it. +It’s hard to conceptualize very large libraries and how their components interact with each other, regardless of your cognitive abilities for abstractions. +So I wanted to take a look at the current state of modularity across the repository. How many models are defined using components of others?
To get this graph, I used the heuristic of modular inheritance.
modular file?So what do we see? Llama is a basis for many models, and it shows. Radically different architectures such as mamba have spawned their own dependency subgraph.
-
-
-
- Opens full app: Space page.
- 🔍 Modular-candidate explorer (live)
-
-
But there is no similar miracle for VLMs across the board. -As you can see, there is a small DETR island, a little llava pocket, and so on, but it’s not comparable to the centrality observed.
-One problem is, this is only for modular models. Several models do NOT have a modular file. In other words, we have a big “hidden space here.”
So I looked into Jaccard similarity, which we use to measure set differences. I know that code is more than a set of characters stringed together, but it is a correct proxy for now. You can check out [[find_dependencies.py]] .
-- -