Update README.md
Browse files
README.md
CHANGED
|
@@ -209,14 +209,16 @@ lm_eval --model hf --model_args pretrained=pytorch/Qwen3-4B-INT8-INT4 --tasks mm
|
|
| 209 |
We can run the quantized model on a mobile phone using [ExecuTorch](https://github.com/pytorch/executorch).
|
| 210 |
Once ExecuTorch is [set-up](https://pytorch.org/executorch/main/getting-started.html), exporting and running the model on device is a breeze.
|
| 211 |
|
| 212 |
-
|
|
|
|
| 213 |
The following script does this for you.
|
| 214 |
```Shell
|
| 215 |
python -m executorch.examples.models.qwen3.convert_weights $(hf download pytorch/Qwen3-4B-INT8-INT4) pytorch_model_converted.bin
|
| 216 |
```
|
| 217 |
|
| 218 |
-
Once the checkpoint
|
| 219 |
-
|
|
|
|
| 220 |
|
| 221 |
```Shell
|
| 222 |
python -m executorch.examples.models.llama.export_llama \
|
|
@@ -236,6 +238,8 @@ python -m executorch.examples.models.llama.export_llama \
|
|
| 236 |
|
| 237 |
After that you can run the model in a mobile app (see [Running in a mobile app](#running-in-a-mobile-app)).
|
| 238 |
|
|
|
|
|
|
|
| 239 |
# Paper: TorchAO: PyTorch-Native Training-to-Serving Model Optimization
|
| 240 |
The model's quantization is powered by **TorchAO**, a framework presented in the paper [TorchAO: PyTorch-Native Training-to-Serving Model Optimization](https://huggingface.co/papers/2507.16099).
|
| 241 |
|
|
|
|
| 209 |
We can run the quantized model on a mobile phone using [ExecuTorch](https://github.com/pytorch/executorch).
|
| 210 |
Once ExecuTorch is [set-up](https://pytorch.org/executorch/main/getting-started.html), exporting and running the model on device is a breeze.
|
| 211 |
|
| 212 |
+
ExecuTorch's LLM export scripts require the checkpoint keys and parameters have certain names, which differ from those used in Hugging Face.
|
| 213 |
+
So we first use a script that converts the Hugging Face checkpoint key names to ones that ExecuTorch expects:
|
| 214 |
The following script does this for you.
|
| 215 |
```Shell
|
| 216 |
python -m executorch.examples.models.qwen3.convert_weights $(hf download pytorch/Qwen3-4B-INT8-INT4) pytorch_model_converted.bin
|
| 217 |
```
|
| 218 |
|
| 219 |
+
Once we have the checkpoint, we export it to ExecuTorch with a max_seq_length/max_context_length of 1024 to the XNNPACK backend as follows.
|
| 220 |
+
|
| 221 |
+
(Note: ExecuTorch LLM export script requires config.json have certain key names. The correct config to use for the LLM export script is located at examples/models/qwen3/config/4b_config.json within the ExecuTorch repo.)
|
| 222 |
|
| 223 |
```Shell
|
| 224 |
python -m executorch.examples.models.llama.export_llama \
|
|
|
|
| 238 |
|
| 239 |
After that you can run the model in a mobile app (see [Running in a mobile app](#running-in-a-mobile-app)).
|
| 240 |
|
| 241 |
+
(We try to keep these instructions up-to-date, but if you find they do not work, check out our [CI test in ExecuTorch](https://github.com/pytorch/executorch/blob/main/.ci/scripts/test_torchao_huggingface_checkpoints.sh) for the latest source of truth, and let us know we need to update our model card.)
|
| 242 |
+
|
| 243 |
# Paper: TorchAO: PyTorch-Native Training-to-Serving Model Optimization
|
| 244 |
The model's quantization is powered by **TorchAO**, a framework presented in the paper [TorchAO: PyTorch-Native Training-to-Serving Model Optimization](https://huggingface.co/papers/2507.16099).
|
| 245 |
|