--- base_model: skt/A.X-4.0 quantized_by: tmfi-us license: other license_name: qwen license_link: https://huggingface.co/skt/A.X-4.0/blob/main/LICENSE language: - en - ko pipeline_tag: text-generation --- ## About FP8 w8a8 dynamic quants of [skt/A.X-4.0](https://huggingface.co/skt/A.X-4.0). Used following Python script with [llmcompressor](https://github.com/vllm-project/llm-compressor) to generate: ```python from transformers import AutoTokenizer, AutoModelForCausalLM from llmcompressor.transformers import oneshot from llmcompressor.modifiers.quantization import QuantizationModifier MODEL_ID = 'skt/A.X-4.0' model = AutoModelForCausalLM.from_pretrained( MODEL_ID, device_map="auto", torch_dtype="auto", ) tokenizer = AutoTokenizer.from_pretrained(MODEL_ID) # Configure the simple PTQ quantization recipe = QuantizationModifier( targets="Linear", scheme="FP8_DYNAMIC", ignore=["lm_head"]) # Apply the quantization algorithm. oneshot(model=model, recipe=recipe) # Save the model. SAVE_DIR = MODEL_ID.split("/")[1] + "-FP8-Dynamic" model.save_pretrained(SAVE_DIR) tokenizer.save_pretrained(SAVE_DIR) ``` Quantization recipe can be found in [`recipe.yaml`](recipe.yaml)