qwen1.5-0.5b-chat

Model Description

This is a quantized version of the Qwen 1.5 0.5B model from Alibaba Cloud, optimized for efficient inference on devices with limited memory. Quantization reduces the model's size and improves its computational speed by using 8-bit integers instead of 32-bit floating-point numbers.

Files

  • config.json
  • tokenizer.json
  • tokenizer_config.json
  • onnx/decoder_model_merged_quantized.onnx

Usage in Transformers.js

import { pipeline, AutoTokenizer } from '@xenova/transformers';

async function runTextGeneration() {
    const generator = await pipeline(
        'text-generation',
        'jestevesv/qwen1.5-0.5b-chat',
        { quantized: true }
    );

    const prompt = 'Hola, ¿cómo estás hoy?';

    const output = await generator(prompt, {
        max_length: 100,
        do_sample: true,
        temperature: 0.7,
    });

    console.log(output);
}

runTextGeneration().catch(err => {
    console.error('Error:', err);
});
Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support