qwen1.5-0.5b-chat
Model Description
This is a quantized version of the Qwen 1.5 0.5B model from Alibaba Cloud, optimized for efficient inference on devices with limited memory. Quantization reduces the model's size and improves its computational speed by using 8-bit integers instead of 32-bit floating-point numbers.
Files
config.jsontokenizer.jsontokenizer_config.jsononnx/decoder_model_merged_quantized.onnx
Usage in Transformers.js
import { pipeline, AutoTokenizer } from '@xenova/transformers';
async function runTextGeneration() {
const generator = await pipeline(
'text-generation',
'jestevesv/qwen1.5-0.5b-chat',
{ quantized: true }
);
const prompt = 'Hola, ¿cómo estás hoy?';
const output = await generator(prompt, {
max_length: 100,
do_sample: true,
temperature: 0.7,
});
console.log(output);
}
runTextGeneration().catch(err => {
console.error('Error:', err);
});
- Downloads last month
- 4