|
|
--- |
|
|
license: mit |
|
|
pipeline_tag: text-generation |
|
|
library_name: transformers.js |
|
|
tags: |
|
|
- ONNX |
|
|
- DML |
|
|
- ONNXRuntime |
|
|
- nlp |
|
|
- conversational |
|
|
--- |
|
|
|
|
|
# Phi-3 Mini-4K-Instruct ONNX model for onnxruntime-web |
|
|
This is the same models as the [official phi3 onnx model](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx) with a few changes to make it work for onnxruntime-web: |
|
|
|
|
|
1. the model is fp16 with int4 block quantization for weights |
|
|
2. the 'logits' output is fp32 |
|
|
3. the model uses MHA instead of GQA |
|
|
4. onnx and external data file need to stay below 2GB to be cacheable in chromium |
|
|
|
|
|
|
|
|
|
|
|
## Usage (Transformers.js) |
|
|
|
|
|
If you haven't already, you can install the [Transformers.js](https://huggingface.co/docs/transformers.js) JavaScript library from [NPM](https://www.npmjs.com/package/@huggingface/transformers) using: |
|
|
```bash |
|
|
npm i @huggingface/transformers |
|
|
``` |
|
|
|
|
|
You can then use the model to generate text like this: |
|
|
|
|
|
```js |
|
|
import { pipeline, TextStreamer } from "@huggingface/transformers"; |
|
|
|
|
|
// Create a text generation pipeline |
|
|
const generator = await pipeline( |
|
|
"text-generation", |
|
|
"Xenova/Phi-3-mini-4k-instruct", |
|
|
); |
|
|
|
|
|
// Define the list of messages |
|
|
const messages = [ |
|
|
{ role: "user", content: "Solve the equation: x^2 - 3x + 2 = 0" }, |
|
|
]; |
|
|
|
|
|
// Create text streamer |
|
|
const streamer = new TextStreamer(generator.tokenizer, { |
|
|
skip_prompt: true, |
|
|
// callback_function: (text) => { }, // Optional callback function |
|
|
}) |
|
|
|
|
|
// Generate a response |
|
|
const output = await generator(messages, { max_new_tokens: 512, do_sample: false, streamer }); |
|
|
console.log(output[0].generated_text.at(-1).content); |
|
|
``` |
|
|
|