Qwen3-0.6B for Apple NPU
Quickstart
Install NexaSDK
Run the model with one line of code:
nexa infer NexaAI/qwen3-0.6b-ane
Model Description
Qwen3-0.6B is a compact 600-million-parameter language model from the Qwen team at Alibaba Cloud.
Designed for ultra-efficient inference, it provides strong multilingual understanding and basic reasoning in a tiny footprint.
With low memory requirements and fast latency, Qwen3-0.6B is ideal for mobile, embedded, and resource-constrained environments.
Features
- Ultra-lightweight: Runs well on CPUs, mobile devices, and edge hardware.
- Multilingual: Supports a broad set of languages.
- Fast inference: Low-latency generation suited for real-time applications.
- Efficient reasoning: Performs core reasoning and analysis tasks at small scale.
- Fine-tunable: Adaptable for domain-specific use cases.
Use Cases
- Mobile and embedded assistants
- Lightweight chat and document apps
- On-device summarization and Q&A
- IoT and robotics agents
- CPU-only or small-GPU deployments
Inputs and Outputs
Input
- Text prompts or conversation history (tokenized sequences for API or SDK workflows)
Output
- Generated text (answers, summaries, short reasoning)
- Optional logits/probabilities
License
This repo is licensed under the Creative Commons Attribution–NonCommercial 4.0 (CC BY-NC 4.0) license, which allows use, sharing, and modification only for non-commercial purposes with proper attribution. All NPU-related models, runtimes, and code in this project are protected under this non-commercial license and cannot be used in any commercial or revenue-generating applications. Commercial licensing or enterprise usage requires a separate agreement. For inquiries, please contact dev@nexa.ai
- Downloads last month
- 53