Nanbeige4.1-3B for Apple Neural Engine (ctx4096)

CoreML conversion of Nanbeige/Nanbeige4.1-3B for on-device inference on Apple Neural Engine using ANEMLL.

Available Context Lengths

Variant	Context	Use Case
ctx512	512	Quick Q&A, simple tasks
ctx1024	1024	Tool calls, short conversations
ctx2048	2048	Multi-turn chat
ctx4096	4096	Long conversations, complex tool workflows

Model Details

Parameter	Value
Source Model	Nanbeige/Nanbeige4.1-3B
Architecture	LLaMA
Context Length	4096
Batch Size	16
Chunks	3
LUT (FFN)	6-bit, per_channel=4
LUT (LM Head)	6-bit, per_channel=4

Recommended Sampling

Parameter	Value
Temperature	0.6
Top-p	0.95
Repeat penalty	1.0

Requirements

macOS 15 (Sequoia) or later with Apple Neural Engine
8GB RAM or more
Python 3.9+ with coremltools>=9.0 and transformers

Quick Start

# Install dependencies
pip install "coremltools>=9.0" transformers huggingface_hub

# Clone the model
git lfs install
git clone https://huggingface.co/anemll/anemll-Nanbeige-Nanbeige4.1-3B-ctx4096_0.3.5
cd anemll-Nanbeige-Nanbeige4.1-3B-ctx4096_0.3.5

# Extract CoreML model files
find . -type f -name "*.zip" -exec unzip {} \;

# Run chat
python chat.py --meta ./meta.yaml

For conversation mode with history:

python chat_full.py --meta ./meta.yaml

Note: First load takes time as macOS places the model on ANE. Subsequent loads are instant. Use Ctrl-D to exit, Ctrl-C to interrupt.

Tool Calling

Nanbeige4.1-3B natively supports tool/function calling. The model uses <tool_call> tags to invoke functions.

ctx4096 provides ample context for complex tool calling with multiple tools and long <think> reasoning.

Live Demo

The included test_tool_call.py runs a full tool-calling loop on ANE with live HTTP calls:

Model decides which tool to call
Script executes the tool (real API request)
Model summarizes the result

# Weather in San Francisco (default)
python test_tool_call.py --meta ./meta.yaml

# Custom city
python test_tool_call.py --meta ./meta.yaml --question "Weather in Paris?"
python test_tool_call.py --meta ./meta.yaml --question "Weather in Tokyo?"

The get_weather tool fetches live data from open-meteo.com (free, no API key).

Example output:

[Step 3] Parsed: get_weather({"location": "San Diego"})

[Step 4] Executing 'get_weather' (live HTTP call)...
  Result: {"location": "San Diego", "temp_f": 57.3, "humidity": 78, "condition": "Partly cloudy"}

[Step 5] Second inference (model summarizes result)...

  Final answer:
  ──────────────────────────────────────────────────
  The current weather in San Diego is:
  - **Temperature**: 57.3°F
  - **Conditions**: Partly cloudy
  - **Humidity**: 78%

How It Works

The model receives tools as JSON schemas in the system prompt using ChatML format:

<|im_start|>system
You are a helpful assistant with access to tools.
<tools>
{"type": "function", "function": {"name": "get_weather", ...}}
</tools>
<|im_end|>
<|im_start|>user
What is the weather in San Francisco?<|im_end|>
<|im_start|>assistant

The model responds with a tool call:

<think>
The user wants weather info. I should call get_weather.
</think>

<tool_call>
{"name": "get_weather", "arguments": {"location": "San Francisco"}}
</tool_call>

The tool result is fed back as a <tool_response>, and the model generates a natural language summary.

iOS/macOS App

Try the sample Chat-Bot app on TestFlight:

Install TestFlight from App Store
Join beta: TestFlight Link
Add custom models via HuggingFace URLs

License

ANEMLL conversion pipeline: MIT License Source model (Nanbeige4.1-3B): Apache 2.0

Downloads last month: 30

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for anemll/anemll-Nanbeige-Nanbeige4.1-3B-ctx4096_0.3.5

Base model

Nanbeige/Nanbeige4-3B-Base

Finetuned

Nanbeige/Nanbeige4.1-3B

Finetuned

(11)

this model

Collection including anemll/anemll-Nanbeige-Nanbeige4.1-3B-ctx4096_0.3.5

Nanbeige4.1-3B

Collection

4 items • Updated 5 days ago • 2

anemll
/

anemll-Nanbeige-Nanbeige4.1-3B-ctx4096_0.3.5