|
|
--- |
|
|
license: gemma |
|
|
language: |
|
|
- en |
|
|
base_model: |
|
|
- google/functiongemma-270m-it |
|
|
pipeline_tag: text-generation |
|
|
tags: |
|
|
- function-calling |
|
|
- infrastructure |
|
|
- devops |
|
|
- litertlm |
|
|
--- |
|
|
|
|
|
# FunctionGemma Infrastructure Tools v8 |
|
|
|
|
|
A fine-tuned [FunctionGemma 270M](https://huggingface.co/google/functiongemma-270m-it) model for infrastructure error diagnosis and remediation. Achieves **100% accuracy** on 7 infrastructure tools when using the correct tool definitions. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Base Model**: google/functiongemma-270m-it |
|
|
- **Format**: LiteRT-LM (.litertlm) - optimized for on-device inference |
|
|
- **Quantization**: INT8 (Q8) |
|
|
- **Size**: ~271MB |
|
|
- **Training**: 50 epochs on 10,500 examples (1,500 per tool) |
|
|
|
|
|
## Supported Tools |
|
|
|
|
|
| Tool | Description | Use Case | |
|
|
|------|-------------|----------| |
|
|
| `enableCors` | Enable CORS for a specific origin | CORS policy errors, blocked cross-origin requests | |
|
|
| `updateConnectionUrl` | Update service connection URL | ECONNREFUSED errors, localhost connection issues in containers | |
|
|
| `setEnvVar` | Set environment variable | Missing configuration, undefined env vars | |
|
|
| `addHostMapping` | Add hostname to IP mapping | DNS resolution (ENOTFOUND) errors | |
|
|
| `increaseMemory` | Increase memory limit | OOMKilled errors, out of memory crashes | |
|
|
| `increaseTimeout` | Increase timeout value | 504 Gateway Timeout, connection timeout errors | |
|
|
| `restartService` | Restart a service | Stuck processes, stale data after deployment | |
|
|
|
|
|
## Usage with LiteRT-LM |
|
|
|
|
|
### Download the Model |
|
|
|
|
|
```bash |
|
|
# Using huggingface-cli |
|
|
huggingface-cli download macmacmacmac/functiongemma-nextjs functiongemma-infra-v8_q8_ekv1024.litertlm |
|
|
|
|
|
# Or using Python |
|
|
from huggingface_hub import hf_hub_download |
|
|
model_path = hf_hub_download( |
|
|
repo_id="macmacmacmac/functiongemma-nextjs", |
|
|
filename="functiongemma-infra-v8_q8_ekv1024.litertlm" |
|
|
) |
|
|
``` |
|
|
|
|
|
### Required Tool Definitions |
|
|
|
|
|
**Important**: You must use these exact tool definitions for optimal accuracy. The model was trained with these specific descriptions. |
|
|
|
|
|
```javascript |
|
|
const tools = [ |
|
|
{ |
|
|
type: "function", |
|
|
function: { |
|
|
name: "enableCors", |
|
|
description: "Enable CORS for a specific origin to fix blocked cross-origin requests.", |
|
|
parameters: { |
|
|
type: "object", |
|
|
properties: { |
|
|
origin: { type: "string", description: "The origin to allow (e.g., http://localhost:3000)" }, |
|
|
methods: { type: "string", description: "Allowed HTTP methods (e.g., GET,POST,PUT,DELETE)" } |
|
|
}, |
|
|
required: ["origin"] |
|
|
} |
|
|
} |
|
|
}, |
|
|
{ |
|
|
type: "function", |
|
|
function: { |
|
|
name: "updateConnectionUrl", |
|
|
description: "Update a service connection URL to fix ECONNREFUSED errors, typically changing localhost to the correct service hostname.", |
|
|
parameters: { |
|
|
type: "object", |
|
|
properties: { |
|
|
service: { type: "string", description: "The service to update (e.g., database, redis, api)" }, |
|
|
hostname: { type: "string", description: "The correct hostname to connect to" }, |
|
|
port: { type: "integer", description: "The port number to connect to" } |
|
|
}, |
|
|
required: ["service", "hostname", "port"] |
|
|
} |
|
|
} |
|
|
}, |
|
|
{ |
|
|
type: "function", |
|
|
function: { |
|
|
name: "setEnvVar", |
|
|
description: "Set an environment variable to fix missing configuration errors.", |
|
|
parameters: { |
|
|
type: "object", |
|
|
properties: { |
|
|
name: { type: "string", description: "Environment variable name (e.g., DATABASE_URL, API_KEY)" }, |
|
|
value: { type: "string", description: "The value to set" } |
|
|
}, |
|
|
required: ["name", "value"] |
|
|
} |
|
|
} |
|
|
}, |
|
|
{ |
|
|
type: "function", |
|
|
function: { |
|
|
name: "addHostMapping", |
|
|
description: "Add a hostname to IP mapping to fix DNS resolution (ENOTFOUND) errors.", |
|
|
parameters: { |
|
|
type: "object", |
|
|
properties: { |
|
|
hostname: { type: "string", description: "The hostname to map" }, |
|
|
ip: { type: "string", description: "The IP address to map to" } |
|
|
}, |
|
|
required: ["hostname", "ip"] |
|
|
} |
|
|
} |
|
|
}, |
|
|
{ |
|
|
type: "function", |
|
|
function: { |
|
|
name: "increaseMemory", |
|
|
description: "Increase memory limit for a service to fix OOMKilled errors.", |
|
|
parameters: { |
|
|
type: "object", |
|
|
properties: { |
|
|
service: { type: "string", description: "The service/container/pod name" }, |
|
|
memoryMb: { type: "integer", description: "Memory limit in megabytes" } |
|
|
}, |
|
|
required: ["service", "memoryMb"] |
|
|
} |
|
|
} |
|
|
}, |
|
|
{ |
|
|
type: "function", |
|
|
function: { |
|
|
name: "increaseTimeout", |
|
|
description: "Increase timeout value to fix 504 Gateway Timeout or connection timeout errors.", |
|
|
parameters: { |
|
|
type: "object", |
|
|
properties: { |
|
|
service: { type: "string", description: "The service to configure" }, |
|
|
timeoutMs: { type: "integer", description: "Timeout value in milliseconds" } |
|
|
}, |
|
|
required: ["service", "timeoutMs"] |
|
|
} |
|
|
} |
|
|
}, |
|
|
{ |
|
|
type: "function", |
|
|
function: { |
|
|
name: "restartService", |
|
|
description: "Restart a service to apply configuration changes or fix a stuck process.", |
|
|
parameters: { |
|
|
type: "object", |
|
|
properties: { |
|
|
service: { type: "string", description: "The service/container/pod name to restart" } |
|
|
}, |
|
|
required: ["service"] |
|
|
} |
|
|
} |
|
|
} |
|
|
]; |
|
|
``` |
|
|
|
|
|
### Example Usage with dad-express |
|
|
|
|
|
```javascript |
|
|
const { FunctionGemmaEngine } = require('dad-express'); |
|
|
|
|
|
const engine = new FunctionGemmaEngine({ |
|
|
modelPath: './functiongemma-infra-v8_q8_ekv1024.litertlm', |
|
|
tools: JSON.stringify(tools) |
|
|
}); |
|
|
|
|
|
// Diagnose an error |
|
|
const result = await engine.call('Container api was OOMKilled - out of memory'); |
|
|
console.log(result.tool_calls[0].function); |
|
|
// { name: 'increaseMemory', arguments: { service: 'api', memoryMb: 1024 } } |
|
|
``` |
|
|
|
|
|
## Training Data |
|
|
|
|
|
The model was trained on 10,500 synthetic examples covering common infrastructure errors: |
|
|
|
|
|
| Error Pattern | Tool | Examples | |
|
|
|--------------|------|----------| |
|
|
| CORS policy errors | enableCors | 1,500 | |
|
|
| ECONNREFUSED errors | updateConnectionUrl | 1,500 | |
|
|
| Missing env vars | setEnvVar | 1,500 | |
|
|
| DNS/ENOTFOUND errors | addHostMapping | 1,500 | |
|
|
| OOMKilled errors | increaseMemory | 1,500 | |
|
|
| Timeout errors | increaseTimeout | 1,500 | |
|
|
| Stuck services | restartService | 1,500 | |
|
|
|
|
|
### Sample Training Examples |
|
|
|
|
|
``` |
|
|
"CORS error: No 'Access-Control-Allow-Origin' header from http://localhost:3000" → enableCors |
|
|
"Error: connect ECONNREFUSED 127.0.0.1:5432 - database connection failed" → updateConnectionUrl |
|
|
"Missing required environment variable: DATABASE_URL" → setEnvVar |
|
|
"getaddrinfo ENOTFOUND db" → addHostMapping |
|
|
"Container api was OOMKilled" → increaseMemory |
|
|
"504 Gateway Timeout from backend" → increaseTimeout |
|
|
"nginx container is not responding" → restartService |
|
|
``` |
|
|
|
|
|
|
|
|
|
|
|
## Fully Loaded Serving |
|
|
|
|
|
**Fully Loaded Serving** is an end-to-end intelligent error remediation pipeline that runs entirely on-device. It combines: |
|
|
|
|
|
1. **Low-latency vector embeddings** (EmbeddingGemma) for streaming log classification |
|
|
2. **Semantic clustering** to group similar errors/issues/outliers |
|
|
3. **Function calling** (FunctionGemma) to automatically diagnose and fix infrastructure issues |
|
|
4. **Prompt optimization** via [Ax](https://github.com/ax-llm/ax) with MiPRO for continuous improvement |
|
|
|
|
|
### Architecture |
|
|
|
|
|
``` |
|
|
┌─────────────────────────────────────────────────────────────────────────┐ |
|
|
│ Next.js Application │ |
|
|
├─────────────────────────────────────────────────────────────────────────┤ |
|
|
│ stdout/stderr ──▶ Log Stream ──▶ dad-express middleware │ |
|
|
│ │ │ |
|
|
│ ┌─────────────────────┼──────────────────────┐ │ |
|
|
│ │ ▼ │ │ |
|
|
│ │ ┌──────────────────────────────────┐ │ │ |
|
|
│ │ │ EmbeddingGemma (~5ms) │ │ │ |
|
|
│ │ │ 768-dim vector per log line │ │ │ |
|
|
│ │ └──────────────┬───────────────────┘ │ │ |
|
|
│ │ │ │ │ |
|
|
│ │ ▼ │ │ |
|
|
│ │ ┌──────────────────────────────────┐ │ │ |
|
|
│ │ │ Semantic Clustering (cosine) │ │ │ |
|
|
│ │ │ • Group similar errors │ │ │ |
|
|
│ │ │ • Detect outliers │ │ │ |
|
|
│ │ │ • Identify recurring patterns │ │ │ |
|
|
│ │ └──────────────┬───────────────────┘ │ │ |
|
|
│ │ │ │ │ |
|
|
│ │ ▼ │ │ |
|
|
│ │ ┌──────────────────────────────────┐ │ │ |
|
|
│ │ │ FunctionGemma (~50ms/call) │ │ │ |
|
|
│ │ │ → enableCors, setEnvVar, etc. │ │ │ |
|
|
│ │ └──────────────┬───────────────────┘ │ │ |
|
|
│ │ │ │ │ |
|
|
│ │ ▼ │ │ |
|
|
│ │ ┌──────────────────────────────────┐ │ │ |
|
|
│ │ │ Auto-Remediation Layer │ │ │ |
|
|
│ │ │ Execute fix or notify developer │ │ │ |
|
|
│ │ └──────────────────────────────────┘ │ │ |
|
|
│ │ │ │ |
|
|
│ │ LiteRT-LM (on-device, ~300MB RAM) │ │ |
|
|
│ └────────────────────────────────────────────┘ │ |
|
|
└─────────────────────────────────────────────────────────────────────────┘ |
|
|
``` |
|
|
|
|
|
### Ax Integration with MiPRO |
|
|
|
|
|
[Ax](https://github.com/ax-llm/ax) is a TypeScript DSPy-style framework for declarative AI programming. dad-express provides `AxLiteRTProvider` to run Ax signatures entirely on-device: |
|
|
|
|
|
```typescript |
|
|
import { AxGen } from "@ax-llm/ax"; |
|
|
import { AxLiteRTProvider, EmbeddingEngine, FunctionGemmaEngine } from "dad-express"; |
|
|
|
|
|
// Create on-device provider with both embedding and chat models |
|
|
const provider = new AxLiteRTProvider({ |
|
|
chat: { |
|
|
modelPath: "./models/functiongemma-infra-v8_q8_ekv1024.litertlm", |
|
|
tools: infrastructureTools, // The 7 tools from this repo |
|
|
}, |
|
|
embed: { |
|
|
modelPath: "./models/embedding_gemma.tflite", |
|
|
tokenizerPath: "./models/tokenizer.model", |
|
|
}, |
|
|
}); |
|
|
|
|
|
// Define Ax signature for error diagnosis (MiPRO-optimizable) |
|
|
const diagnoseError = new AxGen(` |
|
|
errorMessage:string "The error log line", |
|
|
errorCluster:string? "Similar errors seen recently" |
|
|
-> |
|
|
diagnosis:string "Root cause analysis", |
|
|
toolName:string "Which infrastructure tool to call", |
|
|
confidence:class "high, medium, low" |
|
|
`); |
|
|
|
|
|
// Run inference on-device |
|
|
const result = await diagnoseError.forward(provider, { |
|
|
errorMessage: "CORS error from http://localhost:3000", |
|
|
errorCluster: "3 similar CORS errors in last 5 minutes", |
|
|
}); |
|
|
|
|
|
console.log(result); |
|
|
// { diagnosis: "Frontend origin not in allowed list", |
|
|
// toolName: "enableCors", |
|
|
// confidence: "high" } |
|
|
``` |
|
|
|
|
|
### Example: Hosting Next.js with Fully Loaded Serving |
|
|
|
|
|
```typescript |
|
|
// server.ts - Next.js with intelligent error remediation |
|
|
import { createApp, FunctionGemmaEngine, EmbeddingEngine } from "dad-express"; |
|
|
import { spawn } from "child_process"; |
|
|
|
|
|
// Infrastructure tools (exact definitions for 100% accuracy) |
|
|
const tools = [ |
|
|
{ type: "function", function: { name: "enableCors", description: "Enable CORS for a specific origin to fix blocked cross-origin requests.", parameters: { type: "object", properties: { origin: { type: "string", description: "The origin to allow" } }, required: ["origin"] } } }, |
|
|
{ type: "function", function: { name: "updateConnectionUrl", description: "Update a service connection URL to fix ECONNREFUSED errors.", parameters: { type: "object", properties: { service: { type: "string" }, hostname: { type: "string" }, port: { type: "integer" } }, required: ["service", "hostname", "port"] } } }, |
|
|
{ type: "function", function: { name: "setEnvVar", description: "Set an environment variable to fix missing configuration errors.", parameters: { type: "object", properties: { name: { type: "string" }, value: { type: "string" } }, required: ["name", "value"] } } }, |
|
|
{ type: "function", function: { name: "addHostMapping", description: "Add a hostname to IP mapping to fix DNS resolution errors.", parameters: { type: "object", properties: { hostname: { type: "string" }, ip: { type: "string" } }, required: ["hostname", "ip"] } } }, |
|
|
{ type: "function", function: { name: "increaseMemory", description: "Increase memory limit for a service to fix OOMKilled errors.", parameters: { type: "object", properties: { service: { type: "string" }, memoryMb: { type: "integer" } }, required: ["service", "memoryMb"] } } }, |
|
|
{ type: "function", function: { name: "increaseTimeout", description: "Increase timeout value to fix 504 Gateway Timeout errors.", parameters: { type: "object", properties: { service: { type: "string" }, timeoutMs: { type: "integer" } }, required: ["service", "timeoutMs"] } } }, |
|
|
{ type: "function", function: { name: "restartService", description: "Restart a service to apply changes or fix stuck processes.", parameters: { type: "object", properties: { service: { type: "string" } }, required: ["service"] } } }, |
|
|
]; |
|
|
|
|
|
// Initialize on-device models |
|
|
const embedEngine = new EmbeddingEngine({ |
|
|
modelPath: "./models/embedding_gemma.tflite", |
|
|
tokenizerPath: "./models/tokenizer.model", |
|
|
}); |
|
|
|
|
|
const functionGemma = new FunctionGemmaEngine({ |
|
|
modelPath: "./models/functiongemma-infra-v8_q8_ekv1024.litertlm", |
|
|
tools: JSON.stringify(tools), |
|
|
}); |
|
|
|
|
|
// Error clustering state |
|
|
const errorClusters = new Map<string, { embedding: Float32Array; count: number; lastSeen: Date }>(); |
|
|
|
|
|
async function classifyAndCluster(logLine: string): Promise<string | null> { |
|
|
// Skip non-error lines |
|
|
if (!logLine.match(/error|fail|exception|timeout|refused|denied/i)) { |
|
|
return null; |
|
|
} |
|
|
|
|
|
// Generate embedding (~5ms on CPU) |
|
|
const embedding = await embedEngine.encodeAsync(logLine); |
|
|
|
|
|
// Find similar errors via cosine similarity |
|
|
let bestMatch: string | null = null; |
|
|
let bestSimilarity = 0.85; // Threshold for clustering |
|
|
|
|
|
for (const [clusterId, cluster] of errorClusters) { |
|
|
const similarity = EmbeddingEngine.cosineSimilarity(embedding, cluster.embedding); |
|
|
if (similarity > bestSimilarity) { |
|
|
bestSimilarity = similarity; |
|
|
bestMatch = clusterId; |
|
|
} |
|
|
} |
|
|
|
|
|
if (bestMatch) { |
|
|
// Update existing cluster |
|
|
const cluster = errorClusters.get(bestMatch)!; |
|
|
cluster.count++; |
|
|
cluster.lastSeen = new Date(); |
|
|
return bestMatch; |
|
|
} |
|
|
|
|
|
// Create new cluster |
|
|
const clusterId = `cluster_${Date.now()}`; |
|
|
errorClusters.set(clusterId, { embedding, count: 1, lastSeen: new Date() }); |
|
|
return clusterId; |
|
|
} |
|
|
|
|
|
async function diagnoseAndFix(errorLog: string, clusterId: string): Promise<void> { |
|
|
const cluster = errorClusters.get(clusterId); |
|
|
|
|
|
// Call FunctionGemma for diagnosis (~50ms) |
|
|
const result = await functionGemma.sendMessage(errorLog); |
|
|
|
|
|
if (result.functionCalls && result.functionCalls.length > 0) { |
|
|
const call = result.functionCalls[0]; |
|
|
console.log(`[AutoFix] Detected ${cluster?.count || 1}x: ${call.name}`); |
|
|
console.log(`[AutoFix] Args: ${JSON.stringify(call.arguments)}`); |
|
|
|
|
|
// Execute remediation (in production, this would call actual infrastructure APIs) |
|
|
switch (call.name) { |
|
|
case "enableCors": |
|
|
console.log(`[AutoFix] Would enable CORS for: ${call.arguments.origin}`); |
|
|
break; |
|
|
case "restartService": |
|
|
console.log(`[AutoFix] Would restart: ${call.arguments.service}`); |
|
|
break; |
|
|
case "increaseMemory": |
|
|
console.log(`[AutoFix] Would increase memory for ${call.arguments.service} to ${call.arguments.memoryMb}MB`); |
|
|
break; |
|
|
// ... handle other tools |
|
|
} |
|
|
} |
|
|
} |
|
|
|
|
|
// Create dad-express app |
|
|
const app = createApp(); |
|
|
|
|
|
// API routes |
|
|
app.get("/health", () => ({ status: "ok", models: { embed: true, functionGemma: true } })); |
|
|
|
|
|
app.get("/clusters", () => { |
|
|
const clusters = []; |
|
|
for (const [id, cluster] of errorClusters) { |
|
|
clusters.push({ id, count: cluster.count, lastSeen: cluster.lastSeen }); |
|
|
} |
|
|
return clusters; |
|
|
}); |
|
|
|
|
|
// Start Next.js as child process with log monitoring |
|
|
const nextProcess = spawn("npx", ["next", "start"], { |
|
|
stdio: ["inherit", "pipe", "pipe"], |
|
|
env: { ...process.env, PORT: "3001" }, |
|
|
}); |
|
|
|
|
|
// Stream stdout |
|
|
nextProcess.stdout.on("data", (data) => { |
|
|
const line = data.toString().trim(); |
|
|
console.log(`[next] ${line}`); |
|
|
}); |
|
|
|
|
|
// Stream stderr with intelligent processing |
|
|
nextProcess.stderr.on("data", async (data) => { |
|
|
const line = data.toString().trim(); |
|
|
console.log(`[next:err] ${line}`); |
|
|
|
|
|
// Classify and cluster error |
|
|
const clusterId = await classifyAndCluster(line); |
|
|
|
|
|
if (clusterId) { |
|
|
// Diagnose and auto-fix |
|
|
await diagnoseAndFix(line, clusterId); |
|
|
} |
|
|
}); |
|
|
|
|
|
// Start dad-express on separate port for monitoring |
|
|
app.listen(4000, () => { |
|
|
console.log("dad-express monitoring on http://localhost:4000"); |
|
|
console.log("Next.js app on http://localhost:3001"); |
|
|
}); |
|
|
``` |
|
|
|
|
|
### Key Benefits |
|
|
|
|
|
| Feature | Latency | Memory | Cloud Calls | |
|
|
|---------|---------|--------|-------------| |
|
|
| EmbeddingGemma | ~5ms/embed | ~50MB | 0 | |
|
|
| FunctionGemma | ~50ms/call | ~271MB | 0 | |
|
|
| Semantic clustering | <1ms | Varies | 0 | |
|
|
| **Total pipeline** | **~60ms** | **~350MB** | **0** | |
|
|
|
|
|
- **Zero cloud dependency**: All inference runs locally via LiteRT-LM |
|
|
- **Sub-100ms latency**: Fast enough for real-time log processing |
|
|
- **Privacy-preserving**: Error logs never leave the device |
|
|
- **Continuous improvement**: Use Ax MiPRO to optimize prompts over time |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- Optimized for the 7 specific infrastructure tools listed above |
|
|
- Requires exact tool definitions for best accuracy |
|
|
- May not generalize well to error patterns not seen in training |
|
|
|
|
|
## License |
|
|
|
|
|
This model inherits the [Gemma license](https://ai.google.dev/gemma/terms) from the base model. |
|
|
|