macmacmacmac
/

functiongemma-nextjs

@@ -208,6 +208,254 @@ The model was trained on 10,500 synthetic examples covering common infrastructur
 "nginx container is not responding" → restartService
 ```
 ## Limitations
 - Optimized for the 7 specific infrastructure tools listed above

 "nginx container is not responding" → restartService
 ```
+## Fully Loaded Serving
+**Fully Loaded Serving** is an end-to-end intelligent error remediation pipeline that runs entirely on-device. It combines:
+1. **Low-latency vector embeddings** (EmbeddingGemma) for streaming log classification
+2. **Semantic clustering** to group similar errors/issues/outliers
+3. **Function calling** (FunctionGemma) to automatically diagnose and fix infrastructure issues
+4. **Prompt optimization** via [Ax](https://github.com/ax-llm/ax) with MiPRO for continuous improvement
+### Architecture
+```
+┌─────────────────────────────────────────────────────────────────────────┐
+│                         Next.js Application                             │
+├─────────────────────────────────────────────────────────────────────────┤
+│  stdout/stderr ──▶ Log Stream ──▶ dad-express middleware                │
+│                                          │                              │
+│                    ┌─────────────────────┼──────────────────────┐       │
+│                    │                     ▼                      │       │
+│                    │  ┌──────────────────────────────────┐      │       │
+│                    │  │      EmbeddingGemma (~5ms)       │      │       │
+│                    │  │   768-dim vector per log line    │      │       │
+│                    │  └──────────────┬───────────────────┘      │       │
+│                    │                 │                          │       │
+│                    │                 ▼                          │       │
+│                    │  ┌──────────────────────────────────┐      │       │
+│                    │  │   Semantic Clustering (cosine)   │      │       │
+│                    │  │  • Group similar errors          │      │       │
+│                    │  │  • Detect outliers               │      │       │
+│                    │  │  • Identify recurring patterns   │      │       │
+│                    │  └──────────────┬───────────────────┘      │       │
+│                    │                 │                          │       │
+│                    │                 ▼                          │       │
+│                    │  ┌──────────────────────────────────┐      │       │
+│                    │  │   FunctionGemma (~50ms/call)     │      │       │
+│                    │  │  → enableCors, setEnvVar, etc.   │      │       │
+│                    │  └──────────────┬───────────────────┘      │       │
+│                    │                 │                          │       │
+│                    │                 ▼                          │       │
+│                    │  ┌──────────────────────────────────┐      │       │
+│                    │  │      Auto-Remediation Layer      │      │       │
+│                    │  │  Execute fix or notify developer │      │       │
+│                    │  └──────────────────────────────────┘      │       │
+│                    │                                            │       │
+│                    │     LiteRT-LM (on-device, ~300MB RAM)      │       │
+│                    └────────────────────────────────────────────┘       │
+└─────────────────────────────────────────────────────────────────────────┘
+```
+### Ax Integration with MiPRO
+[Ax](https://github.com/ax-llm/ax) is a TypeScript DSPy-style framework for declarative AI programming. dad-express provides `AxLiteRTProvider` to run Ax signatures entirely on-device:
+```typescript
+import { AxGen } from "@ax-llm/ax";
+import { AxLiteRTProvider, EmbeddingEngine, FunctionGemmaEngine } from "dad-express";
+// Create on-device provider with both embedding and chat models
+const provider = new AxLiteRTProvider({
+  chat: {
+    modelPath: "./models/functiongemma-infra-v8_q8_ekv1024.litertlm",
+    tools: infrastructureTools,  // The 7 tools from this repo
+  },
+  embed: {
+    modelPath: "./models/embedding_gemma.tflite",
+    tokenizerPath: "./models/tokenizer.model",
+  },
+});
+// Define Ax signature for error diagnosis (MiPRO-optimizable)
+const diagnoseError = new AxGen(`
+  errorMessage:string "The error log line",
+  errorCluster:string? "Similar errors seen recently"
+  ->
+  diagnosis:string "Root cause analysis",
+  toolName:string "Which infrastructure tool to call",
+  confidence:class "high, medium, low"
+`);
+// Run inference on-device
+const result = await diagnoseError.forward(provider, {
+  errorMessage: "CORS error from http://localhost:3000",
+  errorCluster: "3 similar CORS errors in last 5 minutes",
+});
+console.log(result);
+// { diagnosis: "Frontend origin not in allowed list",
+//   toolName: "enableCors",
+//   confidence: "high" }
+```
+### Example: Hosting Next.js with Fully Loaded Serving
+```typescript
+// server.ts - Next.js with intelligent error remediation
+import { createApp, FunctionGemmaEngine, EmbeddingEngine } from "dad-express";
+import { spawn } from "child_process";
+// Infrastructure tools (exact definitions for 100% accuracy)
+const tools = [
+  { type: "function", function: { name: "enableCors", description: "Enable CORS for a specific origin to fix blocked cross-origin requests.", parameters: { type: "object", properties: { origin: { type: "string", description: "The origin to allow" } }, required: ["origin"] } } },
+  { type: "function", function: { name: "updateConnectionUrl", description: "Update a service connection URL to fix ECONNREFUSED errors.", parameters: { type: "object", properties: { service: { type: "string" }, hostname: { type: "string" }, port: { type: "integer" } }, required: ["service", "hostname", "port"] } } },
+  { type: "function", function: { name: "setEnvVar", description: "Set an environment variable to fix missing configuration errors.", parameters: { type: "object", properties: { name: { type: "string" }, value: { type: "string" } }, required: ["name", "value"] } } },
+  { type: "function", function: { name: "addHostMapping", description: "Add a hostname to IP mapping to fix DNS resolution errors.", parameters: { type: "object", properties: { hostname: { type: "string" }, ip: { type: "string" } }, required: ["hostname", "ip"] } } },
+  { type: "function", function: { name: "increaseMemory", description: "Increase memory limit for a service to fix OOMKilled errors.", parameters: { type: "object", properties: { service: { type: "string" }, memoryMb: { type: "integer" } }, required: ["service", "memoryMb"] } } },
+  { type: "function", function: { name: "increaseTimeout", description: "Increase timeout value to fix 504 Gateway Timeout errors.", parameters: { type: "object", properties: { service: { type: "string" }, timeoutMs: { type: "integer" } }, required: ["service", "timeoutMs"] } } },
+  { type: "function", function: { name: "restartService", description: "Restart a service to apply changes or fix stuck processes.", parameters: { type: "object", properties: { service: { type: "string" } }, required: ["service"] } } },
+];
+// Initialize on-device models
+const embedEngine = new EmbeddingEngine({
+  modelPath: "./models/embedding_gemma.tflite",
+  tokenizerPath: "./models/tokenizer.model",
+});
+const functionGemma = new FunctionGemmaEngine({
+  modelPath: "./models/functiongemma-infra-v8_q8_ekv1024.litertlm",
+  tools: JSON.stringify(tools),
+});
+// Error clustering state
+const errorClusters = new Map<string, { embedding: Float32Array; count: number; lastSeen: Date }>();
+async function classifyAndCluster(logLine: string): Promise<string | null> {
+  // Skip non-error lines
+  if (!logLine.match(/error|fail|exception|timeout|refused|denied/i)) {
+    return null;
+  }
+  // Generate embedding (~5ms on CPU)
+  const embedding = await embedEngine.encodeAsync(logLine);
+  // Find similar errors via cosine similarity
+  let bestMatch: string | null = null;
+  let bestSimilarity = 0.85; // Threshold for clustering
+  for (const [clusterId, cluster] of errorClusters) {
+    const similarity = EmbeddingEngine.cosineSimilarity(embedding, cluster.embedding);
+    if (similarity > bestSimilarity) {
+      bestSimilarity = similarity;
+      bestMatch = clusterId;
+    }
+  }
+  if (bestMatch) {
+    // Update existing cluster
+    const cluster = errorClusters.get(bestMatch)!;
+    cluster.count++;
+    cluster.lastSeen = new Date();
+    return bestMatch;
+  }
+  // Create new cluster
+  const clusterId = `cluster_${Date.now()}`;
+  errorClusters.set(clusterId, { embedding, count: 1, lastSeen: new Date() });
+  return clusterId;
+}
+async function diagnoseAndFix(errorLog: string, clusterId: string): Promise<void> {
+  const cluster = errorClusters.get(clusterId);
+  // Call FunctionGemma for diagnosis (~50ms)
+  const result = await functionGemma.sendMessage(errorLog);
+  if (result.functionCalls && result.functionCalls.length > 0) {
+    const call = result.functionCalls[0];
+    console.log(`[AutoFix] Detected ${cluster?.count || 1}x: ${call.name}`);
+    console.log(`[AutoFix] Args: ${JSON.stringify(call.arguments)}`);
+    // Execute remediation (in production, this would call actual infrastructure APIs)
+    switch (call.name) {
+      case "enableCors":
+        console.log(`[AutoFix] Would enable CORS for: ${call.arguments.origin}`);
+        break;
+      case "restartService":
+        console.log(`[AutoFix] Would restart: ${call.arguments.service}`);
+        break;
+      case "increaseMemory":
+        console.log(`[AutoFix] Would increase memory for ${call.arguments.service} to ${call.arguments.memoryMb}MB`);
+        break;
+      // ... handle other tools
+    }
+  }
+}
+// Create dad-express app
+const app = createApp();
+// API routes
+app.get("/health", () => ({ status: "ok", models: { embed: true, functionGemma: true } }));
+app.get("/clusters", () => {
+  const clusters = [];
+  for (const [id, cluster] of errorClusters) {
+    clusters.push({ id, count: cluster.count, lastSeen: cluster.lastSeen });
+  }
+  return clusters;
+});
+// Start Next.js as child process with log monitoring
+const nextProcess = spawn("npx", ["next", "start"], {
+  stdio: ["inherit", "pipe", "pipe"],
+  env: { ...process.env, PORT: "3001" },
+});
+// Stream stdout
+nextProcess.stdout.on("data", (data) => {
+  const line = data.toString().trim();
+  console.log(`[next] ${line}`);
+});
+// Stream stderr with intelligent processing
+nextProcess.stderr.on("data", async (data) => {
+  const line = data.toString().trim();
+  console.log(`[next:err] ${line}`);
+  // Classify and cluster error
+  const clusterId = await classifyAndCluster(line);
+  if (clusterId) {
+    // Diagnose and auto-fix
+    await diagnoseAndFix(line, clusterId);
+  }
+});
+// Start dad-express on separate port for monitoring
+app.listen(4000, () => {
+  console.log("dad-express monitoring on http://localhost:4000");
+  console.log("Next.js app on http://localhost:3001");
+});
+```
+### Key Benefits
+| Feature | Latency | Memory | Cloud Calls |
+|---------|---------|--------|-------------|
+| EmbeddingGemma | ~5ms/embed | ~50MB | 0 |
+| FunctionGemma | ~50ms/call | ~271MB | 0 |
+| Semantic clustering | <1ms | Varies | 0 |
+| **Total pipeline** | **~60ms** | **~350MB** | **0** |
+- **Zero cloud dependency**: All inference runs locally via LiteRT-LM
+- **Sub-100ms latency**: Fast enough for real-time log processing
+- **Privacy-preserving**: Error logs never leave the device
+- **Continuous improvement**: Use Ax MiPRO to optimize prompts over time
 ## Limitations
 - Optimized for the 7 specific infrastructure tools listed above