warbler-cda / README.md
Bellok
docs: enhance README with search mode guides and app info updates, add entanglement resonance feature
f22e6ff
metadata
title: Warbler CDA FractalStat RAG
emoji: 🦜
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 6.0.2
app_file: app.py
pinned: false
license: mit
short_description: RAG system with 8D FractalStat and 100k documents
tags:
  - rag
  - semantic-search
  - retrieval
  - fastapi
  - fractalstat
thumbnail: >-
  https://cdn-uploads.huggingface.co/production/uploads/68c705b6fc90bcc7a4f56721/8G2TJJT8enAFaBLJGTXka.png

Warbler CDA - Cognitive Development Architecture RAG System

License: MIT Python 3.11+ FastAPI Docker

A production-ready RAG (Retrieval-Augmented Generation) system with FractalStat multi-dimensional addressing for intelligent document retrieval, semantic memory, and automatic data ingestion.

🌟 Features

Core RAG System

  • Semantic Anchors: Persistent memory with provenance tracking
  • Hierarchical Summarization: Micro/macro distillation for efficient compression
  • Conflict Detection: Automatic detection and resolution of contradictory information
  • Memory Pooling: Performance-optimized object pooling for high-throughput scenarios

FractalStat Multi-Dimensional Addressing

  • 8-Dimensional Coordinates: Realm, Lineage, Adjacency, Horizon, Luminosity, Polarity, Dimensionality, Alignment
  • Hybrid Scoring: Combines semantic similarity with FractalStat resonance for superior retrieval
  • Entanglement Detection: Identifies relationships across dimensional space
  • Validated System: Comprehensive experiments (EXP-01 through EXP-10) validate uniqueness, efficiency, and narrative preservation

Production-Ready API

  • FastAPI Service: High-performance async API with concurrent query support
  • CLI Tools: Command-line interface for queries, ingestion, and management
  • HuggingFace Integration: Direct ingestion from HF datasets
  • Docker Support: Containerized deployment ready

πŸ“š Data Sources

The Warbler system is trained on carefully curated, MIT-licensed datasets from HuggingFace:

Original Warbler Packs

  • warbler-pack-core - Core narrative and reasoning patterns
  • warbler-pack-wisdom-scrolls - Philosophical and wisdom-based content
  • warbler-pack-faction-politics - Political and faction dynamics

HuggingFace Datasets

  • arXiv Papers (nick007x/arxiv-papers) - 2.5M+ scholarly papers covering scientific domains
    • Due to space limits, we only ingest 100k of these documents for use on HuggingFace Spaces.
  • Prompt Engineering Report (PromptSystematicReview/ThePromptReport) - 83 comprehensive prompt documentation entries
    • Currently unavailable due to same reasons above.
  • Generated Novels (GOAT-AI/generated-novels) - 20 narrative-rich novels for storytelling patterns
    • Currently unavailable due to same reasons above.
  • Technical Manuals (nlasso/anac-manuals-23) - 52 procedural and operational documents
    • Currently unavailable due to same reasons above.
  • ChatEnv Enterprise (SustcZhangYX/ChatEnv) - 112K+ software development conversations
    • Currently unavailable due to same reasons above.
  • Portuguese Education (Solshine/Portuguese_Language_Education_Texts) - 21 multilingual educational texts
    • Currently unavailable due to same reasons above.
  • Educational Stories (MU-NLPC/Edustories-en) - 1.5K+ case studies and learning narratives

All datasets are provided under MIT or compatible licenses. For complete attribution, see the HuggingFace Hub pages listed above.

πŸ“¦ Installation

From Source (Current Method)

git clone https://github.com/tiny-walnut-games/the-seed.git
cd the-seed/warbler-cda-package
pip install -e .

Optional Dependencies

# OpenAI embeddings integration
pip install openai

# Development tools
pip install pytest pytest-cov

πŸš€ Quick Start

Option 1: Direct Python (Easiest)

cd warbler-cda-package

# Start the API with automatic pack loading
./run_api.ps1

# Or on Linux/Mac:
python start_server.py

The API automatically loads all Warbler packs on startup and serves them at http://localhost:8000

Option 2: Docker Compose

cd warbler-cda-package
docker-compose up --build

Option 3: Kubernetes

cd warbler-cda-package/k8s
./demo-docker-k8s.sh  # Full auto-deploy

πŸ“‘ API Usage Examples

Using the REST API

# Start the API first: ./run_api.ps1
# Then test with:

# Health check
curl http://localhost:8000/health

# Semantic search (plain English queries)
curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{
    "query_id": "semantic1",
    "semantic_query": "dancing under the moon",
    "max_results": 5
  }'

# FractalStat hybrid search (technical/science with dimensional awareness)
curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{
    "query_id": "hybrid1",
    "semantic_query": "interplanetary approach maneuvers",
    "fractalstat_hybrid": true,
    "max_results": 5
  }'

# Get metrics
curl http://localhost:8000/metrics

Understanding Search Modes

The system provides two search approaches with intelligent fallback:

Semantic Search (Default)

  • Use for: Plain English queries, casual search, general questions
  • Behavior: Pure semantic similarity matching
  • Examples: "How does gravity work?", "tell me about dancing", "operating a spaceship"
  • Results: Always returns matches when available, best for natural language

FractalStat Hybrid Search

  • Use for: Technical/scientific queries, specific terminology, multi-dimensional search
  • Behavior: Combines semantic similarity with 8D FractalStat resonance
  • Examples: "rotation dynamics of Saturn's moons", "quantum chromodynamics", "interplanetary approach maneuvers"
  • Results: Superior for technical content, may filter out general results
  • Fallback: Automatically switches to semantic search if hybrid returns no results

Pro Tip: When hybrid search fails (threshold below 0.3), the system automatically falls back to semantic search, ensuring you always get relevant results.

Using Python Programmatically

import requests

# Health check
response = requests.get("http://localhost:8000/health")
print(f"API Status: {response.json()['status']}")

# Query
query_data = {
    "query_id": "python_test",
    "semantic_query": "rotation dynamics of Saturn's moons",
    "max_results": 5,
    "fractalstat_hybrid": True
}

results = requests.post("http://localhost:8000/query", json=query_data).json()
print(f"Found {len(results['results'])} results")

# Show top result
if results['results']:
    top_result = results['results'][0]
    print(f"Top score: {top_result['relevance_score']:.3f}")
    print(f"Content: {top_result['content'][:100]}...")

FractalStat Hybrid Scoring

from warbler_cda import FractalStatRAGBridge

# Enable FractalStat hybrid scoring
fractalstat_bridge = FractalStatRAGBridge()
api = RetrievalAPI(
    semantic_anchors=semantic_anchors,
    embedding_provider=embedding_provider,
    fractalstat_bridge=fractalstat_bridge,
    config={"enable_fractalstat_hybrid": True}
)

# Query with hybrid scoring
from warbler_cda import RetrievalQuery, RetrievalMode

query = RetrievalQuery(
    query_id="hybrid_query_1",
    mode=RetrievalMode.SEMANTIC_SIMILARITY,
    semantic_query="Find wisdom about resilience",
    fractalstat_hybrid=True,
    weight_semantic=0.6,
    weight_fractalstat=0.4
)

assembly = api.retrieve_context(query)
print(f"Found {len(assembly.results)} results with quality {assembly.assembly_quality:.3f}")

Running the API Service

# Start the FastAPI service
uvicorn warbler_cda.api.service:app --host 0.0.0.0 --port 8000

# Or use the CLI
warbler-api --port 8000

Using the CLI

# Query the API
warbler-cli query --query-id q1 --semantic "wisdom about courage" --max-results 10

# Enable hybrid scoring
warbler-cli query --query-id q2 --semantic "narrative patterns" --hybrid

# Bulk concurrent queries
warbler-cli bulk --num-queries 10 --concurrency 5 --hybrid

# Check metrics
warbler-cli metrics

πŸ“Š FractalStat Experiments

The system includes validated experiments demonstrating:

  • EXP-01: Address uniqueness (0% collision rate across 10K+ entities)
  • EXP-02: Retrieval efficiency (sub-millisecond at 100K scale)
  • EXP-03: Dimension necessity (all 7 dimensions required)
  • EXP-10: Narrative preservation under concurrent load
from warbler_cda import run_all_experiments

# Run validation experiments
results = run_all_experiments(
    exp01_samples=1000,
    exp01_iterations=10,
    exp02_queries=1000,
    exp03_samples=1000
)

print(f"EXP-01 Success: {results['EXP-01']['success']}")
print(f"EXP-02 Success: {results['EXP-02']['success']}")
print(f"EXP-03 Success: {results['EXP-03']['success']}")

🎯 Use Cases

1. Intelligent Document Retrieval

# Add documents from various sources
for doc in documents:
    api.add_document(
        doc_id=doc["id"],
        content=doc["text"],
        metadata={
            "realm_type": "knowledge",
            "realm_label": "technical_docs",
            "lifecycle_stage": "emergence"
        }
    )

# Retrieve with context awareness
results = api.query_semantic_anchors("How to optimize performance?")

2. Narrative Coherence Analysis

from warbler_cda import ConflictDetector

conflict_detector = ConflictDetector(embedding_provider=embedding_provider)

# Process statements
statements = [
    {"id": "s1", "text": "The system is fast"},
    {"id": "s2", "text": "The system is slow"}
]

report = conflict_detector.process_statements(statements)
print(f"Conflicts detected: {report['conflict_summary']}")

3. HuggingFace Dataset Ingestion

from warbler_cda.utils import HFWarblerIngestor

ingestor = HFWarblerIngestor()

# Transform HF dataset to Warbler format
docs = ingestor.transform_npc_dialogue("amaydle/npc-dialogue")

# Create pack
pack_path = ingestor.create_warbler_pack(docs, "warbler-pack-npc-dialogue")

πŸ—οΈ Architecture

warbler_cda/
β”œβ”€β”€ retrieval_api.py          # Main RAG API
β”œβ”€β”€ semantic_anchors.py        # Semantic memory system
β”œβ”€β”€ anchor_data_classes.py     # Core data structures
β”œβ”€β”€ anchor_memory_pool.py      # Performance optimization
β”œβ”€β”€ summarization_ladder.py    # Hierarchical compression
β”œβ”€β”€ conflict_detector.py       # Conflict detection
β”œβ”€β”€ castle_graph.py            # Concept extraction
β”œβ”€β”€ melt_layer.py              # Memory consolidation
β”œβ”€β”€ evaporation.py             # Content distillation
β”œβ”€β”€ fractalstat_rag_bridge.py        # FractalStat hybrid scoring
β”œβ”€β”€ fractalstat_entity.py            # FractalStat entity system
β”œβ”€β”€ fractalstat_experiments.py       # Validation experiments
β”œβ”€β”€ embeddings/                # Embedding providers
β”‚   β”œβ”€β”€ base_provider.py
β”‚   β”œβ”€β”€ local_provider.py
β”‚   β”œβ”€β”€ openai_provider.py
β”‚   └── factory.py
β”œβ”€β”€ api/                       # Production API
β”‚   β”œβ”€β”€ service.py             # FastAPI service
β”‚   └── cli.py                 # CLI interface
└── utils/                     # Utilities
    β”œβ”€β”€ load_warbler_packs.py
    └── hf_warbler_ingest.py

πŸ”¬ Technical Details

FractalStat Dimensions

  1. Realm: Domain classification (type + label)
  2. Lineage: Generation/version number
  3. Adjacency: Graph connectivity (0.0-1.0)
  4. Horizon: Lifecycle stage (logline, outline, scene, panel)
  5. Luminosity: Clarity/activity level (0.0-1.0)
  6. Polarity: Resonance/tension (0.0-1.0)
  7. Dimensionality: Complexity/thread count (1-7)

Hybrid Scoring Formula

hybrid_score = (weight_semantic Γ— semantic_similarity) + (weight_fractalstat Γ— fractalstat_resonance)

Where:

  • semantic_similarity: Cosine similarity of embeddings
  • fractalstat_resonance: Multi-dimensional alignment score
  • Default weights: 60% semantic, 40% FractalStat

πŸ“š Documentation

🀝 Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

πŸ“„ License

MIT License - see LICENSE for details.

πŸ™ Acknowledgments

  • Built on research from The Seed project
  • FractalStat addressing system inspired by multi-dimensional data structures
  • Semantic anchoring based on cognitive architecture principles

πŸ“ž Contact


Made with ❀️ by Tiny Walnut Games