Spaces:
Running
on
Zero
title: Warbler CDA FractalStat RAG
emoji: π¦
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 6.0.2
app_file: app.py
pinned: false
license: mit
short_description: RAG system with 8D FractalStat and 100k documents
tags:
- rag
- semantic-search
- retrieval
- fastapi
- fractalstat
thumbnail: >-
https://cdn-uploads.huggingface.co/production/uploads/68c705b6fc90bcc7a4f56721/8G2TJJT8enAFaBLJGTXka.png
Warbler CDA - Cognitive Development Architecture RAG System
A production-ready RAG (Retrieval-Augmented Generation) system with FractalStat multi-dimensional addressing for intelligent document retrieval, semantic memory, and automatic data ingestion.
π Features
Core RAG System
- Semantic Anchors: Persistent memory with provenance tracking
- Hierarchical Summarization: Micro/macro distillation for efficient compression
- Conflict Detection: Automatic detection and resolution of contradictory information
- Memory Pooling: Performance-optimized object pooling for high-throughput scenarios
FractalStat Multi-Dimensional Addressing
- 8-Dimensional Coordinates: Realm, Lineage, Adjacency, Horizon, Luminosity, Polarity, Dimensionality, Alignment
- Hybrid Scoring: Combines semantic similarity with FractalStat resonance for superior retrieval
- Entanglement Detection: Identifies relationships across dimensional space
- Validated System: Comprehensive experiments (EXP-01 through EXP-10) validate uniqueness, efficiency, and narrative preservation
Production-Ready API
- FastAPI Service: High-performance async API with concurrent query support
- CLI Tools: Command-line interface for queries, ingestion, and management
- HuggingFace Integration: Direct ingestion from HF datasets
- Docker Support: Containerized deployment ready
π Data Sources
The Warbler system is trained on carefully curated, MIT-licensed datasets from HuggingFace:
Original Warbler Packs
warbler-pack-core- Core narrative and reasoning patternswarbler-pack-wisdom-scrolls- Philosophical and wisdom-based contentwarbler-pack-faction-politics- Political and faction dynamics
HuggingFace Datasets
- arXiv Papers (
nick007x/arxiv-papers) - 2.5M+ scholarly papers covering scientific domains- Due to space limits, we only ingest 100k of these documents for use on HuggingFace Spaces.
- Prompt Engineering Report (
PromptSystematicReview/ThePromptReport) - 83 comprehensive prompt documentation entries- Currently unavailable due to same reasons above.
- Generated Novels (
GOAT-AI/generated-novels) - 20 narrative-rich novels for storytelling patterns- Currently unavailable due to same reasons above.
- Technical Manuals (
nlasso/anac-manuals-23) - 52 procedural and operational documents- Currently unavailable due to same reasons above.
- ChatEnv Enterprise (
SustcZhangYX/ChatEnv) - 112K+ software development conversations- Currently unavailable due to same reasons above.
- Portuguese Education (
Solshine/Portuguese_Language_Education_Texts) - 21 multilingual educational texts- Currently unavailable due to same reasons above.
- Educational Stories (
MU-NLPC/Edustories-en) - 1.5K+ case studies and learning narratives
All datasets are provided under MIT or compatible licenses. For complete attribution, see the HuggingFace Hub pages listed above.
π¦ Installation
From Source (Current Method)
git clone https://github.com/tiny-walnut-games/the-seed.git
cd the-seed/warbler-cda-package
pip install -e .
Optional Dependencies
# OpenAI embeddings integration
pip install openai
# Development tools
pip install pytest pytest-cov
π Quick Start
Option 1: Direct Python (Easiest)
cd warbler-cda-package
# Start the API with automatic pack loading
./run_api.ps1
# Or on Linux/Mac:
python start_server.py
The API automatically loads all Warbler packs on startup and serves them at http://localhost:8000
Option 2: Docker Compose
cd warbler-cda-package
docker-compose up --build
Option 3: Kubernetes
cd warbler-cda-package/k8s
./demo-docker-k8s.sh # Full auto-deploy
π‘ API Usage Examples
Using the REST API
# Start the API first: ./run_api.ps1
# Then test with:
# Health check
curl http://localhost:8000/health
# Semantic search (plain English queries)
curl -X POST http://localhost:8000/query \
-H "Content-Type: application/json" \
-d '{
"query_id": "semantic1",
"semantic_query": "dancing under the moon",
"max_results": 5
}'
# FractalStat hybrid search (technical/science with dimensional awareness)
curl -X POST http://localhost:8000/query \
-H "Content-Type: application/json" \
-d '{
"query_id": "hybrid1",
"semantic_query": "interplanetary approach maneuvers",
"fractalstat_hybrid": true,
"max_results": 5
}'
# Get metrics
curl http://localhost:8000/metrics
Understanding Search Modes
The system provides two search approaches with intelligent fallback:
Semantic Search (Default)
- Use for: Plain English queries, casual search, general questions
- Behavior: Pure semantic similarity matching
- Examples: "How does gravity work?", "tell me about dancing", "operating a spaceship"
- Results: Always returns matches when available, best for natural language
FractalStat Hybrid Search
- Use for: Technical/scientific queries, specific terminology, multi-dimensional search
- Behavior: Combines semantic similarity with 8D FractalStat resonance
- Examples: "rotation dynamics of Saturn's moons", "quantum chromodynamics", "interplanetary approach maneuvers"
- Results: Superior for technical content, may filter out general results
- Fallback: Automatically switches to semantic search if hybrid returns no results
Pro Tip: When hybrid search fails (threshold below 0.3), the system automatically falls back to semantic search, ensuring you always get relevant results.
Using Python Programmatically
import requests
# Health check
response = requests.get("http://localhost:8000/health")
print(f"API Status: {response.json()['status']}")
# Query
query_data = {
"query_id": "python_test",
"semantic_query": "rotation dynamics of Saturn's moons",
"max_results": 5,
"fractalstat_hybrid": True
}
results = requests.post("http://localhost:8000/query", json=query_data).json()
print(f"Found {len(results['results'])} results")
# Show top result
if results['results']:
top_result = results['results'][0]
print(f"Top score: {top_result['relevance_score']:.3f}")
print(f"Content: {top_result['content'][:100]}...")
FractalStat Hybrid Scoring
from warbler_cda import FractalStatRAGBridge
# Enable FractalStat hybrid scoring
fractalstat_bridge = FractalStatRAGBridge()
api = RetrievalAPI(
semantic_anchors=semantic_anchors,
embedding_provider=embedding_provider,
fractalstat_bridge=fractalstat_bridge,
config={"enable_fractalstat_hybrid": True}
)
# Query with hybrid scoring
from warbler_cda import RetrievalQuery, RetrievalMode
query = RetrievalQuery(
query_id="hybrid_query_1",
mode=RetrievalMode.SEMANTIC_SIMILARITY,
semantic_query="Find wisdom about resilience",
fractalstat_hybrid=True,
weight_semantic=0.6,
weight_fractalstat=0.4
)
assembly = api.retrieve_context(query)
print(f"Found {len(assembly.results)} results with quality {assembly.assembly_quality:.3f}")
Running the API Service
# Start the FastAPI service
uvicorn warbler_cda.api.service:app --host 0.0.0.0 --port 8000
# Or use the CLI
warbler-api --port 8000
Using the CLI
# Query the API
warbler-cli query --query-id q1 --semantic "wisdom about courage" --max-results 10
# Enable hybrid scoring
warbler-cli query --query-id q2 --semantic "narrative patterns" --hybrid
# Bulk concurrent queries
warbler-cli bulk --num-queries 10 --concurrency 5 --hybrid
# Check metrics
warbler-cli metrics
π FractalStat Experiments
The system includes validated experiments demonstrating:
- EXP-01: Address uniqueness (0% collision rate across 10K+ entities)
- EXP-02: Retrieval efficiency (sub-millisecond at 100K scale)
- EXP-03: Dimension necessity (all 7 dimensions required)
- EXP-10: Narrative preservation under concurrent load
from warbler_cda import run_all_experiments
# Run validation experiments
results = run_all_experiments(
exp01_samples=1000,
exp01_iterations=10,
exp02_queries=1000,
exp03_samples=1000
)
print(f"EXP-01 Success: {results['EXP-01']['success']}")
print(f"EXP-02 Success: {results['EXP-02']['success']}")
print(f"EXP-03 Success: {results['EXP-03']['success']}")
π― Use Cases
1. Intelligent Document Retrieval
# Add documents from various sources
for doc in documents:
api.add_document(
doc_id=doc["id"],
content=doc["text"],
metadata={
"realm_type": "knowledge",
"realm_label": "technical_docs",
"lifecycle_stage": "emergence"
}
)
# Retrieve with context awareness
results = api.query_semantic_anchors("How to optimize performance?")
2. Narrative Coherence Analysis
from warbler_cda import ConflictDetector
conflict_detector = ConflictDetector(embedding_provider=embedding_provider)
# Process statements
statements = [
{"id": "s1", "text": "The system is fast"},
{"id": "s2", "text": "The system is slow"}
]
report = conflict_detector.process_statements(statements)
print(f"Conflicts detected: {report['conflict_summary']}")
3. HuggingFace Dataset Ingestion
from warbler_cda.utils import HFWarblerIngestor
ingestor = HFWarblerIngestor()
# Transform HF dataset to Warbler format
docs = ingestor.transform_npc_dialogue("amaydle/npc-dialogue")
# Create pack
pack_path = ingestor.create_warbler_pack(docs, "warbler-pack-npc-dialogue")
ποΈ Architecture
warbler_cda/
βββ retrieval_api.py # Main RAG API
βββ semantic_anchors.py # Semantic memory system
βββ anchor_data_classes.py # Core data structures
βββ anchor_memory_pool.py # Performance optimization
βββ summarization_ladder.py # Hierarchical compression
βββ conflict_detector.py # Conflict detection
βββ castle_graph.py # Concept extraction
βββ melt_layer.py # Memory consolidation
βββ evaporation.py # Content distillation
βββ fractalstat_rag_bridge.py # FractalStat hybrid scoring
βββ fractalstat_entity.py # FractalStat entity system
βββ fractalstat_experiments.py # Validation experiments
βββ embeddings/ # Embedding providers
β βββ base_provider.py
β βββ local_provider.py
β βββ openai_provider.py
β βββ factory.py
βββ api/ # Production API
β βββ service.py # FastAPI service
β βββ cli.py # CLI interface
βββ utils/ # Utilities
βββ load_warbler_packs.py
βββ hf_warbler_ingest.py
π¬ Technical Details
FractalStat Dimensions
- Realm: Domain classification (type + label)
- Lineage: Generation/version number
- Adjacency: Graph connectivity (0.0-1.0)
- Horizon: Lifecycle stage (logline, outline, scene, panel)
- Luminosity: Clarity/activity level (0.0-1.0)
- Polarity: Resonance/tension (0.0-1.0)
- Dimensionality: Complexity/thread count (1-7)
Hybrid Scoring Formula
hybrid_score = (weight_semantic Γ semantic_similarity) + (weight_fractalstat Γ fractalstat_resonance)
Where:
semantic_similarity: Cosine similarity of embeddingsfractalstat_resonance: Multi-dimensional alignment score- Default weights: 60% semantic, 40% FractalStat
π Documentation
π€ Contributing
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
π License
MIT License - see LICENSE for details.
π Acknowledgments
- Built on research from The Seed project
- FractalStat addressing system inspired by multi-dimensional data structures
- Semantic anchoring based on cognitive architecture principles
π Contact
- Project: The Seed
- Issues: GitHub Issues
- Discussions: GitHub Discussions