Spaces:

Bellok
/

warbler-cda

Running on Zero

App Files Files Community

there-is-already-a-branch

by Bellok - opened 7 days ago

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+280

-1129182

This view is limited to 50 files because it contains too many changes. See the raw diff here.

Files changed (50) hide show

.gitignore +1 -1
PACKAGE_MANIFEST.md +0 -94
PACKS_DEPLOYMENT.md +0 -281
PACK_CACHING.md +0 -172
PACK_INGESTION_FIX.md +0 -209
PDF_INGESTION_INVESTIGATION.md +0 -325
README_HF.md +1 -1
TESTS_PORTED.md +0 -271
TEST_RESULTS.md +0 -211
TODO.md +0 -30
app.py +51 -15
compress_packs.py +0 -134
convert_to_jsonl.py +0 -37
copy_packs.sh +0 -45
coverage.xml +0 -0
final_fix.py +0 -28
fix_theme.py +0 -15
load_warbler_packs_current.txt +0 -259
package-lock.json +0 -861
package.json +0 -19
packs/warbler-pack-hf-arxiv/package.json +4 -4
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-001_compressed.jsonl +0 -0
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-002_compressed.jsonl +0 -0
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-003_compressed.jsonl +0 -0
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-004_compressed.jsonl +0 -0
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-005_compressed.jsonl +0 -0
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-006_compressed.jsonl +0 -0
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-007_compressed.jsonl +0 -0
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-008_compressed.jsonl +0 -0
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-009_compressed.jsonl +0 -0
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-010_compressed.jsonl +0 -0
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-011_compressed.jsonl +0 -0
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-012_compressed.jsonl +0 -0
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-013_compressed.jsonl +0 -0
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-014_compressed.jsonl +0 -0
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-015_compressed.jsonl +0 -0
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-016_compressed.jsonl +0 -0
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-017_compressed.jsonl +0 -0
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-018_compressed.jsonl +0 -0
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-019_compressed.jsonl +0 -0
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-020_compressed.jsonl +0 -0
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-021_compressed.jsonl +0 -0
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-022_compressed.jsonl +0 -0
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-023_compressed.jsonl +0 -0
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-024_compressed.jsonl +0 -0
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-025_compressed.jsonl +0 -0
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-026_compressed.jsonl +0 -0
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-027_compressed.jsonl +0 -0
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-028_compressed.jsonl +0 -0
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-029_compressed.jsonl +0 -0

.gitignore CHANGED Viewed

@@ -47,7 +47,7 @@ results/
 # HuggingFace language packs (downloaded on-demand)
 # Exclude all HF packs to keep deployment size under 1GB
-packs/warbler-pack-hf-arxiv/
 packs/warbler-pack-hf-enterprise/
 packs/warbler-pack-hf-edustories/
 packs/warbler-pack-hf-manuals/

 # HuggingFace language packs (downloaded on-demand)
 # Exclude all HF packs to keep deployment size under 1GB
+packs/warbler-pack-hf-arxiv/*chunk*.jsonl
 packs/warbler-pack-hf-enterprise/
 packs/warbler-pack-hf-edustories/
 packs/warbler-pack-hf-manuals/

PACKAGE_MANIFEST.md DELETED Viewed

@@ -1,94 +0,0 @@
-# Warbler CDA Package - Complete File List
-## Package Structure (21 core files + infrastructure)
-### Core RAG System (9 files)
-✓ warbler_cda/retrieval_api.py - Main RAG API with hybrid scoring
-✓ warbler_cda/semantic_anchors.py - Semantic memory with provenance
-✓ warbler_cda/anchor_data_classes.py - Core data structures
-✓ warbler_cda/anchor_memory_pool.py - Performance optimization
-✓ warbler_cda/summarization_ladder.py - Hierarchical compression
-✓ warbler_cda/conflict_detector.py - Conflict detection
-✓ warbler_cda/castle_graph.py - Concept extraction
-✓ warbler_cda/melt_layer.py - Memory consolidation
-✓ warbler_cda/evaporation.py - Content distillation
-### FractalStat System (4 files)
-✓ warbler_cda/fractalstat_rag_bridge.py - FractalStat hybrid scoring bridge
-✓ warbler_cda/fractalstat_entity.py - FractalStat entity system
-✓ warbler_cda/fractalstat_experiments.py - Validation experiments
-✓ warbler_cda/fractalstat_visualization.py - Visualization tools
-### Embeddings (4 files)
-✓ warbler_cda/embeddings/__init__.py
-✓ warbler_cda/embeddings/base_provider.py - Abstract interface
-✓ warbler_cda/embeddings/factory.py - Provider factory
-✓ warbler_cda/embeddings/local_provider.py - Local TF-IDF embeddings
-✓ warbler_cda/embeddings/openai_provider.py - OpenAI embeddings
-### Production API (2 files)
-✓ warbler_cda/api/__init__.py
-✓ warbler_cda/api/service.py - FastAPI service (exp09_api_service.py)
-✓ warbler_cda/api/cli.py - CLI interface (exp09_cli.py)
-### Utilities (2 files)
-✓ warbler_cda/utils/__init__.py
-✓ warbler_cda/utils/load_warbler_packs.py - Pack loader
-✓ warbler_cda/utils/hf_warbler_ingest.py - HF dataset ingestion
-### Infrastructure Files
-✓ warbler_cda/__init__.py - Package initialization
-✓ requirements.txt - Dependencies
-✓ pyproject.toml - Package metadata
-✓ README.md - Documentation
-✓ app.py - Gradio demo for HuggingFace
-✓ .gitignore - Git exclusions
-✓ LICENSE - MIT License
-✓ DEPLOYMENT.md - Deployment guide
-✓ README_HF.md - HuggingFace Space config
-✓ setup.sh - Quick setup script
-✓ transform_imports.sh - Import transformation script
-## Total Files: 32 files
-## Import Transformations Applied
-All imports have been transformed from:
-- `from seed.engine.X import Y` → `from warbler_cda.X import Y`
-- `from .X import Y` → `from warbler_cda.X import Y`
-Privacy hooks have been removed (not needed for HuggingFace deployment).
-## Size Estimate
-Total package size: ~500KB (source code only)
-With dependencies: ~2GB (includes PyTorch, Transformers, etc.)
-## Next Steps
-1. Test the package locally:
-   ```bash
-   cd warbler-cda-package
-   ./setup.sh
-   python app.py
-   ```
-2. Deploy to HuggingFace:
-   - Set HF_TOKEN in GitLab CI/CD variables
-   - Push to main or create a tag
-   - Pipeline will auto-sync to HuggingFace Space
-3. Publish to PyPI (optional):
-   ```bash
-   python -m build
-   twine upload dist/*
-   ```

PACKS_DEPLOYMENT.md DELETED Viewed

@@ -1,281 +0,0 @@
-# Warbler Packs Deployment Guide
-This guide explains how Warbler packs are loaded and deployed to HuggingFace Spaces.
-## Overview
-The Warbler CDA Space automatically discovers and ingests content packs at startup. Packs contain conversation templates, NPC dialogues, wisdom templates, and other domain-specific content for the RAG system.
-## Pack Structure
-```none
-packs/
-├── warbler-pack-core/              # Essential conversation templates
-├── warbler-pack-faction-politics/  # Political dialogue templates
-├── warbler-pack-wisdom-scrolls/    # Development wisdom generation
-└── warbler-pack-hf-npc-dialogue/   # 1,900+ NPC dialogues from HuggingFace
-```
-## Deployment Process
-### 1. Local Development
-Copy packs from the main repository to warbler-cda-package:
-```bash
-cd warbler-cda-package
-bash copy_packs.sh
-```
-This script copies all packs from:
-```path
-../packages/com.twg.the-seed/The Living Dev Agent/packs/
-```
-To:
-```path
-./packs/
-```
-### 2. Automatic Loading
-When `app.py` starts, it:
-1. **Initializes PackLoader**
-   ```python
-   pack_loader = PackLoader()
-   ```
-2. **Discovers documents from all packs**
-   ```python
-   pack_docs = pack_loader.discover_documents()
-   ```
-3. **Ingests documents into RetrievalAPI**
-   ```python
-   for doc in pack_docs:
-       api.add_document(doc["id"], doc["content"], doc["metadata"])
-   ```
-4. **Falls back to sample documents** if packs not found
-   - Ensures demo works even without packs
-   - Provides example data for testing
-### 3. HuggingFace Space Deployment
-The `.gitlab-ci.yml` handles deployment:
-```bash
-hf upload-large-folder $SPACE_NAME . --repo-type=space --space-sdk=gradio
-```
-This uploads:
-- All Python source code
-- All packs in the `packs/` directory
-- Configuration files
-**Important**: The `packs/` directory must exist and contain pack data before deployment.
-## Pack Loader Details
-The `PackLoader` class (`warbler_cda/pack_loader.py`) handles:
-### Pack Discovery
-- Scans the `packs/` directory
-- Identifies pack type (JSONL-based or structured)
-- Discovers all documents
-### Document Parsing
-- **Structured Packs** (core, faction, wisdom): Load from `pack/templates.json`
-- **JSONL Packs** (HF NPC dialogue): Parse line-by-line JSONL format
-### Metadata Extraction
-```python
-{
-    "pack": "pack-name",
-    "type": "template|dialogue",
-    "realm_type": "wisdom|faction|narrative",
-    "realm_label": "pack-label",
-    "lifecycle_stage": "emergence|peak",
-    "activity_level": 0.7-0.8
-}
-```
-## Adding New Packs
-To add a new pack to the system:
-### 1. Create Pack Structure
-```bash
-packs/
-└── warbler-pack-mypack/
-    ├── package.json
-    ├── pack/
-    │   └── templates.json  # OR
-    └── mypack.jsonl        # JSONL format
-```
-### 2. Update Pack Loader (if needed)
-If your pack format is different, add handling to `pack_loader.py`:
-```python
-def _load_pack(self, pack_dir: Path, pack_name: str):
-    if "mypack" in pack_name:
-        return self._load_my_format(pack_dir, pack_name)
-    # ... existing logic
-```
-### 3. Register in copy_packs.sh
-```bash
-PACKS=(
-    "warbler-pack-core"
-    "warbler-pack-mypack"  # Add here
-)
-```
-### 4. Deploy
-Run copy script and deploy:
-```bash
-bash copy_packs.sh
-# Commit and push to trigger CI/CD
-```
-## Document Format
-Each loaded document follows this structure:
-```python
-{
-    "id": "pack-name/document-id",
-    "content": "Document text content...",
-    "metadata": {
-        "pack": "pack-name",
-        "type": "template|dialogue",
-        "realm_type": "wisdom|faction|narrative",
-        "realm_label": "label",
-        "lifecycle_stage": "emergence|peak|crystallization",
-        "activity_level": 0.5-0.8
-    }
-}
-```
-## Monitoring
-Check pack loading in Space logs:
-```log
-✓ Loaded 1915 documents from warbler-pack-hf-npc-dialogue
-✓ Loaded 6 documents from warbler-pack-wisdom-scrolls
-✓ Loaded 15 documents from warbler-pack-faction-politics
-✓ Loaded 10 documents from warbler-pack-core
-```
-Or if packs not found:
-```log
-⚠️  No Warbler packs found. Using sample documents instead.
-```
-## Publishing to HuggingFace Hub
-Each pack has a dataset card for publication:
-- **README_HF_DATASET.md** - HuggingFace dataset card
-- Contains metadata, attribution, and usage instructions
-Publish to HuggingFace:
-```bash
-# Create repo on HuggingFace Hub (one per pack)
-huggingface-cli repo create warbler-pack-core
-# Push pack as dataset
-cd packs/warbler-pack-core
-huggingface-cli upload . tiny-walnut-games/warbler-pack-core --repo-type dataset
-```
-## Performance Considerations
-### Load Time
-- PackLoader loads all packs at startup
-- Currently: ~1-2 seconds for all packs
-- Packs are cached in memory for query performance
-### Storage
-- Core pack: ~50KB
-- Faction politics pack: ~80KB
-- Wisdom scrolls pack: ~60KB
-- HF NPC dialogue: ~2MB
-- **Total**: ~2.3MB
-### Scaling
-For larger deployments:
-- Lazy-load individual packs on demand
-- Implement pack caching layer
-- Use database for large pack collections
-## Troubleshooting
-### Packs not loading
-Check that `packs/` directory exists:
-```bash
-ls -la packs/
-```
-Verify pack structure:
-```bash
-ls -la packs/warbler-pack-core/
-```
-### Sample documents showing instead
-If you see "No Warbler packs found", the `packs/` directory is empty. Run:
-```bash
-bash copy_packs.sh
-```
-### Pack loader errors
-Check logs for parsing errors:
-```log
-Error loading JSONL pack: ...
-Error parsing line 42 in warbler-pack-hf-npc-dialogue.jsonl: ...
-```
-Fix the source pack and re-run `copy_packs.sh`.
-## Related Documentation
-- [README.md](./README.md) - Main package documentation
-- [DEPLOYMENT.md](./DEPLOYMENT.md) - General deployment guide
-- [app.py](./app.py) - Application startup and pack initialization
-- [warbler_cda/pack_loader.py](./warbler_cda/pack_loader.py) - Pack loading implementation
-## License
-All packs use MIT License. See individual pack LICENSE files for details.
-Attribution: Warbler CDA - Tiny Walnut Games

PACK_CACHING.md DELETED Viewed

@@ -1,172 +0,0 @@
-# Warbler Pack Caching Strategy
-## Overview
-The app now implements intelligent pack caching to avoid unnecessary re-ingestion of large datasets. This minimizes GitLab storage requirements and allows fast session startup.
-## How It Works
-### First Run (Session Start)
-1. **PackManager** initializes and checks for cached metadata
-2. **Health check** verifies if documents are already in the context store
-3. **Ingestion** occurs only if:
-   - No cache metadata exists
-   - Pack count changed
-   - Health check fails (documents missing)
-4. **Cache** is saved with timestamp and document count
-### Subsequent Runs
-- Reuses cached documents without re-ingestion
-- Quick health check ensures documents are still valid
-- Fallback to sample docs if packs unavailable
-## Environment Variables
-Control pack ingestion behavior with these variables:
-### `WARBLER_INGEST_PACKS` (default: `true`)
-Enable/disable automatic pack ingestion.
-```bash
-export WARBLER_INGEST_PACKS=false
-```
-### `WARBLER_SAMPLE_ONLY` (default: `false`)
-Load only sample documents (for CI/CD verification).
-```bash
-export WARBLER_SAMPLE_ONLY=true
-```
-Best for:
-- PyPI package CI/CD pipelines
-- Quick verification that ingestion works
-- Minimal startup time in restricted environments
-### `WARBLER_SKIP_PACK_CACHE` (default: `false`)
-Force reingest even if cache exists.
-```bash
-export WARBLER_SKIP_PACK_CACHE=true
-```
-Best for:
-- Testing pack ingestion pipeline
-- Updating stale cache
-- Debugging
-## Cache Location
-Default cache stored at:
-```path
-~/.warbler_cda/cache/pack_metadata.json
-```
-Metadata includes:
-```json
-{
-  "ingested_at": 1699564800,
-  "pack_count": 7,
-  "doc_count": 12345,
-  "status": "healthy"
-}
-```
-## CI/CD Optimization
-### For GitLab CI (Minimal PyPI Package)
-```yaml
-test:
-  script:
-    - export WARBLER_SAMPLE_ONLY=true
-    - pip install .
-    - python -m pytest tests/
-```
-Benefits:
-- ✅ No large pack files in repository
-- ✅ Fast CI runs (5 samples vs 2.5M docs)
-- ✅ Verifies ingestion code works
-- ✅ Full packs load on first user session
-### For Local Development
-Keep full packs in working directory:
-```bash
-cd warbler-cda-package
-python -m warbler_cda.utils.hf_warbler_ingest ingest -d all
-python app.py
-```
-First run ingests all packs. Subsequent runs use cache.
-### For Gradio Space/Cloud Deployment
-Set environment at deployment:
-```bash
-WARBLER_INGEST_PACKS=true
-```
-Packs ingest once per session, then cached in instance memory.
-## Files Affected
-- `app.py` - Main Gradio app with PackManager
-- `warbler_cda/utils/load_warbler_packs.py` - Pack discovery (already handles caching)
-- No changes needed to pack ingestion scripts
-## Performance Impact
-### Memory
-- **With packs**: ~500MB (2.5M arxiv docs + others)
-- **With samples**: ~1MB (5 test documents)
-### Startup Time
-- **First run**: ~30-60 seconds (ingest packs)
-- **Cached run**: ~2-5 seconds (health check only)
-- **Sample only**: <1 second
-## Troubleshooting
-### Packs not loading?
-1. Check `WARBLER_INGEST_PACKS=true` (default)
-2. Verify packs exist: `ls -la packs/`
-3. Force reingest: `export WARBLER_SKIP_PACK_CACHE=true`
-### Cache corrupted?
-```bash
-rm -rf ~/.warbler_cda/cache/pack_metadata.json
-```
-Will reingest on next run.
-### Need sample docs only?
-```bash
-export WARBLER_SAMPLE_ONLY=true
-python app.py
-```
-## Future Improvements
-- [ ] Detect pack updates via file hash instead of just count
-- [ ] Selective pack loading (choose which datasets to cache)
-- [ ] Metrics dashboard showing cache hit/miss rates
-- [ ] Automatic cache expiration after N days

PACK_INGESTION_FIX.md DELETED Viewed

@@ -1,209 +0,0 @@
-# Pack Ingestion Fix for HuggingFace Space
-## Problem Summary
-Your HuggingFace Space was experiencing three critical errors during pack ingestion:
-1. ❌ **Core pack missing JSONL**: `warbler-pack-core missing JSONL file`
-2. ❌ **Faction pack missing JSONL**: `warbler-pack-faction-politics missing JSONL file`
-3. ❌ **Corrupted arxiv data**: `Error parsing line 145077 in warbler-pack-hf-arxiv.jsonl: Unterminated string`
-## Root Causes Identified
-### Issue 1 & 2: Different Pack Formats
-Your project has **two different pack formats**:
-**Format A: Structured Packs** (Core & Faction)
-```none
-warbler-pack-core/
-├── package.json
-├── pack/
-│   └── templates.json    ← Data is here!
-└── src/
-```
-**Format B: JSONL Packs** (HuggingFace datasets)
-```none
-warbler-pack-hf-arxiv/
-├── package.json
-└── warbler-pack-hf-arxiv-chunk-001.jsonl  ← Data is here!
-```
-The pack loader was expecting **all** packs to have JSONL files, causing false warnings for the structured packs.
-### Issue 3: Corrupted JSON Line
-The arxiv pack has a malformed JSON entry at line 145077:
-```json
-{"content": "This is a test with an unterminated string...
-```
-The previous code would **crash** on the first error, preventing the entire ingestion from completing.
-## Solution Implemented
-### 1. Enhanced Pack Format Detection
-Updated `_is_valid_warbler_pack()` to recognize **three valid formats**:
-```python
-if jsonl_file.exists():
-    return True  # Format B: Single JSONL file
-else:
-    templates_file = pack_dir / "pack" / "templates.json"
-    if templates_file.exists():
-        return False  # Format A: Structured pack (triggers different loader)
-    else:
-        if pack_name.startswith("warbler-pack-hf-"):
-            logger.warning(f"HF pack missing JSONL")  # Only warn for HF packs
-        return False
-```
-### 2. Robust Error Handling
-Updated `_load_jsonl_file()` to **continue on error**:
-```python
-try:
-    entry = json.loads(line)
-    documents.append(doc)
-except json.JSONDecodeError as e:
-    error_count += 1
-    if error_count <= 5:  # Only log first 5 errors
-        logger.warning(f"Error parsing line {line_num}: {e}")
-    continue  # ← Skip bad line, keep processing!
-```
-## What Changed
-**File: `warbler-cda-package/warbler_cda/pack_loader.py`**
-### Change 1: Smarter Validation
-- ✅ Recognizes structured packs as valid
-- ✅ Only warns about missing JSONL for HF packs
-- ✅ Better logging messages
-### Change 2: Error Recovery
-- ✅ Skips corrupted JSON lines
-- ✅ Limits error logging to first 5 occurrences
-- ✅ Reports summary: "Loaded X documents (Y lines skipped)"
-## Expected Behavior After Fix
-### Before (Broken)
-```none
-[INFO] Pack Status: ✓ All 6 packs verified and ready
-Single-file pack warbler-pack-core missing JSONL file: /home/user/app/packs/warbler-pack-core/warbler-pack-core.jsonl
-Single-file pack warbler-pack-faction-politics missing JSONL file: /home/user/app/packs/warbler-pack-faction-politics/warbler-pack-faction-politics.jsonl
-Error parsing line 145077 in /home/user/app/packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv.jsonl: Unterminated string
-[INFO] Ingesting 374869 documents from Warbler packs...
-[ERROR] Ingestion failed!
-```
-### After (Fixed)
-```none
-[INFO] Pack Status: ✓ All 10 packs verified and ready
-[INFO] Ingesting documents from Warbler packs...
-[INFO] Loading pack: warbler-pack-core
-[DEBUG] Pack warbler-pack-core uses structured format (pack/templates.json)
-[INFO] ✓ Loaded 8 documents from warbler-pack-core
-[INFO] Loading pack: warbler-pack-faction-politics
-[DEBUG] Pack warbler-pack-faction-politics uses structured format (pack/templates.json)
-[INFO] ✓ Loaded 6 documents from warbler-pack-faction-politics
-[INFO] Loading pack: warbler-pack-hf-arxiv
-[INFO] Loading chunked pack: warbler-pack-hf-arxiv
-[INFO] Found 5 chunk files for warbler-pack-hf-arxiv
-[WARN] Error parsing line 145077 in warbler-pack-hf-arxiv-chunk-003.jsonl: Unterminated string
-[INFO] Loaded 49999 documents from warbler-pack-hf-arxiv-chunk-003.jsonl (1 lines skipped due to errors)
-[INFO] Loaded 250000 total documents from 5 chunks
-...
-[OK] Loaded 374868 documents from Warbler packs (1 corrupted line skipped)
-```
-## Testing the Fix
-### Local Testing
-1. **Test with sample packs**:
-    ```bash
-    cd warbler-cda-package
-    python -c "from warbler_cda.pack_loader import PackLoader; loader = PackLoader(); docs = loader.discover_documents(); print(f'Loaded {len(docs)} documents')"
-    ```
-2. **Run the app locally**:
-```bash
-python app.py
-```
-### HuggingFace Space Testing
-1. **Merge this MR** to main branch
-2. **Push to HuggingFace** (if auto-sync is not enabled)
-3. **Check the Space logs** for the new output format
-4. **Verify document count** in the System Stats tab
-## Next Steps
-1. ✅ **Review the MR**: [!15 - Fix HuggingFace pack ingestion issues](https://gitlab.com/tiny-walnut-games/the-seed/-/merge_requests/15)
-2. ✅ **Merge when ready**: The fix is backward compatible and safe to merge
-3. ✅ **Monitor HF Space**: After deployment, check that:
-   - All packs load successfully
-   - Document count is ~374,868 (minus 1 corrupted line)
-   - No error messages in logs
-4. 🔧 **Optional: Fix corrupted line** (future improvement):
-   - Identify the exact corrupted entry in arxiv chunk 3
-   - Re-generate that chunk from source dataset
-   - Update the pack
-## Additional Notes
-### Why Not Fix the Corrupted Line Now?
-The corrupted line is likely from the source HuggingFace dataset (`nick007x/arxiv-papers`). Options:
-1. **Skip it** (current solution) - Loses 1 document out of 2.5M
-2. **Re-ingest** - Download and re-process the entire arxiv dataset
-3. **Manual fix** - Find and repair the specific line
-For now, **skipping is the pragmatic choice** - you lose 0.00004% of data and gain a working system.
-### Pack Format Standardization
-Consider standardizing all packs to JSONL format in the future:
-```bash
-# Convert structured packs to JSONL
-python -m warbler_cda.utils.convert_structured_to_jsonl \
-  --input packs/warbler-pack-core/pack/templates.json \
-  --output packs/warbler-pack-core/warbler-pack-core.jsonl
-```
-This would simplify the loader logic and make all packs consistent.
-## Questions?
-If you encounter any issues:
-1. Check the HF Space logs for detailed error messages
-2. Verify pack structure matches expected formats
-3. Test locally with `PackLoader().discover_documents()`
-4. Review this document for troubleshooting tips
----
-**Status**: ✅ Fix implemented and ready for merge
-**MR**: !15
-**Impact**: Fixes all 3 ingestion errors, enables full pack loading

PDF_INGESTION_INVESTIGATION.md DELETED Viewed

@@ -1,325 +0,0 @@
-# PDF Ingestion Investigation Report
-**Date**: 2024
-**Session Reference**: Based on agent session 1251355
-**Investigator**: AI Agent
-## Executive Summary
-Investigation into the warbler-cda-package ingesters to determine if they are properly utilizing PDFPlumber for reading PDF files. The investigation revealed that **PDFPlumber IS being utilized**, but there were **two bugs** that needed fixing.
-## Key Findings
-### ✅ PDFPlumber Integration Status: CONFIRMED
-The ingesters **ARE** utilizing PDFPlumber to read PDF files. The implementation is present and functional with proper fallback mechanisms.
-### 📍 PDFPlumber Usage Locations
-#### 1. **Import and Availability Check** (Lines 23-27)
-```python
-try:
-    import pdfplumber
-    PDF_AVAILABLE = True
-except ImportError:
-    PDF_AVAILABLE = False
-```
-**Status**: ✅ Properly implemented with graceful fallback
-#### 2. **PDF Support Detection Method** (Lines 47-49)
-```python
-def has_pdf_support(self) -> bool:
-    """Check if PDF extraction is available"""
-    return PDF_AVAILABLE
-```
-**Status**: ✅ Provides runtime check for PDF capabilities
-#### 3. **Primary PDF Extraction Method** (Lines 51-67)
-```python
-def extract_pdf_text(self, pdf_bytes: bytes, max_chars: int = 5000) -> Optional[str]:
-    """Extract text from PDF bytes with fallback"""
-    if not PDF_AVAILABLE:
-        return None
-    try:
-        pdf_file = io.BytesIO(pdf_bytes)
-        text_parts = []
-        with pdfplumber.open(pdf_file) as pdf:
-            for page in pdf.pages:
-                text = page.extract_text()
-                if text:
-                    text_parts.append(text)
-                    if sum(len(t) for t in text_parts) > max_chars:
-                        break
-        return " ".join(text_parts)[:max_chars] if text_parts else None
-    except Exception as e:
-        logger.debug(f"PDF extraction error: {e}")
-        return None
-```
-**Status**: ✅ Properly implemented with:
-- Character limit protection (max_chars=5000)
-- Page-by-page extraction
-- Error handling
-- Graceful fallback
-#### 4. **Flexible PDF Extraction Method** (Lines 540-565)
-```python
-def _extract_pdf_text(self, pdf_data: Any) -> Optional[str]:
-    """Extract text from PDF data (bytes, file path, or file-like object)"""
-    if not PDF_AVAILABLE:  # ⚠️ FIXED: Was PDF_SUPPORT
-        return None
-    try:
-        # Handle different PDF data types
-        if isinstance(pdf_data, bytes):
-            pdf_file = io.BytesIO(pdf_data)
-        elif isinstance(pdf_data, str) and os.path.exists(pdf_data):
-            pdf_file = pdf_data
-        elif hasattr(pdf_data, 'read'):
-            pdf_file = pdf_data
-        else:
-            return None
-        # Extract text from all pages
-        text_parts = []
-        with pdfplumber.open(pdf_file) as pdf:
-            for page in pdf.pages:
-                page_text = page.extract_text()
-                if page_text:
-                    text_parts.append(page_text)
-        return "\n\n".join(text_parts) if text_parts else None
-    except Exception as e:
-        logger.debug(f"PDF extraction error: {e}")
-        return None
-```
-**Status**: ✅ Handles multiple input types (bytes, file path, file-like objects)
-### 🎯 Transformers Using PDF Extraction
-#### 1. **transform_novels()** (Lines 247-320)
-- **Dataset**: GOAT-AI/generated-novels
-- **PDF Usage**: Attempts to extract from PDF fields when text fields are unavailable
-- **Fallback**: Creates placeholder entries with informative messages
-- **Code Location**: Lines 285-295
-```python
-if not text and self.has_pdf_support():
-    for pdf_field in ['pdf', 'file', 'document']:
-        try:
-            if isinstance(item, dict):
-                if pdf_field in item and item[pdf_field]:
-                    text = self.extract_pdf_text(item[pdf_field])
-                    if text:
-                        logger.info(f"Novel {idx + 1}: Extracted {len(text)} chars from PDF")
-                        break
-```
-**Status**: ✅ Properly integrated with PDF extraction
-#### 2. **transform_portuguese_education()** (Lines 400-500+)
-- **Dataset**: Solshine/Portuguese_Language_Education_Texts
-- **PDF Usage**: Could potentially use PDF extraction (not explicitly shown in current code)
-- **Fallback**: Creates informative placeholders when content is unavailable
-**Status**: ✅ Has fallback mechanisms in place
-## 🐛 Bugs Found and Fixed
-### Bug #1: Incorrect Variable Name in `_extract_pdf_text()`
-**Location**: Line 542
-**Issue**: Used `PDF_SUPPORT` instead of `PDF_AVAILABLE`
-**Impact**: Would cause NameError when `_extract_pdf_text()` is called
-**Fix Applied**: Changed `PDF_SUPPORT` to `PDF_AVAILABLE`
-```diff
-- if not PDF_SUPPORT:
-+ if not PDF_AVAILABLE:
-```
-### Bug #2: Duplicate `import io` Statement
-**Location**: Line 56 (inside `extract_pdf_text` method)
-**Issue**: `import io` was inside the method instead of at module level
-**Impact**: Unnecessary repeated imports, potential performance impact
-**Fix Applied**:
-1. Added `import io` to module-level imports (Line 10)
-2. Removed duplicate `import io` from inside method
-```diff
-# At module level (Line 10)
-+ import io
-# Inside extract_pdf_text method (Line 56)
-- import io
-```
-## 📦 Dependency Configuration
-### requirements.txt
-```text
-pdfplumber>=0.11.0
-```
-**Status**: ✅ Properly listed as a dependency
-### pyproject.toml
-**Status**: ⚠️ NOT listed in core dependencies
-**Recommendation**: Consider adding to optional dependencies or core dependencies
-```toml
-[project.optional-dependencies]
-pdf = [
-    "pdfplumber>=0.11.0",
-]
-```
-## 🔍 How PDFPlumber is Actually Used
-### Workflow
-1. **Import Check**: On module load, attempts to import pdfplumber
-2. **Availability Flag**: Sets `PDF_AVAILABLE = True/False` based on import success
-3. **Runtime Check**: `has_pdf_support()` method checks availability
-4. **Extraction Attempt**: When processing datasets:
-   - First tries to find text in standard fields (text, story, content, etc.)
-   - If no text found AND `has_pdf_support()` returns True:
-     - Searches for PDF fields (pdf, file, document)
-     - Calls `extract_pdf_text()` to extract content
-     - Logs extraction success with character count
-5. **Graceful Fallback**: If PDF extraction fails or unavailable:
-   - Creates informative placeholder entries
-   - Includes metadata about PDF availability
-   - Maintains system functionality
-### Example from `transform_novels()`
-```python
-# Try text fields first
-for field in ['text', 'story', 'content', 'novel', 'body', 'full_text']:
-    if field in item and item[field]:
-        text = item[field]
-        break
-# If no text, try PDF extraction
-if not text and self.has_pdf_support():
-    for pdf_field in ['pdf', 'file', 'document']:
-        if pdf_field in item and item[pdf_field]:
-            text = self.extract_pdf_text(item[pdf_field])
-            if text:
-                logger.info(f"Novel {idx + 1}: Extracted {len(text)} chars from PDF")
-                break
-# If still no text, create placeholder
-if not text:
-    text = f"""[Novel Content Unavailable]
-This novel (#{idx + 1}) is part of the GOAT-AI/generated-novels dataset.
-The original content may be stored in PDF format or require special extraction.
-PDF extraction support: {'Available (install pdfplumber)' if not self.has_pdf_support() else 'Enabled'}
-"""
-```
-## 🎯 Tactical Assessment
-### Current Strategy: ✅ SOUND
-The current approach is **well-designed** and does NOT require changing tactics:
-1. **Graceful Degradation**: System works with or without pdfplumber
-2. **Multiple Fallbacks**: Tries text fields first, then PDF, then placeholders
-3. **Informative Placeholders**: When content unavailable, creates useful metadata
-4. **Proper Error Handling**: All PDF operations wrapped in try-except
-5. **Logging**: Provides visibility into extraction success/failure
-### Recommendations
-#### 1. **Keep Current Approach** ✅
-The multi-layered fallback strategy is excellent for production systems.
-#### 2. **Fix Applied Bugs** ✅
-- Fixed `PDF_SUPPORT` → `PDF_AVAILABLE` variable name
-- Fixed duplicate `import io` statement
-#### 3. **Optional Enhancement**: Add to pyproject.toml
-Consider adding pdfplumber to optional dependencies:
-```toml
-[project.optional-dependencies]
-pdf = [
-    "pdfplumber>=0.11.0",
-]
-```
-#### 4. **Documentation Enhancement**
-The code already has good inline documentation. Consider adding to README:
-- How to enable PDF support
-- What happens when PDF support is unavailable
-- Which datasets benefit from PDF extraction
-## 📊 Test Coverage
-The test suite (`test_pdf_ingestion.py`) covers:
-- ✅ PDF support detection
-- ✅ PDF extraction method existence
-- ✅ Placeholder creation
-- ✅ Novel dataset with PDF fields
-- ✅ Novel dataset with text fields
-- ✅ Portuguese education with PDF fields
-- ✅ Output format validation
-## 🎓 Conclusion
-**PDFPlumber IS being utilized properly** in the ingesters. The implementation:
-- ✅ Has proper import and availability checking
-- ✅ Provides two PDF extraction methods (simple and flexible)
-- ✅ Integrates PDF extraction into dataset transformers
-- ✅ Has comprehensive fallback mechanisms
-- ✅ Is well-tested
-- ✅ Is properly documented
-**Bugs Fixed**:
-1. Variable name typo: `PDF_SUPPORT` → `PDF_AVAILABLE`
-2. Duplicate import: Moved `import io` to module level
-**No tactical changes needed** - the current approach is sound and production-ready.
-## 📝 Files Modified
-1. `warbler-cda-package/warbler_cda/utils/hf_warbler_ingest.py`
-   - Fixed variable name in `_extract_pdf_text()` method
-   - Added `import io` to module-level imports
-   - Removed duplicate `import io` from method
-## 🔗 Related Files
-- `warbler-cda-package/requirements.txt` - Lists pdfplumber>=0.11.0
-- `warbler-cda-package/tests/test_pdf_ingestion.py` - Test suite for PDF functionality
-- `warbler-cda-package/pyproject.toml` - Package configuration (could add optional PDF dependency)

README_HF.md CHANGED Viewed

@@ -8,7 +8,7 @@ pinned: false
 license: mit
 ---
-# Warbler CDA - Cognitive Development Architecture
 A production-ready RAG system with **FractalStat 8D multi-dimensional addressing** for intelligent document retrieval.

 license: mit
 ---
+## Warbler CDA - Cognitive Development Architecture
 A production-ready RAG system with **FractalStat 8D multi-dimensional addressing** for intelligent document retrieval.

TESTS_PORTED.md DELETED Viewed

@@ -1,271 +0,0 @@
-# Tests Ported to Warbler CDA Package
-This document summarizes the TDD (Test-Driven Development) test suite that has been ported from the main project to the warbler-cda-package for HuggingFace deployment.
-## Overview
-The complete test suite for the Warbler CDA (Cognitive Development Architecture) RAG system has been ported and adapted for the standalone package. This includes:
-- **4 main test modules** with comprehensive coverage
-- **1 end-to-end integration test suite**
-- **Pytest configuration** with custom markers
-- **Test documentation** and running instructions
-## Test Files Ported
-### 1. **tests/test_embedding_providers.py** (9.5 KB)
-**Source**: Adapted from `packages/com.twg.the-seed/The Living Dev Agent/tests/test_semantic_anchors.py`
-**Coverage**:
-- EmbeddingProviderFactory pattern
-- LocalEmbeddingProvider (TF-IDF based)
-- SentenceTransformerEmbeddingProvider (GPU-accelerated)
-- Embedding generation (single and batch)
-- Similarity calculations
-- Provider information and metadata
-**Tests**:
-- `test_factory_creates_local_provider` - Factory can create local providers
-- `test_factory_list_available_providers` - Factory lists available providers
-- `test_factory_default_provider` - Factory defaults to SentenceTransformer with fallback
-- `test_embed_single_text` - Single text embedding
-- `test_embed_batch` - Batch embedding
-- `test_similarity_calculation` - Cosine similarity
-- `test_semantic_search` - K-nearest neighbor search
-- `test_stat7_computation` - STAT7 coordinate computation
-- And 8 more embedding-focused tests
-### 2. **tests/test_retrieval_api.py** (11.9 KB)
-**Source**: Adapted from `packages/com.twg.the-seed/seed/engine/test_retrieval_debug.py`
-**Coverage**:
-- Context store operations
-- Document addition and deduplication
-- Query execution and filtering
-- Retrieval modes (semantic, temporal, composite)
-- Confidence threshold filtering
-- Result structure validation
-- Caching and metrics
-**Tests**:
-- `TestRetrievalAPIContextStore` - 4 tests for document store
-- `TestRetrievalQueryExecution` - 5 tests for query operations
-- `TestRetrievalModes` - 3 tests for different retrieval modes
-- `TestRetrievalHybridScoring` - 2 tests for STAT7 hybrid scoring
-- `TestRetrievalMetrics` - 2 tests for metrics tracking
-- Total: 16+ tests
-### 3. **tests/test_stat7_integration.py** (12.3 KB)
-**Source**: Original implementation for STAT7 support
-**Coverage**:
-- STAT7 coordinate computation from embeddings
-- Hybrid semantic + STAT7 scoring
-- STAT7 resonance calculation
-- Document enrichment with STAT7 data
-- Multi-dimensional query addressing
-- STAT7 dimensional properties
-**Tests**:
-- `TestSTAT7CoordinateComputation` - 3 tests
-- `TestSTAT7HybridScoring` - 3 tests
-- `TestSTAT7DocumentEnrichment` - 2 tests
-- `TestSTAT7QueryAddressing` - 2 tests
-- `TestSTAT7Dimensions` - 2 tests
-- Total: 12+ tests
-### 4. **tests/test_rag_e2e.py** (12.6 KB)
-**Source**: Adapted from `packages/com.twg.the-seed/The Living Dev Agent/tests/test_exp08_rag_integration.py`
-**Coverage**:
-- Complete end-to-end RAG pipeline
-- Embedding generation validation
-- Document ingestion
-- Semantic search retrieval
-- Temporal retrieval
-- Metrics tracking
-- Full system integration
-**Tests**:
-1. `test_01_embedding_generation` - Embeddings are generated
-2. `test_02_embedding_similarity` - Similarity scoring works
-3. `test_03_document_ingestion` - Documents are ingested
-4. `test_04_semantic_search` - Semantic search works
-5. `test_05_max_results_respected` - Result limiting works
-6. `test_06_confidence_threshold` - Threshold filtering works
-7. `test_07_stat7_hybrid_scoring` - Hybrid scoring works
-8. `test_08_temporal_retrieval` - Temporal queries work
-9. `test_09_retrieval_metrics` - Metrics are tracked
-10. `test_10_full_rag_pipeline` - Complete pipeline works
-### 5. **tests/conftest.py** (1.6 KB)
-**Purpose**: Pytest configuration and fixtures
-**Includes**:
-- Custom pytest markers (embedding, retrieval, stat7, e2e, slow)
-- Test data fixtures
-- Pytest configuration hooks
-### 6. **tests/README.md** (5.6 KB)
-**Purpose**: Test documentation
-**Contains**:
-- Test organization overview
-- Running instructions
-- Test coverage summary
-- Troubleshooting guide
-- CI/CD integration examples
-## Test Statistics
-| Category | Count |
-|----------|-------|
-| Total Test Classes | 16 |
-| Total Test Methods | 50+ |
-| Total Test Files | 4 |
-| Test Size | ~47 KB |
-| Coverage Scope | 90%+ of core functionality |
-## Key Testing Areas
-### Embedding Providers
-- ✅ Local TF-IDF provider (no dependencies)
-- ✅ SentenceTransformer provider (GPU acceleration)
-- ✅ Factory pattern with graceful fallback
-- ✅ Batch processing
-- ✅ Similarity calculations
-- ✅ Semantic search
-### Retrieval Operations
-- ✅ Document ingestion and storage
-- ✅ Context store management
-- ✅ Query execution
-- ✅ Semantic similarity retrieval
-- ✅ Temporal sequence retrieval
-- ✅ Composite retrieval modes
-### STAT7 Integration
-- ✅ Coordinate computation from embeddings
-- ✅ Hybrid scoring (semantic + STAT7)
-- ✅ Resonance calculations
-- ✅ Multi-dimensional addressing
-- ✅ Document enrichment
-### System Integration
-- ✅ End-to-end pipeline
-- ✅ Metrics and performance tracking
-- ✅ Caching mechanisms
-- ✅ Error handling and fallbacks
-## Running the Tests
-### Quick Start
-```bash
-cd warbler-cda-package
-pytest tests/ -v
-```
-### Detailed Examples
-```bash
-# Run all tests with output
-pytest tests/ -v -s
-# Run with coverage report
-pytest tests/ --cov=warbler_cda --cov-report=html
-# Run only embedding tests
-pytest tests/test_embedding_providers.py -v
-# Run only end-to-end tests
-pytest tests/test_rag_e2e.py -v -s
-# Run tests matching a pattern
-pytest tests/ -k "semantic" -v
-```
-## Compatibility
-### With SentenceTransformer Installed
-- All 50+ tests pass
-- GPU acceleration available
-- Full STAT7 integration enabled
-### Without SentenceTransformer
-- Tests gracefully skip SentenceTransformer-specific tests
-- Fallback to local TF-IDF provider
-- ~40 tests pass
-- STAT7 tests skipped
-## Design Principles
-The ported tests follow TDD principles:
-1. **Isolation**: Each test is independent and can run standalone
-2. **Clarity**: Test names describe what is being tested
-3. **Completeness**: Happy path and edge cases covered
-4. **Robustness**: Graceful handling of optional dependencies
-5. **Documentation**: Each test is well-commented and documented
-## Integration with CI/CD
-The tests are designed for easy integration with CI/CD pipelines:
-```yaml
-# Example GitHub Actions workflow
-- name: Run Warbler CDA Tests
-  run: |
-    cd warbler-cda-package
-    pytest tests/ --cov=warbler_cda --cov-report=xml
-```
-## Future Test Additions
-Recommended areas for additional tests:
-1. Performance benchmarking
-2. Stress testing with large document collections
-3. Concurrent query handling
-4. Cache invalidation scenarios
-5. Error recovery mechanisms
-6. Large-scale STAT7 coordinate distribution analysis
-## Notes
-- Tests use pytest fixtures for setup/teardown
-- Custom markers enable selective test execution
-- Graceful fallback for optional dependencies
-- Comprehensive end-to-end validation
-- Documentation-as-tests through verbose assertions
-## Maintenance
-When updating the package:
-1. Run tests after any changes: `pytest tests/ -v`
-2. Update tests if new functionality is added
-3. Keep end-to-end tests as verification baseline
-4. Monitor test execution time for performance regressions

TEST_RESULTS.md DELETED Viewed

@@ -1,211 +0,0 @@
-# Test Results: MIT-Licensed Datasets Integration
-**Date**: November 8, 2025
-**Status**: ✅ **ALL TESTS PASSING**
-**Total Tests**: 71
-**Passed**: 71
-**Failed**: 0
-**Skipped**: 0
----
-## Test Summary
-### New MIT-Licensed Dataset Tests: 18/18 ✅
-| Test Class | Tests | Status |
-|-----------|-------|--------|
-| TestArxivPapersTransformer | 4 | ✅ PASS |
-| TestPromptReportTransformer | 2 | ✅ PASS |
-| TestGeneratedNovelsTransformer | 2 | ✅ PASS |
-| TestManualnsTransformer | 2 | ✅ PASS |
-| TestEnterpriseTransformer | 2 | ✅ PASS |
-| TestPortugueseEducationTransformer | 2 | ✅ PASS |
-| TestNewDatasetsIntegrationWithRetrieval | 2 | ✅ PASS |
-| TestNewDatasetsPerformance | 1 | ✅ PASS |
-| TestNewDatasetsAllAtOnce | 1 | ✅ PASS |
-| **Total New Tests** | **18** | **✅ 100%** |
-### Existing Warbler-CDA Tests: 53/53 ✅
-| Test Module | Tests | Status |
-|------------|-------|--------|
-| test_embedding_providers.py | 11 | ✅ PASS |
-| test_rag_e2e.py | 10 | ✅ PASS |
-| test_retrieval_api.py | 13 | ✅ PASS |
-| test_stat7_integration.py | 12 | ✅ PASS |
-| test_embedding_integration.py | 7 | ✅ PASS |
-| **Total Existing Tests** | **53** | **✅ 100%** |
----
-## Individual Test Results
-### ✅ New Transformer Tests (18 PASSED)
-```log
-tests/test_new_mit_datasets.py::TestArxivPapersTransformer::test_arxiv_transformer_exists PASSED
-tests/test_new_mit_datasets.py::TestArxivPapersTransformer::test_arxiv_output_format PASSED
-tests/test_new_mit_datasets.py::TestArxivPapersTransformer::test_arxiv_metadata_fields PASSED
-tests/test_new_mit_datasets.py::TestArxivPapersTransformer::test_arxiv_limit_parameter PASSED
-tests/test_new_mit_datasets.py::TestPromptReportTransformer::test_prompt_report_transformer_exists PASSED
-tests/test_new_mit_datasets.py::TestPromptReportTransformer::test_prompt_report_output_format PASSED
-tests/test_new_mit_datasets.py::TestGeneratedNovelsTransformer::test_novels_transformer_exists PASSED
-tests/test_new_mit_datasets.py::TestGeneratedNovelsTransformer::test_novels_chunking_for_long_text PASSED
-tests/test_new_mit_datasets.py::TestManualnsTransformer::test_manuals_transformer_exists PASSED
-tests/test_new_mit_datasets.py::TestManualnsTransformer::test_manuals_output_format PASSED
-tests/test_new_mit_datasets.py::TestEnterpriseTransformer::test_enterprise_transformer_exists PASSED
-tests/test_new_mit_datasets.py::TestEnterpriseTransformer::test_enterprise_output_format PASSED
-tests/test_new_mit_datasets.py::TestPortugueseEducationTransformer::test_portuguese_transformer_exists PASSED
-tests/test_new_mit_datasets.py::TestPortugueseEducationTransformer::test_portuguese_multilingual_metadata PASSED
-tests/test_new_mit_datasets.py::TestNewDatasetsIntegrationWithRetrieval::test_warbler_document_structure PASSED
-tests/test_new_mit_datasets.py::TestNewDatasetsIntegrationWithRetrieval::test_pack_creation_with_new_datasets PASSED
-tests/test_new_mit_datasets.py::TestNewDatasetsPerformance::test_arxiv_handles_large_dataset PASSED
-tests/test_new_mit_datasets.py::TestNewDatasetsAllAtOnce::test_all_transformers_callable PASSED
-```
-### ✅ Backward Compatibility Tests (53 PASSED)
-All existing tests continue to pass, confirming backward compatibility:
-- Embedding provider interface tests ✅
-- RAG end-to-end pipeline ✅
-- Retrieval API functionality ✅
-- STAT7 integration and hybrid scoring ✅
-- Embedding integration ✅
----
-## Test Execution Details
-### Command
-```bash
-C:\Users\jerio\AppData\Local\Programs\Python\Python312\python.exe -m pytest tests/ -v
-```
-### Execution Time
-- Total: 58.70 seconds
-- New tests: ~13 seconds
-- Existing tests: ~45 seconds
-### Environment
-- Python: 3.12.10
-- pytest: 8.4.2
-- Platform: Windows (win32)
----
-## Coverage by Transformer
-### arXiv Papers (4 tests)
-- ✅ Transformer exists and is callable
-- ✅ Output format matches Warbler structure
-- ✅ Metadata includes required fields
-- ✅ Limit parameter respected
-### Prompt Report (2 tests)
-- ✅ Transformer exists
-- ✅ Output format correct
-### Generated Novels (2 tests)
-- ✅ Transformer exists
-- ✅ Text chunking functionality
-### Technical Manuals (2 tests)
-- ✅ Transformer exists
-- ✅ Output format correct
-### Enterprise Benchmarks (2 tests)
-- ✅ Transformer exists
-- ✅ Output format correct
-### Portuguese Education (2 tests)
-- ✅ Transformer exists
-- ✅ Multilingual metadata
-### Integration (2 tests)
-- ✅ Warbler document structure validation
-- ✅ Pack creation with mocked filesystem
-### Performance (1 test)
-- ✅ Large dataset handling (100+ papers in <10s)
-### All Transformers Callable (1 test)
-- ✅ All 6 new transformers verified as callable
----
-## Issues Found & Fixed
-### Issue 1: Mock WindowsPath AttributeError
-**Problem**: Test tried to mock `mkdir` attribute on real Path object
-**Solution**: Used MagicMock instead of real Path
-**Status**: ✅ Fixed - all tests now pass
----
-## Validation Checklist
-- [x] All new transformer methods are implemented
-- [x] All helper methods are implemented
-- [x] Output format matches Warbler structure
-- [x] MIT license field present in all documents
-- [x] Metadata fields required (realm_type, realm_label, etc)
-- [x] Error handling in place
-- [x] CLI integration works
-- [x] Backward compatibility maintained
-- [x] Performance acceptable (<10s for large datasets)
-- [x] 100% test pass rate
----
-## Recommendations
-### Immediate
-- ✅ Ready for staging environment validation
-- ✅ Ready for production deployment
-### Next Steps
-1. Test with actual HuggingFace API (not mocked)
-2. Validate pack loading in retrieval system
-3. Benchmark hybrid scoring with new documents
-4. Monitor first production ingestion
-### Long-term
-1. Add integration tests with real HuggingFace datasets
-2. Performance benchmarking with different dataset sizes
-3. Memory profiling for large arXiv ingestion
-4. Document update frequency strategy
----
-## Sign-Off
-**All 71 tests passing.**
-**Backward compatibility maintained.**
-**New functionality validated.**
-✅ **Ready for Production Deployment**
----
-**Test Report Generated**: 2025-11-08
-**Python Version**: 3.12.10
-**pytest Version**: 8.4.2
-**Status**: VALIDATED ✅

TODO.md DELETED Viewed

@@ -1,30 +0,0 @@
-# Background Pack Ingestion Implementation
-## Overview
-Modify app.py to perform pack ingestion in a background thread, allowing the app to start immediately while documents load asynchronously.
-## Tasks
-### 1. Add Background Ingestion Support
-- [ ] Import threading module in app.py
-- [ ] Add global variables to track ingestion status (running, progress, total_docs, processed, etc.)
-- [ ] Create a background_ingest_packs() function that performs the ingestion logic
-- [ ] Start the background thread after API initialization but before app launch
-### 2. Update System Stats
-- [ ] Modify get_system_stats() to include ingestion progress information
-- [ ] Display current ingestion status in the System Stats tab
-### 3. Handle Thread Safety
-- [ ] Ensure API.add_document() calls are thread-safe (assuming they are)
-- [ ] Add proper error handling in the background thread
-### 4. Test Implementation
-- [ ] Test that app launches immediately
-- [ ] Verify ingestion happens in background
-- [ ] Check that queries work during ingestion
-- [ ] Confirm progress is shown in System Stats
-## Status
-- [x] Plan created and approved
-- [ ] Implementation in progress

app.py CHANGED Viewed

@@ -6,14 +6,13 @@ Provides a web UI for the FractalStat RAG system with GPU acceleration.
 """
 import gradio as gr
-import json
-from typing import Dict, Any, List
 import time
 # Import Warbler CDA components
 from warbler_cda.retrieval_api import RetrievalAPI, RetrievalQuery, RetrievalMode
 from warbler_cda.embeddings import EmbeddingProviderFactory
 from warbler_cda.fractalstat_rag_bridge import FractalStatRAGBridge
 from warbler_cda.pack_loader import PackLoader
 # Initialize the system
@@ -23,12 +22,17 @@ print("🚀 Initializing Warbler CDA...")
 embedding_provider = EmbeddingProviderFactory.get_default_provider()
 print(f"✅ Embedding provider: {embedding_provider.get_provider_info()['provider_id']}")
 # Create FractalStat bridge
 fractalstat_bridge = FractalStatRAGBridge()
 print("✅ FractalStat bridge initialized")
-# Create RetrievalAPI
 api = RetrievalAPI(
     embedding_provider=embedding_provider,
     fractalstat_bridge=fractalstat_bridge,
     config={"enable_fractalstat_hybrid": True}
@@ -39,15 +43,47 @@ print("✅ RetrievalAPI initialized")
 print("📚 Loading Warbler packs...")
 pack_loader = PackLoader()
 documents = pack_loader.discover_documents()
-print(f"✅ Found {len(documents)} documents")
-# Ingest documents
-for doc in documents:
-    api.add_document(
-        doc_id=doc["id"],
-        content=doc["content"],
-        metadata=doc.get("metadata", {})
-    )
 print(f"🎉 Warbler CDA ready with {api.get_context_store_size()} documents!")
@@ -145,7 +181,7 @@ with gr.Blocks(title="Warbler CDA - FractalStat RAG") as demo:
             with gr.Column():
                 results_output = gr.Markdown(label="Results")
-        query_btn.click(
             fn=query_warbler,
             inputs=[query_input, max_results, use_hybrid],
             outputs=results_output
@@ -163,8 +199,8 @@ with gr.Blocks(title="Warbler CDA - FractalStat RAG") as demo:
     with gr.Tab("System Stats"):
         stats_output = gr.Markdown()
         stats_btn = gr.Button("Refresh Stats")
-        stats_btn.click(fn=get_system_stats, outputs=stats_output)
-        demo.load(fn=get_system_stats, outputs=stats_output)
     with gr.Tab("About"):
         gr.Markdown("""

 """
 import gradio as gr
 import time
 # Import Warbler CDA components
 from warbler_cda.retrieval_api import RetrievalAPI, RetrievalQuery, RetrievalMode
 from warbler_cda.embeddings import EmbeddingProviderFactory
 from warbler_cda.fractalstat_rag_bridge import FractalStatRAGBridge
+from warbler_cda.semantic_anchors import SemanticAnchorGraph
 from warbler_cda.pack_loader import PackLoader
 # Initialize the system
 embedding_provider = EmbeddingProviderFactory.get_default_provider()
 print(f"✅ Embedding provider: {embedding_provider.get_provider_info()['provider_id']}")
+# Create semantic anchors (required by RetrievalAPI)
+semantic_anchors = SemanticAnchorGraph(embedding_provider=embedding_provider)
+print("✅ Semantic anchors initialized")
 # Create FractalStat bridge
 fractalstat_bridge = FractalStatRAGBridge()
 print("✅ FractalStat bridge initialized")
+# Create RetrievalAPI with proper components
 api = RetrievalAPI(
+    semantic_anchors=semantic_anchors,
     embedding_provider=embedding_provider,
     fractalstat_bridge=fractalstat_bridge,
     config={"enable_fractalstat_hybrid": True}
 print("📚 Loading Warbler packs...")
 pack_loader = PackLoader()
 documents = pack_loader.discover_documents()
+# If no packs found, try to download them
+if len(documents) == 0:
+    print("⚠️ No packs found locally. Attempting to download from HuggingFace...")
+    try:
+        from warbler_cda.utils.hf_warbler_ingest import HFWarblerIngestor
+        ingestor = HFWarblerIngestor(packs_dir=pack_loader.packs_dir, verbose=True)
+        # Download a small demo dataset for deployment
+        print("📦 Downloading warbler-pack-hf-prompt-report...")
+        success = ingestor.ingest_dataset("prompt-report")
+        if success:
+            # Reload after download
+            documents = pack_loader.discover_documents()
+            print(f"✅ Downloaded {len(documents)} documents")
+        else:
+            print("❌ Failed to download dataset, using sample documents...")
+            documents = []
+    except Exception as e:
+        print(f"⚠️ Could not download packs: {e}")
+        print("Using sample documents instead...")
+        documents = []
+if len(documents) == 0:
+    # Fallback to sample documents
+    sample_docs = [
+        {"id": "sample1", "content": "FractalStat is an 8-dimensional addressing system for intelligent retrieval.", "metadata": {}},
+        {"id": "sample2", "content": "Semantic search finds documents by meaning, not just keywords.", "metadata": {}},
+        {"id": "sample3", "content": "Bob the Skeptic validates results to prevent bias and hallucinations.", "metadata": {}},
+    ]
+    for doc in sample_docs:
+        api.add_document(doc["id"], doc["content"], doc["metadata"])
+    print(f"✅ Loaded {len(sample_docs)} sample documents")
+else:
+    print(f"✅ Found {len(documents)} documents")
+    # Ingest documents
+    for doc in documents:
+        api.add_document(
+            doc_id=doc["id"],
+            content=doc["content"],
+            metadata=doc.get("metadata", {})
+        )
 print(f"🎉 Warbler CDA ready with {api.get_context_store_size()} documents!")
             with gr.Column():
                 results_output = gr.Markdown(label="Results")
+        query_btn.click(  # pylint: disable=E1101
             fn=query_warbler,
             inputs=[query_input, max_results, use_hybrid],
             outputs=results_output
     with gr.Tab("System Stats"):
         stats_output = gr.Markdown()
         stats_btn = gr.Button("Refresh Stats")
+        stats_btn.click(fn=get_system_stats, outputs=stats_output)  # pylint: disable=E1101
+        demo.load(fn=get_system_stats, outputs=stats_output)  # pylint: disable=E1101
     with gr.Tab("About"):
         gr.Markdown("""

compress_packs.py DELETED Viewed

@@ -1,134 +0,0 @@
-#!/usr/bin/env python3
-"""
-Pack Compression Script using Evaporation Engine
-This script compresses warbler packs by replacing document content with
-compressed proto-thoughts generated by the evaporation engine.
-"""
-import json
-import sys
-from pathlib import Path
-from typing import Dict, Any, List
-# Add the project root to Python path
-sys.path.insert(0, str(Path(__file__).parent))
-from warbler_cda.melt_layer import MeltLayer, MagmaStore
-from warbler_cda.evaporation import EvaporationEngine, CloudStore
-def load_jsonl_file(filepath: str) -> List[Dict[str, Any]]:
-    """Load a JSONL file and return list of documents."""
-    documents = []
-    with open(filepath, "r", encoding="utf-8") as f:
-        for line in f:
-            line = line.strip()
-            if line:
-                documents.append(json.loads(line))
-    return documents
-def save_jsonl_file(filepath: str, documents: List[Dict[str, Any]]) -> None:
-    """Save list of documents to a JSONL file."""
-    with open(filepath, "w", encoding="utf-8") as f:
-        for doc in documents:
-            f.write(json.dumps(doc, ensure_ascii=False) + "\n")
-def compress_pack(pack_path: str, output_suffix: str = "_compressed") -> None:
-    """Compress a single pack using evaporation engine."""
-    pack_path = Path(pack_path)
-    if not pack_path.exists():
-        raise FileNotFoundError(f"Pack path {pack_path} does not exist")
-    # Find all JSONL files in the pack
-    jsonl_files = list(pack_path.glob("*.jsonl"))
-    if not jsonl_files:
-        print(f"No JSONL files found in {pack_path}")
-        return
-    print(f"Found {len(jsonl_files)} JSONL files in {pack_path}")
-    # Initialize evaporation components
-    magma_store = MagmaStore()
-    cloud_store = CloudStore()
-    melt_layer = MeltLayer(magma_store)
-    evaporation_engine = EvaporationEngine(magma_store, cloud_store)
-    total_docs = 0
-    compressed_docs = 0
-    for jsonl_file in jsonl_files:
-        print(f"Processing {jsonl_file.name}...")
-        # Load documents
-        documents = load_jsonl_file(str(jsonl_file))
-        total_docs += len(documents)
-        compressed_documents = []
-        for doc in documents:
-            if "content" not in doc:
-                print("Warning: Document missing 'content' field, skipping")
-                continue
-            content = doc["content"]
-            if not content or not isinstance(content, str):
-                print("Warning: Empty or invalid content, skipping")
-                continue
-            try:
-                # Create a fragment from the document content
-                fragment = {"id": doc.get("content_id", f"doc_{compressed_docs}"), "text": content}
-                # Create glyph from the single fragment
-                melt_layer.retire_cluster({"fragments": [fragment]})
-                # Evaporate to get proto-thought
-                mist_lines = evaporation_engine.evaporate(limit=1)
-                if mist_lines:
-                    proto_thought = mist_lines[0]["proto_thought"]
-                    # Replace content with compressed proto-thought
-                    compressed_doc = doc.copy()
-                    compressed_doc["content"] = proto_thought
-                    compressed_doc["original_content_length"] = len(content)
-                    compressed_doc["compressed_content_length"] = len(proto_thought)
-                    compressed_documents.append(compressed_doc)
-                    compressed_docs += 1
-                else:
-                    print(
-                        f"Warning: Failed to evaporate glyph for document {doc.get('content_id', 'unknown')}"
-                    )
-                    # Keep original document if evaporation fails
-                    compressed_documents.append(doc)
-            except Exception as e:
-                print(f"Error processing document {doc.get('content_id', 'unknown')}: {e}")
-                # Keep original document on error
-                compressed_documents.append(doc)
-        # Save compressed file
-        output_file = jsonl_file.parent / f"{jsonl_file.stem}{output_suffix}{jsonl_file.suffix}"
-        save_jsonl_file(str(output_file), compressed_documents)
-        print(f"Saved compressed file: {output_file}")
-    print("Compression complete:")
-    print(f"  Total documents processed: {total_docs}")
-    print(f"  Documents compressed: {compressed_docs}")
-    if total_docs > 0:
-        print(f"  Compression ratio: {compressed_docs/total_docs:.2%}")
-def main():
-    if len(sys.argv) != 2:
-        print("Usage: python compress_packs.py <pack_path>")
-        sys.exit(1)
-    pack_path = sys.argv[1]
-    compress_pack(pack_path)
-if __name__ == "__main__":
-    main()

convert_to_jsonl.py DELETED Viewed

@@ -1,37 +0,0 @@
-import json
-import os
-def convert_templates_to_jsonl(pack_dir):
-    """Convert templates.json to pack_name.jsonl for a given pack directory."""
-    pack_name = os.path.basename(pack_dir)
-    templates_path = os.path.join(pack_dir, "pack", "templates.json")
-    jsonl_path = os.path.join(pack_dir, f"{pack_name}.jsonl")
-    if not os.path.exists(templates_path):
-        print(f"No templates.json found in {pack_dir}")
-        return
-    with open(templates_path, "r") as f:
-        templates = json.load(f)
-    with open(jsonl_path, "w") as f:
-        for template in templates:
-            json.dump(template, f)
-            f.write("\n")
-    print(f"Converted {templates_path} to {jsonl_path}")
-# Convert the three default packs
-packs_to_convert = [
-    "packs/warbler-pack-core",
-    "packs/warbler-pack-faction-politics",
-    "packs/warbler-pack-wisdom-scrolls",
-]
-for pack in packs_to_convert:
-    if os.path.exists(pack):
-        convert_templates_to_jsonl(pack)
-    else:
-        print(f"Pack directory {pack} not found")

copy_packs.sh DELETED Viewed

@@ -1,45 +0,0 @@
-#!/bin/bash
-set -e
-SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
-REPO_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)"
-SOURCE_PACKS_DIR="$REPO_ROOT/packages/com.twg.the-seed/The Living Dev Agent/packs"
-DEST_PACKS_DIR="$SCRIPT_DIR/packs"
-echo "Copying Warbler Packs to warbler-cda-package..."
-echo "Source: $SOURCE_PACKS_DIR"
-echo "Destination: $DEST_PACKS_DIR"
-if [ ! -d "$SOURCE_PACKS_DIR" ]; then
-    echo "❌ Error: Source packs directory not found at $SOURCE_PACKS_DIR"
-    exit 1
-fi
-mkdir -p "$DEST_PACKS_DIR"
-PACKS=(
-    "warbler-pack-core"
-    "warbler-pack-faction-politics"
-    "warbler-pack-wisdom-scrolls"
-    "warbler-pack-hf-npc-dialogue"
-)
-for pack in "${PACKS[@]}"; do
-    src="$SOURCE_PACKS_DIR/$pack"
-    dst="$DEST_PACKS_DIR/$pack"
-    if [ -d "$src" ]; then
-        echo "📦 Copying $pack..."
-        rm -rf "$dst"
-        cp -r "$src" "$dst"
-        echo "✓ Copied $pack"
-    else
-        echo "⚠️  Warning: Pack not found at $src (skipping)"
-    fi
-done
-echo ""
-echo "✅ Warbler packs successfully copied to $DEST_PACKS_DIR"
-echo ""
-echo "Packs available for ingestion:"
-ls -1 "$DEST_PACKS_DIR" | sed 's/^/  • /'

coverage.xml DELETED Viewed

The diff for this file is too large to render. See raw diff

final_fix.py DELETED Viewed

@@ -1,28 +0,0 @@
-#!/usr/bin/env python3
-"""Final fixes for stat7_entity.py and verify the fixes work"""
-# Fix the stat7_entity.py bug
-with open("warbler_cda/stat7_entity.py", "r", encoding="utf-8") as f:
-    content = f.read()
-# Fix the description reference bug
-content = content.replace('"description": description,', '"description": self.description,')
-# Write back the fixed content
-with open("warbler_cda/stat7_entity.py", "w", encoding="utf-8") as f:
-    f.write(content)
-print("Fixed stat7_entity.py description bug")
-# Test import to make sure everything works
-try:
-    print("✅ stat7_entity imports successfully")
-except Exception as e:
-    print(f"❌ stat7_entity import failed: {e}")
-try:
-    print("✅ stat7_rag_bridge imports successfully")
-except Exception as e:
-    print(f"❌ stat7_rag_bridge import failed: {e}")
-print("All fixes applied!")

fix_theme.py DELETED Viewed

@@ -1,15 +0,0 @@
-#!/usr/bin/env python3
-"""Fix the theme issue in app.py"""
-with open("app.py", "r", encoding="utf-8") as f:
-    content = f.read()
-old_line = 'with gr.Blocks(title="Warbler CDA - RAG System Demo", theme=gr.themes.Soft()) as demo:'
-new_line = 'with gr.Blocks(title="Warbler CDA - RAG System Demo") as demo:'
-content = content.replace(old_line, new_line)
-with open("app.py", "w", encoding="utf-8") as f:
-    f.write(content)
-print("Fixed theme issue")

load_warbler_packs_current.txt DELETED Viewed

@@ -1,259 +0,0 @@
-#!/usr/bin/env python3
-"""
-Load Warbler Pack Data into EXP-09 API Service
-Ingests game wisdom, lore, and faction data into the STAT7-enabled RetrievalAPI
-for end-to-end testing with real Warbler content.
-"""
-import json
-import requests
-import click
-from pathlib import Path
-from typing import List, Dict, Any
-import logging
-logging.basicConfig(level=logging.INFO)
-logger = logging.getLogger(__name__)
-# Warbler pack locations
-BASE_DIR = Path(__file__).resolve().parent
-PACKS_DIR = BASE_DIR.parents[1] / 'packs'
-WARBLER_PACKS = [
-    "warbler-pack-core",
-    "warbler-pack-wisdom-scrolls",
-    "warbler-pack-faction-politics",
-    "warbler-pack-hf-arxiv",
-    "warbler-pack-hf-prompt-report",
-    "warbler-pack-hf-novels",
-    "warbler-pack-hf-manuals",
-    "warbler-pack-hf-enterprise",
-    "warbler-pack-hf-portuguese-edu",
-    "warbler-pack-hf-edustories"
-]
-class WarblerPackLoader:
-    """Load Warbler pack data into the API"""
-    def __init__(self, api_url: str = "http://localhost:8000"):
-        self.api_url = api_url.rstrip("/")
-        self.session = requests.Session()
-        self.loaded_count = 0
-        self.error_count = 0
-    def discover_documents(self, pack_name: str) -> List[Dict[str, Any]]:
-        """Discover all documents in a pack"""
-        pack_path = PACKS_DIR / pack_name
-        documents = []
-        if not pack_path.exists():
-            logger.warning(f"Pack not found: {pack_path}")
-            return []
-        # Look for JSON, YAML, markdown, and JSONL files
-        for pattern in [
-            "**/*.json",
-            "**/*.yaml",
-            "**/*.yml",
-            "**/*.md",
-                "**/*.jsonl"]:
-            for file_path in pack_path.glob(pattern):
-                try:
-                    doc = self._parse_document(file_path, pack_name)
-                    if doc:
-                        documents.append(doc)
-                        logger.info(
-                            f"Discovered: {file_path.relative_to(PACKS_DIR)}")
-                except Exception as e:
-                    logger.error(f"Error parsing {file_path}: {e}")
-        return documents
-    def _parse_document(self, file_path: Path,
-                        pack_name: str) -> Dict[str, Any]:
-        """Parse a document file"""
-        try:
-            if file_path.suffix in ['.json']:
-                with open(file_path, 'r', encoding='utf-8') as f:
-                    content = json.load(f)
-                    if isinstance(content, dict):
-                        content = json.dumps(content)
-                    else:
-                        content = json.dumps(content)
-            elif file_path.suffix in ['.jsonl']:
-                # JSONL files contain multiple JSON objects, one per line
-                # We'll read the first few lines and combine them
-                with open(file_path, 'r', encoding='utf-8') as f:
-                    lines = f.readlines()[:5]  # First 5 lines
-                    content = '\n'.join(line.strip()
-                                        for line in lines if line.strip())
-            elif file_path.suffix in ['.yaml', '.yml']:
-                import yaml
-                with open(file_path, 'r', encoding='utf-8') as f:
-                    content = yaml.safe_load(f)
-                    content = json.dumps(content)
-            elif file_path.suffix == '.md':
-                with open(file_path, 'r', encoding='utf-8') as f:
-                    content = f.read()
-            else:
-                return None
-            # Infer realm from pack name
-            if "wisdom" in pack_name:
-                realm = "wisdom"
-            elif "faction" in pack_name:
-                realm = "faction"
-            else:
-                realm = "narrative"
-            return {
-                "content_id": f"{pack_name}/{file_path.stem}",
-                "content": str(content)[:5000],  # Limit content size
-                "metadata": {
-                    "pack": pack_name,
-                    "source_file": str(file_path.name),
-                    "realm_type": realm,
-                    "realm_label": pack_name.replace("warbler-pack-", ""),
-                    "lifecycle_stage": "emergence",
-                    "activity_level": 0.7
-                }
-            }
-        except Exception as e:
-            logger.error(f"Failed to parse {file_path}: {e}")
-            return None
-    def ingest_document(self, doc: Dict[str, Any]) -> bool:
-        """Send document to API for ingestion"""
-        try:
-            # For now, we'll store in local context
-            # The API service will need an /ingest endpoint
-            logger.info(f"Ingesting: {doc['content_id']}")
-            # Check if API has ingest endpoint
-            response = self.session.post(
-                f"{self.api_url}/ingest",
-                json={"documents": [doc]},
-                timeout=10
-            )
-            if response.status_code in [200, 201, 202]:
-                self.loaded_count += 1
-                logger.info(f"[OK] Loaded: {doc['content_id']}")
-                return True
-            else:
-                logger.warning(
-                    f"API returned {response.status_code}: {response.text[:200]}")
-                return False
-        except requests.exceptions.ConnectionError:
-            logger.error("Cannot connect to API. Is the service running?")
-            return False
-        except Exception as e:
-            logger.error(f"Ingestion failed: {e}")
-            self.error_count += 1
-            return False
-    def load_all_packs(self) -> int:
-        """Load all Warbler packs"""
-        click.echo("\n" + "=" * 60)
-        click.echo("Loading Warbler Pack Data into EXP-09 API")
-        click.echo("=" * 60 + "\n")
-        total_docs = 0
-        for pack_name in WARBLER_PACKS:
-            click.echo(f"\n[PACK] Processing: {pack_name}")
-            click.echo("-" * 40)
-            documents = self.discover_documents(pack_name)
-            click.echo(f"Found {len(documents)} documents\n")
-            for doc in documents:
-                self.ingest_document(doc)
-                total_docs += 1
-        click.echo("\n" + "=" * 60)
-        click.secho(
-            f"[OK] Load Complete: {
-                self.loaded_count} docs ingested",
-            fg="green")
-        if self.error_count > 0:
-            click.secho(f"[ERROR] Errors: {self.error_count}", fg="yellow")
-        click.echo("=" * 60 + "\n")
-        return self.loaded_count
-@click.group()
-def cli():
-    """Warbler Pack Loader for EXP-09"""
-    pass
-@cli.command()
-@click.option("--api-url",
-              default="http://localhost:8000",
-              help="API service URL")
-def load(api_url):
-    """Load all Warbler packs into the API"""
-    loader = WarblerPackLoader(api_url)
-    # First, check if API is running
-    try:
-        response = loader.session.get(f"{api_url}/health", timeout=5)
-        if response.status_code == 200:
-            click.secho("[OK] API service is running", fg="green")
-        else:
-            click.secho(
-                "[ERROR] API service not responding correctly", fg="red")
-            return
-    except Exception as e:
-        click.secho(f"[ERROR] Cannot reach API at {api_url}: {e}", fg="red")
-        click.echo("\nStart the service with: docker-compose up -d")
-        return
-    # Load the packs
-    loaded = loader.load_all_packs()
-    if loaded > 0:
-        click.echo("\n[NEXT] Next Steps:")
-        click.echo(
-            "  1. Query the data with: python exp09_cli.py query --query-id q1 --semantic \"wisdom about courage\"")
-        click.echo(
-            "  2. Test hybrid scoring: python exp09_cli.py query --query-id q1 --semantic \"...\" --hybrid")
-        click.echo("  3. Check metrics: python exp09_cli.py metrics\n")
-@cli.command()
-@click.option("--api-url",
-              default="http://localhost:8000",
-              help="API service URL")
-def discover(api_url):
-    """Discover documents in Warbler packs (no loading)"""
-    loader = WarblerPackLoader(api_url)
-    click.echo("\n" + "=" * 60)
-    click.echo("Discovering Warbler Pack Documents")
-    click.echo("=" * 60 + "\n")
-    total = 0
-    for pack_name in WARBLER_PACKS:
-        click.echo(f"\n[PACK] {pack_name}")
-        click.echo("-" * 40)
-        documents = loader.discover_documents(pack_name)
-        total += len(documents)
-        for doc in documents:
-            click.echo(f"  - {doc['content_id']}")
-            if "metadata" in doc:
-                click.echo(
-                    f"    Realm: {
-                        doc['metadata'].get(
-                            'realm_type',
-                            'unknown')}")
-    click.echo(f"\n[STATS] Total discovered: {total} documents\n")
-if __name__ == "__main__":
-    cli()

package-lock.json DELETED Viewed

@@ -1,861 +0,0 @@
-{
-  "name": "warbler-cda",
-  "version": "1.0.0",
-  "lockfileVersion": 3,
-  "requires": true,
-  "packages": {
-    "": {
-      "name": "warbler-cda",
-      "version": "1.0.0",
-      "license": "ISC",
-      "dependencies": {
-        "express": "^5.1.0",
-        "typescript": "^5.9.3"
-      }
-    },
-    "node_modules/accepts": {
-      "version": "2.0.0",
-      "resolved": "https://registry.npmjs.org/accepts/-/accepts-2.0.0.tgz",
-      "integrity": "sha512-5cvg6CtKwfgdmVqY1WIiXKc3Q1bkRqGLi+2W/6ao+6Y7gu/RCwRuAhGEzh5B4KlszSuTLgZYuqFqo5bImjNKng==",
-      "license": "MIT",
-      "dependencies": {
-        "mime-types": "^3.0.0",
-        "negotiator": "^1.0.0"
-      },
-      "engines": {
-        "node": ">= 0.6"
-      }
-    },
-    "node_modules/body-parser": {
-      "version": "2.2.0",
-      "resolved": "https://registry.npmjs.org/body-parser/-/body-parser-2.2.0.tgz",
-      "integrity": "sha512-02qvAaxv8tp7fBa/mw1ga98OGm+eCbqzJOKoRt70sLmfEEi+jyBYVTDGfCL/k06/4EMk/z01gCe7HoCH/f2LTg==",
-      "license": "MIT",
-      "dependencies": {
-        "bytes": "^3.1.2",
-        "content-type": "^1.0.5",
-        "debug": "^4.4.0",
-        "http-errors": "^2.0.0",
-        "iconv-lite": "^0.6.3",
-        "on-finished": "^2.4.1",
-        "qs": "^6.14.0",
-        "raw-body": "^3.0.0",
-        "type-is": "^2.0.0"
-      },
-      "engines": {
-        "node": ">=18"
-      }
-    },
-    "node_modules/bytes": {
-      "version": "3.1.2",
-      "resolved": "https://registry.npmjs.org/bytes/-/bytes-3.1.2.tgz",
-      "integrity": "sha512-/Nf7TyzTx6S3yRJObOAV7956r8cr2+Oj8AC5dt8wSP3BQAoeX58NoHyCU8P8zGkNXStjTSi6fzO6F0pBdcYbEg==",
-      "license": "MIT",
-      "engines": {
-        "node": ">= 0.8"
-      }
-    },
-    "node_modules/call-bind-apply-helpers": {
-      "version": "1.0.2",
-      "resolved": "https://registry.npmjs.org/call-bind-apply-helpers/-/call-bind-apply-helpers-1.0.2.tgz",
-      "integrity": "sha512-Sp1ablJ0ivDkSzjcaJdxEunN5/XvksFJ2sMBFfq6x0ryhQV/2b/KwFe21cMpmHtPOSij8K99/wSfoEuTObmuMQ==",
-      "license": "MIT",
-      "dependencies": {
-        "es-errors": "^1.3.0",
-        "function-bind": "^1.1.2"
-      },
-      "engines": {
-        "node": ">= 0.4"
-      }
-    },
-    "node_modules/call-bound": {
-      "version": "1.0.4",
-      "resolved": "https://registry.npmjs.org/call-bound/-/call-bound-1.0.4.tgz",
-      "integrity": "sha512-+ys997U96po4Kx/ABpBCqhA9EuxJaQWDQg7295H4hBphv3IZg0boBKuwYpt4YXp6MZ5AmZQnU/tyMTlRpaSejg==",
-      "license": "MIT",
-      "dependencies": {
-        "call-bind-apply-helpers": "^1.0.2",
-        "get-intrinsic": "^1.3.0"
-      },
-      "engines": {
-        "node": ">= 0.4"
-      },
-      "funding": {
-        "url": "https://github.com/sponsors/ljharb"
-      }
-    },
-    "node_modules/content-disposition": {
-      "version": "1.0.1",
-      "resolved": "https://registry.npmjs.org/content-disposition/-/content-disposition-1.0.1.tgz",
-      "integrity": "sha512-oIXISMynqSqm241k6kcQ5UwttDILMK4BiurCfGEREw6+X9jkkpEe5T9FZaApyLGGOnFuyMWZpdolTXMtvEJ08Q==",
-      "license": "MIT",
-      "engines": {
-        "node": ">=18"
-      },
-      "funding": {
-        "type": "opencollective",
-        "url": "https://opencollective.com/express"
-      }
-    },
-    "node_modules/content-type": {
-      "version": "1.0.5",
-      "resolved": "https://registry.npmjs.org/content-type/-/content-type-1.0.5.tgz",
-      "integrity": "sha512-nTjqfcBFEipKdXCv4YDQWCfmcLZKm81ldF0pAopTvyrFGVbcR6P/VAAd5G7N+0tTr8QqiU0tFadD6FK4NtJwOA==",
-      "license": "MIT",
-      "engines": {
-        "node": ">= 0.6"
-      }
-    },
-    "node_modules/cookie": {
-      "version": "0.7.2",
-      "resolved": "https://registry.npmjs.org/cookie/-/cookie-0.7.2.tgz",
-      "integrity": "sha512-yki5XnKuf750l50uGTllt6kKILY4nQ1eNIQatoXEByZ5dWgnKqbnqmTrBE5B4N7lrMJKQ2ytWMiTO2o0v6Ew/w==",
-      "license": "MIT",
-      "engines": {
-        "node": ">= 0.6"
-      }
-    },
-    "node_modules/cookie-signature": {
-      "version": "1.2.2",
-      "resolved": "https://registry.npmjs.org/cookie-signature/-/cookie-signature-1.2.2.tgz",
-      "integrity": "sha512-D76uU73ulSXrD1UXF4KE2TMxVVwhsnCgfAyTg9k8P6KGZjlXKrOLe4dJQKI3Bxi5wjesZoFXJWElNWBjPZMbhg==",
-      "license": "MIT",
-      "engines": {
-        "node": ">=6.6.0"
-      }
-    },
-    "node_modules/debug": {
-      "version": "4.4.3",
-      "resolved": "https://registry.npmjs.org/debug/-/debug-4.4.3.tgz",
-      "integrity": "sha512-RGwwWnwQvkVfavKVt22FGLw+xYSdzARwm0ru6DhTVA3umU5hZc28V3kO4stgYryrTlLpuvgI9GiijltAjNbcqA==",
-      "license": "MIT",
-      "dependencies": {
-        "ms": "^2.1.3"
-      },
-      "engines": {
-        "node": ">=6.0"
-      },
-      "peerDependenciesMeta": {
-        "supports-color": {
-          "optional": true
-        }
-      }
-    },
-    "node_modules/depd": {
-      "version": "2.0.0",
-      "resolved": "https://registry.npmjs.org/depd/-/depd-2.0.0.tgz",
-      "integrity": "sha512-g7nH6P6dyDioJogAAGprGpCtVImJhpPk/roCzdb3fIh61/s/nPsfR6onyMwkCAR/OlC3yBC0lESvUoQEAssIrw==",
-      "license": "MIT",
-      "engines": {
-        "node": ">= 0.8"
-      }
-    },
-    "node_modules/dunder-proto": {
-      "version": "1.0.1",
-      "resolved": "https://registry.npmjs.org/dunder-proto/-/dunder-proto-1.0.1.tgz",
-      "integrity": "sha512-KIN/nDJBQRcXw0MLVhZE9iQHmG68qAVIBg9CqmUYjmQIhgij9U5MFvrqkUL5FbtyyzZuOeOt0zdeRe4UY7ct+A==",
-      "license": "MIT",
-      "dependencies": {
-        "call-bind-apply-helpers": "^1.0.1",
-        "es-errors": "^1.3.0",
-        "gopd": "^1.2.0"
-      },
-      "engines": {
-        "node": ">= 0.4"
-      }
-    },
-    "node_modules/ee-first": {
-      "version": "1.1.1",
-      "resolved": "https://registry.npmjs.org/ee-first/-/ee-first-1.1.1.tgz",
-      "integrity": "sha512-WMwm9LhRUo+WUaRN+vRuETqG89IgZphVSNkdFgeb6sS/E4OrDIN7t48CAewSHXc6C8lefD8KKfr5vY61brQlow==",
-      "license": "MIT"
-    },
-    "node_modules/encodeurl": {
-      "version": "2.0.0",
-      "resolved": "https://registry.npmjs.org/encodeurl/-/encodeurl-2.0.0.tgz",
-      "integrity": "sha512-Q0n9HRi4m6JuGIV1eFlmvJB7ZEVxu93IrMyiMsGC0lrMJMWzRgx6WGquyfQgZVb31vhGgXnfmPNNXmxnOkRBrg==",
-      "license": "MIT",
-      "engines": {
-        "node": ">= 0.8"
-      }
-    },
-    "node_modules/es-define-property": {
-      "version": "1.0.1",
-      "resolved": "https://registry.npmjs.org/es-define-property/-/es-define-property-1.0.1.tgz",
-      "integrity": "sha512-e3nRfgfUZ4rNGL232gUgX06QNyyez04KdjFrF+LTRoOXmrOgFKDg4BCdsjW8EnT69eqdYGmRpJwiPVYNrCaW3g==",
-      "license": "MIT",
-      "engines": {
-        "node": ">= 0.4"
-      }
-    },
-    "node_modules/es-errors": {
-      "version": "1.3.0",
-      "resolved": "https://registry.npmjs.org/es-errors/-/es-errors-1.3.0.tgz",
-      "integrity": "sha512-Zf5H2Kxt2xjTvbJvP2ZWLEICxA6j+hAmMzIlypy4xcBg1vKVnx89Wy0GbS+kf5cwCVFFzdCFh2XSCFNULS6csw==",
-      "license": "MIT",
-      "engines": {
-        "node": ">= 0.4"
-      }
-    },
-    "node_modules/es-object-atoms": {
-      "version": "1.1.1",
-      "resolved": "https://registry.npmjs.org/es-object-atoms/-/es-object-atoms-1.1.1.tgz",
-      "integrity": "sha512-FGgH2h8zKNim9ljj7dankFPcICIK9Cp5bm+c2gQSYePhpaG5+esrLODihIorn+Pe6FGJzWhXQotPv73jTaldXA==",
-      "license": "MIT",
-      "dependencies": {
-        "es-errors": "^1.3.0"
-      },
-      "engines": {
-        "node": ">= 0.4"
-      }
-    },
-    "node_modules/escape-html": {
-      "version": "1.0.3",
-      "resolved": "https://registry.npmjs.org/escape-html/-/escape-html-1.0.3.tgz",
-      "integrity": "sha512-NiSupZ4OeuGwr68lGIeym/ksIZMJodUGOSCZ/FSnTxcrekbvqrgdUxlJOMpijaKZVjAJrWrGs/6Jy8OMuyj9ow==",
-      "license": "MIT"
-    },
-    "node_modules/etag": {
-      "version": "1.8.1",
-      "resolved": "https://registry.npmjs.org/etag/-/etag-1.8.1.tgz",
-      "integrity": "sha512-aIL5Fx7mawVa300al2BnEE4iNvo1qETxLrPI/o05L7z6go7fCw1J6EQmbK4FmJ2AS7kgVF/KEZWufBfdClMcPg==",
-      "license": "MIT",
-      "engines": {
-        "node": ">= 0.6"
-      }
-    },
-    "node_modules/express": {
-      "version": "5.1.0",
-      "resolved": "https://registry.npmjs.org/express/-/express-5.1.0.tgz",
-      "integrity": "sha512-DT9ck5YIRU+8GYzzU5kT3eHGA5iL+1Zd0EutOmTE9Dtk+Tvuzd23VBU+ec7HPNSTxXYO55gPV/hq4pSBJDjFpA==",
-      "license": "MIT",
-      "dependencies": {
-        "accepts": "^2.0.0",
-        "body-parser": "^2.2.0",
-        "content-disposition": "^1.0.0",
-        "content-type": "^1.0.5",
-        "cookie": "^0.7.1",
-        "cookie-signature": "^1.2.1",
-        "debug": "^4.4.0",
-        "encodeurl": "^2.0.0",
-        "escape-html": "^1.0.3",
-        "etag": "^1.8.1",
-        "finalhandler": "^2.1.0",
-        "fresh": "^2.0.0",
-        "http-errors": "^2.0.0",
-        "merge-descriptors": "^2.0.0",
-        "mime-types": "^3.0.0",
-        "on-finished": "^2.4.1",
-        "once": "^1.4.0",
-        "parseurl": "^1.3.3",
-        "proxy-addr": "^2.0.7",
-        "qs": "^6.14.0",
-        "range-parser": "^1.2.1",
-        "router": "^2.2.0",
-        "send": "^1.1.0",
-        "serve-static": "^2.2.0",
-        "statuses": "^2.0.1",
-        "type-is": "^2.0.1",
-        "vary": "^1.1.2"
-      },
-      "engines": {
-        "node": ">= 18"
-      },
-      "funding": {
-        "type": "opencollective",
-        "url": "https://opencollective.com/express"
-      }
-    },
-    "node_modules/finalhandler": {
-      "version": "2.1.0",
-      "resolved": "https://registry.npmjs.org/finalhandler/-/finalhandler-2.1.0.tgz",
-      "integrity": "sha512-/t88Ty3d5JWQbWYgaOGCCYfXRwV1+be02WqYYlL6h0lEiUAMPM8o8qKGO01YIkOHzka2up08wvgYD0mDiI+q3Q==",
-      "license": "MIT",
-      "dependencies": {
-        "debug": "^4.4.0",
-        "encodeurl": "^2.0.0",
-        "escape-html": "^1.0.3",
-        "on-finished": "^2.4.1",
-        "parseurl": "^1.3.3",
-        "statuses": "^2.0.1"
-      },
-      "engines": {
-        "node": ">= 0.8"
-      }
-    },
-    "node_modules/forwarded": {
-      "version": "0.2.0",
-      "resolved": "https://registry.npmjs.org/forwarded/-/forwarded-0.2.0.tgz",
-      "integrity": "sha512-buRG0fpBtRHSTCOASe6hD258tEubFoRLb4ZNA6NxMVHNw2gOcwHo9wyablzMzOA5z9xA9L1KNjk/Nt6MT9aYow==",
-      "license": "MIT",
-      "engines": {
-        "node": ">= 0.6"
-      }
-    },
-    "node_modules/fresh": {
-      "version": "2.0.0",
-      "resolved": "https://registry.npmjs.org/fresh/-/fresh-2.0.0.tgz",
-      "integrity": "sha512-Rx/WycZ60HOaqLKAi6cHRKKI7zxWbJ31MhntmtwMoaTeF7XFH9hhBp8vITaMidfljRQ6eYWCKkaTK+ykVJHP2A==",
-      "license": "MIT",
-      "engines": {
-        "node": ">= 0.8"
-      }
-    },
-    "node_modules/function-bind": {
-      "version": "1.1.2",
-      "resolved": "https://registry.npmjs.org/function-bind/-/function-bind-1.1.2.tgz",
-      "integrity": "sha512-7XHNxH7qX9xG5mIwxkhumTox/MIRNcOgDrxWsMt2pAr23WHp6MrRlN7FBSFpCpr+oVO0F744iUgR82nJMfG2SA==",
-      "license": "MIT",
-      "funding": {
-        "url": "https://github.com/sponsors/ljharb"
-      }
-    },
-    "node_modules/get-intrinsic": {
-      "version": "1.3.0",
-      "resolved": "https://registry.npmjs.org/get-intrinsic/-/get-intrinsic-1.3.0.tgz",
-      "integrity": "sha512-9fSjSaos/fRIVIp+xSJlE6lfwhES7LNtKaCBIamHsjr2na1BiABJPo0mOjjz8GJDURarmCPGqaiVg5mfjb98CQ==",
-      "license": "MIT",
-      "dependencies": {
-        "call-bind-apply-helpers": "^1.0.2",
-        "es-define-property": "^1.0.1",
-        "es-errors": "^1.3.0",
-        "es-object-atoms": "^1.1.1",
-        "function-bind": "^1.1.2",
-        "get-proto": "^1.0.1",
-        "gopd": "^1.2.0",
-        "has-symbols": "^1.1.0",
-        "hasown": "^2.0.2",
-        "math-intrinsics": "^1.1.0"
-      },
-      "engines": {
-        "node": ">= 0.4"
-      },
-      "funding": {
-        "url": "https://github.com/sponsors/ljharb"
-      }
-    },
-    "node_modules/get-proto": {
-      "version": "1.0.1",
-      "resolved": "https://registry.npmjs.org/get-proto/-/get-proto-1.0.1.tgz",
-      "integrity": "sha512-sTSfBjoXBp89JvIKIefqw7U2CCebsc74kiY6awiGogKtoSGbgjYE/G/+l9sF3MWFPNc9IcoOC4ODfKHfxFmp0g==",
-      "license": "MIT",
-      "dependencies": {
-        "dunder-proto": "^1.0.1",
-        "es-object-atoms": "^1.0.0"
-      },
-      "engines": {
-        "node": ">= 0.4"
-      }
-    },
-    "node_modules/gopd": {
-      "version": "1.2.0",
-      "resolved": "https://registry.npmjs.org/gopd/-/gopd-1.2.0.tgz",
-      "integrity": "sha512-ZUKRh6/kUFoAiTAtTYPZJ3hw9wNxx+BIBOijnlG9PnrJsCcSjs1wyyD6vJpaYtgnzDrKYRSqf3OO6Rfa93xsRg==",
-      "license": "MIT",
-      "engines": {
-        "node": ">= 0.4"
-      },
-      "funding": {
-        "url": "https://github.com/sponsors/ljharb"
-      }
-    },
-    "node_modules/has-symbols": {
-      "version": "1.1.0",
-      "resolved": "https://registry.npmjs.org/has-symbols/-/has-symbols-1.1.0.tgz",
-      "integrity": "sha512-1cDNdwJ2Jaohmb3sg4OmKaMBwuC48sYni5HUw2DvsC8LjGTLK9h+eb1X6RyuOHe4hT0ULCW68iomhjUoKUqlPQ==",
-      "license": "MIT",
-      "engines": {
-        "node": ">= 0.4"
-      },
-      "funding": {
-        "url": "https://github.com/sponsors/ljharb"
-      }
-    },
-    "node_modules/hasown": {
-      "version": "2.0.2",
-      "resolved": "https://registry.npmjs.org/hasown/-/hasown-2.0.2.tgz",
-      "integrity": "sha512-0hJU9SCPvmMzIBdZFqNPXWa6dqh7WdH0cII9y+CyS8rG3nL48Bclra9HmKhVVUHyPWNH5Y7xDwAB7bfgSjkUMQ==",
-      "license": "MIT",
-      "dependencies": {
-        "function-bind": "^1.1.2"
-      },
-      "engines": {
-        "node": ">= 0.4"
-      }
-    },
-    "node_modules/http-errors": {
-      "version": "2.0.1",
-      "resolved": "https://registry.npmjs.org/http-errors/-/http-errors-2.0.1.tgz",
-      "integrity": "sha512-4FbRdAX+bSdmo4AUFuS0WNiPz8NgFt+r8ThgNWmlrjQjt1Q7ZR9+zTlce2859x4KSXrwIsaeTqDoKQmtP8pLmQ==",
-      "license": "MIT",
-      "dependencies": {
-        "depd": "~2.0.0",
-        "inherits": "~2.0.4",
-        "setprototypeof": "~1.2.0",
-        "statuses": "~2.0.2",
-        "toidentifier": "~1.0.1"
-      },
-      "engines": {
-        "node": ">= 0.8"
-      },
-      "funding": {
-        "type": "opencollective",
-        "url": "https://opencollective.com/express"
-      }
-    },
-    "node_modules/iconv-lite": {
-      "version": "0.6.3",
-      "resolved": "https://registry.npmjs.org/iconv-lite/-/iconv-lite-0.6.3.tgz",
-      "integrity": "sha512-4fCk79wshMdzMp2rH06qWrJE4iolqLhCUH+OiuIgU++RB0+94NlDL81atO7GX55uUKueo0txHNtvEyI6D7WdMw==",
-      "license": "MIT",
-      "dependencies": {
-        "safer-buffer": ">= 2.1.2 < 3.0.0"
-      },
-      "engines": {
-        "node": ">=0.10.0"
-      }
-    },
-    "node_modules/inherits": {
-      "version": "2.0.4",
-      "resolved": "https://registry.npmjs.org/inherits/-/inherits-2.0.4.tgz",
-      "integrity": "sha512-k/vGaX4/Yla3WzyMCvTQOXYeIHvqOKtnqBduzTHpzpQZzAskKMhZ2K+EnBiSM9zGSoIFeMpXKxa4dYeZIQqewQ==",
-      "license": "ISC"
-    },
-    "node_modules/ipaddr.js": {
-      "version": "1.9.1",
-      "resolved": "https://registry.npmjs.org/ipaddr.js/-/ipaddr.js-1.9.1.tgz",
-      "integrity": "sha512-0KI/607xoxSToH7GjN1FfSbLoU0+btTicjsQSWQlh/hZykN8KpmMf7uYwPW3R+akZ6R/w18ZlXSHBYXiYUPO3g==",
-      "license": "MIT",
-      "engines": {
-        "node": ">= 0.10"
-      }
-    },
-    "node_modules/is-promise": {
-      "version": "4.0.0",
-      "resolved": "https://registry.npmjs.org/is-promise/-/is-promise-4.0.0.tgz",
-      "integrity": "sha512-hvpoI6korhJMnej285dSg6nu1+e6uxs7zG3BYAm5byqDsgJNWwxzM6z6iZiAgQR4TJ30JmBTOwqZUw3WlyH3AQ==",
-      "license": "MIT"
-    },
-    "node_modules/math-intrinsics": {
-      "version": "1.1.0",
-      "resolved": "https://registry.npmjs.org/math-intrinsics/-/math-intrinsics-1.1.0.tgz",
-      "integrity": "sha512-/IXtbwEk5HTPyEwyKX6hGkYXxM9nbj64B+ilVJnC/R6B0pH5G4V3b0pVbL7DBj4tkhBAppbQUlf6F6Xl9LHu1g==",
-      "license": "MIT",
-      "engines": {
-        "node": ">= 0.4"
-      }
-    },
-    "node_modules/media-typer": {
-      "version": "1.1.0",
-      "resolved": "https://registry.npmjs.org/media-typer/-/media-typer-1.1.0.tgz",
-      "integrity": "sha512-aisnrDP4GNe06UcKFnV5bfMNPBUw4jsLGaWwWfnH3v02GnBuXX2MCVn5RbrWo0j3pczUilYblq7fQ7Nw2t5XKw==",
-      "license": "MIT",
-      "engines": {
-        "node": ">= 0.8"
-      }
-    },
-    "node_modules/merge-descriptors": {
-      "version": "2.0.0",
-      "resolved": "https://registry.npmjs.org/merge-descriptors/-/merge-descriptors-2.0.0.tgz",
-      "integrity": "sha512-Snk314V5ayFLhp3fkUREub6WtjBfPdCPY1Ln8/8munuLuiYhsABgBVWsozAG+MWMbVEvcdcpbi9R7ww22l9Q3g==",
-      "license": "MIT",
-      "engines": {
-        "node": ">=18"
-      },
-      "funding": {
-        "url": "https://github.com/sponsors/sindresorhus"
-      }
-    },
-    "node_modules/mime-db": {
-      "version": "1.54.0",
-      "resolved": "https://registry.npmjs.org/mime-db/-/mime-db-1.54.0.tgz",
-      "integrity": "sha512-aU5EJuIN2WDemCcAp2vFBfp/m4EAhWJnUNSSw0ixs7/kXbd6Pg64EmwJkNdFhB8aWt1sH2CTXrLxo/iAGV3oPQ==",
-      "license": "MIT",
-      "engines": {
-        "node": ">= 0.6"
-      }
-    },
-    "node_modules/mime-types": {
-      "version": "3.0.2",
-      "resolved": "https://registry.npmjs.org/mime-types/-/mime-types-3.0.2.tgz",
-      "integrity": "sha512-Lbgzdk0h4juoQ9fCKXW4by0UJqj+nOOrI9MJ1sSj4nI8aI2eo1qmvQEie4VD1glsS250n15LsWsYtCugiStS5A==",
-      "license": "MIT",
-      "dependencies": {
-        "mime-db": "^1.54.0"
-      },
-      "engines": {
-        "node": ">=18"
-      },
-      "funding": {
-        "type": "opencollective",
-        "url": "https://opencollective.com/express"
-      }
-    },
-    "node_modules/ms": {
-      "version": "2.1.3",
-      "resolved": "https://registry.npmjs.org/ms/-/ms-2.1.3.tgz",
-      "integrity": "sha512-6FlzubTLZG3J2a/NVCAleEhjzq5oxgHyaCU9yYXvcLsvoVaHJq/s5xXI6/XXP6tz7R9xAOtHnSO/tXtF3WRTlA==",
-      "license": "MIT"
-    },
-    "node_modules/negotiator": {
-      "version": "1.0.0",
-      "resolved": "https://registry.npmjs.org/negotiator/-/negotiator-1.0.0.tgz",
-      "integrity": "sha512-8Ofs/AUQh8MaEcrlq5xOX0CQ9ypTF5dl78mjlMNfOK08fzpgTHQRQPBxcPlEtIw0yRpws+Zo/3r+5WRby7u3Gg==",
-      "license": "MIT",
-      "engines": {
-        "node": ">= 0.6"
-      }
-    },
-    "node_modules/object-inspect": {
-      "version": "1.13.4",
-      "resolved": "https://registry.npmjs.org/object-inspect/-/object-inspect-1.13.4.tgz",
-      "integrity": "sha512-W67iLl4J2EXEGTbfeHCffrjDfitvLANg0UlX3wFUUSTx92KXRFegMHUVgSqE+wvhAbi4WqjGg9czysTV2Epbew==",
-      "license": "MIT",
-      "engines": {
-        "node": ">= 0.4"
-      },
-      "funding": {
-        "url": "https://github.com/sponsors/ljharb"
-      }
-    },
-    "node_modules/on-finished": {
-      "version": "2.4.1",
-      "resolved": "https://registry.npmjs.org/on-finished/-/on-finished-2.4.1.tgz",
-      "integrity": "sha512-oVlzkg3ENAhCk2zdv7IJwd/QUD4z2RxRwpkcGY8psCVcCYZNq4wYnVWALHM+brtuJjePWiYF/ClmuDr8Ch5+kg==",
-      "license": "MIT",
-      "dependencies": {
-        "ee-first": "1.1.1"
-      },
-      "engines": {
-        "node": ">= 0.8"
-      }
-    },
-    "node_modules/once": {
-      "version": "1.4.0",
-      "resolved": "https://registry.npmjs.org/once/-/once-1.4.0.tgz",
-      "integrity": "sha512-lNaJgI+2Q5URQBkccEKHTQOPaXdUxnZZElQTZY0MFUAuaEqe1E+Nyvgdz/aIyNi6Z9MzO5dv1H8n58/GELp3+w==",
-      "license": "ISC",
-      "dependencies": {
-        "wrappy": "1"
-      }
-    },
-    "node_modules/parseurl": {
-      "version": "1.3.3",
-      "resolved": "https://registry.npmjs.org/parseurl/-/parseurl-1.3.3.tgz",
-      "integrity": "sha512-CiyeOxFT/JZyN5m0z9PfXw4SCBJ6Sygz1Dpl0wqjlhDEGGBP1GnsUVEL0p63hoG1fcj3fHynXi9NYO4nWOL+qQ==",
-      "license": "MIT",
-      "engines": {
-        "node": ">= 0.8"
-      }
-    },
-    "node_modules/path-to-regexp": {
-      "version": "8.3.0",
-      "resolved": "https://registry.npmjs.org/path-to-regexp/-/path-to-regexp-8.3.0.tgz",
-      "integrity": "sha512-7jdwVIRtsP8MYpdXSwOS0YdD0Du+qOoF/AEPIt88PcCFrZCzx41oxku1jD88hZBwbNUIEfpqvuhjFaMAqMTWnA==",
-      "license": "MIT",
-      "funding": {
-        "type": "opencollective",
-        "url": "https://opencollective.com/express"
-      }
-    },
-    "node_modules/proxy-addr": {
-      "version": "2.0.7",
-      "resolved": "https://registry.npmjs.org/proxy-addr/-/proxy-addr-2.0.7.tgz",
-      "integrity": "sha512-llQsMLSUDUPT44jdrU/O37qlnifitDP+ZwrmmZcoSKyLKvtZxpyV0n2/bD/N4tBAAZ/gJEdZU7KMraoK1+XYAg==",
-      "license": "MIT",
-      "dependencies": {
-        "forwarded": "0.2.0",
-        "ipaddr.js": "1.9.1"
-      },
-      "engines": {
-        "node": ">= 0.10"
-      }
-    },
-    "node_modules/qs": {
-      "version": "6.14.0",
-      "resolved": "https://registry.npmjs.org/qs/-/qs-6.14.0.tgz",
-      "integrity": "sha512-YWWTjgABSKcvs/nWBi9PycY/JiPJqOD4JA6o9Sej2AtvSGarXxKC3OQSk4pAarbdQlKAh5D4FCQkJNkW+GAn3w==",
-      "license": "BSD-3-Clause",
-      "dependencies": {
-        "side-channel": "^1.1.0"
-      },
-      "engines": {
-        "node": ">=0.6"
-      },
-      "funding": {
-        "url": "https://github.com/sponsors/ljharb"
-      }
-    },
-    "node_modules/range-parser": {
-      "version": "1.2.1",
-      "resolved": "https://registry.npmjs.org/range-parser/-/range-parser-1.2.1.tgz",
-      "integrity": "sha512-Hrgsx+orqoygnmhFbKaHE6c296J+HTAQXoxEF6gNupROmmGJRoyzfG3ccAveqCBrwr/2yxQ5BVd/GTl5agOwSg==",
-      "license": "MIT",
-      "engines": {
-        "node": ">= 0.6"
-      }
-    },
-    "node_modules/raw-body": {
-      "version": "3.0.1",
-      "resolved": "https://registry.npmjs.org/raw-body/-/raw-body-3.0.1.tgz",
-      "integrity": "sha512-9G8cA+tuMS75+6G/TzW8OtLzmBDMo8p1JRxN5AZ+LAp8uxGA8V8GZm4GQ4/N5QNQEnLmg6SS7wyuSmbKepiKqA==",
-      "license": "MIT",
-      "dependencies": {
-        "bytes": "3.1.2",
-        "http-errors": "2.0.0",
-        "iconv-lite": "0.7.0",
-        "unpipe": "1.0.0"
-      },
-      "engines": {
-        "node": ">= 0.10"
-      }
-    },
-    "node_modules/raw-body/node_modules/http-errors": {
-      "version": "2.0.0",
-      "resolved": "https://registry.npmjs.org/http-errors/-/http-errors-2.0.0.tgz",
-      "integrity": "sha512-FtwrG/euBzaEjYeRqOgly7G0qviiXoJWnvEH2Z1plBdXgbyjv34pHTSb9zoeHMyDy33+DWy5Wt9Wo+TURtOYSQ==",
-      "license": "MIT",
-      "dependencies": {
-        "depd": "2.0.0",
-        "inherits": "2.0.4",
-        "setprototypeof": "1.2.0",
-        "statuses": "2.0.1",
-        "toidentifier": "1.0.1"
-      },
-      "engines": {
-        "node": ">= 0.8"
-      }
-    },
-    "node_modules/raw-body/node_modules/iconv-lite": {
-      "version": "0.7.0",
-      "resolved": "https://registry.npmjs.org/iconv-lite/-/iconv-lite-0.7.0.tgz",
-      "integrity": "sha512-cf6L2Ds3h57VVmkZe+Pn+5APsT7FpqJtEhhieDCvrE2MK5Qk9MyffgQyuxQTm6BChfeZNtcOLHp9IcWRVcIcBQ==",
-      "license": "MIT",
-      "dependencies": {
-        "safer-buffer": ">= 2.1.2 < 3.0.0"
-      },
-      "engines": {
-        "node": ">=0.10.0"
-      },
-      "funding": {
-        "type": "opencollective",
-        "url": "https://opencollective.com/express"
-      }
-    },
-    "node_modules/raw-body/node_modules/statuses": {
-      "version": "2.0.1",
-      "resolved": "https://registry.npmjs.org/statuses/-/statuses-2.0.1.tgz",
-      "integrity": "sha512-RwNA9Z/7PrK06rYLIzFMlaF+l73iwpzsqRIFgbMLbTcLD6cOao82TaWefPXQvB2fOC4AjuYSEndS7N/mTCbkdQ==",
-      "license": "MIT",
-      "engines": {
-        "node": ">= 0.8"
-      }
-    },
-    "node_modules/router": {
-      "version": "2.2.0",
-      "resolved": "https://registry.npmjs.org/router/-/router-2.2.0.tgz",
-      "integrity": "sha512-nLTrUKm2UyiL7rlhapu/Zl45FwNgkZGaCpZbIHajDYgwlJCOzLSk+cIPAnsEqV955GjILJnKbdQC1nVPz+gAYQ==",
-      "license": "MIT",
-      "dependencies": {
-        "debug": "^4.4.0",
-        "depd": "^2.0.0",
-        "is-promise": "^4.0.0",
-        "parseurl": "^1.3.3",
-        "path-to-regexp": "^8.0.0"
-      },
-      "engines": {
-        "node": ">= 18"
-      }
-    },
-    "node_modules/safer-buffer": {
-      "version": "2.1.2",
-      "resolved": "https://registry.npmjs.org/safer-buffer/-/safer-buffer-2.1.2.tgz",
-      "integrity": "sha512-YZo3K82SD7Riyi0E1EQPojLz7kpepnSQI9IyPbHHg1XXXevb5dJI7tpyN2ADxGcQbHG7vcyRHk0cbwqcQriUtg==",
-      "license": "MIT"
-    },
-    "node_modules/send": {
-      "version": "1.2.0",
-      "resolved": "https://registry.npmjs.org/send/-/send-1.2.0.tgz",
-      "integrity": "sha512-uaW0WwXKpL9blXE2o0bRhoL2EGXIrZxQ2ZQ4mgcfoBxdFmQold+qWsD2jLrfZ0trjKL6vOw0j//eAwcALFjKSw==",
-      "license": "MIT",
-      "dependencies": {
-        "debug": "^4.3.5",
-        "encodeurl": "^2.0.0",
-        "escape-html": "^1.0.3",
-        "etag": "^1.8.1",
-        "fresh": "^2.0.0",
-        "http-errors": "^2.0.0",
-        "mime-types": "^3.0.1",
-        "ms": "^2.1.3",
-        "on-finished": "^2.4.1",
-        "range-parser": "^1.2.1",
-        "statuses": "^2.0.1"
-      },
-      "engines": {
-        "node": ">= 18"
-      }
-    },
-    "node_modules/serve-static": {
-      "version": "2.2.0",
-      "resolved": "https://registry.npmjs.org/serve-static/-/serve-static-2.2.0.tgz",
-      "integrity": "sha512-61g9pCh0Vnh7IutZjtLGGpTA355+OPn2TyDv/6ivP2h/AdAVX9azsoxmg2/M6nZeQZNYBEwIcsne1mJd9oQItQ==",
-      "license": "MIT",
-      "dependencies": {
-        "encodeurl": "^2.0.0",
-        "escape-html": "^1.0.3",
-        "parseurl": "^1.3.3",
-        "send": "^1.2.0"
-      },
-      "engines": {
-        "node": ">= 18"
-      }
-    },
-    "node_modules/setprototypeof": {
-      "version": "1.2.0",
-      "resolved": "https://registry.npmjs.org/setprototypeof/-/setprototypeof-1.2.0.tgz",
-      "integrity": "sha512-E5LDX7Wrp85Kil5bhZv46j8jOeboKq5JMmYM3gVGdGH8xFpPWXUMsNrlODCrkoxMEeNi/XZIwuRvY4XNwYMJpw==",
-      "license": "ISC"
-    },
-    "node_modules/side-channel": {
-      "version": "1.1.0",
-      "resolved": "https://registry.npmjs.org/side-channel/-/side-channel-1.1.0.tgz",
-      "integrity": "sha512-ZX99e6tRweoUXqR+VBrslhda51Nh5MTQwou5tnUDgbtyM0dBgmhEDtWGP/xbKn6hqfPRHujUNwz5fy/wbbhnpw==",
-      "license": "MIT",
-      "dependencies": {
-        "es-errors": "^1.3.0",
-        "object-inspect": "^1.13.3",
-        "side-channel-list": "^1.0.0",
-        "side-channel-map": "^1.0.1",
-        "side-channel-weakmap": "^1.0.2"
-      },
-      "engines": {
-        "node": ">= 0.4"
-      },
-      "funding": {
-        "url": "https://github.com/sponsors/ljharb"
-      }
-    },
-    "node_modules/side-channel-list": {
-      "version": "1.0.0",
-      "resolved": "https://registry.npmjs.org/side-channel-list/-/side-channel-list-1.0.0.tgz",
-      "integrity": "sha512-FCLHtRD/gnpCiCHEiJLOwdmFP+wzCmDEkc9y7NsYxeF4u7Btsn1ZuwgwJGxImImHicJArLP4R0yX4c2KCrMrTA==",
-      "license": "MIT",
-      "dependencies": {
-        "es-errors": "^1.3.0",
-        "object-inspect": "^1.13.3"
-      },
-      "engines": {
-        "node": ">= 0.4"
-      },
-      "funding": {
-        "url": "https://github.com/sponsors/ljharb"
-      }
-    },
-    "node_modules/side-channel-map": {
-      "version": "1.0.1",
-      "resolved": "https://registry.npmjs.org/side-channel-map/-/side-channel-map-1.0.1.tgz",
-      "integrity": "sha512-VCjCNfgMsby3tTdo02nbjtM/ewra6jPHmpThenkTYh8pG9ucZ/1P8So4u4FGBek/BjpOVsDCMoLA/iuBKIFXRA==",
-      "license": "MIT",
-      "dependencies": {
-        "call-bound": "^1.0.2",
-        "es-errors": "^1.3.0",
-        "get-intrinsic": "^1.2.5",
-        "object-inspect": "^1.13.3"
-      },
-      "engines": {
-        "node": ">= 0.4"
-      },
-      "funding": {
-        "url": "https://github.com/sponsors/ljharb"
-      }
-    },
-    "node_modules/side-channel-weakmap": {
-      "version": "1.0.2",
-      "resolved": "https://registry.npmjs.org/side-channel-weakmap/-/side-channel-weakmap-1.0.2.tgz",
-      "integrity": "sha512-WPS/HvHQTYnHisLo9McqBHOJk2FkHO/tlpvldyrnem4aeQp4hai3gythswg6p01oSoTl58rcpiFAjF2br2Ak2A==",
-      "license": "MIT",
-      "dependencies": {
-        "call-bound": "^1.0.2",
-        "es-errors": "^1.3.0",
-        "get-intrinsic": "^1.2.5",
-        "object-inspect": "^1.13.3",
-        "side-channel-map": "^1.0.1"
-      },
-      "engines": {
-        "node": ">= 0.4"
-      },
-      "funding": {
-        "url": "https://github.com/sponsors/ljharb"
-      }
-    },
-    "node_modules/statuses": {
-      "version": "2.0.2",
-      "resolved": "https://registry.npmjs.org/statuses/-/statuses-2.0.2.tgz",
-      "integrity": "sha512-DvEy55V3DB7uknRo+4iOGT5fP1slR8wQohVdknigZPMpMstaKJQWhwiYBACJE3Ul2pTnATihhBYnRhZQHGBiRw==",
-      "license": "MIT",
-      "engines": {
-        "node": ">= 0.8"
-      }
-    },
-    "node_modules/toidentifier": {
-      "version": "1.0.1",
-      "resolved": "https://registry.npmjs.org/toidentifier/-/toidentifier-1.0.1.tgz",
-      "integrity": "sha512-o5sSPKEkg/DIQNmH43V0/uerLrpzVedkUh8tGNvaeXpfpuwjKenlSox/2O/BTlZUtEe+JG7s5YhEz608PlAHRA==",
-      "license": "MIT",
-      "engines": {
-        "node": ">=0.6"
-      }
-    },
-    "node_modules/type-is": {
-      "version": "2.0.1",
-      "resolved": "https://registry.npmjs.org/type-is/-/type-is-2.0.1.tgz",
-      "integrity": "sha512-OZs6gsjF4vMp32qrCbiVSkrFmXtG/AZhY3t0iAMrMBiAZyV9oALtXO8hsrHbMXF9x6L3grlFuwW2oAz7cav+Gw==",
-      "license": "MIT",
-      "dependencies": {
-        "content-type": "^1.0.5",
-        "media-typer": "^1.1.0",
-        "mime-types": "^3.0.0"
-      },
-      "engines": {
-        "node": ">= 0.6"
-      }
-    },
-    "node_modules/typescript": {
-      "version": "5.9.3",
-      "resolved": "https://registry.npmjs.org/typescript/-/typescript-5.9.3.tgz",
-      "integrity": "sha512-jl1vZzPDinLr9eUt3J/t7V6FgNEw9QjvBPdysz9KfQDD41fQrC2Y4vKQdiaUpFT4bXlb1RHhLpp8wtm6M5TgSw==",
-      "license": "Apache-2.0",
-      "bin": {
-        "tsc": "bin/tsc",
-        "tsserver": "bin/tsserver"
-      },
-      "engines": {
-        "node": ">=14.17"
-      }
-    },
-    "node_modules/unpipe": {
-      "version": "1.0.0",
-      "resolved": "https://registry.npmjs.org/unpipe/-/unpipe-1.0.0.tgz",
-      "integrity": "sha512-pjy2bYhSsufwWlKwPc+l3cN7+wuJlK6uz0YdJEOlQDbl6jo/YlPi4mb8agUkVC8BF7V8NuzeyPNqRksA3hztKQ==",
-      "license": "MIT",
-      "engines": {
-        "node": ">= 0.8"
-      }
-    },
-    "node_modules/vary": {
-      "version": "1.1.2",
-      "resolved": "https://registry.npmjs.org/vary/-/vary-1.1.2.tgz",
-      "integrity": "sha512-BNGbWLfd0eUPabhkXUVm0j8uuvREyTh5ovRa/dyow/BqAbZJyC+5fU+IzQOzmAKzYqYRAISoRhdQr3eIZ/PXqg==",
-      "license": "MIT",
-      "engines": {
-        "node": ">= 0.8"
-      }
-    },
-    "node_modules/wrappy": {
-      "version": "1.0.2",
-      "resolved": "https://registry.npmjs.org/wrappy/-/wrappy-1.0.2.tgz",
-      "integrity": "sha512-l4Sp/DRseor9wL6EvV2+TuQn63dMkPjZ/sp9XkghTEbV9KlPS1xUsZ3u7/IQO4wxtcFB4bgpQPRcR3QCvezPcQ==",
-      "license": "ISC"
-    }
-  }
-}

package.json DELETED Viewed

@@ -1,19 +0,0 @@
-{
-  "name": "warbler-cda",
-  "version": "1.0.0",
-  "description": "--- title: Warbler CDA RAG System emoji: 🦜 colorFrom: blue colorTo: purple sdk: gradio sdk_version: 5.49.1 app_file: app.py pinned: false license: mit tags:   - rag   - retrieval   - semantic-search   - stat7   - embeddings   - nlp ---",
-  "main": "index.js",
-  "directories": {
-    "test": "tests"
-  },
-  "scripts": {
-    "test": "echo \"Error: no test specified\" && exit 1"
-  },
-  "keywords": [],
-  "author": "",
-  "license": "ISC",
-  "dependencies": {
-    "express": "^5.1.0",
-    "typescript": "^5.9.3"
-  }
-}

packs/warbler-pack-hf-arxiv/package.json CHANGED Viewed

@@ -2,14 +2,14 @@
   "name": "warbler-pack-hf-arxiv",
   "version": "1.0.0",
   "description": "Warbler pack generated from HuggingFace datasets (chunked)",
-  "created_at": "2025-11-19T19:07:32.887499",
   "document_count": 2549619,
   "source": "HuggingFace",
   "content_types": [
     "scholarly_discussion"
   ],
   "chunked": true,
-  "chunk_count": 255,
-  "docs_per_chunk": 10000,
-  "chunk_pattern": "warbler-pack-hf-arxiv-chunk-*_compressed.jsonl"
 }

   "name": "warbler-pack-hf-arxiv",
   "version": "1.0.0",
   "description": "Warbler pack generated from HuggingFace datasets (chunked)",
+  "created_at": "2025-12-02T10:48:41.412949",
   "document_count": 2549619,
   "source": "HuggingFace",
   "content_types": [
     "scholarly_discussion"
   ],
   "chunked": true,
+  "chunk_count": 51,
+  "docs_per_chunk": 50000,
+  "chunk_pattern": "warbler-pack-hf-arxiv-chunk-*.jsonl"
 }

packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-001_compressed.jsonl DELETED Viewed

The diff for this file is too large to render. See raw diff

packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-002_compressed.jsonl DELETED Viewed

The diff for this file is too large to render. See raw diff

packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-003_compressed.jsonl DELETED Viewed

The diff for this file is too large to render. See raw diff

packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-004_compressed.jsonl DELETED Viewed

The diff for this file is too large to render. See raw diff

packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-005_compressed.jsonl DELETED Viewed

The diff for this file is too large to render. See raw diff

packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-006_compressed.jsonl DELETED Viewed

The diff for this file is too large to render. See raw diff

packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-007_compressed.jsonl DELETED Viewed

The diff for this file is too large to render. See raw diff

packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-008_compressed.jsonl DELETED Viewed

The diff for this file is too large to render. See raw diff

packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-009_compressed.jsonl DELETED Viewed

The diff for this file is too large to render. See raw diff

packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-010_compressed.jsonl DELETED Viewed

The diff for this file is too large to render. See raw diff

packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-011_compressed.jsonl DELETED Viewed

The diff for this file is too large to render. See raw diff

packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-012_compressed.jsonl DELETED Viewed

The diff for this file is too large to render. See raw diff

packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-013_compressed.jsonl DELETED Viewed

The diff for this file is too large to render. See raw diff

packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-014_compressed.jsonl DELETED Viewed

The diff for this file is too large to render. See raw diff

packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-015_compressed.jsonl DELETED Viewed

The diff for this file is too large to render. See raw diff

packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-016_compressed.jsonl DELETED Viewed

The diff for this file is too large to render. See raw diff

packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-017_compressed.jsonl DELETED Viewed

The diff for this file is too large to render. See raw diff

packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-018_compressed.jsonl DELETED Viewed

The diff for this file is too large to render. See raw diff

packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-019_compressed.jsonl DELETED Viewed

The diff for this file is too large to render. See raw diff

packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-020_compressed.jsonl DELETED Viewed

The diff for this file is too large to render. See raw diff

packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-021_compressed.jsonl DELETED Viewed

The diff for this file is too large to render. See raw diff

packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-022_compressed.jsonl DELETED Viewed

The diff for this file is too large to render. See raw diff

packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-023_compressed.jsonl DELETED Viewed

The diff for this file is too large to render. See raw diff

packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-024_compressed.jsonl DELETED Viewed

The diff for this file is too large to render. See raw diff

packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-025_compressed.jsonl DELETED Viewed

The diff for this file is too large to render. See raw diff

packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-026_compressed.jsonl DELETED Viewed

The diff for this file is too large to render. See raw diff

packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-027_compressed.jsonl DELETED Viewed

The diff for this file is too large to render. See raw diff

packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-028_compressed.jsonl DELETED Viewed

The diff for this file is too large to render. See raw diff

packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-029_compressed.jsonl DELETED Viewed

The diff for this file is too large to render. See raw diff