Spaces:
Running
on
Zero
Running
on
Zero
there-is-already-a-branch
#1
by
Bellok
- opened
This view is limited to 50 files because it contains too many changes.
See the raw diff here.
- .gitignore +1 -1
- PACKAGE_MANIFEST.md +0 -94
- PACKS_DEPLOYMENT.md +0 -281
- PACK_CACHING.md +0 -172
- PACK_INGESTION_FIX.md +0 -209
- PDF_INGESTION_INVESTIGATION.md +0 -325
- README_HF.md +1 -1
- TESTS_PORTED.md +0 -271
- TEST_RESULTS.md +0 -211
- TODO.md +0 -30
- app.py +51 -15
- compress_packs.py +0 -134
- convert_to_jsonl.py +0 -37
- copy_packs.sh +0 -45
- coverage.xml +0 -0
- final_fix.py +0 -28
- fix_theme.py +0 -15
- load_warbler_packs_current.txt +0 -259
- package-lock.json +0 -861
- package.json +0 -19
- packs/warbler-pack-hf-arxiv/package.json +4 -4
- packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-001_compressed.jsonl +0 -0
- packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-002_compressed.jsonl +0 -0
- packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-003_compressed.jsonl +0 -0
- packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-004_compressed.jsonl +0 -0
- packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-005_compressed.jsonl +0 -0
- packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-006_compressed.jsonl +0 -0
- packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-007_compressed.jsonl +0 -0
- packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-008_compressed.jsonl +0 -0
- packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-009_compressed.jsonl +0 -0
- packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-010_compressed.jsonl +0 -0
- packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-011_compressed.jsonl +0 -0
- packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-012_compressed.jsonl +0 -0
- packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-013_compressed.jsonl +0 -0
- packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-014_compressed.jsonl +0 -0
- packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-015_compressed.jsonl +0 -0
- packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-016_compressed.jsonl +0 -0
- packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-017_compressed.jsonl +0 -0
- packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-018_compressed.jsonl +0 -0
- packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-019_compressed.jsonl +0 -0
- packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-020_compressed.jsonl +0 -0
- packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-021_compressed.jsonl +0 -0
- packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-022_compressed.jsonl +0 -0
- packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-023_compressed.jsonl +0 -0
- packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-024_compressed.jsonl +0 -0
- packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-025_compressed.jsonl +0 -0
- packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-026_compressed.jsonl +0 -0
- packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-027_compressed.jsonl +0 -0
- packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-028_compressed.jsonl +0 -0
- packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-029_compressed.jsonl +0 -0
.gitignore
CHANGED
|
@@ -47,7 +47,7 @@ results/
|
|
| 47 |
|
| 48 |
# HuggingFace language packs (downloaded on-demand)
|
| 49 |
# Exclude all HF packs to keep deployment size under 1GB
|
| 50 |
-
packs/warbler-pack-hf-arxiv
|
| 51 |
packs/warbler-pack-hf-enterprise/
|
| 52 |
packs/warbler-pack-hf-edustories/
|
| 53 |
packs/warbler-pack-hf-manuals/
|
|
|
|
| 47 |
|
| 48 |
# HuggingFace language packs (downloaded on-demand)
|
| 49 |
# Exclude all HF packs to keep deployment size under 1GB
|
| 50 |
+
packs/warbler-pack-hf-arxiv/*chunk*.jsonl
|
| 51 |
packs/warbler-pack-hf-enterprise/
|
| 52 |
packs/warbler-pack-hf-edustories/
|
| 53 |
packs/warbler-pack-hf-manuals/
|
PACKAGE_MANIFEST.md
DELETED
|
@@ -1,94 +0,0 @@
|
|
| 1 |
-
# Warbler CDA Package - Complete File List
|
| 2 |
-
|
| 3 |
-
## Package Structure (21 core files + infrastructure)
|
| 4 |
-
|
| 5 |
-
### Core RAG System (9 files)
|
| 6 |
-
|
| 7 |
-
✓ warbler_cda/retrieval_api.py - Main RAG API with hybrid scoring
|
| 8 |
-
✓ warbler_cda/semantic_anchors.py - Semantic memory with provenance
|
| 9 |
-
✓ warbler_cda/anchor_data_classes.py - Core data structures
|
| 10 |
-
✓ warbler_cda/anchor_memory_pool.py - Performance optimization
|
| 11 |
-
✓ warbler_cda/summarization_ladder.py - Hierarchical compression
|
| 12 |
-
✓ warbler_cda/conflict_detector.py - Conflict detection
|
| 13 |
-
✓ warbler_cda/castle_graph.py - Concept extraction
|
| 14 |
-
✓ warbler_cda/melt_layer.py - Memory consolidation
|
| 15 |
-
✓ warbler_cda/evaporation.py - Content distillation
|
| 16 |
-
|
| 17 |
-
### FractalStat System (4 files)
|
| 18 |
-
|
| 19 |
-
✓ warbler_cda/fractalstat_rag_bridge.py - FractalStat hybrid scoring bridge
|
| 20 |
-
✓ warbler_cda/fractalstat_entity.py - FractalStat entity system
|
| 21 |
-
✓ warbler_cda/fractalstat_experiments.py - Validation experiments
|
| 22 |
-
✓ warbler_cda/fractalstat_visualization.py - Visualization tools
|
| 23 |
-
|
| 24 |
-
### Embeddings (4 files)
|
| 25 |
-
|
| 26 |
-
✓ warbler_cda/embeddings/__init__.py
|
| 27 |
-
✓ warbler_cda/embeddings/base_provider.py - Abstract interface
|
| 28 |
-
✓ warbler_cda/embeddings/factory.py - Provider factory
|
| 29 |
-
✓ warbler_cda/embeddings/local_provider.py - Local TF-IDF embeddings
|
| 30 |
-
✓ warbler_cda/embeddings/openai_provider.py - OpenAI embeddings
|
| 31 |
-
|
| 32 |
-
### Production API (2 files)
|
| 33 |
-
|
| 34 |
-
✓ warbler_cda/api/__init__.py
|
| 35 |
-
✓ warbler_cda/api/service.py - FastAPI service (exp09_api_service.py)
|
| 36 |
-
✓ warbler_cda/api/cli.py - CLI interface (exp09_cli.py)
|
| 37 |
-
|
| 38 |
-
### Utilities (2 files)
|
| 39 |
-
|
| 40 |
-
✓ warbler_cda/utils/__init__.py
|
| 41 |
-
✓ warbler_cda/utils/load_warbler_packs.py - Pack loader
|
| 42 |
-
✓ warbler_cda/utils/hf_warbler_ingest.py - HF dataset ingestion
|
| 43 |
-
|
| 44 |
-
### Infrastructure Files
|
| 45 |
-
|
| 46 |
-
✓ warbler_cda/__init__.py - Package initialization
|
| 47 |
-
✓ requirements.txt - Dependencies
|
| 48 |
-
✓ pyproject.toml - Package metadata
|
| 49 |
-
✓ README.md - Documentation
|
| 50 |
-
✓ app.py - Gradio demo for HuggingFace
|
| 51 |
-
✓ .gitignore - Git exclusions
|
| 52 |
-
✓ LICENSE - MIT License
|
| 53 |
-
✓ DEPLOYMENT.md - Deployment guide
|
| 54 |
-
✓ README_HF.md - HuggingFace Space config
|
| 55 |
-
✓ setup.sh - Quick setup script
|
| 56 |
-
✓ transform_imports.sh - Import transformation script
|
| 57 |
-
|
| 58 |
-
## Total Files: 32 files
|
| 59 |
-
|
| 60 |
-
## Import Transformations Applied
|
| 61 |
-
|
| 62 |
-
All imports have been transformed from:
|
| 63 |
-
|
| 64 |
-
- `from seed.engine.X import Y` → `from warbler_cda.X import Y`
|
| 65 |
-
- `from .X import Y` → `from warbler_cda.X import Y`
|
| 66 |
-
|
| 67 |
-
Privacy hooks have been removed (not needed for HuggingFace deployment).
|
| 68 |
-
|
| 69 |
-
## Size Estimate
|
| 70 |
-
|
| 71 |
-
Total package size: ~500KB (source code only)
|
| 72 |
-
With dependencies: ~2GB (includes PyTorch, Transformers, etc.)
|
| 73 |
-
|
| 74 |
-
## Next Steps
|
| 75 |
-
|
| 76 |
-
1. Test the package locally:
|
| 77 |
-
|
| 78 |
-
```bash
|
| 79 |
-
cd warbler-cda-package
|
| 80 |
-
./setup.sh
|
| 81 |
-
python app.py
|
| 82 |
-
```
|
| 83 |
-
|
| 84 |
-
2. Deploy to HuggingFace:
|
| 85 |
-
- Set HF_TOKEN in GitLab CI/CD variables
|
| 86 |
-
- Push to main or create a tag
|
| 87 |
-
- Pipeline will auto-sync to HuggingFace Space
|
| 88 |
-
|
| 89 |
-
3. Publish to PyPI (optional):
|
| 90 |
-
|
| 91 |
-
```bash
|
| 92 |
-
python -m build
|
| 93 |
-
twine upload dist/*
|
| 94 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
PACKS_DEPLOYMENT.md
DELETED
|
@@ -1,281 +0,0 @@
|
|
| 1 |
-
# Warbler Packs Deployment Guide
|
| 2 |
-
|
| 3 |
-
This guide explains how Warbler packs are loaded and deployed to HuggingFace Spaces.
|
| 4 |
-
|
| 5 |
-
## Overview
|
| 6 |
-
|
| 7 |
-
The Warbler CDA Space automatically discovers and ingests content packs at startup. Packs contain conversation templates, NPC dialogues, wisdom templates, and other domain-specific content for the RAG system.
|
| 8 |
-
|
| 9 |
-
## Pack Structure
|
| 10 |
-
|
| 11 |
-
```none
|
| 12 |
-
packs/
|
| 13 |
-
├── warbler-pack-core/ # Essential conversation templates
|
| 14 |
-
├── warbler-pack-faction-politics/ # Political dialogue templates
|
| 15 |
-
├── warbler-pack-wisdom-scrolls/ # Development wisdom generation
|
| 16 |
-
└── warbler-pack-hf-npc-dialogue/ # 1,900+ NPC dialogues from HuggingFace
|
| 17 |
-
```
|
| 18 |
-
|
| 19 |
-
## Deployment Process
|
| 20 |
-
|
| 21 |
-
### 1. Local Development
|
| 22 |
-
|
| 23 |
-
Copy packs from the main repository to warbler-cda-package:
|
| 24 |
-
|
| 25 |
-
```bash
|
| 26 |
-
cd warbler-cda-package
|
| 27 |
-
bash copy_packs.sh
|
| 28 |
-
```
|
| 29 |
-
|
| 30 |
-
This script copies all packs from:
|
| 31 |
-
|
| 32 |
-
```path
|
| 33 |
-
../packages/com.twg.the-seed/The Living Dev Agent/packs/
|
| 34 |
-
```
|
| 35 |
-
|
| 36 |
-
To:
|
| 37 |
-
|
| 38 |
-
```path
|
| 39 |
-
./packs/
|
| 40 |
-
```
|
| 41 |
-
|
| 42 |
-
### 2. Automatic Loading
|
| 43 |
-
|
| 44 |
-
When `app.py` starts, it:
|
| 45 |
-
|
| 46 |
-
1. **Initializes PackLoader**
|
| 47 |
-
|
| 48 |
-
```python
|
| 49 |
-
pack_loader = PackLoader()
|
| 50 |
-
```
|
| 51 |
-
|
| 52 |
-
2. **Discovers documents from all packs**
|
| 53 |
-
|
| 54 |
-
```python
|
| 55 |
-
pack_docs = pack_loader.discover_documents()
|
| 56 |
-
```
|
| 57 |
-
|
| 58 |
-
3. **Ingests documents into RetrievalAPI**
|
| 59 |
-
|
| 60 |
-
```python
|
| 61 |
-
for doc in pack_docs:
|
| 62 |
-
api.add_document(doc["id"], doc["content"], doc["metadata"])
|
| 63 |
-
```
|
| 64 |
-
|
| 65 |
-
4. **Falls back to sample documents** if packs not found
|
| 66 |
-
- Ensures demo works even without packs
|
| 67 |
-
- Provides example data for testing
|
| 68 |
-
|
| 69 |
-
### 3. HuggingFace Space Deployment
|
| 70 |
-
|
| 71 |
-
The `.gitlab-ci.yml` handles deployment:
|
| 72 |
-
|
| 73 |
-
```bash
|
| 74 |
-
hf upload-large-folder $SPACE_NAME . --repo-type=space --space-sdk=gradio
|
| 75 |
-
```
|
| 76 |
-
|
| 77 |
-
This uploads:
|
| 78 |
-
|
| 79 |
-
- All Python source code
|
| 80 |
-
- All packs in the `packs/` directory
|
| 81 |
-
- Configuration files
|
| 82 |
-
|
| 83 |
-
**Important**: The `packs/` directory must exist and contain pack data before deployment.
|
| 84 |
-
|
| 85 |
-
## Pack Loader Details
|
| 86 |
-
|
| 87 |
-
The `PackLoader` class (`warbler_cda/pack_loader.py`) handles:
|
| 88 |
-
|
| 89 |
-
### Pack Discovery
|
| 90 |
-
|
| 91 |
-
- Scans the `packs/` directory
|
| 92 |
-
- Identifies pack type (JSONL-based or structured)
|
| 93 |
-
- Discovers all documents
|
| 94 |
-
|
| 95 |
-
### Document Parsing
|
| 96 |
-
|
| 97 |
-
- **Structured Packs** (core, faction, wisdom): Load from `pack/templates.json`
|
| 98 |
-
- **JSONL Packs** (HF NPC dialogue): Parse line-by-line JSONL format
|
| 99 |
-
|
| 100 |
-
### Metadata Extraction
|
| 101 |
-
|
| 102 |
-
```python
|
| 103 |
-
{
|
| 104 |
-
"pack": "pack-name",
|
| 105 |
-
"type": "template|dialogue",
|
| 106 |
-
"realm_type": "wisdom|faction|narrative",
|
| 107 |
-
"realm_label": "pack-label",
|
| 108 |
-
"lifecycle_stage": "emergence|peak",
|
| 109 |
-
"activity_level": 0.7-0.8
|
| 110 |
-
}
|
| 111 |
-
```
|
| 112 |
-
|
| 113 |
-
## Adding New Packs
|
| 114 |
-
|
| 115 |
-
To add a new pack to the system:
|
| 116 |
-
|
| 117 |
-
### 1. Create Pack Structure
|
| 118 |
-
|
| 119 |
-
```bash
|
| 120 |
-
packs/
|
| 121 |
-
└── warbler-pack-mypack/
|
| 122 |
-
├── package.json
|
| 123 |
-
├── pack/
|
| 124 |
-
│ └── templates.json # OR
|
| 125 |
-
└── mypack.jsonl # JSONL format
|
| 126 |
-
```
|
| 127 |
-
|
| 128 |
-
### 2. Update Pack Loader (if needed)
|
| 129 |
-
|
| 130 |
-
If your pack format is different, add handling to `pack_loader.py`:
|
| 131 |
-
|
| 132 |
-
```python
|
| 133 |
-
def _load_pack(self, pack_dir: Path, pack_name: str):
|
| 134 |
-
if "mypack" in pack_name:
|
| 135 |
-
return self._load_my_format(pack_dir, pack_name)
|
| 136 |
-
# ... existing logic
|
| 137 |
-
```
|
| 138 |
-
|
| 139 |
-
### 3. Register in copy_packs.sh
|
| 140 |
-
|
| 141 |
-
```bash
|
| 142 |
-
PACKS=(
|
| 143 |
-
"warbler-pack-core"
|
| 144 |
-
"warbler-pack-mypack" # Add here
|
| 145 |
-
)
|
| 146 |
-
```
|
| 147 |
-
|
| 148 |
-
### 4. Deploy
|
| 149 |
-
|
| 150 |
-
Run copy script and deploy:
|
| 151 |
-
|
| 152 |
-
```bash
|
| 153 |
-
bash copy_packs.sh
|
| 154 |
-
# Commit and push to trigger CI/CD
|
| 155 |
-
```
|
| 156 |
-
|
| 157 |
-
## Document Format
|
| 158 |
-
|
| 159 |
-
Each loaded document follows this structure:
|
| 160 |
-
|
| 161 |
-
```python
|
| 162 |
-
{
|
| 163 |
-
"id": "pack-name/document-id",
|
| 164 |
-
"content": "Document text content...",
|
| 165 |
-
"metadata": {
|
| 166 |
-
"pack": "pack-name",
|
| 167 |
-
"type": "template|dialogue",
|
| 168 |
-
"realm_type": "wisdom|faction|narrative",
|
| 169 |
-
"realm_label": "label",
|
| 170 |
-
"lifecycle_stage": "emergence|peak|crystallization",
|
| 171 |
-
"activity_level": 0.5-0.8
|
| 172 |
-
}
|
| 173 |
-
}
|
| 174 |
-
```
|
| 175 |
-
|
| 176 |
-
## Monitoring
|
| 177 |
-
|
| 178 |
-
Check pack loading in Space logs:
|
| 179 |
-
|
| 180 |
-
```log
|
| 181 |
-
✓ Loaded 1915 documents from warbler-pack-hf-npc-dialogue
|
| 182 |
-
✓ Loaded 6 documents from warbler-pack-wisdom-scrolls
|
| 183 |
-
✓ Loaded 15 documents from warbler-pack-faction-politics
|
| 184 |
-
✓ Loaded 10 documents from warbler-pack-core
|
| 185 |
-
```
|
| 186 |
-
|
| 187 |
-
Or if packs not found:
|
| 188 |
-
|
| 189 |
-
```log
|
| 190 |
-
⚠️ No Warbler packs found. Using sample documents instead.
|
| 191 |
-
```
|
| 192 |
-
|
| 193 |
-
## Publishing to HuggingFace Hub
|
| 194 |
-
|
| 195 |
-
Each pack has a dataset card for publication:
|
| 196 |
-
|
| 197 |
-
- **README_HF_DATASET.md** - HuggingFace dataset card
|
| 198 |
-
- Contains metadata, attribution, and usage instructions
|
| 199 |
-
|
| 200 |
-
Publish to HuggingFace:
|
| 201 |
-
|
| 202 |
-
```bash
|
| 203 |
-
# Create repo on HuggingFace Hub (one per pack)
|
| 204 |
-
huggingface-cli repo create warbler-pack-core
|
| 205 |
-
|
| 206 |
-
# Push pack as dataset
|
| 207 |
-
cd packs/warbler-pack-core
|
| 208 |
-
huggingface-cli upload . tiny-walnut-games/warbler-pack-core --repo-type dataset
|
| 209 |
-
```
|
| 210 |
-
|
| 211 |
-
## Performance Considerations
|
| 212 |
-
|
| 213 |
-
### Load Time
|
| 214 |
-
|
| 215 |
-
- PackLoader loads all packs at startup
|
| 216 |
-
- Currently: ~1-2 seconds for all packs
|
| 217 |
-
- Packs are cached in memory for query performance
|
| 218 |
-
|
| 219 |
-
### Storage
|
| 220 |
-
|
| 221 |
-
- Core pack: ~50KB
|
| 222 |
-
- Faction politics pack: ~80KB
|
| 223 |
-
- Wisdom scrolls pack: ~60KB
|
| 224 |
-
- HF NPC dialogue: ~2MB
|
| 225 |
-
- **Total**: ~2.3MB
|
| 226 |
-
|
| 227 |
-
### Scaling
|
| 228 |
-
|
| 229 |
-
For larger deployments:
|
| 230 |
-
|
| 231 |
-
- Lazy-load individual packs on demand
|
| 232 |
-
- Implement pack caching layer
|
| 233 |
-
- Use database for large pack collections
|
| 234 |
-
|
| 235 |
-
## Troubleshooting
|
| 236 |
-
|
| 237 |
-
### Packs not loading
|
| 238 |
-
|
| 239 |
-
Check that `packs/` directory exists:
|
| 240 |
-
|
| 241 |
-
```bash
|
| 242 |
-
ls -la packs/
|
| 243 |
-
```
|
| 244 |
-
|
| 245 |
-
Verify pack structure:
|
| 246 |
-
|
| 247 |
-
```bash
|
| 248 |
-
ls -la packs/warbler-pack-core/
|
| 249 |
-
```
|
| 250 |
-
|
| 251 |
-
### Sample documents showing instead
|
| 252 |
-
|
| 253 |
-
If you see "No Warbler packs found", the `packs/` directory is empty. Run:
|
| 254 |
-
|
| 255 |
-
```bash
|
| 256 |
-
bash copy_packs.sh
|
| 257 |
-
```
|
| 258 |
-
|
| 259 |
-
### Pack loader errors
|
| 260 |
-
|
| 261 |
-
Check logs for parsing errors:
|
| 262 |
-
|
| 263 |
-
```log
|
| 264 |
-
Error loading JSONL pack: ...
|
| 265 |
-
Error parsing line 42 in warbler-pack-hf-npc-dialogue.jsonl: ...
|
| 266 |
-
```
|
| 267 |
-
|
| 268 |
-
Fix the source pack and re-run `copy_packs.sh`.
|
| 269 |
-
|
| 270 |
-
## Related Documentation
|
| 271 |
-
|
| 272 |
-
- [README.md](./README.md) - Main package documentation
|
| 273 |
-
- [DEPLOYMENT.md](./DEPLOYMENT.md) - General deployment guide
|
| 274 |
-
- [app.py](./app.py) - Application startup and pack initialization
|
| 275 |
-
- [warbler_cda/pack_loader.py](./warbler_cda/pack_loader.py) - Pack loading implementation
|
| 276 |
-
|
| 277 |
-
## License
|
| 278 |
-
|
| 279 |
-
All packs use MIT License. See individual pack LICENSE files for details.
|
| 280 |
-
|
| 281 |
-
Attribution: Warbler CDA - Tiny Walnut Games
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
PACK_CACHING.md
DELETED
|
@@ -1,172 +0,0 @@
|
|
| 1 |
-
# Warbler Pack Caching Strategy
|
| 2 |
-
|
| 3 |
-
## Overview
|
| 4 |
-
|
| 5 |
-
The app now implements intelligent pack caching to avoid unnecessary re-ingestion of large datasets. This minimizes GitLab storage requirements and allows fast session startup.
|
| 6 |
-
|
| 7 |
-
## How It Works
|
| 8 |
-
|
| 9 |
-
### First Run (Session Start)
|
| 10 |
-
|
| 11 |
-
1. **PackManager** initializes and checks for cached metadata
|
| 12 |
-
2. **Health check** verifies if documents are already in the context store
|
| 13 |
-
3. **Ingestion** occurs only if:
|
| 14 |
-
- No cache metadata exists
|
| 15 |
-
- Pack count changed
|
| 16 |
-
- Health check fails (documents missing)
|
| 17 |
-
4. **Cache** is saved with timestamp and document count
|
| 18 |
-
|
| 19 |
-
### Subsequent Runs
|
| 20 |
-
|
| 21 |
-
- Reuses cached documents without re-ingestion
|
| 22 |
-
- Quick health check ensures documents are still valid
|
| 23 |
-
- Fallback to sample docs if packs unavailable
|
| 24 |
-
|
| 25 |
-
## Environment Variables
|
| 26 |
-
|
| 27 |
-
Control pack ingestion behavior with these variables:
|
| 28 |
-
|
| 29 |
-
### `WARBLER_INGEST_PACKS` (default: `true`)
|
| 30 |
-
|
| 31 |
-
Enable/disable automatic pack ingestion.
|
| 32 |
-
|
| 33 |
-
```bash
|
| 34 |
-
export WARBLER_INGEST_PACKS=false
|
| 35 |
-
```
|
| 36 |
-
|
| 37 |
-
### `WARBLER_SAMPLE_ONLY` (default: `false`)
|
| 38 |
-
|
| 39 |
-
Load only sample documents (for CI/CD verification).
|
| 40 |
-
|
| 41 |
-
```bash
|
| 42 |
-
export WARBLER_SAMPLE_ONLY=true
|
| 43 |
-
```
|
| 44 |
-
|
| 45 |
-
Best for:
|
| 46 |
-
|
| 47 |
-
- PyPI package CI/CD pipelines
|
| 48 |
-
- Quick verification that ingestion works
|
| 49 |
-
- Minimal startup time in restricted environments
|
| 50 |
-
|
| 51 |
-
### `WARBLER_SKIP_PACK_CACHE` (default: `false`)
|
| 52 |
-
|
| 53 |
-
Force reingest even if cache exists.
|
| 54 |
-
|
| 55 |
-
```bash
|
| 56 |
-
export WARBLER_SKIP_PACK_CACHE=true
|
| 57 |
-
```
|
| 58 |
-
|
| 59 |
-
Best for:
|
| 60 |
-
|
| 61 |
-
- Testing pack ingestion pipeline
|
| 62 |
-
- Updating stale cache
|
| 63 |
-
- Debugging
|
| 64 |
-
|
| 65 |
-
## Cache Location
|
| 66 |
-
|
| 67 |
-
Default cache stored at:
|
| 68 |
-
|
| 69 |
-
```path
|
| 70 |
-
~/.warbler_cda/cache/pack_metadata.json
|
| 71 |
-
```
|
| 72 |
-
|
| 73 |
-
Metadata includes:
|
| 74 |
-
|
| 75 |
-
```json
|
| 76 |
-
{
|
| 77 |
-
"ingested_at": 1699564800,
|
| 78 |
-
"pack_count": 7,
|
| 79 |
-
"doc_count": 12345,
|
| 80 |
-
"status": "healthy"
|
| 81 |
-
}
|
| 82 |
-
```
|
| 83 |
-
|
| 84 |
-
## CI/CD Optimization
|
| 85 |
-
|
| 86 |
-
### For GitLab CI (Minimal PyPI Package)
|
| 87 |
-
|
| 88 |
-
```yaml
|
| 89 |
-
test:
|
| 90 |
-
script:
|
| 91 |
-
- export WARBLER_SAMPLE_ONLY=true
|
| 92 |
-
- pip install .
|
| 93 |
-
- python -m pytest tests/
|
| 94 |
-
```
|
| 95 |
-
|
| 96 |
-
Benefits:
|
| 97 |
-
|
| 98 |
-
- ✅ No large pack files in repository
|
| 99 |
-
- ✅ Fast CI runs (5 samples vs 2.5M docs)
|
| 100 |
-
- ✅ Verifies ingestion code works
|
| 101 |
-
- ✅ Full packs load on first user session
|
| 102 |
-
|
| 103 |
-
### For Local Development
|
| 104 |
-
|
| 105 |
-
Keep full packs in working directory:
|
| 106 |
-
|
| 107 |
-
```bash
|
| 108 |
-
cd warbler-cda-package
|
| 109 |
-
python -m warbler_cda.utils.hf_warbler_ingest ingest -d all
|
| 110 |
-
python app.py
|
| 111 |
-
```
|
| 112 |
-
|
| 113 |
-
First run ingests all packs. Subsequent runs use cache.
|
| 114 |
-
|
| 115 |
-
### For Gradio Space/Cloud Deployment
|
| 116 |
-
|
| 117 |
-
Set environment at deployment:
|
| 118 |
-
|
| 119 |
-
```bash
|
| 120 |
-
WARBLER_INGEST_PACKS=true
|
| 121 |
-
```
|
| 122 |
-
|
| 123 |
-
Packs ingest once per session, then cached in instance memory.
|
| 124 |
-
|
| 125 |
-
## Files Affected
|
| 126 |
-
|
| 127 |
-
- `app.py` - Main Gradio app with PackManager
|
| 128 |
-
- `warbler_cda/utils/load_warbler_packs.py` - Pack discovery (already handles caching)
|
| 129 |
-
- No changes needed to pack ingestion scripts
|
| 130 |
-
|
| 131 |
-
## Performance Impact
|
| 132 |
-
|
| 133 |
-
### Memory
|
| 134 |
-
|
| 135 |
-
- **With packs**: ~500MB (2.5M arxiv docs + others)
|
| 136 |
-
- **With samples**: ~1MB (5 test documents)
|
| 137 |
-
|
| 138 |
-
### Startup Time
|
| 139 |
-
|
| 140 |
-
- **First run**: ~30-60 seconds (ingest packs)
|
| 141 |
-
- **Cached run**: ~2-5 seconds (health check only)
|
| 142 |
-
- **Sample only**: <1 second
|
| 143 |
-
|
| 144 |
-
## Troubleshooting
|
| 145 |
-
|
| 146 |
-
### Packs not loading?
|
| 147 |
-
|
| 148 |
-
1. Check `WARBLER_INGEST_PACKS=true` (default)
|
| 149 |
-
2. Verify packs exist: `ls -la packs/`
|
| 150 |
-
3. Force reingest: `export WARBLER_SKIP_PACK_CACHE=true`
|
| 151 |
-
|
| 152 |
-
### Cache corrupted?
|
| 153 |
-
|
| 154 |
-
```bash
|
| 155 |
-
rm -rf ~/.warbler_cda/cache/pack_metadata.json
|
| 156 |
-
```
|
| 157 |
-
|
| 158 |
-
Will reingest on next run.
|
| 159 |
-
|
| 160 |
-
### Need sample docs only?
|
| 161 |
-
|
| 162 |
-
```bash
|
| 163 |
-
export WARBLER_SAMPLE_ONLY=true
|
| 164 |
-
python app.py
|
| 165 |
-
```
|
| 166 |
-
|
| 167 |
-
## Future Improvements
|
| 168 |
-
|
| 169 |
-
- [ ] Detect pack updates via file hash instead of just count
|
| 170 |
-
- [ ] Selective pack loading (choose which datasets to cache)
|
| 171 |
-
- [ ] Metrics dashboard showing cache hit/miss rates
|
| 172 |
-
- [ ] Automatic cache expiration after N days
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
PACK_INGESTION_FIX.md
DELETED
|
@@ -1,209 +0,0 @@
|
|
| 1 |
-
# Pack Ingestion Fix for HuggingFace Space
|
| 2 |
-
|
| 3 |
-
## Problem Summary
|
| 4 |
-
|
| 5 |
-
Your HuggingFace Space was experiencing three critical errors during pack ingestion:
|
| 6 |
-
|
| 7 |
-
1. ❌ **Core pack missing JSONL**: `warbler-pack-core missing JSONL file`
|
| 8 |
-
2. ❌ **Faction pack missing JSONL**: `warbler-pack-faction-politics missing JSONL file`
|
| 9 |
-
3. ❌ **Corrupted arxiv data**: `Error parsing line 145077 in warbler-pack-hf-arxiv.jsonl: Unterminated string`
|
| 10 |
-
|
| 11 |
-
## Root Causes Identified
|
| 12 |
-
|
| 13 |
-
### Issue 1 & 2: Different Pack Formats
|
| 14 |
-
|
| 15 |
-
Your project has **two different pack formats**:
|
| 16 |
-
|
| 17 |
-
**Format A: Structured Packs** (Core & Faction)
|
| 18 |
-
|
| 19 |
-
```none
|
| 20 |
-
warbler-pack-core/
|
| 21 |
-
├── package.json
|
| 22 |
-
├── pack/
|
| 23 |
-
│ └── templates.json ← Data is here!
|
| 24 |
-
└── src/
|
| 25 |
-
```
|
| 26 |
-
|
| 27 |
-
**Format B: JSONL Packs** (HuggingFace datasets)
|
| 28 |
-
|
| 29 |
-
```none
|
| 30 |
-
warbler-pack-hf-arxiv/
|
| 31 |
-
├── package.json
|
| 32 |
-
└── warbler-pack-hf-arxiv-chunk-001.jsonl ← Data is here!
|
| 33 |
-
```
|
| 34 |
-
|
| 35 |
-
The pack loader was expecting **all** packs to have JSONL files, causing false warnings for the structured packs.
|
| 36 |
-
|
| 37 |
-
### Issue 3: Corrupted JSON Line
|
| 38 |
-
|
| 39 |
-
The arxiv pack has a malformed JSON entry at line 145077:
|
| 40 |
-
|
| 41 |
-
```json
|
| 42 |
-
{"content": "This is a test with an unterminated string...
|
| 43 |
-
```
|
| 44 |
-
|
| 45 |
-
The previous code would **crash** on the first error, preventing the entire ingestion from completing.
|
| 46 |
-
|
| 47 |
-
## Solution Implemented
|
| 48 |
-
|
| 49 |
-
### 1. Enhanced Pack Format Detection
|
| 50 |
-
|
| 51 |
-
Updated `_is_valid_warbler_pack()` to recognize **three valid formats**:
|
| 52 |
-
|
| 53 |
-
```python
|
| 54 |
-
if jsonl_file.exists():
|
| 55 |
-
return True # Format B: Single JSONL file
|
| 56 |
-
else:
|
| 57 |
-
templates_file = pack_dir / "pack" / "templates.json"
|
| 58 |
-
if templates_file.exists():
|
| 59 |
-
return False # Format A: Structured pack (triggers different loader)
|
| 60 |
-
else:
|
| 61 |
-
if pack_name.startswith("warbler-pack-hf-"):
|
| 62 |
-
logger.warning(f"HF pack missing JSONL") # Only warn for HF packs
|
| 63 |
-
return False
|
| 64 |
-
```
|
| 65 |
-
|
| 66 |
-
### 2. Robust Error Handling
|
| 67 |
-
|
| 68 |
-
Updated `_load_jsonl_file()` to **continue on error**:
|
| 69 |
-
|
| 70 |
-
```python
|
| 71 |
-
try:
|
| 72 |
-
entry = json.loads(line)
|
| 73 |
-
documents.append(doc)
|
| 74 |
-
except json.JSONDecodeError as e:
|
| 75 |
-
error_count += 1
|
| 76 |
-
if error_count <= 5: # Only log first 5 errors
|
| 77 |
-
logger.warning(f"Error parsing line {line_num}: {e}")
|
| 78 |
-
continue # ← Skip bad line, keep processing!
|
| 79 |
-
```
|
| 80 |
-
|
| 81 |
-
## What Changed
|
| 82 |
-
|
| 83 |
-
**File: `warbler-cda-package/warbler_cda/pack_loader.py`**
|
| 84 |
-
|
| 85 |
-
### Change 1: Smarter Validation
|
| 86 |
-
|
| 87 |
-
- ✅ Recognizes structured packs as valid
|
| 88 |
-
- ✅ Only warns about missing JSONL for HF packs
|
| 89 |
-
- ✅ Better logging messages
|
| 90 |
-
|
| 91 |
-
### Change 2: Error Recovery
|
| 92 |
-
|
| 93 |
-
- ✅ Skips corrupted JSON lines
|
| 94 |
-
- ✅ Limits error logging to first 5 occurrences
|
| 95 |
-
- ✅ Reports summary: "Loaded X documents (Y lines skipped)"
|
| 96 |
-
|
| 97 |
-
## Expected Behavior After Fix
|
| 98 |
-
|
| 99 |
-
### Before (Broken)
|
| 100 |
-
|
| 101 |
-
```none
|
| 102 |
-
[INFO] Pack Status: ✓ All 6 packs verified and ready
|
| 103 |
-
Single-file pack warbler-pack-core missing JSONL file: /home/user/app/packs/warbler-pack-core/warbler-pack-core.jsonl
|
| 104 |
-
Single-file pack warbler-pack-faction-politics missing JSONL file: /home/user/app/packs/warbler-pack-faction-politics/warbler-pack-faction-politics.jsonl
|
| 105 |
-
Error parsing line 145077 in /home/user/app/packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv.jsonl: Unterminated string
|
| 106 |
-
[INFO] Ingesting 374869 documents from Warbler packs...
|
| 107 |
-
[ERROR] Ingestion failed!
|
| 108 |
-
```
|
| 109 |
-
|
| 110 |
-
### After (Fixed)
|
| 111 |
-
|
| 112 |
-
```none
|
| 113 |
-
[INFO] Pack Status: ✓ All 10 packs verified and ready
|
| 114 |
-
[INFO] Ingesting documents from Warbler packs...
|
| 115 |
-
[INFO] Loading pack: warbler-pack-core
|
| 116 |
-
[DEBUG] Pack warbler-pack-core uses structured format (pack/templates.json)
|
| 117 |
-
[INFO] ✓ Loaded 8 documents from warbler-pack-core
|
| 118 |
-
[INFO] Loading pack: warbler-pack-faction-politics
|
| 119 |
-
[DEBUG] Pack warbler-pack-faction-politics uses structured format (pack/templates.json)
|
| 120 |
-
[INFO] ✓ Loaded 6 documents from warbler-pack-faction-politics
|
| 121 |
-
[INFO] Loading pack: warbler-pack-hf-arxiv
|
| 122 |
-
[INFO] Loading chunked pack: warbler-pack-hf-arxiv
|
| 123 |
-
[INFO] Found 5 chunk files for warbler-pack-hf-arxiv
|
| 124 |
-
[WARN] Error parsing line 145077 in warbler-pack-hf-arxiv-chunk-003.jsonl: Unterminated string
|
| 125 |
-
[INFO] Loaded 49999 documents from warbler-pack-hf-arxiv-chunk-003.jsonl (1 lines skipped due to errors)
|
| 126 |
-
[INFO] Loaded 250000 total documents from 5 chunks
|
| 127 |
-
...
|
| 128 |
-
[OK] Loaded 374868 documents from Warbler packs (1 corrupted line skipped)
|
| 129 |
-
```
|
| 130 |
-
|
| 131 |
-
## Testing the Fix
|
| 132 |
-
|
| 133 |
-
### Local Testing
|
| 134 |
-
|
| 135 |
-
1. **Test with sample packs**:
|
| 136 |
-
|
| 137 |
-
```bash
|
| 138 |
-
cd warbler-cda-package
|
| 139 |
-
python -c "from warbler_cda.pack_loader import PackLoader; loader = PackLoader(); docs = loader.discover_documents(); print(f'Loaded {len(docs)} documents')"
|
| 140 |
-
```
|
| 141 |
-
|
| 142 |
-
2. **Run the app locally**:
|
| 143 |
-
|
| 144 |
-
```bash
|
| 145 |
-
python app.py
|
| 146 |
-
```
|
| 147 |
-
|
| 148 |
-
### HuggingFace Space Testing
|
| 149 |
-
|
| 150 |
-
1. **Merge this MR** to main branch
|
| 151 |
-
2. **Push to HuggingFace** (if auto-sync is not enabled)
|
| 152 |
-
3. **Check the Space logs** for the new output format
|
| 153 |
-
4. **Verify document count** in the System Stats tab
|
| 154 |
-
|
| 155 |
-
## Next Steps
|
| 156 |
-
|
| 157 |
-
1. ✅ **Review the MR**: [!15 - Fix HuggingFace pack ingestion issues](https://gitlab.com/tiny-walnut-games/the-seed/-/merge_requests/15)
|
| 158 |
-
|
| 159 |
-
2. ✅ **Merge when ready**: The fix is backward compatible and safe to merge
|
| 160 |
-
|
| 161 |
-
3. ✅ **Monitor HF Space**: After deployment, check that:
|
| 162 |
-
- All packs load successfully
|
| 163 |
-
- Document count is ~374,868 (minus 1 corrupted line)
|
| 164 |
-
- No error messages in logs
|
| 165 |
-
|
| 166 |
-
4. 🔧 **Optional: Fix corrupted line** (future improvement):
|
| 167 |
-
- Identify the exact corrupted entry in arxiv chunk 3
|
| 168 |
-
- Re-generate that chunk from source dataset
|
| 169 |
-
- Update the pack
|
| 170 |
-
|
| 171 |
-
## Additional Notes
|
| 172 |
-
|
| 173 |
-
### Why Not Fix the Corrupted Line Now?
|
| 174 |
-
|
| 175 |
-
The corrupted line is likely from the source HuggingFace dataset (`nick007x/arxiv-papers`). Options:
|
| 176 |
-
|
| 177 |
-
1. **Skip it** (current solution) - Loses 1 document out of 2.5M
|
| 178 |
-
2. **Re-ingest** - Download and re-process the entire arxiv dataset
|
| 179 |
-
3. **Manual fix** - Find and repair the specific line
|
| 180 |
-
|
| 181 |
-
For now, **skipping is the pragmatic choice** - you lose 0.00004% of data and gain a working system.
|
| 182 |
-
|
| 183 |
-
### Pack Format Standardization
|
| 184 |
-
|
| 185 |
-
Consider standardizing all packs to JSONL format in the future:
|
| 186 |
-
|
| 187 |
-
```bash
|
| 188 |
-
# Convert structured packs to JSONL
|
| 189 |
-
python -m warbler_cda.utils.convert_structured_to_jsonl \
|
| 190 |
-
--input packs/warbler-pack-core/pack/templates.json \
|
| 191 |
-
--output packs/warbler-pack-core/warbler-pack-core.jsonl
|
| 192 |
-
```
|
| 193 |
-
|
| 194 |
-
This would simplify the loader logic and make all packs consistent.
|
| 195 |
-
|
| 196 |
-
## Questions?
|
| 197 |
-
|
| 198 |
-
If you encounter any issues:
|
| 199 |
-
|
| 200 |
-
1. Check the HF Space logs for detailed error messages
|
| 201 |
-
2. Verify pack structure matches expected formats
|
| 202 |
-
3. Test locally with `PackLoader().discover_documents()`
|
| 203 |
-
4. Review this document for troubleshooting tips
|
| 204 |
-
|
| 205 |
-
---
|
| 206 |
-
|
| 207 |
-
**Status**: ✅ Fix implemented and ready for merge
|
| 208 |
-
**MR**: !15
|
| 209 |
-
**Impact**: Fixes all 3 ingestion errors, enables full pack loading
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
PDF_INGESTION_INVESTIGATION.md
DELETED
|
@@ -1,325 +0,0 @@
|
|
| 1 |
-
# PDF Ingestion Investigation Report
|
| 2 |
-
|
| 3 |
-
**Date**: 2024
|
| 4 |
-
**Session Reference**: Based on agent session 1251355
|
| 5 |
-
**Investigator**: AI Agent
|
| 6 |
-
|
| 7 |
-
## Executive Summary
|
| 8 |
-
|
| 9 |
-
Investigation into the warbler-cda-package ingesters to determine if they are properly utilizing PDFPlumber for reading PDF files. The investigation revealed that **PDFPlumber IS being utilized**, but there were **two bugs** that needed fixing.
|
| 10 |
-
|
| 11 |
-
## Key Findings
|
| 12 |
-
|
| 13 |
-
### ✅ PDFPlumber Integration Status: CONFIRMED
|
| 14 |
-
|
| 15 |
-
The ingesters **ARE** utilizing PDFPlumber to read PDF files. The implementation is present and functional with proper fallback mechanisms.
|
| 16 |
-
|
| 17 |
-
### 📍 PDFPlumber Usage Locations
|
| 18 |
-
|
| 19 |
-
#### 1. **Import and Availability Check** (Lines 23-27)
|
| 20 |
-
|
| 21 |
-
```python
|
| 22 |
-
try:
|
| 23 |
-
import pdfplumber
|
| 24 |
-
PDF_AVAILABLE = True
|
| 25 |
-
except ImportError:
|
| 26 |
-
PDF_AVAILABLE = False
|
| 27 |
-
```
|
| 28 |
-
|
| 29 |
-
**Status**: ✅ Properly implemented with graceful fallback
|
| 30 |
-
|
| 31 |
-
#### 2. **PDF Support Detection Method** (Lines 47-49)
|
| 32 |
-
|
| 33 |
-
```python
|
| 34 |
-
def has_pdf_support(self) -> bool:
|
| 35 |
-
"""Check if PDF extraction is available"""
|
| 36 |
-
return PDF_AVAILABLE
|
| 37 |
-
```
|
| 38 |
-
|
| 39 |
-
**Status**: ✅ Provides runtime check for PDF capabilities
|
| 40 |
-
|
| 41 |
-
#### 3. **Primary PDF Extraction Method** (Lines 51-67)
|
| 42 |
-
|
| 43 |
-
```python
|
| 44 |
-
def extract_pdf_text(self, pdf_bytes: bytes, max_chars: int = 5000) -> Optional[str]:
|
| 45 |
-
"""Extract text from PDF bytes with fallback"""
|
| 46 |
-
if not PDF_AVAILABLE:
|
| 47 |
-
return None
|
| 48 |
-
|
| 49 |
-
try:
|
| 50 |
-
pdf_file = io.BytesIO(pdf_bytes)
|
| 51 |
-
text_parts = []
|
| 52 |
-
|
| 53 |
-
with pdfplumber.open(pdf_file) as pdf:
|
| 54 |
-
for page in pdf.pages:
|
| 55 |
-
text = page.extract_text()
|
| 56 |
-
if text:
|
| 57 |
-
text_parts.append(text)
|
| 58 |
-
if sum(len(t) for t in text_parts) > max_chars:
|
| 59 |
-
break
|
| 60 |
-
|
| 61 |
-
return " ".join(text_parts)[:max_chars] if text_parts else None
|
| 62 |
-
except Exception as e:
|
| 63 |
-
logger.debug(f"PDF extraction error: {e}")
|
| 64 |
-
return None
|
| 65 |
-
```
|
| 66 |
-
|
| 67 |
-
**Status**: ✅ Properly implemented with:
|
| 68 |
-
|
| 69 |
-
- Character limit protection (max_chars=5000)
|
| 70 |
-
- Page-by-page extraction
|
| 71 |
-
- Error handling
|
| 72 |
-
- Graceful fallback
|
| 73 |
-
|
| 74 |
-
#### 4. **Flexible PDF Extraction Method** (Lines 540-565)
|
| 75 |
-
|
| 76 |
-
```python
|
| 77 |
-
def _extract_pdf_text(self, pdf_data: Any) -> Optional[str]:
|
| 78 |
-
"""Extract text from PDF data (bytes, file path, or file-like object)"""
|
| 79 |
-
if not PDF_AVAILABLE: # ⚠️ FIXED: Was PDF_SUPPORT
|
| 80 |
-
return None
|
| 81 |
-
|
| 82 |
-
try:
|
| 83 |
-
# Handle different PDF data types
|
| 84 |
-
if isinstance(pdf_data, bytes):
|
| 85 |
-
pdf_file = io.BytesIO(pdf_data)
|
| 86 |
-
elif isinstance(pdf_data, str) and os.path.exists(pdf_data):
|
| 87 |
-
pdf_file = pdf_data
|
| 88 |
-
elif hasattr(pdf_data, 'read'):
|
| 89 |
-
pdf_file = pdf_data
|
| 90 |
-
else:
|
| 91 |
-
return None
|
| 92 |
-
|
| 93 |
-
# Extract text from all pages
|
| 94 |
-
text_parts = []
|
| 95 |
-
with pdfplumber.open(pdf_file) as pdf:
|
| 96 |
-
for page in pdf.pages:
|
| 97 |
-
page_text = page.extract_text()
|
| 98 |
-
if page_text:
|
| 99 |
-
text_parts.append(page_text)
|
| 100 |
-
|
| 101 |
-
return "\n\n".join(text_parts) if text_parts else None
|
| 102 |
-
|
| 103 |
-
except Exception as e:
|
| 104 |
-
logger.debug(f"PDF extraction error: {e}")
|
| 105 |
-
return None
|
| 106 |
-
```
|
| 107 |
-
|
| 108 |
-
**Status**: ✅ Handles multiple input types (bytes, file path, file-like objects)
|
| 109 |
-
|
| 110 |
-
### 🎯 Transformers Using PDF Extraction
|
| 111 |
-
|
| 112 |
-
#### 1. **transform_novels()** (Lines 247-320)
|
| 113 |
-
|
| 114 |
-
- **Dataset**: GOAT-AI/generated-novels
|
| 115 |
-
- **PDF Usage**: Attempts to extract from PDF fields when text fields are unavailable
|
| 116 |
-
- **Fallback**: Creates placeholder entries with informative messages
|
| 117 |
-
- **Code Location**: Lines 285-295
|
| 118 |
-
|
| 119 |
-
```python
|
| 120 |
-
if not text and self.has_pdf_support():
|
| 121 |
-
for pdf_field in ['pdf', 'file', 'document']:
|
| 122 |
-
try:
|
| 123 |
-
if isinstance(item, dict):
|
| 124 |
-
if pdf_field in item and item[pdf_field]:
|
| 125 |
-
text = self.extract_pdf_text(item[pdf_field])
|
| 126 |
-
if text:
|
| 127 |
-
logger.info(f"Novel {idx + 1}: Extracted {len(text)} chars from PDF")
|
| 128 |
-
break
|
| 129 |
-
```
|
| 130 |
-
|
| 131 |
-
**Status**: ✅ Properly integrated with PDF extraction
|
| 132 |
-
|
| 133 |
-
#### 2. **transform_portuguese_education()** (Lines 400-500+)
|
| 134 |
-
|
| 135 |
-
- **Dataset**: Solshine/Portuguese_Language_Education_Texts
|
| 136 |
-
- **PDF Usage**: Could potentially use PDF extraction (not explicitly shown in current code)
|
| 137 |
-
- **Fallback**: Creates informative placeholders when content is unavailable
|
| 138 |
-
|
| 139 |
-
**Status**: ✅ Has fallback mechanisms in place
|
| 140 |
-
|
| 141 |
-
## 🐛 Bugs Found and Fixed
|
| 142 |
-
|
| 143 |
-
### Bug #1: Incorrect Variable Name in `_extract_pdf_text()`
|
| 144 |
-
|
| 145 |
-
**Location**: Line 542
|
| 146 |
-
**Issue**: Used `PDF_SUPPORT` instead of `PDF_AVAILABLE`
|
| 147 |
-
**Impact**: Would cause NameError when `_extract_pdf_text()` is called
|
| 148 |
-
**Fix Applied**: Changed `PDF_SUPPORT` to `PDF_AVAILABLE`
|
| 149 |
-
|
| 150 |
-
```diff
|
| 151 |
-
- if not PDF_SUPPORT:
|
| 152 |
-
+ if not PDF_AVAILABLE:
|
| 153 |
-
```
|
| 154 |
-
|
| 155 |
-
### Bug #2: Duplicate `import io` Statement
|
| 156 |
-
|
| 157 |
-
**Location**: Line 56 (inside `extract_pdf_text` method)
|
| 158 |
-
**Issue**: `import io` was inside the method instead of at module level
|
| 159 |
-
**Impact**: Unnecessary repeated imports, potential performance impact
|
| 160 |
-
**Fix Applied**:
|
| 161 |
-
|
| 162 |
-
1. Added `import io` to module-level imports (Line 10)
|
| 163 |
-
2. Removed duplicate `import io` from inside method
|
| 164 |
-
|
| 165 |
-
```diff
|
| 166 |
-
# At module level (Line 10)
|
| 167 |
-
+ import io
|
| 168 |
-
|
| 169 |
-
# Inside extract_pdf_text method (Line 56)
|
| 170 |
-
- import io
|
| 171 |
-
```
|
| 172 |
-
|
| 173 |
-
## 📦 Dependency Configuration
|
| 174 |
-
|
| 175 |
-
### requirements.txt
|
| 176 |
-
|
| 177 |
-
```text
|
| 178 |
-
pdfplumber>=0.11.0
|
| 179 |
-
```
|
| 180 |
-
|
| 181 |
-
**Status**: ✅ Properly listed as a dependency
|
| 182 |
-
|
| 183 |
-
### pyproject.toml
|
| 184 |
-
|
| 185 |
-
**Status**: ⚠️ NOT listed in core dependencies
|
| 186 |
-
**Recommendation**: Consider adding to optional dependencies or core dependencies
|
| 187 |
-
|
| 188 |
-
```toml
|
| 189 |
-
[project.optional-dependencies]
|
| 190 |
-
pdf = [
|
| 191 |
-
"pdfplumber>=0.11.0",
|
| 192 |
-
]
|
| 193 |
-
```
|
| 194 |
-
|
| 195 |
-
## 🔍 How PDFPlumber is Actually Used
|
| 196 |
-
|
| 197 |
-
### Workflow
|
| 198 |
-
|
| 199 |
-
1. **Import Check**: On module load, attempts to import pdfplumber
|
| 200 |
-
2. **Availability Flag**: Sets `PDF_AVAILABLE = True/False` based on import success
|
| 201 |
-
3. **Runtime Check**: `has_pdf_support()` method checks availability
|
| 202 |
-
4. **Extraction Attempt**: When processing datasets:
|
| 203 |
-
- First tries to find text in standard fields (text, story, content, etc.)
|
| 204 |
-
- If no text found AND `has_pdf_support()` returns True:
|
| 205 |
-
- Searches for PDF fields (pdf, file, document)
|
| 206 |
-
- Calls `extract_pdf_text()` to extract content
|
| 207 |
-
- Logs extraction success with character count
|
| 208 |
-
5. **Graceful Fallback**: If PDF extraction fails or unavailable:
|
| 209 |
-
- Creates informative placeholder entries
|
| 210 |
-
- Includes metadata about PDF availability
|
| 211 |
-
- Maintains system functionality
|
| 212 |
-
|
| 213 |
-
### Example from `transform_novels()`
|
| 214 |
-
|
| 215 |
-
```python
|
| 216 |
-
# Try text fields first
|
| 217 |
-
for field in ['text', 'story', 'content', 'novel', 'body', 'full_text']:
|
| 218 |
-
if field in item and item[field]:
|
| 219 |
-
text = item[field]
|
| 220 |
-
break
|
| 221 |
-
|
| 222 |
-
# If no text, try PDF extraction
|
| 223 |
-
if not text and self.has_pdf_support():
|
| 224 |
-
for pdf_field in ['pdf', 'file', 'document']:
|
| 225 |
-
if pdf_field in item and item[pdf_field]:
|
| 226 |
-
text = self.extract_pdf_text(item[pdf_field])
|
| 227 |
-
if text:
|
| 228 |
-
logger.info(f"Novel {idx + 1}: Extracted {len(text)} chars from PDF")
|
| 229 |
-
break
|
| 230 |
-
|
| 231 |
-
# If still no text, create placeholder
|
| 232 |
-
if not text:
|
| 233 |
-
text = f"""[Novel Content Unavailable]
|
| 234 |
-
|
| 235 |
-
This novel (#{idx + 1}) is part of the GOAT-AI/generated-novels dataset.
|
| 236 |
-
The original content may be stored in PDF format or require special extraction.
|
| 237 |
-
|
| 238 |
-
PDF extraction support: {'Available (install pdfplumber)' if not self.has_pdf_support() else 'Enabled'}
|
| 239 |
-
"""
|
| 240 |
-
```
|
| 241 |
-
|
| 242 |
-
## 🎯 Tactical Assessment
|
| 243 |
-
|
| 244 |
-
### Current Strategy: ✅ SOUND
|
| 245 |
-
|
| 246 |
-
The current approach is **well-designed** and does NOT require changing tactics:
|
| 247 |
-
|
| 248 |
-
1. **Graceful Degradation**: System works with or without pdfplumber
|
| 249 |
-
2. **Multiple Fallbacks**: Tries text fields first, then PDF, then placeholders
|
| 250 |
-
3. **Informative Placeholders**: When content unavailable, creates useful metadata
|
| 251 |
-
4. **Proper Error Handling**: All PDF operations wrapped in try-except
|
| 252 |
-
5. **Logging**: Provides visibility into extraction success/failure
|
| 253 |
-
|
| 254 |
-
### Recommendations
|
| 255 |
-
|
| 256 |
-
#### 1. **Keep Current Approach** ✅
|
| 257 |
-
|
| 258 |
-
The multi-layered fallback strategy is excellent for production systems.
|
| 259 |
-
|
| 260 |
-
#### 2. **Fix Applied Bugs** ✅
|
| 261 |
-
|
| 262 |
-
- Fixed `PDF_SUPPORT` → `PDF_AVAILABLE` variable name
|
| 263 |
-
- Fixed duplicate `import io` statement
|
| 264 |
-
|
| 265 |
-
#### 3. **Optional Enhancement**: Add to pyproject.toml
|
| 266 |
-
|
| 267 |
-
Consider adding pdfplumber to optional dependencies:
|
| 268 |
-
|
| 269 |
-
```toml
|
| 270 |
-
[project.optional-dependencies]
|
| 271 |
-
pdf = [
|
| 272 |
-
"pdfplumber>=0.11.0",
|
| 273 |
-
]
|
| 274 |
-
```
|
| 275 |
-
|
| 276 |
-
#### 4. **Documentation Enhancement**
|
| 277 |
-
|
| 278 |
-
The code already has good inline documentation. Consider adding to README:
|
| 279 |
-
|
| 280 |
-
- How to enable PDF support
|
| 281 |
-
- What happens when PDF support is unavailable
|
| 282 |
-
- Which datasets benefit from PDF extraction
|
| 283 |
-
|
| 284 |
-
## 📊 Test Coverage
|
| 285 |
-
|
| 286 |
-
The test suite (`test_pdf_ingestion.py`) covers:
|
| 287 |
-
|
| 288 |
-
- ✅ PDF support detection
|
| 289 |
-
- ✅ PDF extraction method existence
|
| 290 |
-
- ✅ Placeholder creation
|
| 291 |
-
- ✅ Novel dataset with PDF fields
|
| 292 |
-
- ✅ Novel dataset with text fields
|
| 293 |
-
- ✅ Portuguese education with PDF fields
|
| 294 |
-
- ✅ Output format validation
|
| 295 |
-
|
| 296 |
-
## 🎓 Conclusion
|
| 297 |
-
|
| 298 |
-
**PDFPlumber IS being utilized properly** in the ingesters. The implementation:
|
| 299 |
-
|
| 300 |
-
- ✅ Has proper import and availability checking
|
| 301 |
-
- ✅ Provides two PDF extraction methods (simple and flexible)
|
| 302 |
-
- ✅ Integrates PDF extraction into dataset transformers
|
| 303 |
-
- ✅ Has comprehensive fallback mechanisms
|
| 304 |
-
- ✅ Is well-tested
|
| 305 |
-
- ✅ Is properly documented
|
| 306 |
-
|
| 307 |
-
**Bugs Fixed**:
|
| 308 |
-
|
| 309 |
-
1. Variable name typo: `PDF_SUPPORT` → `PDF_AVAILABLE`
|
| 310 |
-
2. Duplicate import: Moved `import io` to module level
|
| 311 |
-
|
| 312 |
-
**No tactical changes needed** - the current approach is sound and production-ready.
|
| 313 |
-
|
| 314 |
-
## 📝 Files Modified
|
| 315 |
-
|
| 316 |
-
1. `warbler-cda-package/warbler_cda/utils/hf_warbler_ingest.py`
|
| 317 |
-
- Fixed variable name in `_extract_pdf_text()` method
|
| 318 |
-
- Added `import io` to module-level imports
|
| 319 |
-
- Removed duplicate `import io` from method
|
| 320 |
-
|
| 321 |
-
## 🔗 Related Files
|
| 322 |
-
|
| 323 |
-
- `warbler-cda-package/requirements.txt` - Lists pdfplumber>=0.11.0
|
| 324 |
-
- `warbler-cda-package/tests/test_pdf_ingestion.py` - Test suite for PDF functionality
|
| 325 |
-
- `warbler-cda-package/pyproject.toml` - Package configuration (could add optional PDF dependency)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
README_HF.md
CHANGED
|
@@ -8,7 +8,7 @@ pinned: false
|
|
| 8 |
license: mit
|
| 9 |
---
|
| 10 |
|
| 11 |
-
|
| 12 |
|
| 13 |
A production-ready RAG system with **FractalStat 8D multi-dimensional addressing** for intelligent document retrieval.
|
| 14 |
|
|
|
|
| 8 |
license: mit
|
| 9 |
---
|
| 10 |
|
| 11 |
+
## Warbler CDA - Cognitive Development Architecture
|
| 12 |
|
| 13 |
A production-ready RAG system with **FractalStat 8D multi-dimensional addressing** for intelligent document retrieval.
|
| 14 |
|
TESTS_PORTED.md
DELETED
|
@@ -1,271 +0,0 @@
|
|
| 1 |
-
# Tests Ported to Warbler CDA Package
|
| 2 |
-
|
| 3 |
-
This document summarizes the TDD (Test-Driven Development) test suite that has been ported from the main project to the warbler-cda-package for HuggingFace deployment.
|
| 4 |
-
|
| 5 |
-
## Overview
|
| 6 |
-
|
| 7 |
-
The complete test suite for the Warbler CDA (Cognitive Development Architecture) RAG system has been ported and adapted for the standalone package. This includes:
|
| 8 |
-
|
| 9 |
-
- **4 main test modules** with comprehensive coverage
|
| 10 |
-
- **1 end-to-end integration test suite**
|
| 11 |
-
- **Pytest configuration** with custom markers
|
| 12 |
-
- **Test documentation** and running instructions
|
| 13 |
-
|
| 14 |
-
## Test Files Ported
|
| 15 |
-
|
| 16 |
-
### 1. **tests/test_embedding_providers.py** (9.5 KB)
|
| 17 |
-
|
| 18 |
-
**Source**: Adapted from `packages/com.twg.the-seed/The Living Dev Agent/tests/test_semantic_anchors.py`
|
| 19 |
-
|
| 20 |
-
**Coverage**:
|
| 21 |
-
|
| 22 |
-
- EmbeddingProviderFactory pattern
|
| 23 |
-
- LocalEmbeddingProvider (TF-IDF based)
|
| 24 |
-
- SentenceTransformerEmbeddingProvider (GPU-accelerated)
|
| 25 |
-
- Embedding generation (single and batch)
|
| 26 |
-
- Similarity calculations
|
| 27 |
-
- Provider information and metadata
|
| 28 |
-
|
| 29 |
-
**Tests**:
|
| 30 |
-
|
| 31 |
-
- `test_factory_creates_local_provider` - Factory can create local providers
|
| 32 |
-
- `test_factory_list_available_providers` - Factory lists available providers
|
| 33 |
-
- `test_factory_default_provider` - Factory defaults to SentenceTransformer with fallback
|
| 34 |
-
- `test_embed_single_text` - Single text embedding
|
| 35 |
-
- `test_embed_batch` - Batch embedding
|
| 36 |
-
- `test_similarity_calculation` - Cosine similarity
|
| 37 |
-
- `test_semantic_search` - K-nearest neighbor search
|
| 38 |
-
- `test_stat7_computation` - STAT7 coordinate computation
|
| 39 |
-
- And 8 more embedding-focused tests
|
| 40 |
-
|
| 41 |
-
### 2. **tests/test_retrieval_api.py** (11.9 KB)
|
| 42 |
-
|
| 43 |
-
**Source**: Adapted from `packages/com.twg.the-seed/seed/engine/test_retrieval_debug.py`
|
| 44 |
-
|
| 45 |
-
**Coverage**:
|
| 46 |
-
|
| 47 |
-
- Context store operations
|
| 48 |
-
- Document addition and deduplication
|
| 49 |
-
- Query execution and filtering
|
| 50 |
-
- Retrieval modes (semantic, temporal, composite)
|
| 51 |
-
- Confidence threshold filtering
|
| 52 |
-
- Result structure validation
|
| 53 |
-
- Caching and metrics
|
| 54 |
-
|
| 55 |
-
**Tests**:
|
| 56 |
-
|
| 57 |
-
- `TestRetrievalAPIContextStore` - 4 tests for document store
|
| 58 |
-
- `TestRetrievalQueryExecution` - 5 tests for query operations
|
| 59 |
-
- `TestRetrievalModes` - 3 tests for different retrieval modes
|
| 60 |
-
- `TestRetrievalHybridScoring` - 2 tests for STAT7 hybrid scoring
|
| 61 |
-
- `TestRetrievalMetrics` - 2 tests for metrics tracking
|
| 62 |
-
- Total: 16+ tests
|
| 63 |
-
|
| 64 |
-
### 3. **tests/test_stat7_integration.py** (12.3 KB)
|
| 65 |
-
|
| 66 |
-
**Source**: Original implementation for STAT7 support
|
| 67 |
-
|
| 68 |
-
**Coverage**:
|
| 69 |
-
|
| 70 |
-
- STAT7 coordinate computation from embeddings
|
| 71 |
-
- Hybrid semantic + STAT7 scoring
|
| 72 |
-
- STAT7 resonance calculation
|
| 73 |
-
- Document enrichment with STAT7 data
|
| 74 |
-
- Multi-dimensional query addressing
|
| 75 |
-
- STAT7 dimensional properties
|
| 76 |
-
|
| 77 |
-
**Tests**:
|
| 78 |
-
|
| 79 |
-
- `TestSTAT7CoordinateComputation` - 3 tests
|
| 80 |
-
- `TestSTAT7HybridScoring` - 3 tests
|
| 81 |
-
- `TestSTAT7DocumentEnrichment` - 2 tests
|
| 82 |
-
- `TestSTAT7QueryAddressing` - 2 tests
|
| 83 |
-
- `TestSTAT7Dimensions` - 2 tests
|
| 84 |
-
- Total: 12+ tests
|
| 85 |
-
|
| 86 |
-
### 4. **tests/test_rag_e2e.py** (12.6 KB)
|
| 87 |
-
|
| 88 |
-
**Source**: Adapted from `packages/com.twg.the-seed/The Living Dev Agent/tests/test_exp08_rag_integration.py`
|
| 89 |
-
|
| 90 |
-
**Coverage**:
|
| 91 |
-
|
| 92 |
-
- Complete end-to-end RAG pipeline
|
| 93 |
-
- Embedding generation validation
|
| 94 |
-
- Document ingestion
|
| 95 |
-
- Semantic search retrieval
|
| 96 |
-
- Temporal retrieval
|
| 97 |
-
- Metrics tracking
|
| 98 |
-
- Full system integration
|
| 99 |
-
|
| 100 |
-
**Tests**:
|
| 101 |
-
|
| 102 |
-
1. `test_01_embedding_generation` - Embeddings are generated
|
| 103 |
-
2. `test_02_embedding_similarity` - Similarity scoring works
|
| 104 |
-
3. `test_03_document_ingestion` - Documents are ingested
|
| 105 |
-
4. `test_04_semantic_search` - Semantic search works
|
| 106 |
-
5. `test_05_max_results_respected` - Result limiting works
|
| 107 |
-
6. `test_06_confidence_threshold` - Threshold filtering works
|
| 108 |
-
7. `test_07_stat7_hybrid_scoring` - Hybrid scoring works
|
| 109 |
-
8. `test_08_temporal_retrieval` - Temporal queries work
|
| 110 |
-
9. `test_09_retrieval_metrics` - Metrics are tracked
|
| 111 |
-
10. `test_10_full_rag_pipeline` - Complete pipeline works
|
| 112 |
-
|
| 113 |
-
### 5. **tests/conftest.py** (1.6 KB)
|
| 114 |
-
|
| 115 |
-
**Purpose**: Pytest configuration and fixtures
|
| 116 |
-
|
| 117 |
-
**Includes**:
|
| 118 |
-
|
| 119 |
-
- Custom pytest markers (embedding, retrieval, stat7, e2e, slow)
|
| 120 |
-
- Test data fixtures
|
| 121 |
-
- Pytest configuration hooks
|
| 122 |
-
|
| 123 |
-
### 6. **tests/README.md** (5.6 KB)
|
| 124 |
-
|
| 125 |
-
**Purpose**: Test documentation
|
| 126 |
-
|
| 127 |
-
**Contains**:
|
| 128 |
-
|
| 129 |
-
- Test organization overview
|
| 130 |
-
- Running instructions
|
| 131 |
-
- Test coverage summary
|
| 132 |
-
- Troubleshooting guide
|
| 133 |
-
- CI/CD integration examples
|
| 134 |
-
|
| 135 |
-
## Test Statistics
|
| 136 |
-
|
| 137 |
-
| Category | Count |
|
| 138 |
-
|----------|-------|
|
| 139 |
-
| Total Test Classes | 16 |
|
| 140 |
-
| Total Test Methods | 50+ |
|
| 141 |
-
| Total Test Files | 4 |
|
| 142 |
-
| Test Size | ~47 KB |
|
| 143 |
-
| Coverage Scope | 90%+ of core functionality |
|
| 144 |
-
|
| 145 |
-
## Key Testing Areas
|
| 146 |
-
|
| 147 |
-
### Embedding Providers
|
| 148 |
-
|
| 149 |
-
- ✅ Local TF-IDF provider (no dependencies)
|
| 150 |
-
- ✅ SentenceTransformer provider (GPU acceleration)
|
| 151 |
-
- ✅ Factory pattern with graceful fallback
|
| 152 |
-
- ✅ Batch processing
|
| 153 |
-
- ✅ Similarity calculations
|
| 154 |
-
- ✅ Semantic search
|
| 155 |
-
|
| 156 |
-
### Retrieval Operations
|
| 157 |
-
|
| 158 |
-
- ✅ Document ingestion and storage
|
| 159 |
-
- ✅ Context store management
|
| 160 |
-
- ✅ Query execution
|
| 161 |
-
- ✅ Semantic similarity retrieval
|
| 162 |
-
- ✅ Temporal sequence retrieval
|
| 163 |
-
- ✅ Composite retrieval modes
|
| 164 |
-
|
| 165 |
-
### STAT7 Integration
|
| 166 |
-
|
| 167 |
-
- ✅ Coordinate computation from embeddings
|
| 168 |
-
- ✅ Hybrid scoring (semantic + STAT7)
|
| 169 |
-
- ✅ Resonance calculations
|
| 170 |
-
- ✅ Multi-dimensional addressing
|
| 171 |
-
- ✅ Document enrichment
|
| 172 |
-
|
| 173 |
-
### System Integration
|
| 174 |
-
|
| 175 |
-
- ✅ End-to-end pipeline
|
| 176 |
-
- ✅ Metrics and performance tracking
|
| 177 |
-
- ✅ Caching mechanisms
|
| 178 |
-
- ✅ Error handling and fallbacks
|
| 179 |
-
|
| 180 |
-
## Running the Tests
|
| 181 |
-
|
| 182 |
-
### Quick Start
|
| 183 |
-
|
| 184 |
-
```bash
|
| 185 |
-
cd warbler-cda-package
|
| 186 |
-
pytest tests/ -v
|
| 187 |
-
```
|
| 188 |
-
|
| 189 |
-
### Detailed Examples
|
| 190 |
-
|
| 191 |
-
```bash
|
| 192 |
-
# Run all tests with output
|
| 193 |
-
pytest tests/ -v -s
|
| 194 |
-
|
| 195 |
-
# Run with coverage report
|
| 196 |
-
pytest tests/ --cov=warbler_cda --cov-report=html
|
| 197 |
-
|
| 198 |
-
# Run only embedding tests
|
| 199 |
-
pytest tests/test_embedding_providers.py -v
|
| 200 |
-
|
| 201 |
-
# Run only end-to-end tests
|
| 202 |
-
pytest tests/test_rag_e2e.py -v -s
|
| 203 |
-
|
| 204 |
-
# Run tests matching a pattern
|
| 205 |
-
pytest tests/ -k "semantic" -v
|
| 206 |
-
```
|
| 207 |
-
|
| 208 |
-
## Compatibility
|
| 209 |
-
|
| 210 |
-
### With SentenceTransformer Installed
|
| 211 |
-
|
| 212 |
-
- All 50+ tests pass
|
| 213 |
-
- GPU acceleration available
|
| 214 |
-
- Full STAT7 integration enabled
|
| 215 |
-
|
| 216 |
-
### Without SentenceTransformer
|
| 217 |
-
|
| 218 |
-
- Tests gracefully skip SentenceTransformer-specific tests
|
| 219 |
-
- Fallback to local TF-IDF provider
|
| 220 |
-
- ~40 tests pass
|
| 221 |
-
- STAT7 tests skipped
|
| 222 |
-
|
| 223 |
-
## Design Principles
|
| 224 |
-
|
| 225 |
-
The ported tests follow TDD principles:
|
| 226 |
-
|
| 227 |
-
1. **Isolation**: Each test is independent and can run standalone
|
| 228 |
-
2. **Clarity**: Test names describe what is being tested
|
| 229 |
-
3. **Completeness**: Happy path and edge cases covered
|
| 230 |
-
4. **Robustness**: Graceful handling of optional dependencies
|
| 231 |
-
5. **Documentation**: Each test is well-commented and documented
|
| 232 |
-
|
| 233 |
-
## Integration with CI/CD
|
| 234 |
-
|
| 235 |
-
The tests are designed for easy integration with CI/CD pipelines:
|
| 236 |
-
|
| 237 |
-
```yaml
|
| 238 |
-
# Example GitHub Actions workflow
|
| 239 |
-
- name: Run Warbler CDA Tests
|
| 240 |
-
run: |
|
| 241 |
-
cd warbler-cda-package
|
| 242 |
-
pytest tests/ --cov=warbler_cda --cov-report=xml
|
| 243 |
-
```
|
| 244 |
-
|
| 245 |
-
## Future Test Additions
|
| 246 |
-
|
| 247 |
-
Recommended areas for additional tests:
|
| 248 |
-
|
| 249 |
-
1. Performance benchmarking
|
| 250 |
-
2. Stress testing with large document collections
|
| 251 |
-
3. Concurrent query handling
|
| 252 |
-
4. Cache invalidation scenarios
|
| 253 |
-
5. Error recovery mechanisms
|
| 254 |
-
6. Large-scale STAT7 coordinate distribution analysis
|
| 255 |
-
|
| 256 |
-
## Notes
|
| 257 |
-
|
| 258 |
-
- Tests use pytest fixtures for setup/teardown
|
| 259 |
-
- Custom markers enable selective test execution
|
| 260 |
-
- Graceful fallback for optional dependencies
|
| 261 |
-
- Comprehensive end-to-end validation
|
| 262 |
-
- Documentation-as-tests through verbose assertions
|
| 263 |
-
|
| 264 |
-
## Maintenance
|
| 265 |
-
|
| 266 |
-
When updating the package:
|
| 267 |
-
|
| 268 |
-
1. Run tests after any changes: `pytest tests/ -v`
|
| 269 |
-
2. Update tests if new functionality is added
|
| 270 |
-
3. Keep end-to-end tests as verification baseline
|
| 271 |
-
4. Monitor test execution time for performance regressions
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
TEST_RESULTS.md
DELETED
|
@@ -1,211 +0,0 @@
|
|
| 1 |
-
# Test Results: MIT-Licensed Datasets Integration
|
| 2 |
-
|
| 3 |
-
**Date**: November 8, 2025
|
| 4 |
-
**Status**: ✅ **ALL TESTS PASSING**
|
| 5 |
-
**Total Tests**: 71
|
| 6 |
-
**Passed**: 71
|
| 7 |
-
**Failed**: 0
|
| 8 |
-
**Skipped**: 0
|
| 9 |
-
|
| 10 |
-
---
|
| 11 |
-
|
| 12 |
-
## Test Summary
|
| 13 |
-
|
| 14 |
-
### New MIT-Licensed Dataset Tests: 18/18 ✅
|
| 15 |
-
|
| 16 |
-
| Test Class | Tests | Status |
|
| 17 |
-
|-----------|-------|--------|
|
| 18 |
-
| TestArxivPapersTransformer | 4 | ✅ PASS |
|
| 19 |
-
| TestPromptReportTransformer | 2 | ✅ PASS |
|
| 20 |
-
| TestGeneratedNovelsTransformer | 2 | ✅ PASS |
|
| 21 |
-
| TestManualnsTransformer | 2 | ✅ PASS |
|
| 22 |
-
| TestEnterpriseTransformer | 2 | ✅ PASS |
|
| 23 |
-
| TestPortugueseEducationTransformer | 2 | ✅ PASS |
|
| 24 |
-
| TestNewDatasetsIntegrationWithRetrieval | 2 | ✅ PASS |
|
| 25 |
-
| TestNewDatasetsPerformance | 1 | ✅ PASS |
|
| 26 |
-
| TestNewDatasetsAllAtOnce | 1 | ✅ PASS |
|
| 27 |
-
| **Total New Tests** | **18** | **✅ 100%** |
|
| 28 |
-
|
| 29 |
-
### Existing Warbler-CDA Tests: 53/53 ✅
|
| 30 |
-
|
| 31 |
-
| Test Module | Tests | Status |
|
| 32 |
-
|------------|-------|--------|
|
| 33 |
-
| test_embedding_providers.py | 11 | ✅ PASS |
|
| 34 |
-
| test_rag_e2e.py | 10 | ✅ PASS |
|
| 35 |
-
| test_retrieval_api.py | 13 | ✅ PASS |
|
| 36 |
-
| test_stat7_integration.py | 12 | ✅ PASS |
|
| 37 |
-
| test_embedding_integration.py | 7 | ✅ PASS |
|
| 38 |
-
| **Total Existing Tests** | **53** | **✅ 100%** |
|
| 39 |
-
|
| 40 |
-
---
|
| 41 |
-
|
| 42 |
-
## Individual Test Results
|
| 43 |
-
|
| 44 |
-
### ✅ New Transformer Tests (18 PASSED)
|
| 45 |
-
|
| 46 |
-
```log
|
| 47 |
-
tests/test_new_mit_datasets.py::TestArxivPapersTransformer::test_arxiv_transformer_exists PASSED
|
| 48 |
-
tests/test_new_mit_datasets.py::TestArxivPapersTransformer::test_arxiv_output_format PASSED
|
| 49 |
-
tests/test_new_mit_datasets.py::TestArxivPapersTransformer::test_arxiv_metadata_fields PASSED
|
| 50 |
-
tests/test_new_mit_datasets.py::TestArxivPapersTransformer::test_arxiv_limit_parameter PASSED
|
| 51 |
-
tests/test_new_mit_datasets.py::TestPromptReportTransformer::test_prompt_report_transformer_exists PASSED
|
| 52 |
-
tests/test_new_mit_datasets.py::TestPromptReportTransformer::test_prompt_report_output_format PASSED
|
| 53 |
-
tests/test_new_mit_datasets.py::TestGeneratedNovelsTransformer::test_novels_transformer_exists PASSED
|
| 54 |
-
tests/test_new_mit_datasets.py::TestGeneratedNovelsTransformer::test_novels_chunking_for_long_text PASSED
|
| 55 |
-
tests/test_new_mit_datasets.py::TestManualnsTransformer::test_manuals_transformer_exists PASSED
|
| 56 |
-
tests/test_new_mit_datasets.py::TestManualnsTransformer::test_manuals_output_format PASSED
|
| 57 |
-
tests/test_new_mit_datasets.py::TestEnterpriseTransformer::test_enterprise_transformer_exists PASSED
|
| 58 |
-
tests/test_new_mit_datasets.py::TestEnterpriseTransformer::test_enterprise_output_format PASSED
|
| 59 |
-
tests/test_new_mit_datasets.py::TestPortugueseEducationTransformer::test_portuguese_transformer_exists PASSED
|
| 60 |
-
tests/test_new_mit_datasets.py::TestPortugueseEducationTransformer::test_portuguese_multilingual_metadata PASSED
|
| 61 |
-
tests/test_new_mit_datasets.py::TestNewDatasetsIntegrationWithRetrieval::test_warbler_document_structure PASSED
|
| 62 |
-
tests/test_new_mit_datasets.py::TestNewDatasetsIntegrationWithRetrieval::test_pack_creation_with_new_datasets PASSED
|
| 63 |
-
tests/test_new_mit_datasets.py::TestNewDatasetsPerformance::test_arxiv_handles_large_dataset PASSED
|
| 64 |
-
tests/test_new_mit_datasets.py::TestNewDatasetsAllAtOnce::test_all_transformers_callable PASSED
|
| 65 |
-
```
|
| 66 |
-
|
| 67 |
-
### ✅ Backward Compatibility Tests (53 PASSED)
|
| 68 |
-
|
| 69 |
-
All existing tests continue to pass, confirming backward compatibility:
|
| 70 |
-
|
| 71 |
-
- Embedding provider interface tests ✅
|
| 72 |
-
- RAG end-to-end pipeline ✅
|
| 73 |
-
- Retrieval API functionality ✅
|
| 74 |
-
- STAT7 integration and hybrid scoring ✅
|
| 75 |
-
- Embedding integration ✅
|
| 76 |
-
|
| 77 |
-
---
|
| 78 |
-
|
| 79 |
-
## Test Execution Details
|
| 80 |
-
|
| 81 |
-
### Command
|
| 82 |
-
|
| 83 |
-
```bash
|
| 84 |
-
C:\Users\jerio\AppData\Local\Programs\Python\Python312\python.exe -m pytest tests/ -v
|
| 85 |
-
```
|
| 86 |
-
|
| 87 |
-
### Execution Time
|
| 88 |
-
|
| 89 |
-
- Total: 58.70 seconds
|
| 90 |
-
- New tests: ~13 seconds
|
| 91 |
-
- Existing tests: ~45 seconds
|
| 92 |
-
|
| 93 |
-
### Environment
|
| 94 |
-
|
| 95 |
-
- Python: 3.12.10
|
| 96 |
-
- pytest: 8.4.2
|
| 97 |
-
- Platform: Windows (win32)
|
| 98 |
-
|
| 99 |
-
---
|
| 100 |
-
|
| 101 |
-
## Coverage by Transformer
|
| 102 |
-
|
| 103 |
-
### arXiv Papers (4 tests)
|
| 104 |
-
|
| 105 |
-
- ✅ Transformer exists and is callable
|
| 106 |
-
- ✅ Output format matches Warbler structure
|
| 107 |
-
- ✅ Metadata includes required fields
|
| 108 |
-
- ✅ Limit parameter respected
|
| 109 |
-
|
| 110 |
-
### Prompt Report (2 tests)
|
| 111 |
-
|
| 112 |
-
- ✅ Transformer exists
|
| 113 |
-
- ✅ Output format correct
|
| 114 |
-
|
| 115 |
-
### Generated Novels (2 tests)
|
| 116 |
-
|
| 117 |
-
- ✅ Transformer exists
|
| 118 |
-
- ✅ Text chunking functionality
|
| 119 |
-
|
| 120 |
-
### Technical Manuals (2 tests)
|
| 121 |
-
|
| 122 |
-
- ✅ Transformer exists
|
| 123 |
-
- ✅ Output format correct
|
| 124 |
-
|
| 125 |
-
### Enterprise Benchmarks (2 tests)
|
| 126 |
-
|
| 127 |
-
- ✅ Transformer exists
|
| 128 |
-
- ✅ Output format correct
|
| 129 |
-
|
| 130 |
-
### Portuguese Education (2 tests)
|
| 131 |
-
|
| 132 |
-
- ✅ Transformer exists
|
| 133 |
-
- ✅ Multilingual metadata
|
| 134 |
-
|
| 135 |
-
### Integration (2 tests)
|
| 136 |
-
|
| 137 |
-
- ✅ Warbler document structure validation
|
| 138 |
-
- ✅ Pack creation with mocked filesystem
|
| 139 |
-
|
| 140 |
-
### Performance (1 test)
|
| 141 |
-
|
| 142 |
-
- ✅ Large dataset handling (100+ papers in <10s)
|
| 143 |
-
|
| 144 |
-
### All Transformers Callable (1 test)
|
| 145 |
-
|
| 146 |
-
- ✅ All 6 new transformers verified as callable
|
| 147 |
-
|
| 148 |
-
---
|
| 149 |
-
|
| 150 |
-
## Issues Found & Fixed
|
| 151 |
-
|
| 152 |
-
### Issue 1: Mock WindowsPath AttributeError
|
| 153 |
-
|
| 154 |
-
**Problem**: Test tried to mock `mkdir` attribute on real Path object
|
| 155 |
-
**Solution**: Used MagicMock instead of real Path
|
| 156 |
-
**Status**: ✅ Fixed - all tests now pass
|
| 157 |
-
|
| 158 |
-
---
|
| 159 |
-
|
| 160 |
-
## Validation Checklist
|
| 161 |
-
|
| 162 |
-
- [x] All new transformer methods are implemented
|
| 163 |
-
- [x] All helper methods are implemented
|
| 164 |
-
- [x] Output format matches Warbler structure
|
| 165 |
-
- [x] MIT license field present in all documents
|
| 166 |
-
- [x] Metadata fields required (realm_type, realm_label, etc)
|
| 167 |
-
- [x] Error handling in place
|
| 168 |
-
- [x] CLI integration works
|
| 169 |
-
- [x] Backward compatibility maintained
|
| 170 |
-
- [x] Performance acceptable (<10s for large datasets)
|
| 171 |
-
- [x] 100% test pass rate
|
| 172 |
-
|
| 173 |
-
---
|
| 174 |
-
|
| 175 |
-
## Recommendations
|
| 176 |
-
|
| 177 |
-
### Immediate
|
| 178 |
-
|
| 179 |
-
- ✅ Ready for staging environment validation
|
| 180 |
-
- ✅ Ready for production deployment
|
| 181 |
-
|
| 182 |
-
### Next Steps
|
| 183 |
-
|
| 184 |
-
1. Test with actual HuggingFace API (not mocked)
|
| 185 |
-
2. Validate pack loading in retrieval system
|
| 186 |
-
3. Benchmark hybrid scoring with new documents
|
| 187 |
-
4. Monitor first production ingestion
|
| 188 |
-
|
| 189 |
-
### Long-term
|
| 190 |
-
|
| 191 |
-
1. Add integration tests with real HuggingFace datasets
|
| 192 |
-
2. Performance benchmarking with different dataset sizes
|
| 193 |
-
3. Memory profiling for large arXiv ingestion
|
| 194 |
-
4. Document update frequency strategy
|
| 195 |
-
|
| 196 |
-
---
|
| 197 |
-
|
| 198 |
-
## Sign-Off
|
| 199 |
-
|
| 200 |
-
**All 71 tests passing.**
|
| 201 |
-
**Backward compatibility maintained.**
|
| 202 |
-
**New functionality validated.**
|
| 203 |
-
|
| 204 |
-
✅ **Ready for Production Deployment**
|
| 205 |
-
|
| 206 |
-
---
|
| 207 |
-
|
| 208 |
-
**Test Report Generated**: 2025-11-08
|
| 209 |
-
**Python Version**: 3.12.10
|
| 210 |
-
**pytest Version**: 8.4.2
|
| 211 |
-
**Status**: VALIDATED ✅
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
TODO.md
DELETED
|
@@ -1,30 +0,0 @@
|
|
| 1 |
-
# Background Pack Ingestion Implementation
|
| 2 |
-
|
| 3 |
-
## Overview
|
| 4 |
-
Modify app.py to perform pack ingestion in a background thread, allowing the app to start immediately while documents load asynchronously.
|
| 5 |
-
|
| 6 |
-
## Tasks
|
| 7 |
-
|
| 8 |
-
### 1. Add Background Ingestion Support
|
| 9 |
-
- [ ] Import threading module in app.py
|
| 10 |
-
- [ ] Add global variables to track ingestion status (running, progress, total_docs, processed, etc.)
|
| 11 |
-
- [ ] Create a background_ingest_packs() function that performs the ingestion logic
|
| 12 |
-
- [ ] Start the background thread after API initialization but before app launch
|
| 13 |
-
|
| 14 |
-
### 2. Update System Stats
|
| 15 |
-
- [ ] Modify get_system_stats() to include ingestion progress information
|
| 16 |
-
- [ ] Display current ingestion status in the System Stats tab
|
| 17 |
-
|
| 18 |
-
### 3. Handle Thread Safety
|
| 19 |
-
- [ ] Ensure API.add_document() calls are thread-safe (assuming they are)
|
| 20 |
-
- [ ] Add proper error handling in the background thread
|
| 21 |
-
|
| 22 |
-
### 4. Test Implementation
|
| 23 |
-
- [ ] Test that app launches immediately
|
| 24 |
-
- [ ] Verify ingestion happens in background
|
| 25 |
-
- [ ] Check that queries work during ingestion
|
| 26 |
-
- [ ] Confirm progress is shown in System Stats
|
| 27 |
-
|
| 28 |
-
## Status
|
| 29 |
-
- [x] Plan created and approved
|
| 30 |
-
- [ ] Implementation in progress
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
app.py
CHANGED
|
@@ -6,14 +6,13 @@ Provides a web UI for the FractalStat RAG system with GPU acceleration.
|
|
| 6 |
"""
|
| 7 |
|
| 8 |
import gradio as gr
|
| 9 |
-
import json
|
| 10 |
-
from typing import Dict, Any, List
|
| 11 |
import time
|
| 12 |
|
| 13 |
# Import Warbler CDA components
|
| 14 |
from warbler_cda.retrieval_api import RetrievalAPI, RetrievalQuery, RetrievalMode
|
| 15 |
from warbler_cda.embeddings import EmbeddingProviderFactory
|
| 16 |
from warbler_cda.fractalstat_rag_bridge import FractalStatRAGBridge
|
|
|
|
| 17 |
from warbler_cda.pack_loader import PackLoader
|
| 18 |
|
| 19 |
# Initialize the system
|
|
@@ -23,12 +22,17 @@ print("🚀 Initializing Warbler CDA...")
|
|
| 23 |
embedding_provider = EmbeddingProviderFactory.get_default_provider()
|
| 24 |
print(f"✅ Embedding provider: {embedding_provider.get_provider_info()['provider_id']}")
|
| 25 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
# Create FractalStat bridge
|
| 27 |
fractalstat_bridge = FractalStatRAGBridge()
|
| 28 |
print("✅ FractalStat bridge initialized")
|
| 29 |
|
| 30 |
-
# Create RetrievalAPI
|
| 31 |
api = RetrievalAPI(
|
|
|
|
| 32 |
embedding_provider=embedding_provider,
|
| 33 |
fractalstat_bridge=fractalstat_bridge,
|
| 34 |
config={"enable_fractalstat_hybrid": True}
|
|
@@ -39,15 +43,47 @@ print("✅ RetrievalAPI initialized")
|
|
| 39 |
print("📚 Loading Warbler packs...")
|
| 40 |
pack_loader = PackLoader()
|
| 41 |
documents = pack_loader.discover_documents()
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 51 |
|
| 52 |
print(f"🎉 Warbler CDA ready with {api.get_context_store_size()} documents!")
|
| 53 |
|
|
@@ -145,7 +181,7 @@ with gr.Blocks(title="Warbler CDA - FractalStat RAG") as demo:
|
|
| 145 |
with gr.Column():
|
| 146 |
results_output = gr.Markdown(label="Results")
|
| 147 |
|
| 148 |
-
query_btn.click(
|
| 149 |
fn=query_warbler,
|
| 150 |
inputs=[query_input, max_results, use_hybrid],
|
| 151 |
outputs=results_output
|
|
@@ -163,8 +199,8 @@ with gr.Blocks(title="Warbler CDA - FractalStat RAG") as demo:
|
|
| 163 |
with gr.Tab("System Stats"):
|
| 164 |
stats_output = gr.Markdown()
|
| 165 |
stats_btn = gr.Button("Refresh Stats")
|
| 166 |
-
stats_btn.click(fn=get_system_stats, outputs=stats_output)
|
| 167 |
-
demo.load(fn=get_system_stats, outputs=stats_output)
|
| 168 |
|
| 169 |
with gr.Tab("About"):
|
| 170 |
gr.Markdown("""
|
|
|
|
| 6 |
"""
|
| 7 |
|
| 8 |
import gradio as gr
|
|
|
|
|
|
|
| 9 |
import time
|
| 10 |
|
| 11 |
# Import Warbler CDA components
|
| 12 |
from warbler_cda.retrieval_api import RetrievalAPI, RetrievalQuery, RetrievalMode
|
| 13 |
from warbler_cda.embeddings import EmbeddingProviderFactory
|
| 14 |
from warbler_cda.fractalstat_rag_bridge import FractalStatRAGBridge
|
| 15 |
+
from warbler_cda.semantic_anchors import SemanticAnchorGraph
|
| 16 |
from warbler_cda.pack_loader import PackLoader
|
| 17 |
|
| 18 |
# Initialize the system
|
|
|
|
| 22 |
embedding_provider = EmbeddingProviderFactory.get_default_provider()
|
| 23 |
print(f"✅ Embedding provider: {embedding_provider.get_provider_info()['provider_id']}")
|
| 24 |
|
| 25 |
+
# Create semantic anchors (required by RetrievalAPI)
|
| 26 |
+
semantic_anchors = SemanticAnchorGraph(embedding_provider=embedding_provider)
|
| 27 |
+
print("✅ Semantic anchors initialized")
|
| 28 |
+
|
| 29 |
# Create FractalStat bridge
|
| 30 |
fractalstat_bridge = FractalStatRAGBridge()
|
| 31 |
print("✅ FractalStat bridge initialized")
|
| 32 |
|
| 33 |
+
# Create RetrievalAPI with proper components
|
| 34 |
api = RetrievalAPI(
|
| 35 |
+
semantic_anchors=semantic_anchors,
|
| 36 |
embedding_provider=embedding_provider,
|
| 37 |
fractalstat_bridge=fractalstat_bridge,
|
| 38 |
config={"enable_fractalstat_hybrid": True}
|
|
|
|
| 43 |
print("📚 Loading Warbler packs...")
|
| 44 |
pack_loader = PackLoader()
|
| 45 |
documents = pack_loader.discover_documents()
|
| 46 |
+
|
| 47 |
+
# If no packs found, try to download them
|
| 48 |
+
if len(documents) == 0:
|
| 49 |
+
print("⚠️ No packs found locally. Attempting to download from HuggingFace...")
|
| 50 |
+
try:
|
| 51 |
+
from warbler_cda.utils.hf_warbler_ingest import HFWarblerIngestor
|
| 52 |
+
ingestor = HFWarblerIngestor(packs_dir=pack_loader.packs_dir, verbose=True)
|
| 53 |
+
# Download a small demo dataset for deployment
|
| 54 |
+
print("📦 Downloading warbler-pack-hf-prompt-report...")
|
| 55 |
+
success = ingestor.ingest_dataset("prompt-report")
|
| 56 |
+
if success:
|
| 57 |
+
# Reload after download
|
| 58 |
+
documents = pack_loader.discover_documents()
|
| 59 |
+
print(f"✅ Downloaded {len(documents)} documents")
|
| 60 |
+
else:
|
| 61 |
+
print("❌ Failed to download dataset, using sample documents...")
|
| 62 |
+
documents = []
|
| 63 |
+
except Exception as e:
|
| 64 |
+
print(f"⚠️ Could not download packs: {e}")
|
| 65 |
+
print("Using sample documents instead...")
|
| 66 |
+
documents = []
|
| 67 |
+
|
| 68 |
+
if len(documents) == 0:
|
| 69 |
+
# Fallback to sample documents
|
| 70 |
+
sample_docs = [
|
| 71 |
+
{"id": "sample1", "content": "FractalStat is an 8-dimensional addressing system for intelligent retrieval.", "metadata": {}},
|
| 72 |
+
{"id": "sample2", "content": "Semantic search finds documents by meaning, not just keywords.", "metadata": {}},
|
| 73 |
+
{"id": "sample3", "content": "Bob the Skeptic validates results to prevent bias and hallucinations.", "metadata": {}},
|
| 74 |
+
]
|
| 75 |
+
for doc in sample_docs:
|
| 76 |
+
api.add_document(doc["id"], doc["content"], doc["metadata"])
|
| 77 |
+
print(f"✅ Loaded {len(sample_docs)} sample documents")
|
| 78 |
+
else:
|
| 79 |
+
print(f"✅ Found {len(documents)} documents")
|
| 80 |
+
# Ingest documents
|
| 81 |
+
for doc in documents:
|
| 82 |
+
api.add_document(
|
| 83 |
+
doc_id=doc["id"],
|
| 84 |
+
content=doc["content"],
|
| 85 |
+
metadata=doc.get("metadata", {})
|
| 86 |
+
)
|
| 87 |
|
| 88 |
print(f"🎉 Warbler CDA ready with {api.get_context_store_size()} documents!")
|
| 89 |
|
|
|
|
| 181 |
with gr.Column():
|
| 182 |
results_output = gr.Markdown(label="Results")
|
| 183 |
|
| 184 |
+
query_btn.click( # pylint: disable=E1101
|
| 185 |
fn=query_warbler,
|
| 186 |
inputs=[query_input, max_results, use_hybrid],
|
| 187 |
outputs=results_output
|
|
|
|
| 199 |
with gr.Tab("System Stats"):
|
| 200 |
stats_output = gr.Markdown()
|
| 201 |
stats_btn = gr.Button("Refresh Stats")
|
| 202 |
+
stats_btn.click(fn=get_system_stats, outputs=stats_output) # pylint: disable=E1101
|
| 203 |
+
demo.load(fn=get_system_stats, outputs=stats_output) # pylint: disable=E1101
|
| 204 |
|
| 205 |
with gr.Tab("About"):
|
| 206 |
gr.Markdown("""
|
compress_packs.py
DELETED
|
@@ -1,134 +0,0 @@
|
|
| 1 |
-
#!/usr/bin/env python3
|
| 2 |
-
"""
|
| 3 |
-
Pack Compression Script using Evaporation Engine
|
| 4 |
-
|
| 5 |
-
This script compresses warbler packs by replacing document content with
|
| 6 |
-
compressed proto-thoughts generated by the evaporation engine.
|
| 7 |
-
"""
|
| 8 |
-
|
| 9 |
-
import json
|
| 10 |
-
import sys
|
| 11 |
-
from pathlib import Path
|
| 12 |
-
from typing import Dict, Any, List
|
| 13 |
-
|
| 14 |
-
# Add the project root to Python path
|
| 15 |
-
sys.path.insert(0, str(Path(__file__).parent))
|
| 16 |
-
|
| 17 |
-
from warbler_cda.melt_layer import MeltLayer, MagmaStore
|
| 18 |
-
from warbler_cda.evaporation import EvaporationEngine, CloudStore
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
def load_jsonl_file(filepath: str) -> List[Dict[str, Any]]:
|
| 22 |
-
"""Load a JSONL file and return list of documents."""
|
| 23 |
-
documents = []
|
| 24 |
-
with open(filepath, "r", encoding="utf-8") as f:
|
| 25 |
-
for line in f:
|
| 26 |
-
line = line.strip()
|
| 27 |
-
if line:
|
| 28 |
-
documents.append(json.loads(line))
|
| 29 |
-
return documents
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
def save_jsonl_file(filepath: str, documents: List[Dict[str, Any]]) -> None:
|
| 33 |
-
"""Save list of documents to a JSONL file."""
|
| 34 |
-
with open(filepath, "w", encoding="utf-8") as f:
|
| 35 |
-
for doc in documents:
|
| 36 |
-
f.write(json.dumps(doc, ensure_ascii=False) + "\n")
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
def compress_pack(pack_path: str, output_suffix: str = "_compressed") -> None:
|
| 40 |
-
"""Compress a single pack using evaporation engine."""
|
| 41 |
-
pack_path = Path(pack_path)
|
| 42 |
-
if not pack_path.exists():
|
| 43 |
-
raise FileNotFoundError(f"Pack path {pack_path} does not exist")
|
| 44 |
-
|
| 45 |
-
# Find all JSONL files in the pack
|
| 46 |
-
jsonl_files = list(pack_path.glob("*.jsonl"))
|
| 47 |
-
if not jsonl_files:
|
| 48 |
-
print(f"No JSONL files found in {pack_path}")
|
| 49 |
-
return
|
| 50 |
-
|
| 51 |
-
print(f"Found {len(jsonl_files)} JSONL files in {pack_path}")
|
| 52 |
-
|
| 53 |
-
# Initialize evaporation components
|
| 54 |
-
magma_store = MagmaStore()
|
| 55 |
-
cloud_store = CloudStore()
|
| 56 |
-
melt_layer = MeltLayer(magma_store)
|
| 57 |
-
evaporation_engine = EvaporationEngine(magma_store, cloud_store)
|
| 58 |
-
|
| 59 |
-
total_docs = 0
|
| 60 |
-
compressed_docs = 0
|
| 61 |
-
|
| 62 |
-
for jsonl_file in jsonl_files:
|
| 63 |
-
print(f"Processing {jsonl_file.name}...")
|
| 64 |
-
|
| 65 |
-
# Load documents
|
| 66 |
-
documents = load_jsonl_file(str(jsonl_file))
|
| 67 |
-
total_docs += len(documents)
|
| 68 |
-
|
| 69 |
-
compressed_documents = []
|
| 70 |
-
|
| 71 |
-
for doc in documents:
|
| 72 |
-
if "content" not in doc:
|
| 73 |
-
print("Warning: Document missing 'content' field, skipping")
|
| 74 |
-
continue
|
| 75 |
-
|
| 76 |
-
content = doc["content"]
|
| 77 |
-
if not content or not isinstance(content, str):
|
| 78 |
-
print("Warning: Empty or invalid content, skipping")
|
| 79 |
-
continue
|
| 80 |
-
|
| 81 |
-
try:
|
| 82 |
-
# Create a fragment from the document content
|
| 83 |
-
fragment = {"id": doc.get("content_id", f"doc_{compressed_docs}"), "text": content}
|
| 84 |
-
|
| 85 |
-
# Create glyph from the single fragment
|
| 86 |
-
melt_layer.retire_cluster({"fragments": [fragment]})
|
| 87 |
-
|
| 88 |
-
# Evaporate to get proto-thought
|
| 89 |
-
mist_lines = evaporation_engine.evaporate(limit=1)
|
| 90 |
-
|
| 91 |
-
if mist_lines:
|
| 92 |
-
proto_thought = mist_lines[0]["proto_thought"]
|
| 93 |
-
# Replace content with compressed proto-thought
|
| 94 |
-
compressed_doc = doc.copy()
|
| 95 |
-
compressed_doc["content"] = proto_thought
|
| 96 |
-
compressed_doc["original_content_length"] = len(content)
|
| 97 |
-
compressed_doc["compressed_content_length"] = len(proto_thought)
|
| 98 |
-
compressed_documents.append(compressed_doc)
|
| 99 |
-
compressed_docs += 1
|
| 100 |
-
else:
|
| 101 |
-
print(
|
| 102 |
-
f"Warning: Failed to evaporate glyph for document {doc.get('content_id', 'unknown')}"
|
| 103 |
-
)
|
| 104 |
-
# Keep original document if evaporation fails
|
| 105 |
-
compressed_documents.append(doc)
|
| 106 |
-
|
| 107 |
-
except Exception as e:
|
| 108 |
-
print(f"Error processing document {doc.get('content_id', 'unknown')}: {e}")
|
| 109 |
-
# Keep original document on error
|
| 110 |
-
compressed_documents.append(doc)
|
| 111 |
-
|
| 112 |
-
# Save compressed file
|
| 113 |
-
output_file = jsonl_file.parent / f"{jsonl_file.stem}{output_suffix}{jsonl_file.suffix}"
|
| 114 |
-
save_jsonl_file(str(output_file), compressed_documents)
|
| 115 |
-
print(f"Saved compressed file: {output_file}")
|
| 116 |
-
|
| 117 |
-
print("Compression complete:")
|
| 118 |
-
print(f" Total documents processed: {total_docs}")
|
| 119 |
-
print(f" Documents compressed: {compressed_docs}")
|
| 120 |
-
if total_docs > 0:
|
| 121 |
-
print(f" Compression ratio: {compressed_docs/total_docs:.2%}")
|
| 122 |
-
|
| 123 |
-
|
| 124 |
-
def main():
|
| 125 |
-
if len(sys.argv) != 2:
|
| 126 |
-
print("Usage: python compress_packs.py <pack_path>")
|
| 127 |
-
sys.exit(1)
|
| 128 |
-
|
| 129 |
-
pack_path = sys.argv[1]
|
| 130 |
-
compress_pack(pack_path)
|
| 131 |
-
|
| 132 |
-
|
| 133 |
-
if __name__ == "__main__":
|
| 134 |
-
main()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
convert_to_jsonl.py
DELETED
|
@@ -1,37 +0,0 @@
|
|
| 1 |
-
import json
|
| 2 |
-
import os
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
def convert_templates_to_jsonl(pack_dir):
|
| 6 |
-
"""Convert templates.json to pack_name.jsonl for a given pack directory."""
|
| 7 |
-
pack_name = os.path.basename(pack_dir)
|
| 8 |
-
templates_path = os.path.join(pack_dir, "pack", "templates.json")
|
| 9 |
-
jsonl_path = os.path.join(pack_dir, f"{pack_name}.jsonl")
|
| 10 |
-
|
| 11 |
-
if not os.path.exists(templates_path):
|
| 12 |
-
print(f"No templates.json found in {pack_dir}")
|
| 13 |
-
return
|
| 14 |
-
|
| 15 |
-
with open(templates_path, "r") as f:
|
| 16 |
-
templates = json.load(f)
|
| 17 |
-
|
| 18 |
-
with open(jsonl_path, "w") as f:
|
| 19 |
-
for template in templates:
|
| 20 |
-
json.dump(template, f)
|
| 21 |
-
f.write("\n")
|
| 22 |
-
|
| 23 |
-
print(f"Converted {templates_path} to {jsonl_path}")
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
# Convert the three default packs
|
| 27 |
-
packs_to_convert = [
|
| 28 |
-
"packs/warbler-pack-core",
|
| 29 |
-
"packs/warbler-pack-faction-politics",
|
| 30 |
-
"packs/warbler-pack-wisdom-scrolls",
|
| 31 |
-
]
|
| 32 |
-
|
| 33 |
-
for pack in packs_to_convert:
|
| 34 |
-
if os.path.exists(pack):
|
| 35 |
-
convert_templates_to_jsonl(pack)
|
| 36 |
-
else:
|
| 37 |
-
print(f"Pack directory {pack} not found")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
copy_packs.sh
DELETED
|
@@ -1,45 +0,0 @@
|
|
| 1 |
-
#!/bin/bash
|
| 2 |
-
set -e
|
| 3 |
-
|
| 4 |
-
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
| 5 |
-
REPO_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)"
|
| 6 |
-
SOURCE_PACKS_DIR="$REPO_ROOT/packages/com.twg.the-seed/The Living Dev Agent/packs"
|
| 7 |
-
DEST_PACKS_DIR="$SCRIPT_DIR/packs"
|
| 8 |
-
|
| 9 |
-
echo "Copying Warbler Packs to warbler-cda-package..."
|
| 10 |
-
echo "Source: $SOURCE_PACKS_DIR"
|
| 11 |
-
echo "Destination: $DEST_PACKS_DIR"
|
| 12 |
-
|
| 13 |
-
if [ ! -d "$SOURCE_PACKS_DIR" ]; then
|
| 14 |
-
echo "❌ Error: Source packs directory not found at $SOURCE_PACKS_DIR"
|
| 15 |
-
exit 1
|
| 16 |
-
fi
|
| 17 |
-
|
| 18 |
-
mkdir -p "$DEST_PACKS_DIR"
|
| 19 |
-
|
| 20 |
-
PACKS=(
|
| 21 |
-
"warbler-pack-core"
|
| 22 |
-
"warbler-pack-faction-politics"
|
| 23 |
-
"warbler-pack-wisdom-scrolls"
|
| 24 |
-
"warbler-pack-hf-npc-dialogue"
|
| 25 |
-
)
|
| 26 |
-
|
| 27 |
-
for pack in "${PACKS[@]}"; do
|
| 28 |
-
src="$SOURCE_PACKS_DIR/$pack"
|
| 29 |
-
dst="$DEST_PACKS_DIR/$pack"
|
| 30 |
-
|
| 31 |
-
if [ -d "$src" ]; then
|
| 32 |
-
echo "📦 Copying $pack..."
|
| 33 |
-
rm -rf "$dst"
|
| 34 |
-
cp -r "$src" "$dst"
|
| 35 |
-
echo "✓ Copied $pack"
|
| 36 |
-
else
|
| 37 |
-
echo "⚠️ Warning: Pack not found at $src (skipping)"
|
| 38 |
-
fi
|
| 39 |
-
done
|
| 40 |
-
|
| 41 |
-
echo ""
|
| 42 |
-
echo "✅ Warbler packs successfully copied to $DEST_PACKS_DIR"
|
| 43 |
-
echo ""
|
| 44 |
-
echo "Packs available for ingestion:"
|
| 45 |
-
ls -1 "$DEST_PACKS_DIR" | sed 's/^/ • /'
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
coverage.xml
DELETED
|
The diff for this file is too large to render.
See raw diff
|
|
|
final_fix.py
DELETED
|
@@ -1,28 +0,0 @@
|
|
| 1 |
-
#!/usr/bin/env python3
|
| 2 |
-
"""Final fixes for stat7_entity.py and verify the fixes work"""
|
| 3 |
-
|
| 4 |
-
# Fix the stat7_entity.py bug
|
| 5 |
-
with open("warbler_cda/stat7_entity.py", "r", encoding="utf-8") as f:
|
| 6 |
-
content = f.read()
|
| 7 |
-
|
| 8 |
-
# Fix the description reference bug
|
| 9 |
-
content = content.replace('"description": description,', '"description": self.description,')
|
| 10 |
-
|
| 11 |
-
# Write back the fixed content
|
| 12 |
-
with open("warbler_cda/stat7_entity.py", "w", encoding="utf-8") as f:
|
| 13 |
-
f.write(content)
|
| 14 |
-
|
| 15 |
-
print("Fixed stat7_entity.py description bug")
|
| 16 |
-
|
| 17 |
-
# Test import to make sure everything works
|
| 18 |
-
try:
|
| 19 |
-
print("✅ stat7_entity imports successfully")
|
| 20 |
-
except Exception as e:
|
| 21 |
-
print(f"❌ stat7_entity import failed: {e}")
|
| 22 |
-
|
| 23 |
-
try:
|
| 24 |
-
print("✅ stat7_rag_bridge imports successfully")
|
| 25 |
-
except Exception as e:
|
| 26 |
-
print(f"❌ stat7_rag_bridge import failed: {e}")
|
| 27 |
-
|
| 28 |
-
print("All fixes applied!")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
fix_theme.py
DELETED
|
@@ -1,15 +0,0 @@
|
|
| 1 |
-
#!/usr/bin/env python3
|
| 2 |
-
"""Fix the theme issue in app.py"""
|
| 3 |
-
|
| 4 |
-
with open("app.py", "r", encoding="utf-8") as f:
|
| 5 |
-
content = f.read()
|
| 6 |
-
|
| 7 |
-
old_line = 'with gr.Blocks(title="Warbler CDA - RAG System Demo", theme=gr.themes.Soft()) as demo:'
|
| 8 |
-
new_line = 'with gr.Blocks(title="Warbler CDA - RAG System Demo") as demo:'
|
| 9 |
-
|
| 10 |
-
content = content.replace(old_line, new_line)
|
| 11 |
-
|
| 12 |
-
with open("app.py", "w", encoding="utf-8") as f:
|
| 13 |
-
f.write(content)
|
| 14 |
-
|
| 15 |
-
print("Fixed theme issue")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
load_warbler_packs_current.txt
DELETED
|
@@ -1,259 +0,0 @@
|
|
| 1 |
-
#!/usr/bin/env python3
|
| 2 |
-
"""
|
| 3 |
-
Load Warbler Pack Data into EXP-09 API Service
|
| 4 |
-
|
| 5 |
-
Ingests game wisdom, lore, and faction data into the STAT7-enabled RetrievalAPI
|
| 6 |
-
for end-to-end testing with real Warbler content.
|
| 7 |
-
"""
|
| 8 |
-
|
| 9 |
-
import json
|
| 10 |
-
import requests
|
| 11 |
-
import click
|
| 12 |
-
from pathlib import Path
|
| 13 |
-
from typing import List, Dict, Any
|
| 14 |
-
import logging
|
| 15 |
-
|
| 16 |
-
logging.basicConfig(level=logging.INFO)
|
| 17 |
-
logger = logging.getLogger(__name__)
|
| 18 |
-
|
| 19 |
-
# Warbler pack locations
|
| 20 |
-
BASE_DIR = Path(__file__).resolve().parent
|
| 21 |
-
PACKS_DIR = BASE_DIR.parents[1] / 'packs'
|
| 22 |
-
WARBLER_PACKS = [
|
| 23 |
-
"warbler-pack-core",
|
| 24 |
-
"warbler-pack-wisdom-scrolls",
|
| 25 |
-
"warbler-pack-faction-politics",
|
| 26 |
-
"warbler-pack-hf-arxiv",
|
| 27 |
-
"warbler-pack-hf-prompt-report",
|
| 28 |
-
"warbler-pack-hf-novels",
|
| 29 |
-
"warbler-pack-hf-manuals",
|
| 30 |
-
"warbler-pack-hf-enterprise",
|
| 31 |
-
"warbler-pack-hf-portuguese-edu",
|
| 32 |
-
"warbler-pack-hf-edustories"
|
| 33 |
-
]
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
class WarblerPackLoader:
|
| 37 |
-
"""Load Warbler pack data into the API"""
|
| 38 |
-
|
| 39 |
-
def __init__(self, api_url: str = "http://localhost:8000"):
|
| 40 |
-
self.api_url = api_url.rstrip("/")
|
| 41 |
-
self.session = requests.Session()
|
| 42 |
-
self.loaded_count = 0
|
| 43 |
-
self.error_count = 0
|
| 44 |
-
|
| 45 |
-
def discover_documents(self, pack_name: str) -> List[Dict[str, Any]]:
|
| 46 |
-
"""Discover all documents in a pack"""
|
| 47 |
-
pack_path = PACKS_DIR / pack_name
|
| 48 |
-
documents = []
|
| 49 |
-
|
| 50 |
-
if not pack_path.exists():
|
| 51 |
-
logger.warning(f"Pack not found: {pack_path}")
|
| 52 |
-
return []
|
| 53 |
-
|
| 54 |
-
# Look for JSON, YAML, markdown, and JSONL files
|
| 55 |
-
for pattern in [
|
| 56 |
-
"**/*.json",
|
| 57 |
-
"**/*.yaml",
|
| 58 |
-
"**/*.yml",
|
| 59 |
-
"**/*.md",
|
| 60 |
-
"**/*.jsonl"]:
|
| 61 |
-
for file_path in pack_path.glob(pattern):
|
| 62 |
-
try:
|
| 63 |
-
doc = self._parse_document(file_path, pack_name)
|
| 64 |
-
if doc:
|
| 65 |
-
documents.append(doc)
|
| 66 |
-
logger.info(
|
| 67 |
-
f"Discovered: {file_path.relative_to(PACKS_DIR)}")
|
| 68 |
-
except Exception as e:
|
| 69 |
-
logger.error(f"Error parsing {file_path}: {e}")
|
| 70 |
-
|
| 71 |
-
return documents
|
| 72 |
-
|
| 73 |
-
def _parse_document(self, file_path: Path,
|
| 74 |
-
pack_name: str) -> Dict[str, Any]:
|
| 75 |
-
"""Parse a document file"""
|
| 76 |
-
try:
|
| 77 |
-
if file_path.suffix in ['.json']:
|
| 78 |
-
with open(file_path, 'r', encoding='utf-8') as f:
|
| 79 |
-
content = json.load(f)
|
| 80 |
-
if isinstance(content, dict):
|
| 81 |
-
content = json.dumps(content)
|
| 82 |
-
else:
|
| 83 |
-
content = json.dumps(content)
|
| 84 |
-
elif file_path.suffix in ['.jsonl']:
|
| 85 |
-
# JSONL files contain multiple JSON objects, one per line
|
| 86 |
-
# We'll read the first few lines and combine them
|
| 87 |
-
with open(file_path, 'r', encoding='utf-8') as f:
|
| 88 |
-
lines = f.readlines()[:5] # First 5 lines
|
| 89 |
-
content = '\n'.join(line.strip()
|
| 90 |
-
for line in lines if line.strip())
|
| 91 |
-
elif file_path.suffix in ['.yaml', '.yml']:
|
| 92 |
-
import yaml
|
| 93 |
-
with open(file_path, 'r', encoding='utf-8') as f:
|
| 94 |
-
content = yaml.safe_load(f)
|
| 95 |
-
content = json.dumps(content)
|
| 96 |
-
elif file_path.suffix == '.md':
|
| 97 |
-
with open(file_path, 'r', encoding='utf-8') as f:
|
| 98 |
-
content = f.read()
|
| 99 |
-
else:
|
| 100 |
-
return None
|
| 101 |
-
|
| 102 |
-
# Infer realm from pack name
|
| 103 |
-
if "wisdom" in pack_name:
|
| 104 |
-
realm = "wisdom"
|
| 105 |
-
elif "faction" in pack_name:
|
| 106 |
-
realm = "faction"
|
| 107 |
-
else:
|
| 108 |
-
realm = "narrative"
|
| 109 |
-
|
| 110 |
-
return {
|
| 111 |
-
"content_id": f"{pack_name}/{file_path.stem}",
|
| 112 |
-
"content": str(content)[:5000], # Limit content size
|
| 113 |
-
"metadata": {
|
| 114 |
-
"pack": pack_name,
|
| 115 |
-
"source_file": str(file_path.name),
|
| 116 |
-
"realm_type": realm,
|
| 117 |
-
"realm_label": pack_name.replace("warbler-pack-", ""),
|
| 118 |
-
"lifecycle_stage": "emergence",
|
| 119 |
-
"activity_level": 0.7
|
| 120 |
-
}
|
| 121 |
-
}
|
| 122 |
-
except Exception as e:
|
| 123 |
-
logger.error(f"Failed to parse {file_path}: {e}")
|
| 124 |
-
return None
|
| 125 |
-
|
| 126 |
-
def ingest_document(self, doc: Dict[str, Any]) -> bool:
|
| 127 |
-
"""Send document to API for ingestion"""
|
| 128 |
-
try:
|
| 129 |
-
# For now, we'll store in local context
|
| 130 |
-
# The API service will need an /ingest endpoint
|
| 131 |
-
logger.info(f"Ingesting: {doc['content_id']}")
|
| 132 |
-
|
| 133 |
-
# Check if API has ingest endpoint
|
| 134 |
-
response = self.session.post(
|
| 135 |
-
f"{self.api_url}/ingest",
|
| 136 |
-
json={"documents": [doc]},
|
| 137 |
-
timeout=10
|
| 138 |
-
)
|
| 139 |
-
|
| 140 |
-
if response.status_code in [200, 201, 202]:
|
| 141 |
-
self.loaded_count += 1
|
| 142 |
-
logger.info(f"[OK] Loaded: {doc['content_id']}")
|
| 143 |
-
return True
|
| 144 |
-
else:
|
| 145 |
-
logger.warning(
|
| 146 |
-
f"API returned {response.status_code}: {response.text[:200]}")
|
| 147 |
-
return False
|
| 148 |
-
except requests.exceptions.ConnectionError:
|
| 149 |
-
logger.error("Cannot connect to API. Is the service running?")
|
| 150 |
-
return False
|
| 151 |
-
except Exception as e:
|
| 152 |
-
logger.error(f"Ingestion failed: {e}")
|
| 153 |
-
self.error_count += 1
|
| 154 |
-
return False
|
| 155 |
-
|
| 156 |
-
def load_all_packs(self) -> int:
|
| 157 |
-
"""Load all Warbler packs"""
|
| 158 |
-
click.echo("\n" + "=" * 60)
|
| 159 |
-
click.echo("Loading Warbler Pack Data into EXP-09 API")
|
| 160 |
-
click.echo("=" * 60 + "\n")
|
| 161 |
-
|
| 162 |
-
total_docs = 0
|
| 163 |
-
for pack_name in WARBLER_PACKS:
|
| 164 |
-
click.echo(f"\n[PACK] Processing: {pack_name}")
|
| 165 |
-
click.echo("-" * 40)
|
| 166 |
-
|
| 167 |
-
documents = self.discover_documents(pack_name)
|
| 168 |
-
click.echo(f"Found {len(documents)} documents\n")
|
| 169 |
-
|
| 170 |
-
for doc in documents:
|
| 171 |
-
self.ingest_document(doc)
|
| 172 |
-
total_docs += 1
|
| 173 |
-
|
| 174 |
-
click.echo("\n" + "=" * 60)
|
| 175 |
-
click.secho(
|
| 176 |
-
f"[OK] Load Complete: {
|
| 177 |
-
self.loaded_count} docs ingested",
|
| 178 |
-
fg="green")
|
| 179 |
-
if self.error_count > 0:
|
| 180 |
-
click.secho(f"[ERROR] Errors: {self.error_count}", fg="yellow")
|
| 181 |
-
click.echo("=" * 60 + "\n")
|
| 182 |
-
|
| 183 |
-
return self.loaded_count
|
| 184 |
-
|
| 185 |
-
|
| 186 |
-
@click.group()
|
| 187 |
-
def cli():
|
| 188 |
-
"""Warbler Pack Loader for EXP-09"""
|
| 189 |
-
pass
|
| 190 |
-
|
| 191 |
-
|
| 192 |
-
@cli.command()
|
| 193 |
-
@click.option("--api-url",
|
| 194 |
-
default="http://localhost:8000",
|
| 195 |
-
help="API service URL")
|
| 196 |
-
def load(api_url):
|
| 197 |
-
"""Load all Warbler packs into the API"""
|
| 198 |
-
loader = WarblerPackLoader(api_url)
|
| 199 |
-
|
| 200 |
-
# First, check if API is running
|
| 201 |
-
try:
|
| 202 |
-
response = loader.session.get(f"{api_url}/health", timeout=5)
|
| 203 |
-
if response.status_code == 200:
|
| 204 |
-
click.secho("[OK] API service is running", fg="green")
|
| 205 |
-
else:
|
| 206 |
-
click.secho(
|
| 207 |
-
"[ERROR] API service not responding correctly", fg="red")
|
| 208 |
-
return
|
| 209 |
-
except Exception as e:
|
| 210 |
-
click.secho(f"[ERROR] Cannot reach API at {api_url}: {e}", fg="red")
|
| 211 |
-
click.echo("\nStart the service with: docker-compose up -d")
|
| 212 |
-
return
|
| 213 |
-
|
| 214 |
-
# Load the packs
|
| 215 |
-
loaded = loader.load_all_packs()
|
| 216 |
-
|
| 217 |
-
if loaded > 0:
|
| 218 |
-
click.echo("\n[NEXT] Next Steps:")
|
| 219 |
-
click.echo(
|
| 220 |
-
" 1. Query the data with: python exp09_cli.py query --query-id q1 --semantic \"wisdom about courage\"")
|
| 221 |
-
click.echo(
|
| 222 |
-
" 2. Test hybrid scoring: python exp09_cli.py query --query-id q1 --semantic \"...\" --hybrid")
|
| 223 |
-
click.echo(" 3. Check metrics: python exp09_cli.py metrics\n")
|
| 224 |
-
|
| 225 |
-
|
| 226 |
-
@cli.command()
|
| 227 |
-
@click.option("--api-url",
|
| 228 |
-
default="http://localhost:8000",
|
| 229 |
-
help="API service URL")
|
| 230 |
-
def discover(api_url):
|
| 231 |
-
"""Discover documents in Warbler packs (no loading)"""
|
| 232 |
-
loader = WarblerPackLoader(api_url)
|
| 233 |
-
|
| 234 |
-
click.echo("\n" + "=" * 60)
|
| 235 |
-
click.echo("Discovering Warbler Pack Documents")
|
| 236 |
-
click.echo("=" * 60 + "\n")
|
| 237 |
-
|
| 238 |
-
total = 0
|
| 239 |
-
for pack_name in WARBLER_PACKS:
|
| 240 |
-
click.echo(f"\n[PACK] {pack_name}")
|
| 241 |
-
click.echo("-" * 40)
|
| 242 |
-
|
| 243 |
-
documents = loader.discover_documents(pack_name)
|
| 244 |
-
total += len(documents)
|
| 245 |
-
|
| 246 |
-
for doc in documents:
|
| 247 |
-
click.echo(f" - {doc['content_id']}")
|
| 248 |
-
if "metadata" in doc:
|
| 249 |
-
click.echo(
|
| 250 |
-
f" Realm: {
|
| 251 |
-
doc['metadata'].get(
|
| 252 |
-
'realm_type',
|
| 253 |
-
'unknown')}")
|
| 254 |
-
|
| 255 |
-
click.echo(f"\n[STATS] Total discovered: {total} documents\n")
|
| 256 |
-
|
| 257 |
-
|
| 258 |
-
if __name__ == "__main__":
|
| 259 |
-
cli()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
package-lock.json
DELETED
|
@@ -1,861 +0,0 @@
|
|
| 1 |
-
{
|
| 2 |
-
"name": "warbler-cda",
|
| 3 |
-
"version": "1.0.0",
|
| 4 |
-
"lockfileVersion": 3,
|
| 5 |
-
"requires": true,
|
| 6 |
-
"packages": {
|
| 7 |
-
"": {
|
| 8 |
-
"name": "warbler-cda",
|
| 9 |
-
"version": "1.0.0",
|
| 10 |
-
"license": "ISC",
|
| 11 |
-
"dependencies": {
|
| 12 |
-
"express": "^5.1.0",
|
| 13 |
-
"typescript": "^5.9.3"
|
| 14 |
-
}
|
| 15 |
-
},
|
| 16 |
-
"node_modules/accepts": {
|
| 17 |
-
"version": "2.0.0",
|
| 18 |
-
"resolved": "https://registry.npmjs.org/accepts/-/accepts-2.0.0.tgz",
|
| 19 |
-
"integrity": "sha512-5cvg6CtKwfgdmVqY1WIiXKc3Q1bkRqGLi+2W/6ao+6Y7gu/RCwRuAhGEzh5B4KlszSuTLgZYuqFqo5bImjNKng==",
|
| 20 |
-
"license": "MIT",
|
| 21 |
-
"dependencies": {
|
| 22 |
-
"mime-types": "^3.0.0",
|
| 23 |
-
"negotiator": "^1.0.0"
|
| 24 |
-
},
|
| 25 |
-
"engines": {
|
| 26 |
-
"node": ">= 0.6"
|
| 27 |
-
}
|
| 28 |
-
},
|
| 29 |
-
"node_modules/body-parser": {
|
| 30 |
-
"version": "2.2.0",
|
| 31 |
-
"resolved": "https://registry.npmjs.org/body-parser/-/body-parser-2.2.0.tgz",
|
| 32 |
-
"integrity": "sha512-02qvAaxv8tp7fBa/mw1ga98OGm+eCbqzJOKoRt70sLmfEEi+jyBYVTDGfCL/k06/4EMk/z01gCe7HoCH/f2LTg==",
|
| 33 |
-
"license": "MIT",
|
| 34 |
-
"dependencies": {
|
| 35 |
-
"bytes": "^3.1.2",
|
| 36 |
-
"content-type": "^1.0.5",
|
| 37 |
-
"debug": "^4.4.0",
|
| 38 |
-
"http-errors": "^2.0.0",
|
| 39 |
-
"iconv-lite": "^0.6.3",
|
| 40 |
-
"on-finished": "^2.4.1",
|
| 41 |
-
"qs": "^6.14.0",
|
| 42 |
-
"raw-body": "^3.0.0",
|
| 43 |
-
"type-is": "^2.0.0"
|
| 44 |
-
},
|
| 45 |
-
"engines": {
|
| 46 |
-
"node": ">=18"
|
| 47 |
-
}
|
| 48 |
-
},
|
| 49 |
-
"node_modules/bytes": {
|
| 50 |
-
"version": "3.1.2",
|
| 51 |
-
"resolved": "https://registry.npmjs.org/bytes/-/bytes-3.1.2.tgz",
|
| 52 |
-
"integrity": "sha512-/Nf7TyzTx6S3yRJObOAV7956r8cr2+Oj8AC5dt8wSP3BQAoeX58NoHyCU8P8zGkNXStjTSi6fzO6F0pBdcYbEg==",
|
| 53 |
-
"license": "MIT",
|
| 54 |
-
"engines": {
|
| 55 |
-
"node": ">= 0.8"
|
| 56 |
-
}
|
| 57 |
-
},
|
| 58 |
-
"node_modules/call-bind-apply-helpers": {
|
| 59 |
-
"version": "1.0.2",
|
| 60 |
-
"resolved": "https://registry.npmjs.org/call-bind-apply-helpers/-/call-bind-apply-helpers-1.0.2.tgz",
|
| 61 |
-
"integrity": "sha512-Sp1ablJ0ivDkSzjcaJdxEunN5/XvksFJ2sMBFfq6x0ryhQV/2b/KwFe21cMpmHtPOSij8K99/wSfoEuTObmuMQ==",
|
| 62 |
-
"license": "MIT",
|
| 63 |
-
"dependencies": {
|
| 64 |
-
"es-errors": "^1.3.0",
|
| 65 |
-
"function-bind": "^1.1.2"
|
| 66 |
-
},
|
| 67 |
-
"engines": {
|
| 68 |
-
"node": ">= 0.4"
|
| 69 |
-
}
|
| 70 |
-
},
|
| 71 |
-
"node_modules/call-bound": {
|
| 72 |
-
"version": "1.0.4",
|
| 73 |
-
"resolved": "https://registry.npmjs.org/call-bound/-/call-bound-1.0.4.tgz",
|
| 74 |
-
"integrity": "sha512-+ys997U96po4Kx/ABpBCqhA9EuxJaQWDQg7295H4hBphv3IZg0boBKuwYpt4YXp6MZ5AmZQnU/tyMTlRpaSejg==",
|
| 75 |
-
"license": "MIT",
|
| 76 |
-
"dependencies": {
|
| 77 |
-
"call-bind-apply-helpers": "^1.0.2",
|
| 78 |
-
"get-intrinsic": "^1.3.0"
|
| 79 |
-
},
|
| 80 |
-
"engines": {
|
| 81 |
-
"node": ">= 0.4"
|
| 82 |
-
},
|
| 83 |
-
"funding": {
|
| 84 |
-
"url": "https://github.com/sponsors/ljharb"
|
| 85 |
-
}
|
| 86 |
-
},
|
| 87 |
-
"node_modules/content-disposition": {
|
| 88 |
-
"version": "1.0.1",
|
| 89 |
-
"resolved": "https://registry.npmjs.org/content-disposition/-/content-disposition-1.0.1.tgz",
|
| 90 |
-
"integrity": "sha512-oIXISMynqSqm241k6kcQ5UwttDILMK4BiurCfGEREw6+X9jkkpEe5T9FZaApyLGGOnFuyMWZpdolTXMtvEJ08Q==",
|
| 91 |
-
"license": "MIT",
|
| 92 |
-
"engines": {
|
| 93 |
-
"node": ">=18"
|
| 94 |
-
},
|
| 95 |
-
"funding": {
|
| 96 |
-
"type": "opencollective",
|
| 97 |
-
"url": "https://opencollective.com/express"
|
| 98 |
-
}
|
| 99 |
-
},
|
| 100 |
-
"node_modules/content-type": {
|
| 101 |
-
"version": "1.0.5",
|
| 102 |
-
"resolved": "https://registry.npmjs.org/content-type/-/content-type-1.0.5.tgz",
|
| 103 |
-
"integrity": "sha512-nTjqfcBFEipKdXCv4YDQWCfmcLZKm81ldF0pAopTvyrFGVbcR6P/VAAd5G7N+0tTr8QqiU0tFadD6FK4NtJwOA==",
|
| 104 |
-
"license": "MIT",
|
| 105 |
-
"engines": {
|
| 106 |
-
"node": ">= 0.6"
|
| 107 |
-
}
|
| 108 |
-
},
|
| 109 |
-
"node_modules/cookie": {
|
| 110 |
-
"version": "0.7.2",
|
| 111 |
-
"resolved": "https://registry.npmjs.org/cookie/-/cookie-0.7.2.tgz",
|
| 112 |
-
"integrity": "sha512-yki5XnKuf750l50uGTllt6kKILY4nQ1eNIQatoXEByZ5dWgnKqbnqmTrBE5B4N7lrMJKQ2ytWMiTO2o0v6Ew/w==",
|
| 113 |
-
"license": "MIT",
|
| 114 |
-
"engines": {
|
| 115 |
-
"node": ">= 0.6"
|
| 116 |
-
}
|
| 117 |
-
},
|
| 118 |
-
"node_modules/cookie-signature": {
|
| 119 |
-
"version": "1.2.2",
|
| 120 |
-
"resolved": "https://registry.npmjs.org/cookie-signature/-/cookie-signature-1.2.2.tgz",
|
| 121 |
-
"integrity": "sha512-D76uU73ulSXrD1UXF4KE2TMxVVwhsnCgfAyTg9k8P6KGZjlXKrOLe4dJQKI3Bxi5wjesZoFXJWElNWBjPZMbhg==",
|
| 122 |
-
"license": "MIT",
|
| 123 |
-
"engines": {
|
| 124 |
-
"node": ">=6.6.0"
|
| 125 |
-
}
|
| 126 |
-
},
|
| 127 |
-
"node_modules/debug": {
|
| 128 |
-
"version": "4.4.3",
|
| 129 |
-
"resolved": "https://registry.npmjs.org/debug/-/debug-4.4.3.tgz",
|
| 130 |
-
"integrity": "sha512-RGwwWnwQvkVfavKVt22FGLw+xYSdzARwm0ru6DhTVA3umU5hZc28V3kO4stgYryrTlLpuvgI9GiijltAjNbcqA==",
|
| 131 |
-
"license": "MIT",
|
| 132 |
-
"dependencies": {
|
| 133 |
-
"ms": "^2.1.3"
|
| 134 |
-
},
|
| 135 |
-
"engines": {
|
| 136 |
-
"node": ">=6.0"
|
| 137 |
-
},
|
| 138 |
-
"peerDependenciesMeta": {
|
| 139 |
-
"supports-color": {
|
| 140 |
-
"optional": true
|
| 141 |
-
}
|
| 142 |
-
}
|
| 143 |
-
},
|
| 144 |
-
"node_modules/depd": {
|
| 145 |
-
"version": "2.0.0",
|
| 146 |
-
"resolved": "https://registry.npmjs.org/depd/-/depd-2.0.0.tgz",
|
| 147 |
-
"integrity": "sha512-g7nH6P6dyDioJogAAGprGpCtVImJhpPk/roCzdb3fIh61/s/nPsfR6onyMwkCAR/OlC3yBC0lESvUoQEAssIrw==",
|
| 148 |
-
"license": "MIT",
|
| 149 |
-
"engines": {
|
| 150 |
-
"node": ">= 0.8"
|
| 151 |
-
}
|
| 152 |
-
},
|
| 153 |
-
"node_modules/dunder-proto": {
|
| 154 |
-
"version": "1.0.1",
|
| 155 |
-
"resolved": "https://registry.npmjs.org/dunder-proto/-/dunder-proto-1.0.1.tgz",
|
| 156 |
-
"integrity": "sha512-KIN/nDJBQRcXw0MLVhZE9iQHmG68qAVIBg9CqmUYjmQIhgij9U5MFvrqkUL5FbtyyzZuOeOt0zdeRe4UY7ct+A==",
|
| 157 |
-
"license": "MIT",
|
| 158 |
-
"dependencies": {
|
| 159 |
-
"call-bind-apply-helpers": "^1.0.1",
|
| 160 |
-
"es-errors": "^1.3.0",
|
| 161 |
-
"gopd": "^1.2.0"
|
| 162 |
-
},
|
| 163 |
-
"engines": {
|
| 164 |
-
"node": ">= 0.4"
|
| 165 |
-
}
|
| 166 |
-
},
|
| 167 |
-
"node_modules/ee-first": {
|
| 168 |
-
"version": "1.1.1",
|
| 169 |
-
"resolved": "https://registry.npmjs.org/ee-first/-/ee-first-1.1.1.tgz",
|
| 170 |
-
"integrity": "sha512-WMwm9LhRUo+WUaRN+vRuETqG89IgZphVSNkdFgeb6sS/E4OrDIN7t48CAewSHXc6C8lefD8KKfr5vY61brQlow==",
|
| 171 |
-
"license": "MIT"
|
| 172 |
-
},
|
| 173 |
-
"node_modules/encodeurl": {
|
| 174 |
-
"version": "2.0.0",
|
| 175 |
-
"resolved": "https://registry.npmjs.org/encodeurl/-/encodeurl-2.0.0.tgz",
|
| 176 |
-
"integrity": "sha512-Q0n9HRi4m6JuGIV1eFlmvJB7ZEVxu93IrMyiMsGC0lrMJMWzRgx6WGquyfQgZVb31vhGgXnfmPNNXmxnOkRBrg==",
|
| 177 |
-
"license": "MIT",
|
| 178 |
-
"engines": {
|
| 179 |
-
"node": ">= 0.8"
|
| 180 |
-
}
|
| 181 |
-
},
|
| 182 |
-
"node_modules/es-define-property": {
|
| 183 |
-
"version": "1.0.1",
|
| 184 |
-
"resolved": "https://registry.npmjs.org/es-define-property/-/es-define-property-1.0.1.tgz",
|
| 185 |
-
"integrity": "sha512-e3nRfgfUZ4rNGL232gUgX06QNyyez04KdjFrF+LTRoOXmrOgFKDg4BCdsjW8EnT69eqdYGmRpJwiPVYNrCaW3g==",
|
| 186 |
-
"license": "MIT",
|
| 187 |
-
"engines": {
|
| 188 |
-
"node": ">= 0.4"
|
| 189 |
-
}
|
| 190 |
-
},
|
| 191 |
-
"node_modules/es-errors": {
|
| 192 |
-
"version": "1.3.0",
|
| 193 |
-
"resolved": "https://registry.npmjs.org/es-errors/-/es-errors-1.3.0.tgz",
|
| 194 |
-
"integrity": "sha512-Zf5H2Kxt2xjTvbJvP2ZWLEICxA6j+hAmMzIlypy4xcBg1vKVnx89Wy0GbS+kf5cwCVFFzdCFh2XSCFNULS6csw==",
|
| 195 |
-
"license": "MIT",
|
| 196 |
-
"engines": {
|
| 197 |
-
"node": ">= 0.4"
|
| 198 |
-
}
|
| 199 |
-
},
|
| 200 |
-
"node_modules/es-object-atoms": {
|
| 201 |
-
"version": "1.1.1",
|
| 202 |
-
"resolved": "https://registry.npmjs.org/es-object-atoms/-/es-object-atoms-1.1.1.tgz",
|
| 203 |
-
"integrity": "sha512-FGgH2h8zKNim9ljj7dankFPcICIK9Cp5bm+c2gQSYePhpaG5+esrLODihIorn+Pe6FGJzWhXQotPv73jTaldXA==",
|
| 204 |
-
"license": "MIT",
|
| 205 |
-
"dependencies": {
|
| 206 |
-
"es-errors": "^1.3.0"
|
| 207 |
-
},
|
| 208 |
-
"engines": {
|
| 209 |
-
"node": ">= 0.4"
|
| 210 |
-
}
|
| 211 |
-
},
|
| 212 |
-
"node_modules/escape-html": {
|
| 213 |
-
"version": "1.0.3",
|
| 214 |
-
"resolved": "https://registry.npmjs.org/escape-html/-/escape-html-1.0.3.tgz",
|
| 215 |
-
"integrity": "sha512-NiSupZ4OeuGwr68lGIeym/ksIZMJodUGOSCZ/FSnTxcrekbvqrgdUxlJOMpijaKZVjAJrWrGs/6Jy8OMuyj9ow==",
|
| 216 |
-
"license": "MIT"
|
| 217 |
-
},
|
| 218 |
-
"node_modules/etag": {
|
| 219 |
-
"version": "1.8.1",
|
| 220 |
-
"resolved": "https://registry.npmjs.org/etag/-/etag-1.8.1.tgz",
|
| 221 |
-
"integrity": "sha512-aIL5Fx7mawVa300al2BnEE4iNvo1qETxLrPI/o05L7z6go7fCw1J6EQmbK4FmJ2AS7kgVF/KEZWufBfdClMcPg==",
|
| 222 |
-
"license": "MIT",
|
| 223 |
-
"engines": {
|
| 224 |
-
"node": ">= 0.6"
|
| 225 |
-
}
|
| 226 |
-
},
|
| 227 |
-
"node_modules/express": {
|
| 228 |
-
"version": "5.1.0",
|
| 229 |
-
"resolved": "https://registry.npmjs.org/express/-/express-5.1.0.tgz",
|
| 230 |
-
"integrity": "sha512-DT9ck5YIRU+8GYzzU5kT3eHGA5iL+1Zd0EutOmTE9Dtk+Tvuzd23VBU+ec7HPNSTxXYO55gPV/hq4pSBJDjFpA==",
|
| 231 |
-
"license": "MIT",
|
| 232 |
-
"dependencies": {
|
| 233 |
-
"accepts": "^2.0.0",
|
| 234 |
-
"body-parser": "^2.2.0",
|
| 235 |
-
"content-disposition": "^1.0.0",
|
| 236 |
-
"content-type": "^1.0.5",
|
| 237 |
-
"cookie": "^0.7.1",
|
| 238 |
-
"cookie-signature": "^1.2.1",
|
| 239 |
-
"debug": "^4.4.0",
|
| 240 |
-
"encodeurl": "^2.0.0",
|
| 241 |
-
"escape-html": "^1.0.3",
|
| 242 |
-
"etag": "^1.8.1",
|
| 243 |
-
"finalhandler": "^2.1.0",
|
| 244 |
-
"fresh": "^2.0.0",
|
| 245 |
-
"http-errors": "^2.0.0",
|
| 246 |
-
"merge-descriptors": "^2.0.0",
|
| 247 |
-
"mime-types": "^3.0.0",
|
| 248 |
-
"on-finished": "^2.4.1",
|
| 249 |
-
"once": "^1.4.0",
|
| 250 |
-
"parseurl": "^1.3.3",
|
| 251 |
-
"proxy-addr": "^2.0.7",
|
| 252 |
-
"qs": "^6.14.0",
|
| 253 |
-
"range-parser": "^1.2.1",
|
| 254 |
-
"router": "^2.2.0",
|
| 255 |
-
"send": "^1.1.0",
|
| 256 |
-
"serve-static": "^2.2.0",
|
| 257 |
-
"statuses": "^2.0.1",
|
| 258 |
-
"type-is": "^2.0.1",
|
| 259 |
-
"vary": "^1.1.2"
|
| 260 |
-
},
|
| 261 |
-
"engines": {
|
| 262 |
-
"node": ">= 18"
|
| 263 |
-
},
|
| 264 |
-
"funding": {
|
| 265 |
-
"type": "opencollective",
|
| 266 |
-
"url": "https://opencollective.com/express"
|
| 267 |
-
}
|
| 268 |
-
},
|
| 269 |
-
"node_modules/finalhandler": {
|
| 270 |
-
"version": "2.1.0",
|
| 271 |
-
"resolved": "https://registry.npmjs.org/finalhandler/-/finalhandler-2.1.0.tgz",
|
| 272 |
-
"integrity": "sha512-/t88Ty3d5JWQbWYgaOGCCYfXRwV1+be02WqYYlL6h0lEiUAMPM8o8qKGO01YIkOHzka2up08wvgYD0mDiI+q3Q==",
|
| 273 |
-
"license": "MIT",
|
| 274 |
-
"dependencies": {
|
| 275 |
-
"debug": "^4.4.0",
|
| 276 |
-
"encodeurl": "^2.0.0",
|
| 277 |
-
"escape-html": "^1.0.3",
|
| 278 |
-
"on-finished": "^2.4.1",
|
| 279 |
-
"parseurl": "^1.3.3",
|
| 280 |
-
"statuses": "^2.0.1"
|
| 281 |
-
},
|
| 282 |
-
"engines": {
|
| 283 |
-
"node": ">= 0.8"
|
| 284 |
-
}
|
| 285 |
-
},
|
| 286 |
-
"node_modules/forwarded": {
|
| 287 |
-
"version": "0.2.0",
|
| 288 |
-
"resolved": "https://registry.npmjs.org/forwarded/-/forwarded-0.2.0.tgz",
|
| 289 |
-
"integrity": "sha512-buRG0fpBtRHSTCOASe6hD258tEubFoRLb4ZNA6NxMVHNw2gOcwHo9wyablzMzOA5z9xA9L1KNjk/Nt6MT9aYow==",
|
| 290 |
-
"license": "MIT",
|
| 291 |
-
"engines": {
|
| 292 |
-
"node": ">= 0.6"
|
| 293 |
-
}
|
| 294 |
-
},
|
| 295 |
-
"node_modules/fresh": {
|
| 296 |
-
"version": "2.0.0",
|
| 297 |
-
"resolved": "https://registry.npmjs.org/fresh/-/fresh-2.0.0.tgz",
|
| 298 |
-
"integrity": "sha512-Rx/WycZ60HOaqLKAi6cHRKKI7zxWbJ31MhntmtwMoaTeF7XFH9hhBp8vITaMidfljRQ6eYWCKkaTK+ykVJHP2A==",
|
| 299 |
-
"license": "MIT",
|
| 300 |
-
"engines": {
|
| 301 |
-
"node": ">= 0.8"
|
| 302 |
-
}
|
| 303 |
-
},
|
| 304 |
-
"node_modules/function-bind": {
|
| 305 |
-
"version": "1.1.2",
|
| 306 |
-
"resolved": "https://registry.npmjs.org/function-bind/-/function-bind-1.1.2.tgz",
|
| 307 |
-
"integrity": "sha512-7XHNxH7qX9xG5mIwxkhumTox/MIRNcOgDrxWsMt2pAr23WHp6MrRlN7FBSFpCpr+oVO0F744iUgR82nJMfG2SA==",
|
| 308 |
-
"license": "MIT",
|
| 309 |
-
"funding": {
|
| 310 |
-
"url": "https://github.com/sponsors/ljharb"
|
| 311 |
-
}
|
| 312 |
-
},
|
| 313 |
-
"node_modules/get-intrinsic": {
|
| 314 |
-
"version": "1.3.0",
|
| 315 |
-
"resolved": "https://registry.npmjs.org/get-intrinsic/-/get-intrinsic-1.3.0.tgz",
|
| 316 |
-
"integrity": "sha512-9fSjSaos/fRIVIp+xSJlE6lfwhES7LNtKaCBIamHsjr2na1BiABJPo0mOjjz8GJDURarmCPGqaiVg5mfjb98CQ==",
|
| 317 |
-
"license": "MIT",
|
| 318 |
-
"dependencies": {
|
| 319 |
-
"call-bind-apply-helpers": "^1.0.2",
|
| 320 |
-
"es-define-property": "^1.0.1",
|
| 321 |
-
"es-errors": "^1.3.0",
|
| 322 |
-
"es-object-atoms": "^1.1.1",
|
| 323 |
-
"function-bind": "^1.1.2",
|
| 324 |
-
"get-proto": "^1.0.1",
|
| 325 |
-
"gopd": "^1.2.0",
|
| 326 |
-
"has-symbols": "^1.1.0",
|
| 327 |
-
"hasown": "^2.0.2",
|
| 328 |
-
"math-intrinsics": "^1.1.0"
|
| 329 |
-
},
|
| 330 |
-
"engines": {
|
| 331 |
-
"node": ">= 0.4"
|
| 332 |
-
},
|
| 333 |
-
"funding": {
|
| 334 |
-
"url": "https://github.com/sponsors/ljharb"
|
| 335 |
-
}
|
| 336 |
-
},
|
| 337 |
-
"node_modules/get-proto": {
|
| 338 |
-
"version": "1.0.1",
|
| 339 |
-
"resolved": "https://registry.npmjs.org/get-proto/-/get-proto-1.0.1.tgz",
|
| 340 |
-
"integrity": "sha512-sTSfBjoXBp89JvIKIefqw7U2CCebsc74kiY6awiGogKtoSGbgjYE/G/+l9sF3MWFPNc9IcoOC4ODfKHfxFmp0g==",
|
| 341 |
-
"license": "MIT",
|
| 342 |
-
"dependencies": {
|
| 343 |
-
"dunder-proto": "^1.0.1",
|
| 344 |
-
"es-object-atoms": "^1.0.0"
|
| 345 |
-
},
|
| 346 |
-
"engines": {
|
| 347 |
-
"node": ">= 0.4"
|
| 348 |
-
}
|
| 349 |
-
},
|
| 350 |
-
"node_modules/gopd": {
|
| 351 |
-
"version": "1.2.0",
|
| 352 |
-
"resolved": "https://registry.npmjs.org/gopd/-/gopd-1.2.0.tgz",
|
| 353 |
-
"integrity": "sha512-ZUKRh6/kUFoAiTAtTYPZJ3hw9wNxx+BIBOijnlG9PnrJsCcSjs1wyyD6vJpaYtgnzDrKYRSqf3OO6Rfa93xsRg==",
|
| 354 |
-
"license": "MIT",
|
| 355 |
-
"engines": {
|
| 356 |
-
"node": ">= 0.4"
|
| 357 |
-
},
|
| 358 |
-
"funding": {
|
| 359 |
-
"url": "https://github.com/sponsors/ljharb"
|
| 360 |
-
}
|
| 361 |
-
},
|
| 362 |
-
"node_modules/has-symbols": {
|
| 363 |
-
"version": "1.1.0",
|
| 364 |
-
"resolved": "https://registry.npmjs.org/has-symbols/-/has-symbols-1.1.0.tgz",
|
| 365 |
-
"integrity": "sha512-1cDNdwJ2Jaohmb3sg4OmKaMBwuC48sYni5HUw2DvsC8LjGTLK9h+eb1X6RyuOHe4hT0ULCW68iomhjUoKUqlPQ==",
|
| 366 |
-
"license": "MIT",
|
| 367 |
-
"engines": {
|
| 368 |
-
"node": ">= 0.4"
|
| 369 |
-
},
|
| 370 |
-
"funding": {
|
| 371 |
-
"url": "https://github.com/sponsors/ljharb"
|
| 372 |
-
}
|
| 373 |
-
},
|
| 374 |
-
"node_modules/hasown": {
|
| 375 |
-
"version": "2.0.2",
|
| 376 |
-
"resolved": "https://registry.npmjs.org/hasown/-/hasown-2.0.2.tgz",
|
| 377 |
-
"integrity": "sha512-0hJU9SCPvmMzIBdZFqNPXWa6dqh7WdH0cII9y+CyS8rG3nL48Bclra9HmKhVVUHyPWNH5Y7xDwAB7bfgSjkUMQ==",
|
| 378 |
-
"license": "MIT",
|
| 379 |
-
"dependencies": {
|
| 380 |
-
"function-bind": "^1.1.2"
|
| 381 |
-
},
|
| 382 |
-
"engines": {
|
| 383 |
-
"node": ">= 0.4"
|
| 384 |
-
}
|
| 385 |
-
},
|
| 386 |
-
"node_modules/http-errors": {
|
| 387 |
-
"version": "2.0.1",
|
| 388 |
-
"resolved": "https://registry.npmjs.org/http-errors/-/http-errors-2.0.1.tgz",
|
| 389 |
-
"integrity": "sha512-4FbRdAX+bSdmo4AUFuS0WNiPz8NgFt+r8ThgNWmlrjQjt1Q7ZR9+zTlce2859x4KSXrwIsaeTqDoKQmtP8pLmQ==",
|
| 390 |
-
"license": "MIT",
|
| 391 |
-
"dependencies": {
|
| 392 |
-
"depd": "~2.0.0",
|
| 393 |
-
"inherits": "~2.0.4",
|
| 394 |
-
"setprototypeof": "~1.2.0",
|
| 395 |
-
"statuses": "~2.0.2",
|
| 396 |
-
"toidentifier": "~1.0.1"
|
| 397 |
-
},
|
| 398 |
-
"engines": {
|
| 399 |
-
"node": ">= 0.8"
|
| 400 |
-
},
|
| 401 |
-
"funding": {
|
| 402 |
-
"type": "opencollective",
|
| 403 |
-
"url": "https://opencollective.com/express"
|
| 404 |
-
}
|
| 405 |
-
},
|
| 406 |
-
"node_modules/iconv-lite": {
|
| 407 |
-
"version": "0.6.3",
|
| 408 |
-
"resolved": "https://registry.npmjs.org/iconv-lite/-/iconv-lite-0.6.3.tgz",
|
| 409 |
-
"integrity": "sha512-4fCk79wshMdzMp2rH06qWrJE4iolqLhCUH+OiuIgU++RB0+94NlDL81atO7GX55uUKueo0txHNtvEyI6D7WdMw==",
|
| 410 |
-
"license": "MIT",
|
| 411 |
-
"dependencies": {
|
| 412 |
-
"safer-buffer": ">= 2.1.2 < 3.0.0"
|
| 413 |
-
},
|
| 414 |
-
"engines": {
|
| 415 |
-
"node": ">=0.10.0"
|
| 416 |
-
}
|
| 417 |
-
},
|
| 418 |
-
"node_modules/inherits": {
|
| 419 |
-
"version": "2.0.4",
|
| 420 |
-
"resolved": "https://registry.npmjs.org/inherits/-/inherits-2.0.4.tgz",
|
| 421 |
-
"integrity": "sha512-k/vGaX4/Yla3WzyMCvTQOXYeIHvqOKtnqBduzTHpzpQZzAskKMhZ2K+EnBiSM9zGSoIFeMpXKxa4dYeZIQqewQ==",
|
| 422 |
-
"license": "ISC"
|
| 423 |
-
},
|
| 424 |
-
"node_modules/ipaddr.js": {
|
| 425 |
-
"version": "1.9.1",
|
| 426 |
-
"resolved": "https://registry.npmjs.org/ipaddr.js/-/ipaddr.js-1.9.1.tgz",
|
| 427 |
-
"integrity": "sha512-0KI/607xoxSToH7GjN1FfSbLoU0+btTicjsQSWQlh/hZykN8KpmMf7uYwPW3R+akZ6R/w18ZlXSHBYXiYUPO3g==",
|
| 428 |
-
"license": "MIT",
|
| 429 |
-
"engines": {
|
| 430 |
-
"node": ">= 0.10"
|
| 431 |
-
}
|
| 432 |
-
},
|
| 433 |
-
"node_modules/is-promise": {
|
| 434 |
-
"version": "4.0.0",
|
| 435 |
-
"resolved": "https://registry.npmjs.org/is-promise/-/is-promise-4.0.0.tgz",
|
| 436 |
-
"integrity": "sha512-hvpoI6korhJMnej285dSg6nu1+e6uxs7zG3BYAm5byqDsgJNWwxzM6z6iZiAgQR4TJ30JmBTOwqZUw3WlyH3AQ==",
|
| 437 |
-
"license": "MIT"
|
| 438 |
-
},
|
| 439 |
-
"node_modules/math-intrinsics": {
|
| 440 |
-
"version": "1.1.0",
|
| 441 |
-
"resolved": "https://registry.npmjs.org/math-intrinsics/-/math-intrinsics-1.1.0.tgz",
|
| 442 |
-
"integrity": "sha512-/IXtbwEk5HTPyEwyKX6hGkYXxM9nbj64B+ilVJnC/R6B0pH5G4V3b0pVbL7DBj4tkhBAppbQUlf6F6Xl9LHu1g==",
|
| 443 |
-
"license": "MIT",
|
| 444 |
-
"engines": {
|
| 445 |
-
"node": ">= 0.4"
|
| 446 |
-
}
|
| 447 |
-
},
|
| 448 |
-
"node_modules/media-typer": {
|
| 449 |
-
"version": "1.1.0",
|
| 450 |
-
"resolved": "https://registry.npmjs.org/media-typer/-/media-typer-1.1.0.tgz",
|
| 451 |
-
"integrity": "sha512-aisnrDP4GNe06UcKFnV5bfMNPBUw4jsLGaWwWfnH3v02GnBuXX2MCVn5RbrWo0j3pczUilYblq7fQ7Nw2t5XKw==",
|
| 452 |
-
"license": "MIT",
|
| 453 |
-
"engines": {
|
| 454 |
-
"node": ">= 0.8"
|
| 455 |
-
}
|
| 456 |
-
},
|
| 457 |
-
"node_modules/merge-descriptors": {
|
| 458 |
-
"version": "2.0.0",
|
| 459 |
-
"resolved": "https://registry.npmjs.org/merge-descriptors/-/merge-descriptors-2.0.0.tgz",
|
| 460 |
-
"integrity": "sha512-Snk314V5ayFLhp3fkUREub6WtjBfPdCPY1Ln8/8munuLuiYhsABgBVWsozAG+MWMbVEvcdcpbi9R7ww22l9Q3g==",
|
| 461 |
-
"license": "MIT",
|
| 462 |
-
"engines": {
|
| 463 |
-
"node": ">=18"
|
| 464 |
-
},
|
| 465 |
-
"funding": {
|
| 466 |
-
"url": "https://github.com/sponsors/sindresorhus"
|
| 467 |
-
}
|
| 468 |
-
},
|
| 469 |
-
"node_modules/mime-db": {
|
| 470 |
-
"version": "1.54.0",
|
| 471 |
-
"resolved": "https://registry.npmjs.org/mime-db/-/mime-db-1.54.0.tgz",
|
| 472 |
-
"integrity": "sha512-aU5EJuIN2WDemCcAp2vFBfp/m4EAhWJnUNSSw0ixs7/kXbd6Pg64EmwJkNdFhB8aWt1sH2CTXrLxo/iAGV3oPQ==",
|
| 473 |
-
"license": "MIT",
|
| 474 |
-
"engines": {
|
| 475 |
-
"node": ">= 0.6"
|
| 476 |
-
}
|
| 477 |
-
},
|
| 478 |
-
"node_modules/mime-types": {
|
| 479 |
-
"version": "3.0.2",
|
| 480 |
-
"resolved": "https://registry.npmjs.org/mime-types/-/mime-types-3.0.2.tgz",
|
| 481 |
-
"integrity": "sha512-Lbgzdk0h4juoQ9fCKXW4by0UJqj+nOOrI9MJ1sSj4nI8aI2eo1qmvQEie4VD1glsS250n15LsWsYtCugiStS5A==",
|
| 482 |
-
"license": "MIT",
|
| 483 |
-
"dependencies": {
|
| 484 |
-
"mime-db": "^1.54.0"
|
| 485 |
-
},
|
| 486 |
-
"engines": {
|
| 487 |
-
"node": ">=18"
|
| 488 |
-
},
|
| 489 |
-
"funding": {
|
| 490 |
-
"type": "opencollective",
|
| 491 |
-
"url": "https://opencollective.com/express"
|
| 492 |
-
}
|
| 493 |
-
},
|
| 494 |
-
"node_modules/ms": {
|
| 495 |
-
"version": "2.1.3",
|
| 496 |
-
"resolved": "https://registry.npmjs.org/ms/-/ms-2.1.3.tgz",
|
| 497 |
-
"integrity": "sha512-6FlzubTLZG3J2a/NVCAleEhjzq5oxgHyaCU9yYXvcLsvoVaHJq/s5xXI6/XXP6tz7R9xAOtHnSO/tXtF3WRTlA==",
|
| 498 |
-
"license": "MIT"
|
| 499 |
-
},
|
| 500 |
-
"node_modules/negotiator": {
|
| 501 |
-
"version": "1.0.0",
|
| 502 |
-
"resolved": "https://registry.npmjs.org/negotiator/-/negotiator-1.0.0.tgz",
|
| 503 |
-
"integrity": "sha512-8Ofs/AUQh8MaEcrlq5xOX0CQ9ypTF5dl78mjlMNfOK08fzpgTHQRQPBxcPlEtIw0yRpws+Zo/3r+5WRby7u3Gg==",
|
| 504 |
-
"license": "MIT",
|
| 505 |
-
"engines": {
|
| 506 |
-
"node": ">= 0.6"
|
| 507 |
-
}
|
| 508 |
-
},
|
| 509 |
-
"node_modules/object-inspect": {
|
| 510 |
-
"version": "1.13.4",
|
| 511 |
-
"resolved": "https://registry.npmjs.org/object-inspect/-/object-inspect-1.13.4.tgz",
|
| 512 |
-
"integrity": "sha512-W67iLl4J2EXEGTbfeHCffrjDfitvLANg0UlX3wFUUSTx92KXRFegMHUVgSqE+wvhAbi4WqjGg9czysTV2Epbew==",
|
| 513 |
-
"license": "MIT",
|
| 514 |
-
"engines": {
|
| 515 |
-
"node": ">= 0.4"
|
| 516 |
-
},
|
| 517 |
-
"funding": {
|
| 518 |
-
"url": "https://github.com/sponsors/ljharb"
|
| 519 |
-
}
|
| 520 |
-
},
|
| 521 |
-
"node_modules/on-finished": {
|
| 522 |
-
"version": "2.4.1",
|
| 523 |
-
"resolved": "https://registry.npmjs.org/on-finished/-/on-finished-2.4.1.tgz",
|
| 524 |
-
"integrity": "sha512-oVlzkg3ENAhCk2zdv7IJwd/QUD4z2RxRwpkcGY8psCVcCYZNq4wYnVWALHM+brtuJjePWiYF/ClmuDr8Ch5+kg==",
|
| 525 |
-
"license": "MIT",
|
| 526 |
-
"dependencies": {
|
| 527 |
-
"ee-first": "1.1.1"
|
| 528 |
-
},
|
| 529 |
-
"engines": {
|
| 530 |
-
"node": ">= 0.8"
|
| 531 |
-
}
|
| 532 |
-
},
|
| 533 |
-
"node_modules/once": {
|
| 534 |
-
"version": "1.4.0",
|
| 535 |
-
"resolved": "https://registry.npmjs.org/once/-/once-1.4.0.tgz",
|
| 536 |
-
"integrity": "sha512-lNaJgI+2Q5URQBkccEKHTQOPaXdUxnZZElQTZY0MFUAuaEqe1E+Nyvgdz/aIyNi6Z9MzO5dv1H8n58/GELp3+w==",
|
| 537 |
-
"license": "ISC",
|
| 538 |
-
"dependencies": {
|
| 539 |
-
"wrappy": "1"
|
| 540 |
-
}
|
| 541 |
-
},
|
| 542 |
-
"node_modules/parseurl": {
|
| 543 |
-
"version": "1.3.3",
|
| 544 |
-
"resolved": "https://registry.npmjs.org/parseurl/-/parseurl-1.3.3.tgz",
|
| 545 |
-
"integrity": "sha512-CiyeOxFT/JZyN5m0z9PfXw4SCBJ6Sygz1Dpl0wqjlhDEGGBP1GnsUVEL0p63hoG1fcj3fHynXi9NYO4nWOL+qQ==",
|
| 546 |
-
"license": "MIT",
|
| 547 |
-
"engines": {
|
| 548 |
-
"node": ">= 0.8"
|
| 549 |
-
}
|
| 550 |
-
},
|
| 551 |
-
"node_modules/path-to-regexp": {
|
| 552 |
-
"version": "8.3.0",
|
| 553 |
-
"resolved": "https://registry.npmjs.org/path-to-regexp/-/path-to-regexp-8.3.0.tgz",
|
| 554 |
-
"integrity": "sha512-7jdwVIRtsP8MYpdXSwOS0YdD0Du+qOoF/AEPIt88PcCFrZCzx41oxku1jD88hZBwbNUIEfpqvuhjFaMAqMTWnA==",
|
| 555 |
-
"license": "MIT",
|
| 556 |
-
"funding": {
|
| 557 |
-
"type": "opencollective",
|
| 558 |
-
"url": "https://opencollective.com/express"
|
| 559 |
-
}
|
| 560 |
-
},
|
| 561 |
-
"node_modules/proxy-addr": {
|
| 562 |
-
"version": "2.0.7",
|
| 563 |
-
"resolved": "https://registry.npmjs.org/proxy-addr/-/proxy-addr-2.0.7.tgz",
|
| 564 |
-
"integrity": "sha512-llQsMLSUDUPT44jdrU/O37qlnifitDP+ZwrmmZcoSKyLKvtZxpyV0n2/bD/N4tBAAZ/gJEdZU7KMraoK1+XYAg==",
|
| 565 |
-
"license": "MIT",
|
| 566 |
-
"dependencies": {
|
| 567 |
-
"forwarded": "0.2.0",
|
| 568 |
-
"ipaddr.js": "1.9.1"
|
| 569 |
-
},
|
| 570 |
-
"engines": {
|
| 571 |
-
"node": ">= 0.10"
|
| 572 |
-
}
|
| 573 |
-
},
|
| 574 |
-
"node_modules/qs": {
|
| 575 |
-
"version": "6.14.0",
|
| 576 |
-
"resolved": "https://registry.npmjs.org/qs/-/qs-6.14.0.tgz",
|
| 577 |
-
"integrity": "sha512-YWWTjgABSKcvs/nWBi9PycY/JiPJqOD4JA6o9Sej2AtvSGarXxKC3OQSk4pAarbdQlKAh5D4FCQkJNkW+GAn3w==",
|
| 578 |
-
"license": "BSD-3-Clause",
|
| 579 |
-
"dependencies": {
|
| 580 |
-
"side-channel": "^1.1.0"
|
| 581 |
-
},
|
| 582 |
-
"engines": {
|
| 583 |
-
"node": ">=0.6"
|
| 584 |
-
},
|
| 585 |
-
"funding": {
|
| 586 |
-
"url": "https://github.com/sponsors/ljharb"
|
| 587 |
-
}
|
| 588 |
-
},
|
| 589 |
-
"node_modules/range-parser": {
|
| 590 |
-
"version": "1.2.1",
|
| 591 |
-
"resolved": "https://registry.npmjs.org/range-parser/-/range-parser-1.2.1.tgz",
|
| 592 |
-
"integrity": "sha512-Hrgsx+orqoygnmhFbKaHE6c296J+HTAQXoxEF6gNupROmmGJRoyzfG3ccAveqCBrwr/2yxQ5BVd/GTl5agOwSg==",
|
| 593 |
-
"license": "MIT",
|
| 594 |
-
"engines": {
|
| 595 |
-
"node": ">= 0.6"
|
| 596 |
-
}
|
| 597 |
-
},
|
| 598 |
-
"node_modules/raw-body": {
|
| 599 |
-
"version": "3.0.1",
|
| 600 |
-
"resolved": "https://registry.npmjs.org/raw-body/-/raw-body-3.0.1.tgz",
|
| 601 |
-
"integrity": "sha512-9G8cA+tuMS75+6G/TzW8OtLzmBDMo8p1JRxN5AZ+LAp8uxGA8V8GZm4GQ4/N5QNQEnLmg6SS7wyuSmbKepiKqA==",
|
| 602 |
-
"license": "MIT",
|
| 603 |
-
"dependencies": {
|
| 604 |
-
"bytes": "3.1.2",
|
| 605 |
-
"http-errors": "2.0.0",
|
| 606 |
-
"iconv-lite": "0.7.0",
|
| 607 |
-
"unpipe": "1.0.0"
|
| 608 |
-
},
|
| 609 |
-
"engines": {
|
| 610 |
-
"node": ">= 0.10"
|
| 611 |
-
}
|
| 612 |
-
},
|
| 613 |
-
"node_modules/raw-body/node_modules/http-errors": {
|
| 614 |
-
"version": "2.0.0",
|
| 615 |
-
"resolved": "https://registry.npmjs.org/http-errors/-/http-errors-2.0.0.tgz",
|
| 616 |
-
"integrity": "sha512-FtwrG/euBzaEjYeRqOgly7G0qviiXoJWnvEH2Z1plBdXgbyjv34pHTSb9zoeHMyDy33+DWy5Wt9Wo+TURtOYSQ==",
|
| 617 |
-
"license": "MIT",
|
| 618 |
-
"dependencies": {
|
| 619 |
-
"depd": "2.0.0",
|
| 620 |
-
"inherits": "2.0.4",
|
| 621 |
-
"setprototypeof": "1.2.0",
|
| 622 |
-
"statuses": "2.0.1",
|
| 623 |
-
"toidentifier": "1.0.1"
|
| 624 |
-
},
|
| 625 |
-
"engines": {
|
| 626 |
-
"node": ">= 0.8"
|
| 627 |
-
}
|
| 628 |
-
},
|
| 629 |
-
"node_modules/raw-body/node_modules/iconv-lite": {
|
| 630 |
-
"version": "0.7.0",
|
| 631 |
-
"resolved": "https://registry.npmjs.org/iconv-lite/-/iconv-lite-0.7.0.tgz",
|
| 632 |
-
"integrity": "sha512-cf6L2Ds3h57VVmkZe+Pn+5APsT7FpqJtEhhieDCvrE2MK5Qk9MyffgQyuxQTm6BChfeZNtcOLHp9IcWRVcIcBQ==",
|
| 633 |
-
"license": "MIT",
|
| 634 |
-
"dependencies": {
|
| 635 |
-
"safer-buffer": ">= 2.1.2 < 3.0.0"
|
| 636 |
-
},
|
| 637 |
-
"engines": {
|
| 638 |
-
"node": ">=0.10.0"
|
| 639 |
-
},
|
| 640 |
-
"funding": {
|
| 641 |
-
"type": "opencollective",
|
| 642 |
-
"url": "https://opencollective.com/express"
|
| 643 |
-
}
|
| 644 |
-
},
|
| 645 |
-
"node_modules/raw-body/node_modules/statuses": {
|
| 646 |
-
"version": "2.0.1",
|
| 647 |
-
"resolved": "https://registry.npmjs.org/statuses/-/statuses-2.0.1.tgz",
|
| 648 |
-
"integrity": "sha512-RwNA9Z/7PrK06rYLIzFMlaF+l73iwpzsqRIFgbMLbTcLD6cOao82TaWefPXQvB2fOC4AjuYSEndS7N/mTCbkdQ==",
|
| 649 |
-
"license": "MIT",
|
| 650 |
-
"engines": {
|
| 651 |
-
"node": ">= 0.8"
|
| 652 |
-
}
|
| 653 |
-
},
|
| 654 |
-
"node_modules/router": {
|
| 655 |
-
"version": "2.2.0",
|
| 656 |
-
"resolved": "https://registry.npmjs.org/router/-/router-2.2.0.tgz",
|
| 657 |
-
"integrity": "sha512-nLTrUKm2UyiL7rlhapu/Zl45FwNgkZGaCpZbIHajDYgwlJCOzLSk+cIPAnsEqV955GjILJnKbdQC1nVPz+gAYQ==",
|
| 658 |
-
"license": "MIT",
|
| 659 |
-
"dependencies": {
|
| 660 |
-
"debug": "^4.4.0",
|
| 661 |
-
"depd": "^2.0.0",
|
| 662 |
-
"is-promise": "^4.0.0",
|
| 663 |
-
"parseurl": "^1.3.3",
|
| 664 |
-
"path-to-regexp": "^8.0.0"
|
| 665 |
-
},
|
| 666 |
-
"engines": {
|
| 667 |
-
"node": ">= 18"
|
| 668 |
-
}
|
| 669 |
-
},
|
| 670 |
-
"node_modules/safer-buffer": {
|
| 671 |
-
"version": "2.1.2",
|
| 672 |
-
"resolved": "https://registry.npmjs.org/safer-buffer/-/safer-buffer-2.1.2.tgz",
|
| 673 |
-
"integrity": "sha512-YZo3K82SD7Riyi0E1EQPojLz7kpepnSQI9IyPbHHg1XXXevb5dJI7tpyN2ADxGcQbHG7vcyRHk0cbwqcQriUtg==",
|
| 674 |
-
"license": "MIT"
|
| 675 |
-
},
|
| 676 |
-
"node_modules/send": {
|
| 677 |
-
"version": "1.2.0",
|
| 678 |
-
"resolved": "https://registry.npmjs.org/send/-/send-1.2.0.tgz",
|
| 679 |
-
"integrity": "sha512-uaW0WwXKpL9blXE2o0bRhoL2EGXIrZxQ2ZQ4mgcfoBxdFmQold+qWsD2jLrfZ0trjKL6vOw0j//eAwcALFjKSw==",
|
| 680 |
-
"license": "MIT",
|
| 681 |
-
"dependencies": {
|
| 682 |
-
"debug": "^4.3.5",
|
| 683 |
-
"encodeurl": "^2.0.0",
|
| 684 |
-
"escape-html": "^1.0.3",
|
| 685 |
-
"etag": "^1.8.1",
|
| 686 |
-
"fresh": "^2.0.0",
|
| 687 |
-
"http-errors": "^2.0.0",
|
| 688 |
-
"mime-types": "^3.0.1",
|
| 689 |
-
"ms": "^2.1.3",
|
| 690 |
-
"on-finished": "^2.4.1",
|
| 691 |
-
"range-parser": "^1.2.1",
|
| 692 |
-
"statuses": "^2.0.1"
|
| 693 |
-
},
|
| 694 |
-
"engines": {
|
| 695 |
-
"node": ">= 18"
|
| 696 |
-
}
|
| 697 |
-
},
|
| 698 |
-
"node_modules/serve-static": {
|
| 699 |
-
"version": "2.2.0",
|
| 700 |
-
"resolved": "https://registry.npmjs.org/serve-static/-/serve-static-2.2.0.tgz",
|
| 701 |
-
"integrity": "sha512-61g9pCh0Vnh7IutZjtLGGpTA355+OPn2TyDv/6ivP2h/AdAVX9azsoxmg2/M6nZeQZNYBEwIcsne1mJd9oQItQ==",
|
| 702 |
-
"license": "MIT",
|
| 703 |
-
"dependencies": {
|
| 704 |
-
"encodeurl": "^2.0.0",
|
| 705 |
-
"escape-html": "^1.0.3",
|
| 706 |
-
"parseurl": "^1.3.3",
|
| 707 |
-
"send": "^1.2.0"
|
| 708 |
-
},
|
| 709 |
-
"engines": {
|
| 710 |
-
"node": ">= 18"
|
| 711 |
-
}
|
| 712 |
-
},
|
| 713 |
-
"node_modules/setprototypeof": {
|
| 714 |
-
"version": "1.2.0",
|
| 715 |
-
"resolved": "https://registry.npmjs.org/setprototypeof/-/setprototypeof-1.2.0.tgz",
|
| 716 |
-
"integrity": "sha512-E5LDX7Wrp85Kil5bhZv46j8jOeboKq5JMmYM3gVGdGH8xFpPWXUMsNrlODCrkoxMEeNi/XZIwuRvY4XNwYMJpw==",
|
| 717 |
-
"license": "ISC"
|
| 718 |
-
},
|
| 719 |
-
"node_modules/side-channel": {
|
| 720 |
-
"version": "1.1.0",
|
| 721 |
-
"resolved": "https://registry.npmjs.org/side-channel/-/side-channel-1.1.0.tgz",
|
| 722 |
-
"integrity": "sha512-ZX99e6tRweoUXqR+VBrslhda51Nh5MTQwou5tnUDgbtyM0dBgmhEDtWGP/xbKn6hqfPRHujUNwz5fy/wbbhnpw==",
|
| 723 |
-
"license": "MIT",
|
| 724 |
-
"dependencies": {
|
| 725 |
-
"es-errors": "^1.3.0",
|
| 726 |
-
"object-inspect": "^1.13.3",
|
| 727 |
-
"side-channel-list": "^1.0.0",
|
| 728 |
-
"side-channel-map": "^1.0.1",
|
| 729 |
-
"side-channel-weakmap": "^1.0.2"
|
| 730 |
-
},
|
| 731 |
-
"engines": {
|
| 732 |
-
"node": ">= 0.4"
|
| 733 |
-
},
|
| 734 |
-
"funding": {
|
| 735 |
-
"url": "https://github.com/sponsors/ljharb"
|
| 736 |
-
}
|
| 737 |
-
},
|
| 738 |
-
"node_modules/side-channel-list": {
|
| 739 |
-
"version": "1.0.0",
|
| 740 |
-
"resolved": "https://registry.npmjs.org/side-channel-list/-/side-channel-list-1.0.0.tgz",
|
| 741 |
-
"integrity": "sha512-FCLHtRD/gnpCiCHEiJLOwdmFP+wzCmDEkc9y7NsYxeF4u7Btsn1ZuwgwJGxImImHicJArLP4R0yX4c2KCrMrTA==",
|
| 742 |
-
"license": "MIT",
|
| 743 |
-
"dependencies": {
|
| 744 |
-
"es-errors": "^1.3.0",
|
| 745 |
-
"object-inspect": "^1.13.3"
|
| 746 |
-
},
|
| 747 |
-
"engines": {
|
| 748 |
-
"node": ">= 0.4"
|
| 749 |
-
},
|
| 750 |
-
"funding": {
|
| 751 |
-
"url": "https://github.com/sponsors/ljharb"
|
| 752 |
-
}
|
| 753 |
-
},
|
| 754 |
-
"node_modules/side-channel-map": {
|
| 755 |
-
"version": "1.0.1",
|
| 756 |
-
"resolved": "https://registry.npmjs.org/side-channel-map/-/side-channel-map-1.0.1.tgz",
|
| 757 |
-
"integrity": "sha512-VCjCNfgMsby3tTdo02nbjtM/ewra6jPHmpThenkTYh8pG9ucZ/1P8So4u4FGBek/BjpOVsDCMoLA/iuBKIFXRA==",
|
| 758 |
-
"license": "MIT",
|
| 759 |
-
"dependencies": {
|
| 760 |
-
"call-bound": "^1.0.2",
|
| 761 |
-
"es-errors": "^1.3.0",
|
| 762 |
-
"get-intrinsic": "^1.2.5",
|
| 763 |
-
"object-inspect": "^1.13.3"
|
| 764 |
-
},
|
| 765 |
-
"engines": {
|
| 766 |
-
"node": ">= 0.4"
|
| 767 |
-
},
|
| 768 |
-
"funding": {
|
| 769 |
-
"url": "https://github.com/sponsors/ljharb"
|
| 770 |
-
}
|
| 771 |
-
},
|
| 772 |
-
"node_modules/side-channel-weakmap": {
|
| 773 |
-
"version": "1.0.2",
|
| 774 |
-
"resolved": "https://registry.npmjs.org/side-channel-weakmap/-/side-channel-weakmap-1.0.2.tgz",
|
| 775 |
-
"integrity": "sha512-WPS/HvHQTYnHisLo9McqBHOJk2FkHO/tlpvldyrnem4aeQp4hai3gythswg6p01oSoTl58rcpiFAjF2br2Ak2A==",
|
| 776 |
-
"license": "MIT",
|
| 777 |
-
"dependencies": {
|
| 778 |
-
"call-bound": "^1.0.2",
|
| 779 |
-
"es-errors": "^1.3.0",
|
| 780 |
-
"get-intrinsic": "^1.2.5",
|
| 781 |
-
"object-inspect": "^1.13.3",
|
| 782 |
-
"side-channel-map": "^1.0.1"
|
| 783 |
-
},
|
| 784 |
-
"engines": {
|
| 785 |
-
"node": ">= 0.4"
|
| 786 |
-
},
|
| 787 |
-
"funding": {
|
| 788 |
-
"url": "https://github.com/sponsors/ljharb"
|
| 789 |
-
}
|
| 790 |
-
},
|
| 791 |
-
"node_modules/statuses": {
|
| 792 |
-
"version": "2.0.2",
|
| 793 |
-
"resolved": "https://registry.npmjs.org/statuses/-/statuses-2.0.2.tgz",
|
| 794 |
-
"integrity": "sha512-DvEy55V3DB7uknRo+4iOGT5fP1slR8wQohVdknigZPMpMstaKJQWhwiYBACJE3Ul2pTnATihhBYnRhZQHGBiRw==",
|
| 795 |
-
"license": "MIT",
|
| 796 |
-
"engines": {
|
| 797 |
-
"node": ">= 0.8"
|
| 798 |
-
}
|
| 799 |
-
},
|
| 800 |
-
"node_modules/toidentifier": {
|
| 801 |
-
"version": "1.0.1",
|
| 802 |
-
"resolved": "https://registry.npmjs.org/toidentifier/-/toidentifier-1.0.1.tgz",
|
| 803 |
-
"integrity": "sha512-o5sSPKEkg/DIQNmH43V0/uerLrpzVedkUh8tGNvaeXpfpuwjKenlSox/2O/BTlZUtEe+JG7s5YhEz608PlAHRA==",
|
| 804 |
-
"license": "MIT",
|
| 805 |
-
"engines": {
|
| 806 |
-
"node": ">=0.6"
|
| 807 |
-
}
|
| 808 |
-
},
|
| 809 |
-
"node_modules/type-is": {
|
| 810 |
-
"version": "2.0.1",
|
| 811 |
-
"resolved": "https://registry.npmjs.org/type-is/-/type-is-2.0.1.tgz",
|
| 812 |
-
"integrity": "sha512-OZs6gsjF4vMp32qrCbiVSkrFmXtG/AZhY3t0iAMrMBiAZyV9oALtXO8hsrHbMXF9x6L3grlFuwW2oAz7cav+Gw==",
|
| 813 |
-
"license": "MIT",
|
| 814 |
-
"dependencies": {
|
| 815 |
-
"content-type": "^1.0.5",
|
| 816 |
-
"media-typer": "^1.1.0",
|
| 817 |
-
"mime-types": "^3.0.0"
|
| 818 |
-
},
|
| 819 |
-
"engines": {
|
| 820 |
-
"node": ">= 0.6"
|
| 821 |
-
}
|
| 822 |
-
},
|
| 823 |
-
"node_modules/typescript": {
|
| 824 |
-
"version": "5.9.3",
|
| 825 |
-
"resolved": "https://registry.npmjs.org/typescript/-/typescript-5.9.3.tgz",
|
| 826 |
-
"integrity": "sha512-jl1vZzPDinLr9eUt3J/t7V6FgNEw9QjvBPdysz9KfQDD41fQrC2Y4vKQdiaUpFT4bXlb1RHhLpp8wtm6M5TgSw==",
|
| 827 |
-
"license": "Apache-2.0",
|
| 828 |
-
"bin": {
|
| 829 |
-
"tsc": "bin/tsc",
|
| 830 |
-
"tsserver": "bin/tsserver"
|
| 831 |
-
},
|
| 832 |
-
"engines": {
|
| 833 |
-
"node": ">=14.17"
|
| 834 |
-
}
|
| 835 |
-
},
|
| 836 |
-
"node_modules/unpipe": {
|
| 837 |
-
"version": "1.0.0",
|
| 838 |
-
"resolved": "https://registry.npmjs.org/unpipe/-/unpipe-1.0.0.tgz",
|
| 839 |
-
"integrity": "sha512-pjy2bYhSsufwWlKwPc+l3cN7+wuJlK6uz0YdJEOlQDbl6jo/YlPi4mb8agUkVC8BF7V8NuzeyPNqRksA3hztKQ==",
|
| 840 |
-
"license": "MIT",
|
| 841 |
-
"engines": {
|
| 842 |
-
"node": ">= 0.8"
|
| 843 |
-
}
|
| 844 |
-
},
|
| 845 |
-
"node_modules/vary": {
|
| 846 |
-
"version": "1.1.2",
|
| 847 |
-
"resolved": "https://registry.npmjs.org/vary/-/vary-1.1.2.tgz",
|
| 848 |
-
"integrity": "sha512-BNGbWLfd0eUPabhkXUVm0j8uuvREyTh5ovRa/dyow/BqAbZJyC+5fU+IzQOzmAKzYqYRAISoRhdQr3eIZ/PXqg==",
|
| 849 |
-
"license": "MIT",
|
| 850 |
-
"engines": {
|
| 851 |
-
"node": ">= 0.8"
|
| 852 |
-
}
|
| 853 |
-
},
|
| 854 |
-
"node_modules/wrappy": {
|
| 855 |
-
"version": "1.0.2",
|
| 856 |
-
"resolved": "https://registry.npmjs.org/wrappy/-/wrappy-1.0.2.tgz",
|
| 857 |
-
"integrity": "sha512-l4Sp/DRseor9wL6EvV2+TuQn63dMkPjZ/sp9XkghTEbV9KlPS1xUsZ3u7/IQO4wxtcFB4bgpQPRcR3QCvezPcQ==",
|
| 858 |
-
"license": "ISC"
|
| 859 |
-
}
|
| 860 |
-
}
|
| 861 |
-
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
package.json
DELETED
|
@@ -1,19 +0,0 @@
|
|
| 1 |
-
{
|
| 2 |
-
"name": "warbler-cda",
|
| 3 |
-
"version": "1.0.0",
|
| 4 |
-
"description": "--- title: Warbler CDA RAG System emoji: 🦜 colorFrom: blue colorTo: purple sdk: gradio sdk_version: 5.49.1 app_file: app.py pinned: false license: mit tags: - rag - retrieval - semantic-search - stat7 - embeddings - nlp ---",
|
| 5 |
-
"main": "index.js",
|
| 6 |
-
"directories": {
|
| 7 |
-
"test": "tests"
|
| 8 |
-
},
|
| 9 |
-
"scripts": {
|
| 10 |
-
"test": "echo \"Error: no test specified\" && exit 1"
|
| 11 |
-
},
|
| 12 |
-
"keywords": [],
|
| 13 |
-
"author": "",
|
| 14 |
-
"license": "ISC",
|
| 15 |
-
"dependencies": {
|
| 16 |
-
"express": "^5.1.0",
|
| 17 |
-
"typescript": "^5.9.3"
|
| 18 |
-
}
|
| 19 |
-
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
packs/warbler-pack-hf-arxiv/package.json
CHANGED
|
@@ -2,14 +2,14 @@
|
|
| 2 |
"name": "warbler-pack-hf-arxiv",
|
| 3 |
"version": "1.0.0",
|
| 4 |
"description": "Warbler pack generated from HuggingFace datasets (chunked)",
|
| 5 |
-
"created_at": "2025-
|
| 6 |
"document_count": 2549619,
|
| 7 |
"source": "HuggingFace",
|
| 8 |
"content_types": [
|
| 9 |
"scholarly_discussion"
|
| 10 |
],
|
| 11 |
"chunked": true,
|
| 12 |
-
"chunk_count":
|
| 13 |
-
"docs_per_chunk":
|
| 14 |
-
"chunk_pattern": "warbler-pack-hf-arxiv-chunk
|
| 15 |
}
|
|
|
|
| 2 |
"name": "warbler-pack-hf-arxiv",
|
| 3 |
"version": "1.0.0",
|
| 4 |
"description": "Warbler pack generated from HuggingFace datasets (chunked)",
|
| 5 |
+
"created_at": "2025-12-02T10:48:41.412949",
|
| 6 |
"document_count": 2549619,
|
| 7 |
"source": "HuggingFace",
|
| 8 |
"content_types": [
|
| 9 |
"scholarly_discussion"
|
| 10 |
],
|
| 11 |
"chunked": true,
|
| 12 |
+
"chunk_count": 51,
|
| 13 |
+
"docs_per_chunk": 50000,
|
| 14 |
+
"chunk_pattern": "warbler-pack-hf-arxiv-chunk-*.jsonl"
|
| 15 |
}
|
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-001_compressed.jsonl
DELETED
|
The diff for this file is too large to render.
See raw diff
|
|
|
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-002_compressed.jsonl
DELETED
|
The diff for this file is too large to render.
See raw diff
|
|
|
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-003_compressed.jsonl
DELETED
|
The diff for this file is too large to render.
See raw diff
|
|
|
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-004_compressed.jsonl
DELETED
|
The diff for this file is too large to render.
See raw diff
|
|
|
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-005_compressed.jsonl
DELETED
|
The diff for this file is too large to render.
See raw diff
|
|
|
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-006_compressed.jsonl
DELETED
|
The diff for this file is too large to render.
See raw diff
|
|
|
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-007_compressed.jsonl
DELETED
|
The diff for this file is too large to render.
See raw diff
|
|
|
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-008_compressed.jsonl
DELETED
|
The diff for this file is too large to render.
See raw diff
|
|
|
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-009_compressed.jsonl
DELETED
|
The diff for this file is too large to render.
See raw diff
|
|
|
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-010_compressed.jsonl
DELETED
|
The diff for this file is too large to render.
See raw diff
|
|
|
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-011_compressed.jsonl
DELETED
|
The diff for this file is too large to render.
See raw diff
|
|
|
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-012_compressed.jsonl
DELETED
|
The diff for this file is too large to render.
See raw diff
|
|
|
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-013_compressed.jsonl
DELETED
|
The diff for this file is too large to render.
See raw diff
|
|
|
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-014_compressed.jsonl
DELETED
|
The diff for this file is too large to render.
See raw diff
|
|
|
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-015_compressed.jsonl
DELETED
|
The diff for this file is too large to render.
See raw diff
|
|
|
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-016_compressed.jsonl
DELETED
|
The diff for this file is too large to render.
See raw diff
|
|
|
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-017_compressed.jsonl
DELETED
|
The diff for this file is too large to render.
See raw diff
|
|
|
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-018_compressed.jsonl
DELETED
|
The diff for this file is too large to render.
See raw diff
|
|
|
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-019_compressed.jsonl
DELETED
|
The diff for this file is too large to render.
See raw diff
|
|
|
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-020_compressed.jsonl
DELETED
|
The diff for this file is too large to render.
See raw diff
|
|
|
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-021_compressed.jsonl
DELETED
|
The diff for this file is too large to render.
See raw diff
|
|
|
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-022_compressed.jsonl
DELETED
|
The diff for this file is too large to render.
See raw diff
|
|
|
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-023_compressed.jsonl
DELETED
|
The diff for this file is too large to render.
See raw diff
|
|
|
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-024_compressed.jsonl
DELETED
|
The diff for this file is too large to render.
See raw diff
|
|
|
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-025_compressed.jsonl
DELETED
|
The diff for this file is too large to render.
See raw diff
|
|
|
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-026_compressed.jsonl
DELETED
|
The diff for this file is too large to render.
See raw diff
|
|
|
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-027_compressed.jsonl
DELETED
|
The diff for this file is too large to render.
See raw diff
|
|
|
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-028_compressed.jsonl
DELETED
|
The diff for this file is too large to render.
See raw diff
|
|
|
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-029_compressed.jsonl
DELETED
|
The diff for this file is too large to render.
See raw diff
|
|
|