there-is-already-a-branch

#1
by Bellok - opened
This view is limited to 50 files because it contains too many changes.  See the raw diff here.
Files changed (50) hide show
  1. .gitignore +1 -1
  2. PACKAGE_MANIFEST.md +0 -94
  3. PACKS_DEPLOYMENT.md +0 -281
  4. PACK_CACHING.md +0 -172
  5. PACK_INGESTION_FIX.md +0 -209
  6. PDF_INGESTION_INVESTIGATION.md +0 -325
  7. README_HF.md +1 -1
  8. TESTS_PORTED.md +0 -271
  9. TEST_RESULTS.md +0 -211
  10. TODO.md +0 -30
  11. app.py +51 -15
  12. compress_packs.py +0 -134
  13. convert_to_jsonl.py +0 -37
  14. copy_packs.sh +0 -45
  15. coverage.xml +0 -0
  16. final_fix.py +0 -28
  17. fix_theme.py +0 -15
  18. load_warbler_packs_current.txt +0 -259
  19. package-lock.json +0 -861
  20. package.json +0 -19
  21. packs/warbler-pack-hf-arxiv/package.json +4 -4
  22. packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-001_compressed.jsonl +0 -0
  23. packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-002_compressed.jsonl +0 -0
  24. packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-003_compressed.jsonl +0 -0
  25. packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-004_compressed.jsonl +0 -0
  26. packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-005_compressed.jsonl +0 -0
  27. packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-006_compressed.jsonl +0 -0
  28. packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-007_compressed.jsonl +0 -0
  29. packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-008_compressed.jsonl +0 -0
  30. packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-009_compressed.jsonl +0 -0
  31. packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-010_compressed.jsonl +0 -0
  32. packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-011_compressed.jsonl +0 -0
  33. packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-012_compressed.jsonl +0 -0
  34. packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-013_compressed.jsonl +0 -0
  35. packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-014_compressed.jsonl +0 -0
  36. packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-015_compressed.jsonl +0 -0
  37. packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-016_compressed.jsonl +0 -0
  38. packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-017_compressed.jsonl +0 -0
  39. packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-018_compressed.jsonl +0 -0
  40. packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-019_compressed.jsonl +0 -0
  41. packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-020_compressed.jsonl +0 -0
  42. packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-021_compressed.jsonl +0 -0
  43. packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-022_compressed.jsonl +0 -0
  44. packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-023_compressed.jsonl +0 -0
  45. packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-024_compressed.jsonl +0 -0
  46. packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-025_compressed.jsonl +0 -0
  47. packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-026_compressed.jsonl +0 -0
  48. packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-027_compressed.jsonl +0 -0
  49. packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-028_compressed.jsonl +0 -0
  50. packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-029_compressed.jsonl +0 -0
.gitignore CHANGED
@@ -47,7 +47,7 @@ results/
47
 
48
  # HuggingFace language packs (downloaded on-demand)
49
  # Exclude all HF packs to keep deployment size under 1GB
50
- packs/warbler-pack-hf-arxiv/
51
  packs/warbler-pack-hf-enterprise/
52
  packs/warbler-pack-hf-edustories/
53
  packs/warbler-pack-hf-manuals/
 
47
 
48
  # HuggingFace language packs (downloaded on-demand)
49
  # Exclude all HF packs to keep deployment size under 1GB
50
+ packs/warbler-pack-hf-arxiv/*chunk*.jsonl
51
  packs/warbler-pack-hf-enterprise/
52
  packs/warbler-pack-hf-edustories/
53
  packs/warbler-pack-hf-manuals/
PACKAGE_MANIFEST.md DELETED
@@ -1,94 +0,0 @@
1
- # Warbler CDA Package - Complete File List
2
-
3
- ## Package Structure (21 core files + infrastructure)
4
-
5
- ### Core RAG System (9 files)
6
-
7
- ✓ warbler_cda/retrieval_api.py - Main RAG API with hybrid scoring
8
- ✓ warbler_cda/semantic_anchors.py - Semantic memory with provenance
9
- ✓ warbler_cda/anchor_data_classes.py - Core data structures
10
- ✓ warbler_cda/anchor_memory_pool.py - Performance optimization
11
- ✓ warbler_cda/summarization_ladder.py - Hierarchical compression
12
- ✓ warbler_cda/conflict_detector.py - Conflict detection
13
- ✓ warbler_cda/castle_graph.py - Concept extraction
14
- ✓ warbler_cda/melt_layer.py - Memory consolidation
15
- ✓ warbler_cda/evaporation.py - Content distillation
16
-
17
- ### FractalStat System (4 files)
18
-
19
- ✓ warbler_cda/fractalstat_rag_bridge.py - FractalStat hybrid scoring bridge
20
- ✓ warbler_cda/fractalstat_entity.py - FractalStat entity system
21
- ✓ warbler_cda/fractalstat_experiments.py - Validation experiments
22
- ✓ warbler_cda/fractalstat_visualization.py - Visualization tools
23
-
24
- ### Embeddings (4 files)
25
-
26
- ✓ warbler_cda/embeddings/__init__.py
27
- ✓ warbler_cda/embeddings/base_provider.py - Abstract interface
28
- ✓ warbler_cda/embeddings/factory.py - Provider factory
29
- ✓ warbler_cda/embeddings/local_provider.py - Local TF-IDF embeddings
30
- ✓ warbler_cda/embeddings/openai_provider.py - OpenAI embeddings
31
-
32
- ### Production API (2 files)
33
-
34
- ✓ warbler_cda/api/__init__.py
35
- ✓ warbler_cda/api/service.py - FastAPI service (exp09_api_service.py)
36
- ✓ warbler_cda/api/cli.py - CLI interface (exp09_cli.py)
37
-
38
- ### Utilities (2 files)
39
-
40
- ✓ warbler_cda/utils/__init__.py
41
- ✓ warbler_cda/utils/load_warbler_packs.py - Pack loader
42
- ✓ warbler_cda/utils/hf_warbler_ingest.py - HF dataset ingestion
43
-
44
- ### Infrastructure Files
45
-
46
- ✓ warbler_cda/__init__.py - Package initialization
47
- ✓ requirements.txt - Dependencies
48
- ✓ pyproject.toml - Package metadata
49
- ✓ README.md - Documentation
50
- ✓ app.py - Gradio demo for HuggingFace
51
- ✓ .gitignore - Git exclusions
52
- ✓ LICENSE - MIT License
53
- ✓ DEPLOYMENT.md - Deployment guide
54
- ✓ README_HF.md - HuggingFace Space config
55
- ✓ setup.sh - Quick setup script
56
- ✓ transform_imports.sh - Import transformation script
57
-
58
- ## Total Files: 32 files
59
-
60
- ## Import Transformations Applied
61
-
62
- All imports have been transformed from:
63
-
64
- - `from seed.engine.X import Y` → `from warbler_cda.X import Y`
65
- - `from .X import Y` → `from warbler_cda.X import Y`
66
-
67
- Privacy hooks have been removed (not needed for HuggingFace deployment).
68
-
69
- ## Size Estimate
70
-
71
- Total package size: ~500KB (source code only)
72
- With dependencies: ~2GB (includes PyTorch, Transformers, etc.)
73
-
74
- ## Next Steps
75
-
76
- 1. Test the package locally:
77
-
78
- ```bash
79
- cd warbler-cda-package
80
- ./setup.sh
81
- python app.py
82
- ```
83
-
84
- 2. Deploy to HuggingFace:
85
- - Set HF_TOKEN in GitLab CI/CD variables
86
- - Push to main or create a tag
87
- - Pipeline will auto-sync to HuggingFace Space
88
-
89
- 3. Publish to PyPI (optional):
90
-
91
- ```bash
92
- python -m build
93
- twine upload dist/*
94
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
PACKS_DEPLOYMENT.md DELETED
@@ -1,281 +0,0 @@
1
- # Warbler Packs Deployment Guide
2
-
3
- This guide explains how Warbler packs are loaded and deployed to HuggingFace Spaces.
4
-
5
- ## Overview
6
-
7
- The Warbler CDA Space automatically discovers and ingests content packs at startup. Packs contain conversation templates, NPC dialogues, wisdom templates, and other domain-specific content for the RAG system.
8
-
9
- ## Pack Structure
10
-
11
- ```none
12
- packs/
13
- ├── warbler-pack-core/ # Essential conversation templates
14
- ├── warbler-pack-faction-politics/ # Political dialogue templates
15
- ├── warbler-pack-wisdom-scrolls/ # Development wisdom generation
16
- └── warbler-pack-hf-npc-dialogue/ # 1,900+ NPC dialogues from HuggingFace
17
- ```
18
-
19
- ## Deployment Process
20
-
21
- ### 1. Local Development
22
-
23
- Copy packs from the main repository to warbler-cda-package:
24
-
25
- ```bash
26
- cd warbler-cda-package
27
- bash copy_packs.sh
28
- ```
29
-
30
- This script copies all packs from:
31
-
32
- ```path
33
- ../packages/com.twg.the-seed/The Living Dev Agent/packs/
34
- ```
35
-
36
- To:
37
-
38
- ```path
39
- ./packs/
40
- ```
41
-
42
- ### 2. Automatic Loading
43
-
44
- When `app.py` starts, it:
45
-
46
- 1. **Initializes PackLoader**
47
-
48
- ```python
49
- pack_loader = PackLoader()
50
- ```
51
-
52
- 2. **Discovers documents from all packs**
53
-
54
- ```python
55
- pack_docs = pack_loader.discover_documents()
56
- ```
57
-
58
- 3. **Ingests documents into RetrievalAPI**
59
-
60
- ```python
61
- for doc in pack_docs:
62
- api.add_document(doc["id"], doc["content"], doc["metadata"])
63
- ```
64
-
65
- 4. **Falls back to sample documents** if packs not found
66
- - Ensures demo works even without packs
67
- - Provides example data for testing
68
-
69
- ### 3. HuggingFace Space Deployment
70
-
71
- The `.gitlab-ci.yml` handles deployment:
72
-
73
- ```bash
74
- hf upload-large-folder $SPACE_NAME . --repo-type=space --space-sdk=gradio
75
- ```
76
-
77
- This uploads:
78
-
79
- - All Python source code
80
- - All packs in the `packs/` directory
81
- - Configuration files
82
-
83
- **Important**: The `packs/` directory must exist and contain pack data before deployment.
84
-
85
- ## Pack Loader Details
86
-
87
- The `PackLoader` class (`warbler_cda/pack_loader.py`) handles:
88
-
89
- ### Pack Discovery
90
-
91
- - Scans the `packs/` directory
92
- - Identifies pack type (JSONL-based or structured)
93
- - Discovers all documents
94
-
95
- ### Document Parsing
96
-
97
- - **Structured Packs** (core, faction, wisdom): Load from `pack/templates.json`
98
- - **JSONL Packs** (HF NPC dialogue): Parse line-by-line JSONL format
99
-
100
- ### Metadata Extraction
101
-
102
- ```python
103
- {
104
- "pack": "pack-name",
105
- "type": "template|dialogue",
106
- "realm_type": "wisdom|faction|narrative",
107
- "realm_label": "pack-label",
108
- "lifecycle_stage": "emergence|peak",
109
- "activity_level": 0.7-0.8
110
- }
111
- ```
112
-
113
- ## Adding New Packs
114
-
115
- To add a new pack to the system:
116
-
117
- ### 1. Create Pack Structure
118
-
119
- ```bash
120
- packs/
121
- └── warbler-pack-mypack/
122
- ├── package.json
123
- ├── pack/
124
- │ └── templates.json # OR
125
- └── mypack.jsonl # JSONL format
126
- ```
127
-
128
- ### 2. Update Pack Loader (if needed)
129
-
130
- If your pack format is different, add handling to `pack_loader.py`:
131
-
132
- ```python
133
- def _load_pack(self, pack_dir: Path, pack_name: str):
134
- if "mypack" in pack_name:
135
- return self._load_my_format(pack_dir, pack_name)
136
- # ... existing logic
137
- ```
138
-
139
- ### 3. Register in copy_packs.sh
140
-
141
- ```bash
142
- PACKS=(
143
- "warbler-pack-core"
144
- "warbler-pack-mypack" # Add here
145
- )
146
- ```
147
-
148
- ### 4. Deploy
149
-
150
- Run copy script and deploy:
151
-
152
- ```bash
153
- bash copy_packs.sh
154
- # Commit and push to trigger CI/CD
155
- ```
156
-
157
- ## Document Format
158
-
159
- Each loaded document follows this structure:
160
-
161
- ```python
162
- {
163
- "id": "pack-name/document-id",
164
- "content": "Document text content...",
165
- "metadata": {
166
- "pack": "pack-name",
167
- "type": "template|dialogue",
168
- "realm_type": "wisdom|faction|narrative",
169
- "realm_label": "label",
170
- "lifecycle_stage": "emergence|peak|crystallization",
171
- "activity_level": 0.5-0.8
172
- }
173
- }
174
- ```
175
-
176
- ## Monitoring
177
-
178
- Check pack loading in Space logs:
179
-
180
- ```log
181
- ✓ Loaded 1915 documents from warbler-pack-hf-npc-dialogue
182
- ✓ Loaded 6 documents from warbler-pack-wisdom-scrolls
183
- ✓ Loaded 15 documents from warbler-pack-faction-politics
184
- ✓ Loaded 10 documents from warbler-pack-core
185
- ```
186
-
187
- Or if packs not found:
188
-
189
- ```log
190
- ⚠️ No Warbler packs found. Using sample documents instead.
191
- ```
192
-
193
- ## Publishing to HuggingFace Hub
194
-
195
- Each pack has a dataset card for publication:
196
-
197
- - **README_HF_DATASET.md** - HuggingFace dataset card
198
- - Contains metadata, attribution, and usage instructions
199
-
200
- Publish to HuggingFace:
201
-
202
- ```bash
203
- # Create repo on HuggingFace Hub (one per pack)
204
- huggingface-cli repo create warbler-pack-core
205
-
206
- # Push pack as dataset
207
- cd packs/warbler-pack-core
208
- huggingface-cli upload . tiny-walnut-games/warbler-pack-core --repo-type dataset
209
- ```
210
-
211
- ## Performance Considerations
212
-
213
- ### Load Time
214
-
215
- - PackLoader loads all packs at startup
216
- - Currently: ~1-2 seconds for all packs
217
- - Packs are cached in memory for query performance
218
-
219
- ### Storage
220
-
221
- - Core pack: ~50KB
222
- - Faction politics pack: ~80KB
223
- - Wisdom scrolls pack: ~60KB
224
- - HF NPC dialogue: ~2MB
225
- - **Total**: ~2.3MB
226
-
227
- ### Scaling
228
-
229
- For larger deployments:
230
-
231
- - Lazy-load individual packs on demand
232
- - Implement pack caching layer
233
- - Use database for large pack collections
234
-
235
- ## Troubleshooting
236
-
237
- ### Packs not loading
238
-
239
- Check that `packs/` directory exists:
240
-
241
- ```bash
242
- ls -la packs/
243
- ```
244
-
245
- Verify pack structure:
246
-
247
- ```bash
248
- ls -la packs/warbler-pack-core/
249
- ```
250
-
251
- ### Sample documents showing instead
252
-
253
- If you see "No Warbler packs found", the `packs/` directory is empty. Run:
254
-
255
- ```bash
256
- bash copy_packs.sh
257
- ```
258
-
259
- ### Pack loader errors
260
-
261
- Check logs for parsing errors:
262
-
263
- ```log
264
- Error loading JSONL pack: ...
265
- Error parsing line 42 in warbler-pack-hf-npc-dialogue.jsonl: ...
266
- ```
267
-
268
- Fix the source pack and re-run `copy_packs.sh`.
269
-
270
- ## Related Documentation
271
-
272
- - [README.md](./README.md) - Main package documentation
273
- - [DEPLOYMENT.md](./DEPLOYMENT.md) - General deployment guide
274
- - [app.py](./app.py) - Application startup and pack initialization
275
- - [warbler_cda/pack_loader.py](./warbler_cda/pack_loader.py) - Pack loading implementation
276
-
277
- ## License
278
-
279
- All packs use MIT License. See individual pack LICENSE files for details.
280
-
281
- Attribution: Warbler CDA - Tiny Walnut Games
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
PACK_CACHING.md DELETED
@@ -1,172 +0,0 @@
1
- # Warbler Pack Caching Strategy
2
-
3
- ## Overview
4
-
5
- The app now implements intelligent pack caching to avoid unnecessary re-ingestion of large datasets. This minimizes GitLab storage requirements and allows fast session startup.
6
-
7
- ## How It Works
8
-
9
- ### First Run (Session Start)
10
-
11
- 1. **PackManager** initializes and checks for cached metadata
12
- 2. **Health check** verifies if documents are already in the context store
13
- 3. **Ingestion** occurs only if:
14
- - No cache metadata exists
15
- - Pack count changed
16
- - Health check fails (documents missing)
17
- 4. **Cache** is saved with timestamp and document count
18
-
19
- ### Subsequent Runs
20
-
21
- - Reuses cached documents without re-ingestion
22
- - Quick health check ensures documents are still valid
23
- - Fallback to sample docs if packs unavailable
24
-
25
- ## Environment Variables
26
-
27
- Control pack ingestion behavior with these variables:
28
-
29
- ### `WARBLER_INGEST_PACKS` (default: `true`)
30
-
31
- Enable/disable automatic pack ingestion.
32
-
33
- ```bash
34
- export WARBLER_INGEST_PACKS=false
35
- ```
36
-
37
- ### `WARBLER_SAMPLE_ONLY` (default: `false`)
38
-
39
- Load only sample documents (for CI/CD verification).
40
-
41
- ```bash
42
- export WARBLER_SAMPLE_ONLY=true
43
- ```
44
-
45
- Best for:
46
-
47
- - PyPI package CI/CD pipelines
48
- - Quick verification that ingestion works
49
- - Minimal startup time in restricted environments
50
-
51
- ### `WARBLER_SKIP_PACK_CACHE` (default: `false`)
52
-
53
- Force reingest even if cache exists.
54
-
55
- ```bash
56
- export WARBLER_SKIP_PACK_CACHE=true
57
- ```
58
-
59
- Best for:
60
-
61
- - Testing pack ingestion pipeline
62
- - Updating stale cache
63
- - Debugging
64
-
65
- ## Cache Location
66
-
67
- Default cache stored at:
68
-
69
- ```path
70
- ~/.warbler_cda/cache/pack_metadata.json
71
- ```
72
-
73
- Metadata includes:
74
-
75
- ```json
76
- {
77
- "ingested_at": 1699564800,
78
- "pack_count": 7,
79
- "doc_count": 12345,
80
- "status": "healthy"
81
- }
82
- ```
83
-
84
- ## CI/CD Optimization
85
-
86
- ### For GitLab CI (Minimal PyPI Package)
87
-
88
- ```yaml
89
- test:
90
- script:
91
- - export WARBLER_SAMPLE_ONLY=true
92
- - pip install .
93
- - python -m pytest tests/
94
- ```
95
-
96
- Benefits:
97
-
98
- - ✅ No large pack files in repository
99
- - ✅ Fast CI runs (5 samples vs 2.5M docs)
100
- - ✅ Verifies ingestion code works
101
- - ✅ Full packs load on first user session
102
-
103
- ### For Local Development
104
-
105
- Keep full packs in working directory:
106
-
107
- ```bash
108
- cd warbler-cda-package
109
- python -m warbler_cda.utils.hf_warbler_ingest ingest -d all
110
- python app.py
111
- ```
112
-
113
- First run ingests all packs. Subsequent runs use cache.
114
-
115
- ### For Gradio Space/Cloud Deployment
116
-
117
- Set environment at deployment:
118
-
119
- ```bash
120
- WARBLER_INGEST_PACKS=true
121
- ```
122
-
123
- Packs ingest once per session, then cached in instance memory.
124
-
125
- ## Files Affected
126
-
127
- - `app.py` - Main Gradio app with PackManager
128
- - `warbler_cda/utils/load_warbler_packs.py` - Pack discovery (already handles caching)
129
- - No changes needed to pack ingestion scripts
130
-
131
- ## Performance Impact
132
-
133
- ### Memory
134
-
135
- - **With packs**: ~500MB (2.5M arxiv docs + others)
136
- - **With samples**: ~1MB (5 test documents)
137
-
138
- ### Startup Time
139
-
140
- - **First run**: ~30-60 seconds (ingest packs)
141
- - **Cached run**: ~2-5 seconds (health check only)
142
- - **Sample only**: <1 second
143
-
144
- ## Troubleshooting
145
-
146
- ### Packs not loading?
147
-
148
- 1. Check `WARBLER_INGEST_PACKS=true` (default)
149
- 2. Verify packs exist: `ls -la packs/`
150
- 3. Force reingest: `export WARBLER_SKIP_PACK_CACHE=true`
151
-
152
- ### Cache corrupted?
153
-
154
- ```bash
155
- rm -rf ~/.warbler_cda/cache/pack_metadata.json
156
- ```
157
-
158
- Will reingest on next run.
159
-
160
- ### Need sample docs only?
161
-
162
- ```bash
163
- export WARBLER_SAMPLE_ONLY=true
164
- python app.py
165
- ```
166
-
167
- ## Future Improvements
168
-
169
- - [ ] Detect pack updates via file hash instead of just count
170
- - [ ] Selective pack loading (choose which datasets to cache)
171
- - [ ] Metrics dashboard showing cache hit/miss rates
172
- - [ ] Automatic cache expiration after N days
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
PACK_INGESTION_FIX.md DELETED
@@ -1,209 +0,0 @@
1
- # Pack Ingestion Fix for HuggingFace Space
2
-
3
- ## Problem Summary
4
-
5
- Your HuggingFace Space was experiencing three critical errors during pack ingestion:
6
-
7
- 1. ❌ **Core pack missing JSONL**: `warbler-pack-core missing JSONL file`
8
- 2. ❌ **Faction pack missing JSONL**: `warbler-pack-faction-politics missing JSONL file`
9
- 3. ❌ **Corrupted arxiv data**: `Error parsing line 145077 in warbler-pack-hf-arxiv.jsonl: Unterminated string`
10
-
11
- ## Root Causes Identified
12
-
13
- ### Issue 1 & 2: Different Pack Formats
14
-
15
- Your project has **two different pack formats**:
16
-
17
- **Format A: Structured Packs** (Core & Faction)
18
-
19
- ```none
20
- warbler-pack-core/
21
- ├── package.json
22
- ├── pack/
23
- │ └── templates.json ← Data is here!
24
- └── src/
25
- ```
26
-
27
- **Format B: JSONL Packs** (HuggingFace datasets)
28
-
29
- ```none
30
- warbler-pack-hf-arxiv/
31
- ├── package.json
32
- └── warbler-pack-hf-arxiv-chunk-001.jsonl ← Data is here!
33
- ```
34
-
35
- The pack loader was expecting **all** packs to have JSONL files, causing false warnings for the structured packs.
36
-
37
- ### Issue 3: Corrupted JSON Line
38
-
39
- The arxiv pack has a malformed JSON entry at line 145077:
40
-
41
- ```json
42
- {"content": "This is a test with an unterminated string...
43
- ```
44
-
45
- The previous code would **crash** on the first error, preventing the entire ingestion from completing.
46
-
47
- ## Solution Implemented
48
-
49
- ### 1. Enhanced Pack Format Detection
50
-
51
- Updated `_is_valid_warbler_pack()` to recognize **three valid formats**:
52
-
53
- ```python
54
- if jsonl_file.exists():
55
- return True # Format B: Single JSONL file
56
- else:
57
- templates_file = pack_dir / "pack" / "templates.json"
58
- if templates_file.exists():
59
- return False # Format A: Structured pack (triggers different loader)
60
- else:
61
- if pack_name.startswith("warbler-pack-hf-"):
62
- logger.warning(f"HF pack missing JSONL") # Only warn for HF packs
63
- return False
64
- ```
65
-
66
- ### 2. Robust Error Handling
67
-
68
- Updated `_load_jsonl_file()` to **continue on error**:
69
-
70
- ```python
71
- try:
72
- entry = json.loads(line)
73
- documents.append(doc)
74
- except json.JSONDecodeError as e:
75
- error_count += 1
76
- if error_count <= 5: # Only log first 5 errors
77
- logger.warning(f"Error parsing line {line_num}: {e}")
78
- continue # ← Skip bad line, keep processing!
79
- ```
80
-
81
- ## What Changed
82
-
83
- **File: `warbler-cda-package/warbler_cda/pack_loader.py`**
84
-
85
- ### Change 1: Smarter Validation
86
-
87
- - ✅ Recognizes structured packs as valid
88
- - ✅ Only warns about missing JSONL for HF packs
89
- - ✅ Better logging messages
90
-
91
- ### Change 2: Error Recovery
92
-
93
- - ✅ Skips corrupted JSON lines
94
- - ✅ Limits error logging to first 5 occurrences
95
- - ✅ Reports summary: "Loaded X documents (Y lines skipped)"
96
-
97
- ## Expected Behavior After Fix
98
-
99
- ### Before (Broken)
100
-
101
- ```none
102
- [INFO] Pack Status: ✓ All 6 packs verified and ready
103
- Single-file pack warbler-pack-core missing JSONL file: /home/user/app/packs/warbler-pack-core/warbler-pack-core.jsonl
104
- Single-file pack warbler-pack-faction-politics missing JSONL file: /home/user/app/packs/warbler-pack-faction-politics/warbler-pack-faction-politics.jsonl
105
- Error parsing line 145077 in /home/user/app/packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv.jsonl: Unterminated string
106
- [INFO] Ingesting 374869 documents from Warbler packs...
107
- [ERROR] Ingestion failed!
108
- ```
109
-
110
- ### After (Fixed)
111
-
112
- ```none
113
- [INFO] Pack Status: ✓ All 10 packs verified and ready
114
- [INFO] Ingesting documents from Warbler packs...
115
- [INFO] Loading pack: warbler-pack-core
116
- [DEBUG] Pack warbler-pack-core uses structured format (pack/templates.json)
117
- [INFO] ✓ Loaded 8 documents from warbler-pack-core
118
- [INFO] Loading pack: warbler-pack-faction-politics
119
- [DEBUG] Pack warbler-pack-faction-politics uses structured format (pack/templates.json)
120
- [INFO] ✓ Loaded 6 documents from warbler-pack-faction-politics
121
- [INFO] Loading pack: warbler-pack-hf-arxiv
122
- [INFO] Loading chunked pack: warbler-pack-hf-arxiv
123
- [INFO] Found 5 chunk files for warbler-pack-hf-arxiv
124
- [WARN] Error parsing line 145077 in warbler-pack-hf-arxiv-chunk-003.jsonl: Unterminated string
125
- [INFO] Loaded 49999 documents from warbler-pack-hf-arxiv-chunk-003.jsonl (1 lines skipped due to errors)
126
- [INFO] Loaded 250000 total documents from 5 chunks
127
- ...
128
- [OK] Loaded 374868 documents from Warbler packs (1 corrupted line skipped)
129
- ```
130
-
131
- ## Testing the Fix
132
-
133
- ### Local Testing
134
-
135
- 1. **Test with sample packs**:
136
-
137
- ```bash
138
- cd warbler-cda-package
139
- python -c "from warbler_cda.pack_loader import PackLoader; loader = PackLoader(); docs = loader.discover_documents(); print(f'Loaded {len(docs)} documents')"
140
- ```
141
-
142
- 2. **Run the app locally**:
143
-
144
- ```bash
145
- python app.py
146
- ```
147
-
148
- ### HuggingFace Space Testing
149
-
150
- 1. **Merge this MR** to main branch
151
- 2. **Push to HuggingFace** (if auto-sync is not enabled)
152
- 3. **Check the Space logs** for the new output format
153
- 4. **Verify document count** in the System Stats tab
154
-
155
- ## Next Steps
156
-
157
- 1. ✅ **Review the MR**: [!15 - Fix HuggingFace pack ingestion issues](https://gitlab.com/tiny-walnut-games/the-seed/-/merge_requests/15)
158
-
159
- 2. ✅ **Merge when ready**: The fix is backward compatible and safe to merge
160
-
161
- 3. ✅ **Monitor HF Space**: After deployment, check that:
162
- - All packs load successfully
163
- - Document count is ~374,868 (minus 1 corrupted line)
164
- - No error messages in logs
165
-
166
- 4. 🔧 **Optional: Fix corrupted line** (future improvement):
167
- - Identify the exact corrupted entry in arxiv chunk 3
168
- - Re-generate that chunk from source dataset
169
- - Update the pack
170
-
171
- ## Additional Notes
172
-
173
- ### Why Not Fix the Corrupted Line Now?
174
-
175
- The corrupted line is likely from the source HuggingFace dataset (`nick007x/arxiv-papers`). Options:
176
-
177
- 1. **Skip it** (current solution) - Loses 1 document out of 2.5M
178
- 2. **Re-ingest** - Download and re-process the entire arxiv dataset
179
- 3. **Manual fix** - Find and repair the specific line
180
-
181
- For now, **skipping is the pragmatic choice** - you lose 0.00004% of data and gain a working system.
182
-
183
- ### Pack Format Standardization
184
-
185
- Consider standardizing all packs to JSONL format in the future:
186
-
187
- ```bash
188
- # Convert structured packs to JSONL
189
- python -m warbler_cda.utils.convert_structured_to_jsonl \
190
- --input packs/warbler-pack-core/pack/templates.json \
191
- --output packs/warbler-pack-core/warbler-pack-core.jsonl
192
- ```
193
-
194
- This would simplify the loader logic and make all packs consistent.
195
-
196
- ## Questions?
197
-
198
- If you encounter any issues:
199
-
200
- 1. Check the HF Space logs for detailed error messages
201
- 2. Verify pack structure matches expected formats
202
- 3. Test locally with `PackLoader().discover_documents()`
203
- 4. Review this document for troubleshooting tips
204
-
205
- ---
206
-
207
- **Status**: ✅ Fix implemented and ready for merge
208
- **MR**: !15
209
- **Impact**: Fixes all 3 ingestion errors, enables full pack loading
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
PDF_INGESTION_INVESTIGATION.md DELETED
@@ -1,325 +0,0 @@
1
- # PDF Ingestion Investigation Report
2
-
3
- **Date**: 2024
4
- **Session Reference**: Based on agent session 1251355
5
- **Investigator**: AI Agent
6
-
7
- ## Executive Summary
8
-
9
- Investigation into the warbler-cda-package ingesters to determine if they are properly utilizing PDFPlumber for reading PDF files. The investigation revealed that **PDFPlumber IS being utilized**, but there were **two bugs** that needed fixing.
10
-
11
- ## Key Findings
12
-
13
- ### ✅ PDFPlumber Integration Status: CONFIRMED
14
-
15
- The ingesters **ARE** utilizing PDFPlumber to read PDF files. The implementation is present and functional with proper fallback mechanisms.
16
-
17
- ### 📍 PDFPlumber Usage Locations
18
-
19
- #### 1. **Import and Availability Check** (Lines 23-27)
20
-
21
- ```python
22
- try:
23
- import pdfplumber
24
- PDF_AVAILABLE = True
25
- except ImportError:
26
- PDF_AVAILABLE = False
27
- ```
28
-
29
- **Status**: ✅ Properly implemented with graceful fallback
30
-
31
- #### 2. **PDF Support Detection Method** (Lines 47-49)
32
-
33
- ```python
34
- def has_pdf_support(self) -> bool:
35
- """Check if PDF extraction is available"""
36
- return PDF_AVAILABLE
37
- ```
38
-
39
- **Status**: ✅ Provides runtime check for PDF capabilities
40
-
41
- #### 3. **Primary PDF Extraction Method** (Lines 51-67)
42
-
43
- ```python
44
- def extract_pdf_text(self, pdf_bytes: bytes, max_chars: int = 5000) -> Optional[str]:
45
- """Extract text from PDF bytes with fallback"""
46
- if not PDF_AVAILABLE:
47
- return None
48
-
49
- try:
50
- pdf_file = io.BytesIO(pdf_bytes)
51
- text_parts = []
52
-
53
- with pdfplumber.open(pdf_file) as pdf:
54
- for page in pdf.pages:
55
- text = page.extract_text()
56
- if text:
57
- text_parts.append(text)
58
- if sum(len(t) for t in text_parts) > max_chars:
59
- break
60
-
61
- return " ".join(text_parts)[:max_chars] if text_parts else None
62
- except Exception as e:
63
- logger.debug(f"PDF extraction error: {e}")
64
- return None
65
- ```
66
-
67
- **Status**: ✅ Properly implemented with:
68
-
69
- - Character limit protection (max_chars=5000)
70
- - Page-by-page extraction
71
- - Error handling
72
- - Graceful fallback
73
-
74
- #### 4. **Flexible PDF Extraction Method** (Lines 540-565)
75
-
76
- ```python
77
- def _extract_pdf_text(self, pdf_data: Any) -> Optional[str]:
78
- """Extract text from PDF data (bytes, file path, or file-like object)"""
79
- if not PDF_AVAILABLE: # ⚠️ FIXED: Was PDF_SUPPORT
80
- return None
81
-
82
- try:
83
- # Handle different PDF data types
84
- if isinstance(pdf_data, bytes):
85
- pdf_file = io.BytesIO(pdf_data)
86
- elif isinstance(pdf_data, str) and os.path.exists(pdf_data):
87
- pdf_file = pdf_data
88
- elif hasattr(pdf_data, 'read'):
89
- pdf_file = pdf_data
90
- else:
91
- return None
92
-
93
- # Extract text from all pages
94
- text_parts = []
95
- with pdfplumber.open(pdf_file) as pdf:
96
- for page in pdf.pages:
97
- page_text = page.extract_text()
98
- if page_text:
99
- text_parts.append(page_text)
100
-
101
- return "\n\n".join(text_parts) if text_parts else None
102
-
103
- except Exception as e:
104
- logger.debug(f"PDF extraction error: {e}")
105
- return None
106
- ```
107
-
108
- **Status**: ✅ Handles multiple input types (bytes, file path, file-like objects)
109
-
110
- ### 🎯 Transformers Using PDF Extraction
111
-
112
- #### 1. **transform_novels()** (Lines 247-320)
113
-
114
- - **Dataset**: GOAT-AI/generated-novels
115
- - **PDF Usage**: Attempts to extract from PDF fields when text fields are unavailable
116
- - **Fallback**: Creates placeholder entries with informative messages
117
- - **Code Location**: Lines 285-295
118
-
119
- ```python
120
- if not text and self.has_pdf_support():
121
- for pdf_field in ['pdf', 'file', 'document']:
122
- try:
123
- if isinstance(item, dict):
124
- if pdf_field in item and item[pdf_field]:
125
- text = self.extract_pdf_text(item[pdf_field])
126
- if text:
127
- logger.info(f"Novel {idx + 1}: Extracted {len(text)} chars from PDF")
128
- break
129
- ```
130
-
131
- **Status**: ✅ Properly integrated with PDF extraction
132
-
133
- #### 2. **transform_portuguese_education()** (Lines 400-500+)
134
-
135
- - **Dataset**: Solshine/Portuguese_Language_Education_Texts
136
- - **PDF Usage**: Could potentially use PDF extraction (not explicitly shown in current code)
137
- - **Fallback**: Creates informative placeholders when content is unavailable
138
-
139
- **Status**: ✅ Has fallback mechanisms in place
140
-
141
- ## 🐛 Bugs Found and Fixed
142
-
143
- ### Bug #1: Incorrect Variable Name in `_extract_pdf_text()`
144
-
145
- **Location**: Line 542
146
- **Issue**: Used `PDF_SUPPORT` instead of `PDF_AVAILABLE`
147
- **Impact**: Would cause NameError when `_extract_pdf_text()` is called
148
- **Fix Applied**: Changed `PDF_SUPPORT` to `PDF_AVAILABLE`
149
-
150
- ```diff
151
- - if not PDF_SUPPORT:
152
- + if not PDF_AVAILABLE:
153
- ```
154
-
155
- ### Bug #2: Duplicate `import io` Statement
156
-
157
- **Location**: Line 56 (inside `extract_pdf_text` method)
158
- **Issue**: `import io` was inside the method instead of at module level
159
- **Impact**: Unnecessary repeated imports, potential performance impact
160
- **Fix Applied**:
161
-
162
- 1. Added `import io` to module-level imports (Line 10)
163
- 2. Removed duplicate `import io` from inside method
164
-
165
- ```diff
166
- # At module level (Line 10)
167
- + import io
168
-
169
- # Inside extract_pdf_text method (Line 56)
170
- - import io
171
- ```
172
-
173
- ## 📦 Dependency Configuration
174
-
175
- ### requirements.txt
176
-
177
- ```text
178
- pdfplumber>=0.11.0
179
- ```
180
-
181
- **Status**: ✅ Properly listed as a dependency
182
-
183
- ### pyproject.toml
184
-
185
- **Status**: ⚠️ NOT listed in core dependencies
186
- **Recommendation**: Consider adding to optional dependencies or core dependencies
187
-
188
- ```toml
189
- [project.optional-dependencies]
190
- pdf = [
191
- "pdfplumber>=0.11.0",
192
- ]
193
- ```
194
-
195
- ## 🔍 How PDFPlumber is Actually Used
196
-
197
- ### Workflow
198
-
199
- 1. **Import Check**: On module load, attempts to import pdfplumber
200
- 2. **Availability Flag**: Sets `PDF_AVAILABLE = True/False` based on import success
201
- 3. **Runtime Check**: `has_pdf_support()` method checks availability
202
- 4. **Extraction Attempt**: When processing datasets:
203
- - First tries to find text in standard fields (text, story, content, etc.)
204
- - If no text found AND `has_pdf_support()` returns True:
205
- - Searches for PDF fields (pdf, file, document)
206
- - Calls `extract_pdf_text()` to extract content
207
- - Logs extraction success with character count
208
- 5. **Graceful Fallback**: If PDF extraction fails or unavailable:
209
- - Creates informative placeholder entries
210
- - Includes metadata about PDF availability
211
- - Maintains system functionality
212
-
213
- ### Example from `transform_novels()`
214
-
215
- ```python
216
- # Try text fields first
217
- for field in ['text', 'story', 'content', 'novel', 'body', 'full_text']:
218
- if field in item and item[field]:
219
- text = item[field]
220
- break
221
-
222
- # If no text, try PDF extraction
223
- if not text and self.has_pdf_support():
224
- for pdf_field in ['pdf', 'file', 'document']:
225
- if pdf_field in item and item[pdf_field]:
226
- text = self.extract_pdf_text(item[pdf_field])
227
- if text:
228
- logger.info(f"Novel {idx + 1}: Extracted {len(text)} chars from PDF")
229
- break
230
-
231
- # If still no text, create placeholder
232
- if not text:
233
- text = f"""[Novel Content Unavailable]
234
-
235
- This novel (#{idx + 1}) is part of the GOAT-AI/generated-novels dataset.
236
- The original content may be stored in PDF format or require special extraction.
237
-
238
- PDF extraction support: {'Available (install pdfplumber)' if not self.has_pdf_support() else 'Enabled'}
239
- """
240
- ```
241
-
242
- ## 🎯 Tactical Assessment
243
-
244
- ### Current Strategy: ✅ SOUND
245
-
246
- The current approach is **well-designed** and does NOT require changing tactics:
247
-
248
- 1. **Graceful Degradation**: System works with or without pdfplumber
249
- 2. **Multiple Fallbacks**: Tries text fields first, then PDF, then placeholders
250
- 3. **Informative Placeholders**: When content unavailable, creates useful metadata
251
- 4. **Proper Error Handling**: All PDF operations wrapped in try-except
252
- 5. **Logging**: Provides visibility into extraction success/failure
253
-
254
- ### Recommendations
255
-
256
- #### 1. **Keep Current Approach** ✅
257
-
258
- The multi-layered fallback strategy is excellent for production systems.
259
-
260
- #### 2. **Fix Applied Bugs** ✅
261
-
262
- - Fixed `PDF_SUPPORT` → `PDF_AVAILABLE` variable name
263
- - Fixed duplicate `import io` statement
264
-
265
- #### 3. **Optional Enhancement**: Add to pyproject.toml
266
-
267
- Consider adding pdfplumber to optional dependencies:
268
-
269
- ```toml
270
- [project.optional-dependencies]
271
- pdf = [
272
- "pdfplumber>=0.11.0",
273
- ]
274
- ```
275
-
276
- #### 4. **Documentation Enhancement**
277
-
278
- The code already has good inline documentation. Consider adding to README:
279
-
280
- - How to enable PDF support
281
- - What happens when PDF support is unavailable
282
- - Which datasets benefit from PDF extraction
283
-
284
- ## 📊 Test Coverage
285
-
286
- The test suite (`test_pdf_ingestion.py`) covers:
287
-
288
- - ✅ PDF support detection
289
- - ✅ PDF extraction method existence
290
- - ✅ Placeholder creation
291
- - ✅ Novel dataset with PDF fields
292
- - ✅ Novel dataset with text fields
293
- - ✅ Portuguese education with PDF fields
294
- - ✅ Output format validation
295
-
296
- ## 🎓 Conclusion
297
-
298
- **PDFPlumber IS being utilized properly** in the ingesters. The implementation:
299
-
300
- - ✅ Has proper import and availability checking
301
- - ✅ Provides two PDF extraction methods (simple and flexible)
302
- - ✅ Integrates PDF extraction into dataset transformers
303
- - ✅ Has comprehensive fallback mechanisms
304
- - ✅ Is well-tested
305
- - ✅ Is properly documented
306
-
307
- **Bugs Fixed**:
308
-
309
- 1. Variable name typo: `PDF_SUPPORT` → `PDF_AVAILABLE`
310
- 2. Duplicate import: Moved `import io` to module level
311
-
312
- **No tactical changes needed** - the current approach is sound and production-ready.
313
-
314
- ## 📝 Files Modified
315
-
316
- 1. `warbler-cda-package/warbler_cda/utils/hf_warbler_ingest.py`
317
- - Fixed variable name in `_extract_pdf_text()` method
318
- - Added `import io` to module-level imports
319
- - Removed duplicate `import io` from method
320
-
321
- ## 🔗 Related Files
322
-
323
- - `warbler-cda-package/requirements.txt` - Lists pdfplumber>=0.11.0
324
- - `warbler-cda-package/tests/test_pdf_ingestion.py` - Test suite for PDF functionality
325
- - `warbler-cda-package/pyproject.toml` - Package configuration (could add optional PDF dependency)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
README_HF.md CHANGED
@@ -8,7 +8,7 @@ pinned: false
8
  license: mit
9
  ---
10
 
11
- # Warbler CDA - Cognitive Development Architecture
12
 
13
  A production-ready RAG system with **FractalStat 8D multi-dimensional addressing** for intelligent document retrieval.
14
 
 
8
  license: mit
9
  ---
10
 
11
+ ## Warbler CDA - Cognitive Development Architecture
12
 
13
  A production-ready RAG system with **FractalStat 8D multi-dimensional addressing** for intelligent document retrieval.
14
 
TESTS_PORTED.md DELETED
@@ -1,271 +0,0 @@
1
- # Tests Ported to Warbler CDA Package
2
-
3
- This document summarizes the TDD (Test-Driven Development) test suite that has been ported from the main project to the warbler-cda-package for HuggingFace deployment.
4
-
5
- ## Overview
6
-
7
- The complete test suite for the Warbler CDA (Cognitive Development Architecture) RAG system has been ported and adapted for the standalone package. This includes:
8
-
9
- - **4 main test modules** with comprehensive coverage
10
- - **1 end-to-end integration test suite**
11
- - **Pytest configuration** with custom markers
12
- - **Test documentation** and running instructions
13
-
14
- ## Test Files Ported
15
-
16
- ### 1. **tests/test_embedding_providers.py** (9.5 KB)
17
-
18
- **Source**: Adapted from `packages/com.twg.the-seed/The Living Dev Agent/tests/test_semantic_anchors.py`
19
-
20
- **Coverage**:
21
-
22
- - EmbeddingProviderFactory pattern
23
- - LocalEmbeddingProvider (TF-IDF based)
24
- - SentenceTransformerEmbeddingProvider (GPU-accelerated)
25
- - Embedding generation (single and batch)
26
- - Similarity calculations
27
- - Provider information and metadata
28
-
29
- **Tests**:
30
-
31
- - `test_factory_creates_local_provider` - Factory can create local providers
32
- - `test_factory_list_available_providers` - Factory lists available providers
33
- - `test_factory_default_provider` - Factory defaults to SentenceTransformer with fallback
34
- - `test_embed_single_text` - Single text embedding
35
- - `test_embed_batch` - Batch embedding
36
- - `test_similarity_calculation` - Cosine similarity
37
- - `test_semantic_search` - K-nearest neighbor search
38
- - `test_stat7_computation` - STAT7 coordinate computation
39
- - And 8 more embedding-focused tests
40
-
41
- ### 2. **tests/test_retrieval_api.py** (11.9 KB)
42
-
43
- **Source**: Adapted from `packages/com.twg.the-seed/seed/engine/test_retrieval_debug.py`
44
-
45
- **Coverage**:
46
-
47
- - Context store operations
48
- - Document addition and deduplication
49
- - Query execution and filtering
50
- - Retrieval modes (semantic, temporal, composite)
51
- - Confidence threshold filtering
52
- - Result structure validation
53
- - Caching and metrics
54
-
55
- **Tests**:
56
-
57
- - `TestRetrievalAPIContextStore` - 4 tests for document store
58
- - `TestRetrievalQueryExecution` - 5 tests for query operations
59
- - `TestRetrievalModes` - 3 tests for different retrieval modes
60
- - `TestRetrievalHybridScoring` - 2 tests for STAT7 hybrid scoring
61
- - `TestRetrievalMetrics` - 2 tests for metrics tracking
62
- - Total: 16+ tests
63
-
64
- ### 3. **tests/test_stat7_integration.py** (12.3 KB)
65
-
66
- **Source**: Original implementation for STAT7 support
67
-
68
- **Coverage**:
69
-
70
- - STAT7 coordinate computation from embeddings
71
- - Hybrid semantic + STAT7 scoring
72
- - STAT7 resonance calculation
73
- - Document enrichment with STAT7 data
74
- - Multi-dimensional query addressing
75
- - STAT7 dimensional properties
76
-
77
- **Tests**:
78
-
79
- - `TestSTAT7CoordinateComputation` - 3 tests
80
- - `TestSTAT7HybridScoring` - 3 tests
81
- - `TestSTAT7DocumentEnrichment` - 2 tests
82
- - `TestSTAT7QueryAddressing` - 2 tests
83
- - `TestSTAT7Dimensions` - 2 tests
84
- - Total: 12+ tests
85
-
86
- ### 4. **tests/test_rag_e2e.py** (12.6 KB)
87
-
88
- **Source**: Adapted from `packages/com.twg.the-seed/The Living Dev Agent/tests/test_exp08_rag_integration.py`
89
-
90
- **Coverage**:
91
-
92
- - Complete end-to-end RAG pipeline
93
- - Embedding generation validation
94
- - Document ingestion
95
- - Semantic search retrieval
96
- - Temporal retrieval
97
- - Metrics tracking
98
- - Full system integration
99
-
100
- **Tests**:
101
-
102
- 1. `test_01_embedding_generation` - Embeddings are generated
103
- 2. `test_02_embedding_similarity` - Similarity scoring works
104
- 3. `test_03_document_ingestion` - Documents are ingested
105
- 4. `test_04_semantic_search` - Semantic search works
106
- 5. `test_05_max_results_respected` - Result limiting works
107
- 6. `test_06_confidence_threshold` - Threshold filtering works
108
- 7. `test_07_stat7_hybrid_scoring` - Hybrid scoring works
109
- 8. `test_08_temporal_retrieval` - Temporal queries work
110
- 9. `test_09_retrieval_metrics` - Metrics are tracked
111
- 10. `test_10_full_rag_pipeline` - Complete pipeline works
112
-
113
- ### 5. **tests/conftest.py** (1.6 KB)
114
-
115
- **Purpose**: Pytest configuration and fixtures
116
-
117
- **Includes**:
118
-
119
- - Custom pytest markers (embedding, retrieval, stat7, e2e, slow)
120
- - Test data fixtures
121
- - Pytest configuration hooks
122
-
123
- ### 6. **tests/README.md** (5.6 KB)
124
-
125
- **Purpose**: Test documentation
126
-
127
- **Contains**:
128
-
129
- - Test organization overview
130
- - Running instructions
131
- - Test coverage summary
132
- - Troubleshooting guide
133
- - CI/CD integration examples
134
-
135
- ## Test Statistics
136
-
137
- | Category | Count |
138
- |----------|-------|
139
- | Total Test Classes | 16 |
140
- | Total Test Methods | 50+ |
141
- | Total Test Files | 4 |
142
- | Test Size | ~47 KB |
143
- | Coverage Scope | 90%+ of core functionality |
144
-
145
- ## Key Testing Areas
146
-
147
- ### Embedding Providers
148
-
149
- - ✅ Local TF-IDF provider (no dependencies)
150
- - ✅ SentenceTransformer provider (GPU acceleration)
151
- - ✅ Factory pattern with graceful fallback
152
- - ✅ Batch processing
153
- - ✅ Similarity calculations
154
- - ✅ Semantic search
155
-
156
- ### Retrieval Operations
157
-
158
- - ✅ Document ingestion and storage
159
- - ✅ Context store management
160
- - ✅ Query execution
161
- - ✅ Semantic similarity retrieval
162
- - ✅ Temporal sequence retrieval
163
- - ✅ Composite retrieval modes
164
-
165
- ### STAT7 Integration
166
-
167
- - ✅ Coordinate computation from embeddings
168
- - ✅ Hybrid scoring (semantic + STAT7)
169
- - ✅ Resonance calculations
170
- - ✅ Multi-dimensional addressing
171
- - ✅ Document enrichment
172
-
173
- ### System Integration
174
-
175
- - ✅ End-to-end pipeline
176
- - ✅ Metrics and performance tracking
177
- - ✅ Caching mechanisms
178
- - ✅ Error handling and fallbacks
179
-
180
- ## Running the Tests
181
-
182
- ### Quick Start
183
-
184
- ```bash
185
- cd warbler-cda-package
186
- pytest tests/ -v
187
- ```
188
-
189
- ### Detailed Examples
190
-
191
- ```bash
192
- # Run all tests with output
193
- pytest tests/ -v -s
194
-
195
- # Run with coverage report
196
- pytest tests/ --cov=warbler_cda --cov-report=html
197
-
198
- # Run only embedding tests
199
- pytest tests/test_embedding_providers.py -v
200
-
201
- # Run only end-to-end tests
202
- pytest tests/test_rag_e2e.py -v -s
203
-
204
- # Run tests matching a pattern
205
- pytest tests/ -k "semantic" -v
206
- ```
207
-
208
- ## Compatibility
209
-
210
- ### With SentenceTransformer Installed
211
-
212
- - All 50+ tests pass
213
- - GPU acceleration available
214
- - Full STAT7 integration enabled
215
-
216
- ### Without SentenceTransformer
217
-
218
- - Tests gracefully skip SentenceTransformer-specific tests
219
- - Fallback to local TF-IDF provider
220
- - ~40 tests pass
221
- - STAT7 tests skipped
222
-
223
- ## Design Principles
224
-
225
- The ported tests follow TDD principles:
226
-
227
- 1. **Isolation**: Each test is independent and can run standalone
228
- 2. **Clarity**: Test names describe what is being tested
229
- 3. **Completeness**: Happy path and edge cases covered
230
- 4. **Robustness**: Graceful handling of optional dependencies
231
- 5. **Documentation**: Each test is well-commented and documented
232
-
233
- ## Integration with CI/CD
234
-
235
- The tests are designed for easy integration with CI/CD pipelines:
236
-
237
- ```yaml
238
- # Example GitHub Actions workflow
239
- - name: Run Warbler CDA Tests
240
- run: |
241
- cd warbler-cda-package
242
- pytest tests/ --cov=warbler_cda --cov-report=xml
243
- ```
244
-
245
- ## Future Test Additions
246
-
247
- Recommended areas for additional tests:
248
-
249
- 1. Performance benchmarking
250
- 2. Stress testing with large document collections
251
- 3. Concurrent query handling
252
- 4. Cache invalidation scenarios
253
- 5. Error recovery mechanisms
254
- 6. Large-scale STAT7 coordinate distribution analysis
255
-
256
- ## Notes
257
-
258
- - Tests use pytest fixtures for setup/teardown
259
- - Custom markers enable selective test execution
260
- - Graceful fallback for optional dependencies
261
- - Comprehensive end-to-end validation
262
- - Documentation-as-tests through verbose assertions
263
-
264
- ## Maintenance
265
-
266
- When updating the package:
267
-
268
- 1. Run tests after any changes: `pytest tests/ -v`
269
- 2. Update tests if new functionality is added
270
- 3. Keep end-to-end tests as verification baseline
271
- 4. Monitor test execution time for performance regressions
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
TEST_RESULTS.md DELETED
@@ -1,211 +0,0 @@
1
- # Test Results: MIT-Licensed Datasets Integration
2
-
3
- **Date**: November 8, 2025
4
- **Status**: ✅ **ALL TESTS PASSING**
5
- **Total Tests**: 71
6
- **Passed**: 71
7
- **Failed**: 0
8
- **Skipped**: 0
9
-
10
- ---
11
-
12
- ## Test Summary
13
-
14
- ### New MIT-Licensed Dataset Tests: 18/18 ✅
15
-
16
- | Test Class | Tests | Status |
17
- |-----------|-------|--------|
18
- | TestArxivPapersTransformer | 4 | ✅ PASS |
19
- | TestPromptReportTransformer | 2 | ✅ PASS |
20
- | TestGeneratedNovelsTransformer | 2 | ✅ PASS |
21
- | TestManualnsTransformer | 2 | ✅ PASS |
22
- | TestEnterpriseTransformer | 2 | ✅ PASS |
23
- | TestPortugueseEducationTransformer | 2 | ✅ PASS |
24
- | TestNewDatasetsIntegrationWithRetrieval | 2 | ✅ PASS |
25
- | TestNewDatasetsPerformance | 1 | ✅ PASS |
26
- | TestNewDatasetsAllAtOnce | 1 | ✅ PASS |
27
- | **Total New Tests** | **18** | **✅ 100%** |
28
-
29
- ### Existing Warbler-CDA Tests: 53/53 ✅
30
-
31
- | Test Module | Tests | Status |
32
- |------------|-------|--------|
33
- | test_embedding_providers.py | 11 | ✅ PASS |
34
- | test_rag_e2e.py | 10 | ✅ PASS |
35
- | test_retrieval_api.py | 13 | ✅ PASS |
36
- | test_stat7_integration.py | 12 | ✅ PASS |
37
- | test_embedding_integration.py | 7 | ✅ PASS |
38
- | **Total Existing Tests** | **53** | **✅ 100%** |
39
-
40
- ---
41
-
42
- ## Individual Test Results
43
-
44
- ### ✅ New Transformer Tests (18 PASSED)
45
-
46
- ```log
47
- tests/test_new_mit_datasets.py::TestArxivPapersTransformer::test_arxiv_transformer_exists PASSED
48
- tests/test_new_mit_datasets.py::TestArxivPapersTransformer::test_arxiv_output_format PASSED
49
- tests/test_new_mit_datasets.py::TestArxivPapersTransformer::test_arxiv_metadata_fields PASSED
50
- tests/test_new_mit_datasets.py::TestArxivPapersTransformer::test_arxiv_limit_parameter PASSED
51
- tests/test_new_mit_datasets.py::TestPromptReportTransformer::test_prompt_report_transformer_exists PASSED
52
- tests/test_new_mit_datasets.py::TestPromptReportTransformer::test_prompt_report_output_format PASSED
53
- tests/test_new_mit_datasets.py::TestGeneratedNovelsTransformer::test_novels_transformer_exists PASSED
54
- tests/test_new_mit_datasets.py::TestGeneratedNovelsTransformer::test_novels_chunking_for_long_text PASSED
55
- tests/test_new_mit_datasets.py::TestManualnsTransformer::test_manuals_transformer_exists PASSED
56
- tests/test_new_mit_datasets.py::TestManualnsTransformer::test_manuals_output_format PASSED
57
- tests/test_new_mit_datasets.py::TestEnterpriseTransformer::test_enterprise_transformer_exists PASSED
58
- tests/test_new_mit_datasets.py::TestEnterpriseTransformer::test_enterprise_output_format PASSED
59
- tests/test_new_mit_datasets.py::TestPortugueseEducationTransformer::test_portuguese_transformer_exists PASSED
60
- tests/test_new_mit_datasets.py::TestPortugueseEducationTransformer::test_portuguese_multilingual_metadata PASSED
61
- tests/test_new_mit_datasets.py::TestNewDatasetsIntegrationWithRetrieval::test_warbler_document_structure PASSED
62
- tests/test_new_mit_datasets.py::TestNewDatasetsIntegrationWithRetrieval::test_pack_creation_with_new_datasets PASSED
63
- tests/test_new_mit_datasets.py::TestNewDatasetsPerformance::test_arxiv_handles_large_dataset PASSED
64
- tests/test_new_mit_datasets.py::TestNewDatasetsAllAtOnce::test_all_transformers_callable PASSED
65
- ```
66
-
67
- ### ✅ Backward Compatibility Tests (53 PASSED)
68
-
69
- All existing tests continue to pass, confirming backward compatibility:
70
-
71
- - Embedding provider interface tests ✅
72
- - RAG end-to-end pipeline ✅
73
- - Retrieval API functionality ✅
74
- - STAT7 integration and hybrid scoring ✅
75
- - Embedding integration ✅
76
-
77
- ---
78
-
79
- ## Test Execution Details
80
-
81
- ### Command
82
-
83
- ```bash
84
- C:\Users\jerio\AppData\Local\Programs\Python\Python312\python.exe -m pytest tests/ -v
85
- ```
86
-
87
- ### Execution Time
88
-
89
- - Total: 58.70 seconds
90
- - New tests: ~13 seconds
91
- - Existing tests: ~45 seconds
92
-
93
- ### Environment
94
-
95
- - Python: 3.12.10
96
- - pytest: 8.4.2
97
- - Platform: Windows (win32)
98
-
99
- ---
100
-
101
- ## Coverage by Transformer
102
-
103
- ### arXiv Papers (4 tests)
104
-
105
- - ✅ Transformer exists and is callable
106
- - ✅ Output format matches Warbler structure
107
- - ✅ Metadata includes required fields
108
- - ✅ Limit parameter respected
109
-
110
- ### Prompt Report (2 tests)
111
-
112
- - ✅ Transformer exists
113
- - ✅ Output format correct
114
-
115
- ### Generated Novels (2 tests)
116
-
117
- - ✅ Transformer exists
118
- - ✅ Text chunking functionality
119
-
120
- ### Technical Manuals (2 tests)
121
-
122
- - ✅ Transformer exists
123
- - ✅ Output format correct
124
-
125
- ### Enterprise Benchmarks (2 tests)
126
-
127
- - ✅ Transformer exists
128
- - ✅ Output format correct
129
-
130
- ### Portuguese Education (2 tests)
131
-
132
- - ✅ Transformer exists
133
- - ✅ Multilingual metadata
134
-
135
- ### Integration (2 tests)
136
-
137
- - ✅ Warbler document structure validation
138
- - ✅ Pack creation with mocked filesystem
139
-
140
- ### Performance (1 test)
141
-
142
- - ✅ Large dataset handling (100+ papers in <10s)
143
-
144
- ### All Transformers Callable (1 test)
145
-
146
- - ✅ All 6 new transformers verified as callable
147
-
148
- ---
149
-
150
- ## Issues Found & Fixed
151
-
152
- ### Issue 1: Mock WindowsPath AttributeError
153
-
154
- **Problem**: Test tried to mock `mkdir` attribute on real Path object
155
- **Solution**: Used MagicMock instead of real Path
156
- **Status**: ✅ Fixed - all tests now pass
157
-
158
- ---
159
-
160
- ## Validation Checklist
161
-
162
- - [x] All new transformer methods are implemented
163
- - [x] All helper methods are implemented
164
- - [x] Output format matches Warbler structure
165
- - [x] MIT license field present in all documents
166
- - [x] Metadata fields required (realm_type, realm_label, etc)
167
- - [x] Error handling in place
168
- - [x] CLI integration works
169
- - [x] Backward compatibility maintained
170
- - [x] Performance acceptable (<10s for large datasets)
171
- - [x] 100% test pass rate
172
-
173
- ---
174
-
175
- ## Recommendations
176
-
177
- ### Immediate
178
-
179
- - ✅ Ready for staging environment validation
180
- - ✅ Ready for production deployment
181
-
182
- ### Next Steps
183
-
184
- 1. Test with actual HuggingFace API (not mocked)
185
- 2. Validate pack loading in retrieval system
186
- 3. Benchmark hybrid scoring with new documents
187
- 4. Monitor first production ingestion
188
-
189
- ### Long-term
190
-
191
- 1. Add integration tests with real HuggingFace datasets
192
- 2. Performance benchmarking with different dataset sizes
193
- 3. Memory profiling for large arXiv ingestion
194
- 4. Document update frequency strategy
195
-
196
- ---
197
-
198
- ## Sign-Off
199
-
200
- **All 71 tests passing.**
201
- **Backward compatibility maintained.**
202
- **New functionality validated.**
203
-
204
- ✅ **Ready for Production Deployment**
205
-
206
- ---
207
-
208
- **Test Report Generated**: 2025-11-08
209
- **Python Version**: 3.12.10
210
- **pytest Version**: 8.4.2
211
- **Status**: VALIDATED ✅
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
TODO.md DELETED
@@ -1,30 +0,0 @@
1
- # Background Pack Ingestion Implementation
2
-
3
- ## Overview
4
- Modify app.py to perform pack ingestion in a background thread, allowing the app to start immediately while documents load asynchronously.
5
-
6
- ## Tasks
7
-
8
- ### 1. Add Background Ingestion Support
9
- - [ ] Import threading module in app.py
10
- - [ ] Add global variables to track ingestion status (running, progress, total_docs, processed, etc.)
11
- - [ ] Create a background_ingest_packs() function that performs the ingestion logic
12
- - [ ] Start the background thread after API initialization but before app launch
13
-
14
- ### 2. Update System Stats
15
- - [ ] Modify get_system_stats() to include ingestion progress information
16
- - [ ] Display current ingestion status in the System Stats tab
17
-
18
- ### 3. Handle Thread Safety
19
- - [ ] Ensure API.add_document() calls are thread-safe (assuming they are)
20
- - [ ] Add proper error handling in the background thread
21
-
22
- ### 4. Test Implementation
23
- - [ ] Test that app launches immediately
24
- - [ ] Verify ingestion happens in background
25
- - [ ] Check that queries work during ingestion
26
- - [ ] Confirm progress is shown in System Stats
27
-
28
- ## Status
29
- - [x] Plan created and approved
30
- - [ ] Implementation in progress
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
app.py CHANGED
@@ -6,14 +6,13 @@ Provides a web UI for the FractalStat RAG system with GPU acceleration.
6
  """
7
 
8
  import gradio as gr
9
- import json
10
- from typing import Dict, Any, List
11
  import time
12
 
13
  # Import Warbler CDA components
14
  from warbler_cda.retrieval_api import RetrievalAPI, RetrievalQuery, RetrievalMode
15
  from warbler_cda.embeddings import EmbeddingProviderFactory
16
  from warbler_cda.fractalstat_rag_bridge import FractalStatRAGBridge
 
17
  from warbler_cda.pack_loader import PackLoader
18
 
19
  # Initialize the system
@@ -23,12 +22,17 @@ print("🚀 Initializing Warbler CDA...")
23
  embedding_provider = EmbeddingProviderFactory.get_default_provider()
24
  print(f"✅ Embedding provider: {embedding_provider.get_provider_info()['provider_id']}")
25
 
 
 
 
 
26
  # Create FractalStat bridge
27
  fractalstat_bridge = FractalStatRAGBridge()
28
  print("✅ FractalStat bridge initialized")
29
 
30
- # Create RetrievalAPI
31
  api = RetrievalAPI(
 
32
  embedding_provider=embedding_provider,
33
  fractalstat_bridge=fractalstat_bridge,
34
  config={"enable_fractalstat_hybrid": True}
@@ -39,15 +43,47 @@ print("✅ RetrievalAPI initialized")
39
  print("📚 Loading Warbler packs...")
40
  pack_loader = PackLoader()
41
  documents = pack_loader.discover_documents()
42
- print(f"✅ Found {len(documents)} documents")
43
-
44
- # Ingest documents
45
- for doc in documents:
46
- api.add_document(
47
- doc_id=doc["id"],
48
- content=doc["content"],
49
- metadata=doc.get("metadata", {})
50
- )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
51
 
52
  print(f"🎉 Warbler CDA ready with {api.get_context_store_size()} documents!")
53
 
@@ -145,7 +181,7 @@ with gr.Blocks(title="Warbler CDA - FractalStat RAG") as demo:
145
  with gr.Column():
146
  results_output = gr.Markdown(label="Results")
147
 
148
- query_btn.click(
149
  fn=query_warbler,
150
  inputs=[query_input, max_results, use_hybrid],
151
  outputs=results_output
@@ -163,8 +199,8 @@ with gr.Blocks(title="Warbler CDA - FractalStat RAG") as demo:
163
  with gr.Tab("System Stats"):
164
  stats_output = gr.Markdown()
165
  stats_btn = gr.Button("Refresh Stats")
166
- stats_btn.click(fn=get_system_stats, outputs=stats_output)
167
- demo.load(fn=get_system_stats, outputs=stats_output)
168
 
169
  with gr.Tab("About"):
170
  gr.Markdown("""
 
6
  """
7
 
8
  import gradio as gr
 
 
9
  import time
10
 
11
  # Import Warbler CDA components
12
  from warbler_cda.retrieval_api import RetrievalAPI, RetrievalQuery, RetrievalMode
13
  from warbler_cda.embeddings import EmbeddingProviderFactory
14
  from warbler_cda.fractalstat_rag_bridge import FractalStatRAGBridge
15
+ from warbler_cda.semantic_anchors import SemanticAnchorGraph
16
  from warbler_cda.pack_loader import PackLoader
17
 
18
  # Initialize the system
 
22
  embedding_provider = EmbeddingProviderFactory.get_default_provider()
23
  print(f"✅ Embedding provider: {embedding_provider.get_provider_info()['provider_id']}")
24
 
25
+ # Create semantic anchors (required by RetrievalAPI)
26
+ semantic_anchors = SemanticAnchorGraph(embedding_provider=embedding_provider)
27
+ print("✅ Semantic anchors initialized")
28
+
29
  # Create FractalStat bridge
30
  fractalstat_bridge = FractalStatRAGBridge()
31
  print("✅ FractalStat bridge initialized")
32
 
33
+ # Create RetrievalAPI with proper components
34
  api = RetrievalAPI(
35
+ semantic_anchors=semantic_anchors,
36
  embedding_provider=embedding_provider,
37
  fractalstat_bridge=fractalstat_bridge,
38
  config={"enable_fractalstat_hybrid": True}
 
43
  print("📚 Loading Warbler packs...")
44
  pack_loader = PackLoader()
45
  documents = pack_loader.discover_documents()
46
+
47
+ # If no packs found, try to download them
48
+ if len(documents) == 0:
49
+ print("⚠️ No packs found locally. Attempting to download from HuggingFace...")
50
+ try:
51
+ from warbler_cda.utils.hf_warbler_ingest import HFWarblerIngestor
52
+ ingestor = HFWarblerIngestor(packs_dir=pack_loader.packs_dir, verbose=True)
53
+ # Download a small demo dataset for deployment
54
+ print("📦 Downloading warbler-pack-hf-prompt-report...")
55
+ success = ingestor.ingest_dataset("prompt-report")
56
+ if success:
57
+ # Reload after download
58
+ documents = pack_loader.discover_documents()
59
+ print(f"✅ Downloaded {len(documents)} documents")
60
+ else:
61
+ print("❌ Failed to download dataset, using sample documents...")
62
+ documents = []
63
+ except Exception as e:
64
+ print(f"⚠️ Could not download packs: {e}")
65
+ print("Using sample documents instead...")
66
+ documents = []
67
+
68
+ if len(documents) == 0:
69
+ # Fallback to sample documents
70
+ sample_docs = [
71
+ {"id": "sample1", "content": "FractalStat is an 8-dimensional addressing system for intelligent retrieval.", "metadata": {}},
72
+ {"id": "sample2", "content": "Semantic search finds documents by meaning, not just keywords.", "metadata": {}},
73
+ {"id": "sample3", "content": "Bob the Skeptic validates results to prevent bias and hallucinations.", "metadata": {}},
74
+ ]
75
+ for doc in sample_docs:
76
+ api.add_document(doc["id"], doc["content"], doc["metadata"])
77
+ print(f"✅ Loaded {len(sample_docs)} sample documents")
78
+ else:
79
+ print(f"✅ Found {len(documents)} documents")
80
+ # Ingest documents
81
+ for doc in documents:
82
+ api.add_document(
83
+ doc_id=doc["id"],
84
+ content=doc["content"],
85
+ metadata=doc.get("metadata", {})
86
+ )
87
 
88
  print(f"🎉 Warbler CDA ready with {api.get_context_store_size()} documents!")
89
 
 
181
  with gr.Column():
182
  results_output = gr.Markdown(label="Results")
183
 
184
+ query_btn.click( # pylint: disable=E1101
185
  fn=query_warbler,
186
  inputs=[query_input, max_results, use_hybrid],
187
  outputs=results_output
 
199
  with gr.Tab("System Stats"):
200
  stats_output = gr.Markdown()
201
  stats_btn = gr.Button("Refresh Stats")
202
+ stats_btn.click(fn=get_system_stats, outputs=stats_output) # pylint: disable=E1101
203
+ demo.load(fn=get_system_stats, outputs=stats_output) # pylint: disable=E1101
204
 
205
  with gr.Tab("About"):
206
  gr.Markdown("""
compress_packs.py DELETED
@@ -1,134 +0,0 @@
1
- #!/usr/bin/env python3
2
- """
3
- Pack Compression Script using Evaporation Engine
4
-
5
- This script compresses warbler packs by replacing document content with
6
- compressed proto-thoughts generated by the evaporation engine.
7
- """
8
-
9
- import json
10
- import sys
11
- from pathlib import Path
12
- from typing import Dict, Any, List
13
-
14
- # Add the project root to Python path
15
- sys.path.insert(0, str(Path(__file__).parent))
16
-
17
- from warbler_cda.melt_layer import MeltLayer, MagmaStore
18
- from warbler_cda.evaporation import EvaporationEngine, CloudStore
19
-
20
-
21
- def load_jsonl_file(filepath: str) -> List[Dict[str, Any]]:
22
- """Load a JSONL file and return list of documents."""
23
- documents = []
24
- with open(filepath, "r", encoding="utf-8") as f:
25
- for line in f:
26
- line = line.strip()
27
- if line:
28
- documents.append(json.loads(line))
29
- return documents
30
-
31
-
32
- def save_jsonl_file(filepath: str, documents: List[Dict[str, Any]]) -> None:
33
- """Save list of documents to a JSONL file."""
34
- with open(filepath, "w", encoding="utf-8") as f:
35
- for doc in documents:
36
- f.write(json.dumps(doc, ensure_ascii=False) + "\n")
37
-
38
-
39
- def compress_pack(pack_path: str, output_suffix: str = "_compressed") -> None:
40
- """Compress a single pack using evaporation engine."""
41
- pack_path = Path(pack_path)
42
- if not pack_path.exists():
43
- raise FileNotFoundError(f"Pack path {pack_path} does not exist")
44
-
45
- # Find all JSONL files in the pack
46
- jsonl_files = list(pack_path.glob("*.jsonl"))
47
- if not jsonl_files:
48
- print(f"No JSONL files found in {pack_path}")
49
- return
50
-
51
- print(f"Found {len(jsonl_files)} JSONL files in {pack_path}")
52
-
53
- # Initialize evaporation components
54
- magma_store = MagmaStore()
55
- cloud_store = CloudStore()
56
- melt_layer = MeltLayer(magma_store)
57
- evaporation_engine = EvaporationEngine(magma_store, cloud_store)
58
-
59
- total_docs = 0
60
- compressed_docs = 0
61
-
62
- for jsonl_file in jsonl_files:
63
- print(f"Processing {jsonl_file.name}...")
64
-
65
- # Load documents
66
- documents = load_jsonl_file(str(jsonl_file))
67
- total_docs += len(documents)
68
-
69
- compressed_documents = []
70
-
71
- for doc in documents:
72
- if "content" not in doc:
73
- print("Warning: Document missing 'content' field, skipping")
74
- continue
75
-
76
- content = doc["content"]
77
- if not content or not isinstance(content, str):
78
- print("Warning: Empty or invalid content, skipping")
79
- continue
80
-
81
- try:
82
- # Create a fragment from the document content
83
- fragment = {"id": doc.get("content_id", f"doc_{compressed_docs}"), "text": content}
84
-
85
- # Create glyph from the single fragment
86
- melt_layer.retire_cluster({"fragments": [fragment]})
87
-
88
- # Evaporate to get proto-thought
89
- mist_lines = evaporation_engine.evaporate(limit=1)
90
-
91
- if mist_lines:
92
- proto_thought = mist_lines[0]["proto_thought"]
93
- # Replace content with compressed proto-thought
94
- compressed_doc = doc.copy()
95
- compressed_doc["content"] = proto_thought
96
- compressed_doc["original_content_length"] = len(content)
97
- compressed_doc["compressed_content_length"] = len(proto_thought)
98
- compressed_documents.append(compressed_doc)
99
- compressed_docs += 1
100
- else:
101
- print(
102
- f"Warning: Failed to evaporate glyph for document {doc.get('content_id', 'unknown')}"
103
- )
104
- # Keep original document if evaporation fails
105
- compressed_documents.append(doc)
106
-
107
- except Exception as e:
108
- print(f"Error processing document {doc.get('content_id', 'unknown')}: {e}")
109
- # Keep original document on error
110
- compressed_documents.append(doc)
111
-
112
- # Save compressed file
113
- output_file = jsonl_file.parent / f"{jsonl_file.stem}{output_suffix}{jsonl_file.suffix}"
114
- save_jsonl_file(str(output_file), compressed_documents)
115
- print(f"Saved compressed file: {output_file}")
116
-
117
- print("Compression complete:")
118
- print(f" Total documents processed: {total_docs}")
119
- print(f" Documents compressed: {compressed_docs}")
120
- if total_docs > 0:
121
- print(f" Compression ratio: {compressed_docs/total_docs:.2%}")
122
-
123
-
124
- def main():
125
- if len(sys.argv) != 2:
126
- print("Usage: python compress_packs.py <pack_path>")
127
- sys.exit(1)
128
-
129
- pack_path = sys.argv[1]
130
- compress_pack(pack_path)
131
-
132
-
133
- if __name__ == "__main__":
134
- main()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
convert_to_jsonl.py DELETED
@@ -1,37 +0,0 @@
1
- import json
2
- import os
3
-
4
-
5
- def convert_templates_to_jsonl(pack_dir):
6
- """Convert templates.json to pack_name.jsonl for a given pack directory."""
7
- pack_name = os.path.basename(pack_dir)
8
- templates_path = os.path.join(pack_dir, "pack", "templates.json")
9
- jsonl_path = os.path.join(pack_dir, f"{pack_name}.jsonl")
10
-
11
- if not os.path.exists(templates_path):
12
- print(f"No templates.json found in {pack_dir}")
13
- return
14
-
15
- with open(templates_path, "r") as f:
16
- templates = json.load(f)
17
-
18
- with open(jsonl_path, "w") as f:
19
- for template in templates:
20
- json.dump(template, f)
21
- f.write("\n")
22
-
23
- print(f"Converted {templates_path} to {jsonl_path}")
24
-
25
-
26
- # Convert the three default packs
27
- packs_to_convert = [
28
- "packs/warbler-pack-core",
29
- "packs/warbler-pack-faction-politics",
30
- "packs/warbler-pack-wisdom-scrolls",
31
- ]
32
-
33
- for pack in packs_to_convert:
34
- if os.path.exists(pack):
35
- convert_templates_to_jsonl(pack)
36
- else:
37
- print(f"Pack directory {pack} not found")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
copy_packs.sh DELETED
@@ -1,45 +0,0 @@
1
- #!/bin/bash
2
- set -e
3
-
4
- SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
5
- REPO_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)"
6
- SOURCE_PACKS_DIR="$REPO_ROOT/packages/com.twg.the-seed/The Living Dev Agent/packs"
7
- DEST_PACKS_DIR="$SCRIPT_DIR/packs"
8
-
9
- echo "Copying Warbler Packs to warbler-cda-package..."
10
- echo "Source: $SOURCE_PACKS_DIR"
11
- echo "Destination: $DEST_PACKS_DIR"
12
-
13
- if [ ! -d "$SOURCE_PACKS_DIR" ]; then
14
- echo "❌ Error: Source packs directory not found at $SOURCE_PACKS_DIR"
15
- exit 1
16
- fi
17
-
18
- mkdir -p "$DEST_PACKS_DIR"
19
-
20
- PACKS=(
21
- "warbler-pack-core"
22
- "warbler-pack-faction-politics"
23
- "warbler-pack-wisdom-scrolls"
24
- "warbler-pack-hf-npc-dialogue"
25
- )
26
-
27
- for pack in "${PACKS[@]}"; do
28
- src="$SOURCE_PACKS_DIR/$pack"
29
- dst="$DEST_PACKS_DIR/$pack"
30
-
31
- if [ -d "$src" ]; then
32
- echo "📦 Copying $pack..."
33
- rm -rf "$dst"
34
- cp -r "$src" "$dst"
35
- echo "✓ Copied $pack"
36
- else
37
- echo "⚠️ Warning: Pack not found at $src (skipping)"
38
- fi
39
- done
40
-
41
- echo ""
42
- echo "✅ Warbler packs successfully copied to $DEST_PACKS_DIR"
43
- echo ""
44
- echo "Packs available for ingestion:"
45
- ls -1 "$DEST_PACKS_DIR" | sed 's/^/ • /'
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
coverage.xml DELETED
The diff for this file is too large to render. See raw diff
 
final_fix.py DELETED
@@ -1,28 +0,0 @@
1
- #!/usr/bin/env python3
2
- """Final fixes for stat7_entity.py and verify the fixes work"""
3
-
4
- # Fix the stat7_entity.py bug
5
- with open("warbler_cda/stat7_entity.py", "r", encoding="utf-8") as f:
6
- content = f.read()
7
-
8
- # Fix the description reference bug
9
- content = content.replace('"description": description,', '"description": self.description,')
10
-
11
- # Write back the fixed content
12
- with open("warbler_cda/stat7_entity.py", "w", encoding="utf-8") as f:
13
- f.write(content)
14
-
15
- print("Fixed stat7_entity.py description bug")
16
-
17
- # Test import to make sure everything works
18
- try:
19
- print("✅ stat7_entity imports successfully")
20
- except Exception as e:
21
- print(f"❌ stat7_entity import failed: {e}")
22
-
23
- try:
24
- print("✅ stat7_rag_bridge imports successfully")
25
- except Exception as e:
26
- print(f"❌ stat7_rag_bridge import failed: {e}")
27
-
28
- print("All fixes applied!")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fix_theme.py DELETED
@@ -1,15 +0,0 @@
1
- #!/usr/bin/env python3
2
- """Fix the theme issue in app.py"""
3
-
4
- with open("app.py", "r", encoding="utf-8") as f:
5
- content = f.read()
6
-
7
- old_line = 'with gr.Blocks(title="Warbler CDA - RAG System Demo", theme=gr.themes.Soft()) as demo:'
8
- new_line = 'with gr.Blocks(title="Warbler CDA - RAG System Demo") as demo:'
9
-
10
- content = content.replace(old_line, new_line)
11
-
12
- with open("app.py", "w", encoding="utf-8") as f:
13
- f.write(content)
14
-
15
- print("Fixed theme issue")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
load_warbler_packs_current.txt DELETED
@@ -1,259 +0,0 @@
1
- #!/usr/bin/env python3
2
- """
3
- Load Warbler Pack Data into EXP-09 API Service
4
-
5
- Ingests game wisdom, lore, and faction data into the STAT7-enabled RetrievalAPI
6
- for end-to-end testing with real Warbler content.
7
- """
8
-
9
- import json
10
- import requests
11
- import click
12
- from pathlib import Path
13
- from typing import List, Dict, Any
14
- import logging
15
-
16
- logging.basicConfig(level=logging.INFO)
17
- logger = logging.getLogger(__name__)
18
-
19
- # Warbler pack locations
20
- BASE_DIR = Path(__file__).resolve().parent
21
- PACKS_DIR = BASE_DIR.parents[1] / 'packs'
22
- WARBLER_PACKS = [
23
- "warbler-pack-core",
24
- "warbler-pack-wisdom-scrolls",
25
- "warbler-pack-faction-politics",
26
- "warbler-pack-hf-arxiv",
27
- "warbler-pack-hf-prompt-report",
28
- "warbler-pack-hf-novels",
29
- "warbler-pack-hf-manuals",
30
- "warbler-pack-hf-enterprise",
31
- "warbler-pack-hf-portuguese-edu",
32
- "warbler-pack-hf-edustories"
33
- ]
34
-
35
-
36
- class WarblerPackLoader:
37
- """Load Warbler pack data into the API"""
38
-
39
- def __init__(self, api_url: str = "http://localhost:8000"):
40
- self.api_url = api_url.rstrip("/")
41
- self.session = requests.Session()
42
- self.loaded_count = 0
43
- self.error_count = 0
44
-
45
- def discover_documents(self, pack_name: str) -> List[Dict[str, Any]]:
46
- """Discover all documents in a pack"""
47
- pack_path = PACKS_DIR / pack_name
48
- documents = []
49
-
50
- if not pack_path.exists():
51
- logger.warning(f"Pack not found: {pack_path}")
52
- return []
53
-
54
- # Look for JSON, YAML, markdown, and JSONL files
55
- for pattern in [
56
- "**/*.json",
57
- "**/*.yaml",
58
- "**/*.yml",
59
- "**/*.md",
60
- "**/*.jsonl"]:
61
- for file_path in pack_path.glob(pattern):
62
- try:
63
- doc = self._parse_document(file_path, pack_name)
64
- if doc:
65
- documents.append(doc)
66
- logger.info(
67
- f"Discovered: {file_path.relative_to(PACKS_DIR)}")
68
- except Exception as e:
69
- logger.error(f"Error parsing {file_path}: {e}")
70
-
71
- return documents
72
-
73
- def _parse_document(self, file_path: Path,
74
- pack_name: str) -> Dict[str, Any]:
75
- """Parse a document file"""
76
- try:
77
- if file_path.suffix in ['.json']:
78
- with open(file_path, 'r', encoding='utf-8') as f:
79
- content = json.load(f)
80
- if isinstance(content, dict):
81
- content = json.dumps(content)
82
- else:
83
- content = json.dumps(content)
84
- elif file_path.suffix in ['.jsonl']:
85
- # JSONL files contain multiple JSON objects, one per line
86
- # We'll read the first few lines and combine them
87
- with open(file_path, 'r', encoding='utf-8') as f:
88
- lines = f.readlines()[:5] # First 5 lines
89
- content = '\n'.join(line.strip()
90
- for line in lines if line.strip())
91
- elif file_path.suffix in ['.yaml', '.yml']:
92
- import yaml
93
- with open(file_path, 'r', encoding='utf-8') as f:
94
- content = yaml.safe_load(f)
95
- content = json.dumps(content)
96
- elif file_path.suffix == '.md':
97
- with open(file_path, 'r', encoding='utf-8') as f:
98
- content = f.read()
99
- else:
100
- return None
101
-
102
- # Infer realm from pack name
103
- if "wisdom" in pack_name:
104
- realm = "wisdom"
105
- elif "faction" in pack_name:
106
- realm = "faction"
107
- else:
108
- realm = "narrative"
109
-
110
- return {
111
- "content_id": f"{pack_name}/{file_path.stem}",
112
- "content": str(content)[:5000], # Limit content size
113
- "metadata": {
114
- "pack": pack_name,
115
- "source_file": str(file_path.name),
116
- "realm_type": realm,
117
- "realm_label": pack_name.replace("warbler-pack-", ""),
118
- "lifecycle_stage": "emergence",
119
- "activity_level": 0.7
120
- }
121
- }
122
- except Exception as e:
123
- logger.error(f"Failed to parse {file_path}: {e}")
124
- return None
125
-
126
- def ingest_document(self, doc: Dict[str, Any]) -> bool:
127
- """Send document to API for ingestion"""
128
- try:
129
- # For now, we'll store in local context
130
- # The API service will need an /ingest endpoint
131
- logger.info(f"Ingesting: {doc['content_id']}")
132
-
133
- # Check if API has ingest endpoint
134
- response = self.session.post(
135
- f"{self.api_url}/ingest",
136
- json={"documents": [doc]},
137
- timeout=10
138
- )
139
-
140
- if response.status_code in [200, 201, 202]:
141
- self.loaded_count += 1
142
- logger.info(f"[OK] Loaded: {doc['content_id']}")
143
- return True
144
- else:
145
- logger.warning(
146
- f"API returned {response.status_code}: {response.text[:200]}")
147
- return False
148
- except requests.exceptions.ConnectionError:
149
- logger.error("Cannot connect to API. Is the service running?")
150
- return False
151
- except Exception as e:
152
- logger.error(f"Ingestion failed: {e}")
153
- self.error_count += 1
154
- return False
155
-
156
- def load_all_packs(self) -> int:
157
- """Load all Warbler packs"""
158
- click.echo("\n" + "=" * 60)
159
- click.echo("Loading Warbler Pack Data into EXP-09 API")
160
- click.echo("=" * 60 + "\n")
161
-
162
- total_docs = 0
163
- for pack_name in WARBLER_PACKS:
164
- click.echo(f"\n[PACK] Processing: {pack_name}")
165
- click.echo("-" * 40)
166
-
167
- documents = self.discover_documents(pack_name)
168
- click.echo(f"Found {len(documents)} documents\n")
169
-
170
- for doc in documents:
171
- self.ingest_document(doc)
172
- total_docs += 1
173
-
174
- click.echo("\n" + "=" * 60)
175
- click.secho(
176
- f"[OK] Load Complete: {
177
- self.loaded_count} docs ingested",
178
- fg="green")
179
- if self.error_count > 0:
180
- click.secho(f"[ERROR] Errors: {self.error_count}", fg="yellow")
181
- click.echo("=" * 60 + "\n")
182
-
183
- return self.loaded_count
184
-
185
-
186
- @click.group()
187
- def cli():
188
- """Warbler Pack Loader for EXP-09"""
189
- pass
190
-
191
-
192
- @cli.command()
193
- @click.option("--api-url",
194
- default="http://localhost:8000",
195
- help="API service URL")
196
- def load(api_url):
197
- """Load all Warbler packs into the API"""
198
- loader = WarblerPackLoader(api_url)
199
-
200
- # First, check if API is running
201
- try:
202
- response = loader.session.get(f"{api_url}/health", timeout=5)
203
- if response.status_code == 200:
204
- click.secho("[OK] API service is running", fg="green")
205
- else:
206
- click.secho(
207
- "[ERROR] API service not responding correctly", fg="red")
208
- return
209
- except Exception as e:
210
- click.secho(f"[ERROR] Cannot reach API at {api_url}: {e}", fg="red")
211
- click.echo("\nStart the service with: docker-compose up -d")
212
- return
213
-
214
- # Load the packs
215
- loaded = loader.load_all_packs()
216
-
217
- if loaded > 0:
218
- click.echo("\n[NEXT] Next Steps:")
219
- click.echo(
220
- " 1. Query the data with: python exp09_cli.py query --query-id q1 --semantic \"wisdom about courage\"")
221
- click.echo(
222
- " 2. Test hybrid scoring: python exp09_cli.py query --query-id q1 --semantic \"...\" --hybrid")
223
- click.echo(" 3. Check metrics: python exp09_cli.py metrics\n")
224
-
225
-
226
- @cli.command()
227
- @click.option("--api-url",
228
- default="http://localhost:8000",
229
- help="API service URL")
230
- def discover(api_url):
231
- """Discover documents in Warbler packs (no loading)"""
232
- loader = WarblerPackLoader(api_url)
233
-
234
- click.echo("\n" + "=" * 60)
235
- click.echo("Discovering Warbler Pack Documents")
236
- click.echo("=" * 60 + "\n")
237
-
238
- total = 0
239
- for pack_name in WARBLER_PACKS:
240
- click.echo(f"\n[PACK] {pack_name}")
241
- click.echo("-" * 40)
242
-
243
- documents = loader.discover_documents(pack_name)
244
- total += len(documents)
245
-
246
- for doc in documents:
247
- click.echo(f" - {doc['content_id']}")
248
- if "metadata" in doc:
249
- click.echo(
250
- f" Realm: {
251
- doc['metadata'].get(
252
- 'realm_type',
253
- 'unknown')}")
254
-
255
- click.echo(f"\n[STATS] Total discovered: {total} documents\n")
256
-
257
-
258
- if __name__ == "__main__":
259
- cli()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
package-lock.json DELETED
@@ -1,861 +0,0 @@
1
- {
2
- "name": "warbler-cda",
3
- "version": "1.0.0",
4
- "lockfileVersion": 3,
5
- "requires": true,
6
- "packages": {
7
- "": {
8
- "name": "warbler-cda",
9
- "version": "1.0.0",
10
- "license": "ISC",
11
- "dependencies": {
12
- "express": "^5.1.0",
13
- "typescript": "^5.9.3"
14
- }
15
- },
16
- "node_modules/accepts": {
17
- "version": "2.0.0",
18
- "resolved": "https://registry.npmjs.org/accepts/-/accepts-2.0.0.tgz",
19
- "integrity": "sha512-5cvg6CtKwfgdmVqY1WIiXKc3Q1bkRqGLi+2W/6ao+6Y7gu/RCwRuAhGEzh5B4KlszSuTLgZYuqFqo5bImjNKng==",
20
- "license": "MIT",
21
- "dependencies": {
22
- "mime-types": "^3.0.0",
23
- "negotiator": "^1.0.0"
24
- },
25
- "engines": {
26
- "node": ">= 0.6"
27
- }
28
- },
29
- "node_modules/body-parser": {
30
- "version": "2.2.0",
31
- "resolved": "https://registry.npmjs.org/body-parser/-/body-parser-2.2.0.tgz",
32
- "integrity": "sha512-02qvAaxv8tp7fBa/mw1ga98OGm+eCbqzJOKoRt70sLmfEEi+jyBYVTDGfCL/k06/4EMk/z01gCe7HoCH/f2LTg==",
33
- "license": "MIT",
34
- "dependencies": {
35
- "bytes": "^3.1.2",
36
- "content-type": "^1.0.5",
37
- "debug": "^4.4.0",
38
- "http-errors": "^2.0.0",
39
- "iconv-lite": "^0.6.3",
40
- "on-finished": "^2.4.1",
41
- "qs": "^6.14.0",
42
- "raw-body": "^3.0.0",
43
- "type-is": "^2.0.0"
44
- },
45
- "engines": {
46
- "node": ">=18"
47
- }
48
- },
49
- "node_modules/bytes": {
50
- "version": "3.1.2",
51
- "resolved": "https://registry.npmjs.org/bytes/-/bytes-3.1.2.tgz",
52
- "integrity": "sha512-/Nf7TyzTx6S3yRJObOAV7956r8cr2+Oj8AC5dt8wSP3BQAoeX58NoHyCU8P8zGkNXStjTSi6fzO6F0pBdcYbEg==",
53
- "license": "MIT",
54
- "engines": {
55
- "node": ">= 0.8"
56
- }
57
- },
58
- "node_modules/call-bind-apply-helpers": {
59
- "version": "1.0.2",
60
- "resolved": "https://registry.npmjs.org/call-bind-apply-helpers/-/call-bind-apply-helpers-1.0.2.tgz",
61
- "integrity": "sha512-Sp1ablJ0ivDkSzjcaJdxEunN5/XvksFJ2sMBFfq6x0ryhQV/2b/KwFe21cMpmHtPOSij8K99/wSfoEuTObmuMQ==",
62
- "license": "MIT",
63
- "dependencies": {
64
- "es-errors": "^1.3.0",
65
- "function-bind": "^1.1.2"
66
- },
67
- "engines": {
68
- "node": ">= 0.4"
69
- }
70
- },
71
- "node_modules/call-bound": {
72
- "version": "1.0.4",
73
- "resolved": "https://registry.npmjs.org/call-bound/-/call-bound-1.0.4.tgz",
74
- "integrity": "sha512-+ys997U96po4Kx/ABpBCqhA9EuxJaQWDQg7295H4hBphv3IZg0boBKuwYpt4YXp6MZ5AmZQnU/tyMTlRpaSejg==",
75
- "license": "MIT",
76
- "dependencies": {
77
- "call-bind-apply-helpers": "^1.0.2",
78
- "get-intrinsic": "^1.3.0"
79
- },
80
- "engines": {
81
- "node": ">= 0.4"
82
- },
83
- "funding": {
84
- "url": "https://github.com/sponsors/ljharb"
85
- }
86
- },
87
- "node_modules/content-disposition": {
88
- "version": "1.0.1",
89
- "resolved": "https://registry.npmjs.org/content-disposition/-/content-disposition-1.0.1.tgz",
90
- "integrity": "sha512-oIXISMynqSqm241k6kcQ5UwttDILMK4BiurCfGEREw6+X9jkkpEe5T9FZaApyLGGOnFuyMWZpdolTXMtvEJ08Q==",
91
- "license": "MIT",
92
- "engines": {
93
- "node": ">=18"
94
- },
95
- "funding": {
96
- "type": "opencollective",
97
- "url": "https://opencollective.com/express"
98
- }
99
- },
100
- "node_modules/content-type": {
101
- "version": "1.0.5",
102
- "resolved": "https://registry.npmjs.org/content-type/-/content-type-1.0.5.tgz",
103
- "integrity": "sha512-nTjqfcBFEipKdXCv4YDQWCfmcLZKm81ldF0pAopTvyrFGVbcR6P/VAAd5G7N+0tTr8QqiU0tFadD6FK4NtJwOA==",
104
- "license": "MIT",
105
- "engines": {
106
- "node": ">= 0.6"
107
- }
108
- },
109
- "node_modules/cookie": {
110
- "version": "0.7.2",
111
- "resolved": "https://registry.npmjs.org/cookie/-/cookie-0.7.2.tgz",
112
- "integrity": "sha512-yki5XnKuf750l50uGTllt6kKILY4nQ1eNIQatoXEByZ5dWgnKqbnqmTrBE5B4N7lrMJKQ2ytWMiTO2o0v6Ew/w==",
113
- "license": "MIT",
114
- "engines": {
115
- "node": ">= 0.6"
116
- }
117
- },
118
- "node_modules/cookie-signature": {
119
- "version": "1.2.2",
120
- "resolved": "https://registry.npmjs.org/cookie-signature/-/cookie-signature-1.2.2.tgz",
121
- "integrity": "sha512-D76uU73ulSXrD1UXF4KE2TMxVVwhsnCgfAyTg9k8P6KGZjlXKrOLe4dJQKI3Bxi5wjesZoFXJWElNWBjPZMbhg==",
122
- "license": "MIT",
123
- "engines": {
124
- "node": ">=6.6.0"
125
- }
126
- },
127
- "node_modules/debug": {
128
- "version": "4.4.3",
129
- "resolved": "https://registry.npmjs.org/debug/-/debug-4.4.3.tgz",
130
- "integrity": "sha512-RGwwWnwQvkVfavKVt22FGLw+xYSdzARwm0ru6DhTVA3umU5hZc28V3kO4stgYryrTlLpuvgI9GiijltAjNbcqA==",
131
- "license": "MIT",
132
- "dependencies": {
133
- "ms": "^2.1.3"
134
- },
135
- "engines": {
136
- "node": ">=6.0"
137
- },
138
- "peerDependenciesMeta": {
139
- "supports-color": {
140
- "optional": true
141
- }
142
- }
143
- },
144
- "node_modules/depd": {
145
- "version": "2.0.0",
146
- "resolved": "https://registry.npmjs.org/depd/-/depd-2.0.0.tgz",
147
- "integrity": "sha512-g7nH6P6dyDioJogAAGprGpCtVImJhpPk/roCzdb3fIh61/s/nPsfR6onyMwkCAR/OlC3yBC0lESvUoQEAssIrw==",
148
- "license": "MIT",
149
- "engines": {
150
- "node": ">= 0.8"
151
- }
152
- },
153
- "node_modules/dunder-proto": {
154
- "version": "1.0.1",
155
- "resolved": "https://registry.npmjs.org/dunder-proto/-/dunder-proto-1.0.1.tgz",
156
- "integrity": "sha512-KIN/nDJBQRcXw0MLVhZE9iQHmG68qAVIBg9CqmUYjmQIhgij9U5MFvrqkUL5FbtyyzZuOeOt0zdeRe4UY7ct+A==",
157
- "license": "MIT",
158
- "dependencies": {
159
- "call-bind-apply-helpers": "^1.0.1",
160
- "es-errors": "^1.3.0",
161
- "gopd": "^1.2.0"
162
- },
163
- "engines": {
164
- "node": ">= 0.4"
165
- }
166
- },
167
- "node_modules/ee-first": {
168
- "version": "1.1.1",
169
- "resolved": "https://registry.npmjs.org/ee-first/-/ee-first-1.1.1.tgz",
170
- "integrity": "sha512-WMwm9LhRUo+WUaRN+vRuETqG89IgZphVSNkdFgeb6sS/E4OrDIN7t48CAewSHXc6C8lefD8KKfr5vY61brQlow==",
171
- "license": "MIT"
172
- },
173
- "node_modules/encodeurl": {
174
- "version": "2.0.0",
175
- "resolved": "https://registry.npmjs.org/encodeurl/-/encodeurl-2.0.0.tgz",
176
- "integrity": "sha512-Q0n9HRi4m6JuGIV1eFlmvJB7ZEVxu93IrMyiMsGC0lrMJMWzRgx6WGquyfQgZVb31vhGgXnfmPNNXmxnOkRBrg==",
177
- "license": "MIT",
178
- "engines": {
179
- "node": ">= 0.8"
180
- }
181
- },
182
- "node_modules/es-define-property": {
183
- "version": "1.0.1",
184
- "resolved": "https://registry.npmjs.org/es-define-property/-/es-define-property-1.0.1.tgz",
185
- "integrity": "sha512-e3nRfgfUZ4rNGL232gUgX06QNyyez04KdjFrF+LTRoOXmrOgFKDg4BCdsjW8EnT69eqdYGmRpJwiPVYNrCaW3g==",
186
- "license": "MIT",
187
- "engines": {
188
- "node": ">= 0.4"
189
- }
190
- },
191
- "node_modules/es-errors": {
192
- "version": "1.3.0",
193
- "resolved": "https://registry.npmjs.org/es-errors/-/es-errors-1.3.0.tgz",
194
- "integrity": "sha512-Zf5H2Kxt2xjTvbJvP2ZWLEICxA6j+hAmMzIlypy4xcBg1vKVnx89Wy0GbS+kf5cwCVFFzdCFh2XSCFNULS6csw==",
195
- "license": "MIT",
196
- "engines": {
197
- "node": ">= 0.4"
198
- }
199
- },
200
- "node_modules/es-object-atoms": {
201
- "version": "1.1.1",
202
- "resolved": "https://registry.npmjs.org/es-object-atoms/-/es-object-atoms-1.1.1.tgz",
203
- "integrity": "sha512-FGgH2h8zKNim9ljj7dankFPcICIK9Cp5bm+c2gQSYePhpaG5+esrLODihIorn+Pe6FGJzWhXQotPv73jTaldXA==",
204
- "license": "MIT",
205
- "dependencies": {
206
- "es-errors": "^1.3.0"
207
- },
208
- "engines": {
209
- "node": ">= 0.4"
210
- }
211
- },
212
- "node_modules/escape-html": {
213
- "version": "1.0.3",
214
- "resolved": "https://registry.npmjs.org/escape-html/-/escape-html-1.0.3.tgz",
215
- "integrity": "sha512-NiSupZ4OeuGwr68lGIeym/ksIZMJodUGOSCZ/FSnTxcrekbvqrgdUxlJOMpijaKZVjAJrWrGs/6Jy8OMuyj9ow==",
216
- "license": "MIT"
217
- },
218
- "node_modules/etag": {
219
- "version": "1.8.1",
220
- "resolved": "https://registry.npmjs.org/etag/-/etag-1.8.1.tgz",
221
- "integrity": "sha512-aIL5Fx7mawVa300al2BnEE4iNvo1qETxLrPI/o05L7z6go7fCw1J6EQmbK4FmJ2AS7kgVF/KEZWufBfdClMcPg==",
222
- "license": "MIT",
223
- "engines": {
224
- "node": ">= 0.6"
225
- }
226
- },
227
- "node_modules/express": {
228
- "version": "5.1.0",
229
- "resolved": "https://registry.npmjs.org/express/-/express-5.1.0.tgz",
230
- "integrity": "sha512-DT9ck5YIRU+8GYzzU5kT3eHGA5iL+1Zd0EutOmTE9Dtk+Tvuzd23VBU+ec7HPNSTxXYO55gPV/hq4pSBJDjFpA==",
231
- "license": "MIT",
232
- "dependencies": {
233
- "accepts": "^2.0.0",
234
- "body-parser": "^2.2.0",
235
- "content-disposition": "^1.0.0",
236
- "content-type": "^1.0.5",
237
- "cookie": "^0.7.1",
238
- "cookie-signature": "^1.2.1",
239
- "debug": "^4.4.0",
240
- "encodeurl": "^2.0.0",
241
- "escape-html": "^1.0.3",
242
- "etag": "^1.8.1",
243
- "finalhandler": "^2.1.0",
244
- "fresh": "^2.0.0",
245
- "http-errors": "^2.0.0",
246
- "merge-descriptors": "^2.0.0",
247
- "mime-types": "^3.0.0",
248
- "on-finished": "^2.4.1",
249
- "once": "^1.4.0",
250
- "parseurl": "^1.3.3",
251
- "proxy-addr": "^2.0.7",
252
- "qs": "^6.14.0",
253
- "range-parser": "^1.2.1",
254
- "router": "^2.2.0",
255
- "send": "^1.1.0",
256
- "serve-static": "^2.2.0",
257
- "statuses": "^2.0.1",
258
- "type-is": "^2.0.1",
259
- "vary": "^1.1.2"
260
- },
261
- "engines": {
262
- "node": ">= 18"
263
- },
264
- "funding": {
265
- "type": "opencollective",
266
- "url": "https://opencollective.com/express"
267
- }
268
- },
269
- "node_modules/finalhandler": {
270
- "version": "2.1.0",
271
- "resolved": "https://registry.npmjs.org/finalhandler/-/finalhandler-2.1.0.tgz",
272
- "integrity": "sha512-/t88Ty3d5JWQbWYgaOGCCYfXRwV1+be02WqYYlL6h0lEiUAMPM8o8qKGO01YIkOHzka2up08wvgYD0mDiI+q3Q==",
273
- "license": "MIT",
274
- "dependencies": {
275
- "debug": "^4.4.0",
276
- "encodeurl": "^2.0.0",
277
- "escape-html": "^1.0.3",
278
- "on-finished": "^2.4.1",
279
- "parseurl": "^1.3.3",
280
- "statuses": "^2.0.1"
281
- },
282
- "engines": {
283
- "node": ">= 0.8"
284
- }
285
- },
286
- "node_modules/forwarded": {
287
- "version": "0.2.0",
288
- "resolved": "https://registry.npmjs.org/forwarded/-/forwarded-0.2.0.tgz",
289
- "integrity": "sha512-buRG0fpBtRHSTCOASe6hD258tEubFoRLb4ZNA6NxMVHNw2gOcwHo9wyablzMzOA5z9xA9L1KNjk/Nt6MT9aYow==",
290
- "license": "MIT",
291
- "engines": {
292
- "node": ">= 0.6"
293
- }
294
- },
295
- "node_modules/fresh": {
296
- "version": "2.0.0",
297
- "resolved": "https://registry.npmjs.org/fresh/-/fresh-2.0.0.tgz",
298
- "integrity": "sha512-Rx/WycZ60HOaqLKAi6cHRKKI7zxWbJ31MhntmtwMoaTeF7XFH9hhBp8vITaMidfljRQ6eYWCKkaTK+ykVJHP2A==",
299
- "license": "MIT",
300
- "engines": {
301
- "node": ">= 0.8"
302
- }
303
- },
304
- "node_modules/function-bind": {
305
- "version": "1.1.2",
306
- "resolved": "https://registry.npmjs.org/function-bind/-/function-bind-1.1.2.tgz",
307
- "integrity": "sha512-7XHNxH7qX9xG5mIwxkhumTox/MIRNcOgDrxWsMt2pAr23WHp6MrRlN7FBSFpCpr+oVO0F744iUgR82nJMfG2SA==",
308
- "license": "MIT",
309
- "funding": {
310
- "url": "https://github.com/sponsors/ljharb"
311
- }
312
- },
313
- "node_modules/get-intrinsic": {
314
- "version": "1.3.0",
315
- "resolved": "https://registry.npmjs.org/get-intrinsic/-/get-intrinsic-1.3.0.tgz",
316
- "integrity": "sha512-9fSjSaos/fRIVIp+xSJlE6lfwhES7LNtKaCBIamHsjr2na1BiABJPo0mOjjz8GJDURarmCPGqaiVg5mfjb98CQ==",
317
- "license": "MIT",
318
- "dependencies": {
319
- "call-bind-apply-helpers": "^1.0.2",
320
- "es-define-property": "^1.0.1",
321
- "es-errors": "^1.3.0",
322
- "es-object-atoms": "^1.1.1",
323
- "function-bind": "^1.1.2",
324
- "get-proto": "^1.0.1",
325
- "gopd": "^1.2.0",
326
- "has-symbols": "^1.1.0",
327
- "hasown": "^2.0.2",
328
- "math-intrinsics": "^1.1.0"
329
- },
330
- "engines": {
331
- "node": ">= 0.4"
332
- },
333
- "funding": {
334
- "url": "https://github.com/sponsors/ljharb"
335
- }
336
- },
337
- "node_modules/get-proto": {
338
- "version": "1.0.1",
339
- "resolved": "https://registry.npmjs.org/get-proto/-/get-proto-1.0.1.tgz",
340
- "integrity": "sha512-sTSfBjoXBp89JvIKIefqw7U2CCebsc74kiY6awiGogKtoSGbgjYE/G/+l9sF3MWFPNc9IcoOC4ODfKHfxFmp0g==",
341
- "license": "MIT",
342
- "dependencies": {
343
- "dunder-proto": "^1.0.1",
344
- "es-object-atoms": "^1.0.0"
345
- },
346
- "engines": {
347
- "node": ">= 0.4"
348
- }
349
- },
350
- "node_modules/gopd": {
351
- "version": "1.2.0",
352
- "resolved": "https://registry.npmjs.org/gopd/-/gopd-1.2.0.tgz",
353
- "integrity": "sha512-ZUKRh6/kUFoAiTAtTYPZJ3hw9wNxx+BIBOijnlG9PnrJsCcSjs1wyyD6vJpaYtgnzDrKYRSqf3OO6Rfa93xsRg==",
354
- "license": "MIT",
355
- "engines": {
356
- "node": ">= 0.4"
357
- },
358
- "funding": {
359
- "url": "https://github.com/sponsors/ljharb"
360
- }
361
- },
362
- "node_modules/has-symbols": {
363
- "version": "1.1.0",
364
- "resolved": "https://registry.npmjs.org/has-symbols/-/has-symbols-1.1.0.tgz",
365
- "integrity": "sha512-1cDNdwJ2Jaohmb3sg4OmKaMBwuC48sYni5HUw2DvsC8LjGTLK9h+eb1X6RyuOHe4hT0ULCW68iomhjUoKUqlPQ==",
366
- "license": "MIT",
367
- "engines": {
368
- "node": ">= 0.4"
369
- },
370
- "funding": {
371
- "url": "https://github.com/sponsors/ljharb"
372
- }
373
- },
374
- "node_modules/hasown": {
375
- "version": "2.0.2",
376
- "resolved": "https://registry.npmjs.org/hasown/-/hasown-2.0.2.tgz",
377
- "integrity": "sha512-0hJU9SCPvmMzIBdZFqNPXWa6dqh7WdH0cII9y+CyS8rG3nL48Bclra9HmKhVVUHyPWNH5Y7xDwAB7bfgSjkUMQ==",
378
- "license": "MIT",
379
- "dependencies": {
380
- "function-bind": "^1.1.2"
381
- },
382
- "engines": {
383
- "node": ">= 0.4"
384
- }
385
- },
386
- "node_modules/http-errors": {
387
- "version": "2.0.1",
388
- "resolved": "https://registry.npmjs.org/http-errors/-/http-errors-2.0.1.tgz",
389
- "integrity": "sha512-4FbRdAX+bSdmo4AUFuS0WNiPz8NgFt+r8ThgNWmlrjQjt1Q7ZR9+zTlce2859x4KSXrwIsaeTqDoKQmtP8pLmQ==",
390
- "license": "MIT",
391
- "dependencies": {
392
- "depd": "~2.0.0",
393
- "inherits": "~2.0.4",
394
- "setprototypeof": "~1.2.0",
395
- "statuses": "~2.0.2",
396
- "toidentifier": "~1.0.1"
397
- },
398
- "engines": {
399
- "node": ">= 0.8"
400
- },
401
- "funding": {
402
- "type": "opencollective",
403
- "url": "https://opencollective.com/express"
404
- }
405
- },
406
- "node_modules/iconv-lite": {
407
- "version": "0.6.3",
408
- "resolved": "https://registry.npmjs.org/iconv-lite/-/iconv-lite-0.6.3.tgz",
409
- "integrity": "sha512-4fCk79wshMdzMp2rH06qWrJE4iolqLhCUH+OiuIgU++RB0+94NlDL81atO7GX55uUKueo0txHNtvEyI6D7WdMw==",
410
- "license": "MIT",
411
- "dependencies": {
412
- "safer-buffer": ">= 2.1.2 < 3.0.0"
413
- },
414
- "engines": {
415
- "node": ">=0.10.0"
416
- }
417
- },
418
- "node_modules/inherits": {
419
- "version": "2.0.4",
420
- "resolved": "https://registry.npmjs.org/inherits/-/inherits-2.0.4.tgz",
421
- "integrity": "sha512-k/vGaX4/Yla3WzyMCvTQOXYeIHvqOKtnqBduzTHpzpQZzAskKMhZ2K+EnBiSM9zGSoIFeMpXKxa4dYeZIQqewQ==",
422
- "license": "ISC"
423
- },
424
- "node_modules/ipaddr.js": {
425
- "version": "1.9.1",
426
- "resolved": "https://registry.npmjs.org/ipaddr.js/-/ipaddr.js-1.9.1.tgz",
427
- "integrity": "sha512-0KI/607xoxSToH7GjN1FfSbLoU0+btTicjsQSWQlh/hZykN8KpmMf7uYwPW3R+akZ6R/w18ZlXSHBYXiYUPO3g==",
428
- "license": "MIT",
429
- "engines": {
430
- "node": ">= 0.10"
431
- }
432
- },
433
- "node_modules/is-promise": {
434
- "version": "4.0.0",
435
- "resolved": "https://registry.npmjs.org/is-promise/-/is-promise-4.0.0.tgz",
436
- "integrity": "sha512-hvpoI6korhJMnej285dSg6nu1+e6uxs7zG3BYAm5byqDsgJNWwxzM6z6iZiAgQR4TJ30JmBTOwqZUw3WlyH3AQ==",
437
- "license": "MIT"
438
- },
439
- "node_modules/math-intrinsics": {
440
- "version": "1.1.0",
441
- "resolved": "https://registry.npmjs.org/math-intrinsics/-/math-intrinsics-1.1.0.tgz",
442
- "integrity": "sha512-/IXtbwEk5HTPyEwyKX6hGkYXxM9nbj64B+ilVJnC/R6B0pH5G4V3b0pVbL7DBj4tkhBAppbQUlf6F6Xl9LHu1g==",
443
- "license": "MIT",
444
- "engines": {
445
- "node": ">= 0.4"
446
- }
447
- },
448
- "node_modules/media-typer": {
449
- "version": "1.1.0",
450
- "resolved": "https://registry.npmjs.org/media-typer/-/media-typer-1.1.0.tgz",
451
- "integrity": "sha512-aisnrDP4GNe06UcKFnV5bfMNPBUw4jsLGaWwWfnH3v02GnBuXX2MCVn5RbrWo0j3pczUilYblq7fQ7Nw2t5XKw==",
452
- "license": "MIT",
453
- "engines": {
454
- "node": ">= 0.8"
455
- }
456
- },
457
- "node_modules/merge-descriptors": {
458
- "version": "2.0.0",
459
- "resolved": "https://registry.npmjs.org/merge-descriptors/-/merge-descriptors-2.0.0.tgz",
460
- "integrity": "sha512-Snk314V5ayFLhp3fkUREub6WtjBfPdCPY1Ln8/8munuLuiYhsABgBVWsozAG+MWMbVEvcdcpbi9R7ww22l9Q3g==",
461
- "license": "MIT",
462
- "engines": {
463
- "node": ">=18"
464
- },
465
- "funding": {
466
- "url": "https://github.com/sponsors/sindresorhus"
467
- }
468
- },
469
- "node_modules/mime-db": {
470
- "version": "1.54.0",
471
- "resolved": "https://registry.npmjs.org/mime-db/-/mime-db-1.54.0.tgz",
472
- "integrity": "sha512-aU5EJuIN2WDemCcAp2vFBfp/m4EAhWJnUNSSw0ixs7/kXbd6Pg64EmwJkNdFhB8aWt1sH2CTXrLxo/iAGV3oPQ==",
473
- "license": "MIT",
474
- "engines": {
475
- "node": ">= 0.6"
476
- }
477
- },
478
- "node_modules/mime-types": {
479
- "version": "3.0.2",
480
- "resolved": "https://registry.npmjs.org/mime-types/-/mime-types-3.0.2.tgz",
481
- "integrity": "sha512-Lbgzdk0h4juoQ9fCKXW4by0UJqj+nOOrI9MJ1sSj4nI8aI2eo1qmvQEie4VD1glsS250n15LsWsYtCugiStS5A==",
482
- "license": "MIT",
483
- "dependencies": {
484
- "mime-db": "^1.54.0"
485
- },
486
- "engines": {
487
- "node": ">=18"
488
- },
489
- "funding": {
490
- "type": "opencollective",
491
- "url": "https://opencollective.com/express"
492
- }
493
- },
494
- "node_modules/ms": {
495
- "version": "2.1.3",
496
- "resolved": "https://registry.npmjs.org/ms/-/ms-2.1.3.tgz",
497
- "integrity": "sha512-6FlzubTLZG3J2a/NVCAleEhjzq5oxgHyaCU9yYXvcLsvoVaHJq/s5xXI6/XXP6tz7R9xAOtHnSO/tXtF3WRTlA==",
498
- "license": "MIT"
499
- },
500
- "node_modules/negotiator": {
501
- "version": "1.0.0",
502
- "resolved": "https://registry.npmjs.org/negotiator/-/negotiator-1.0.0.tgz",
503
- "integrity": "sha512-8Ofs/AUQh8MaEcrlq5xOX0CQ9ypTF5dl78mjlMNfOK08fzpgTHQRQPBxcPlEtIw0yRpws+Zo/3r+5WRby7u3Gg==",
504
- "license": "MIT",
505
- "engines": {
506
- "node": ">= 0.6"
507
- }
508
- },
509
- "node_modules/object-inspect": {
510
- "version": "1.13.4",
511
- "resolved": "https://registry.npmjs.org/object-inspect/-/object-inspect-1.13.4.tgz",
512
- "integrity": "sha512-W67iLl4J2EXEGTbfeHCffrjDfitvLANg0UlX3wFUUSTx92KXRFegMHUVgSqE+wvhAbi4WqjGg9czysTV2Epbew==",
513
- "license": "MIT",
514
- "engines": {
515
- "node": ">= 0.4"
516
- },
517
- "funding": {
518
- "url": "https://github.com/sponsors/ljharb"
519
- }
520
- },
521
- "node_modules/on-finished": {
522
- "version": "2.4.1",
523
- "resolved": "https://registry.npmjs.org/on-finished/-/on-finished-2.4.1.tgz",
524
- "integrity": "sha512-oVlzkg3ENAhCk2zdv7IJwd/QUD4z2RxRwpkcGY8psCVcCYZNq4wYnVWALHM+brtuJjePWiYF/ClmuDr8Ch5+kg==",
525
- "license": "MIT",
526
- "dependencies": {
527
- "ee-first": "1.1.1"
528
- },
529
- "engines": {
530
- "node": ">= 0.8"
531
- }
532
- },
533
- "node_modules/once": {
534
- "version": "1.4.0",
535
- "resolved": "https://registry.npmjs.org/once/-/once-1.4.0.tgz",
536
- "integrity": "sha512-lNaJgI+2Q5URQBkccEKHTQOPaXdUxnZZElQTZY0MFUAuaEqe1E+Nyvgdz/aIyNi6Z9MzO5dv1H8n58/GELp3+w==",
537
- "license": "ISC",
538
- "dependencies": {
539
- "wrappy": "1"
540
- }
541
- },
542
- "node_modules/parseurl": {
543
- "version": "1.3.3",
544
- "resolved": "https://registry.npmjs.org/parseurl/-/parseurl-1.3.3.tgz",
545
- "integrity": "sha512-CiyeOxFT/JZyN5m0z9PfXw4SCBJ6Sygz1Dpl0wqjlhDEGGBP1GnsUVEL0p63hoG1fcj3fHynXi9NYO4nWOL+qQ==",
546
- "license": "MIT",
547
- "engines": {
548
- "node": ">= 0.8"
549
- }
550
- },
551
- "node_modules/path-to-regexp": {
552
- "version": "8.3.0",
553
- "resolved": "https://registry.npmjs.org/path-to-regexp/-/path-to-regexp-8.3.0.tgz",
554
- "integrity": "sha512-7jdwVIRtsP8MYpdXSwOS0YdD0Du+qOoF/AEPIt88PcCFrZCzx41oxku1jD88hZBwbNUIEfpqvuhjFaMAqMTWnA==",
555
- "license": "MIT",
556
- "funding": {
557
- "type": "opencollective",
558
- "url": "https://opencollective.com/express"
559
- }
560
- },
561
- "node_modules/proxy-addr": {
562
- "version": "2.0.7",
563
- "resolved": "https://registry.npmjs.org/proxy-addr/-/proxy-addr-2.0.7.tgz",
564
- "integrity": "sha512-llQsMLSUDUPT44jdrU/O37qlnifitDP+ZwrmmZcoSKyLKvtZxpyV0n2/bD/N4tBAAZ/gJEdZU7KMraoK1+XYAg==",
565
- "license": "MIT",
566
- "dependencies": {
567
- "forwarded": "0.2.0",
568
- "ipaddr.js": "1.9.1"
569
- },
570
- "engines": {
571
- "node": ">= 0.10"
572
- }
573
- },
574
- "node_modules/qs": {
575
- "version": "6.14.0",
576
- "resolved": "https://registry.npmjs.org/qs/-/qs-6.14.0.tgz",
577
- "integrity": "sha512-YWWTjgABSKcvs/nWBi9PycY/JiPJqOD4JA6o9Sej2AtvSGarXxKC3OQSk4pAarbdQlKAh5D4FCQkJNkW+GAn3w==",
578
- "license": "BSD-3-Clause",
579
- "dependencies": {
580
- "side-channel": "^1.1.0"
581
- },
582
- "engines": {
583
- "node": ">=0.6"
584
- },
585
- "funding": {
586
- "url": "https://github.com/sponsors/ljharb"
587
- }
588
- },
589
- "node_modules/range-parser": {
590
- "version": "1.2.1",
591
- "resolved": "https://registry.npmjs.org/range-parser/-/range-parser-1.2.1.tgz",
592
- "integrity": "sha512-Hrgsx+orqoygnmhFbKaHE6c296J+HTAQXoxEF6gNupROmmGJRoyzfG3ccAveqCBrwr/2yxQ5BVd/GTl5agOwSg==",
593
- "license": "MIT",
594
- "engines": {
595
- "node": ">= 0.6"
596
- }
597
- },
598
- "node_modules/raw-body": {
599
- "version": "3.0.1",
600
- "resolved": "https://registry.npmjs.org/raw-body/-/raw-body-3.0.1.tgz",
601
- "integrity": "sha512-9G8cA+tuMS75+6G/TzW8OtLzmBDMo8p1JRxN5AZ+LAp8uxGA8V8GZm4GQ4/N5QNQEnLmg6SS7wyuSmbKepiKqA==",
602
- "license": "MIT",
603
- "dependencies": {
604
- "bytes": "3.1.2",
605
- "http-errors": "2.0.0",
606
- "iconv-lite": "0.7.0",
607
- "unpipe": "1.0.0"
608
- },
609
- "engines": {
610
- "node": ">= 0.10"
611
- }
612
- },
613
- "node_modules/raw-body/node_modules/http-errors": {
614
- "version": "2.0.0",
615
- "resolved": "https://registry.npmjs.org/http-errors/-/http-errors-2.0.0.tgz",
616
- "integrity": "sha512-FtwrG/euBzaEjYeRqOgly7G0qviiXoJWnvEH2Z1plBdXgbyjv34pHTSb9zoeHMyDy33+DWy5Wt9Wo+TURtOYSQ==",
617
- "license": "MIT",
618
- "dependencies": {
619
- "depd": "2.0.0",
620
- "inherits": "2.0.4",
621
- "setprototypeof": "1.2.0",
622
- "statuses": "2.0.1",
623
- "toidentifier": "1.0.1"
624
- },
625
- "engines": {
626
- "node": ">= 0.8"
627
- }
628
- },
629
- "node_modules/raw-body/node_modules/iconv-lite": {
630
- "version": "0.7.0",
631
- "resolved": "https://registry.npmjs.org/iconv-lite/-/iconv-lite-0.7.0.tgz",
632
- "integrity": "sha512-cf6L2Ds3h57VVmkZe+Pn+5APsT7FpqJtEhhieDCvrE2MK5Qk9MyffgQyuxQTm6BChfeZNtcOLHp9IcWRVcIcBQ==",
633
- "license": "MIT",
634
- "dependencies": {
635
- "safer-buffer": ">= 2.1.2 < 3.0.0"
636
- },
637
- "engines": {
638
- "node": ">=0.10.0"
639
- },
640
- "funding": {
641
- "type": "opencollective",
642
- "url": "https://opencollective.com/express"
643
- }
644
- },
645
- "node_modules/raw-body/node_modules/statuses": {
646
- "version": "2.0.1",
647
- "resolved": "https://registry.npmjs.org/statuses/-/statuses-2.0.1.tgz",
648
- "integrity": "sha512-RwNA9Z/7PrK06rYLIzFMlaF+l73iwpzsqRIFgbMLbTcLD6cOao82TaWefPXQvB2fOC4AjuYSEndS7N/mTCbkdQ==",
649
- "license": "MIT",
650
- "engines": {
651
- "node": ">= 0.8"
652
- }
653
- },
654
- "node_modules/router": {
655
- "version": "2.2.0",
656
- "resolved": "https://registry.npmjs.org/router/-/router-2.2.0.tgz",
657
- "integrity": "sha512-nLTrUKm2UyiL7rlhapu/Zl45FwNgkZGaCpZbIHajDYgwlJCOzLSk+cIPAnsEqV955GjILJnKbdQC1nVPz+gAYQ==",
658
- "license": "MIT",
659
- "dependencies": {
660
- "debug": "^4.4.0",
661
- "depd": "^2.0.0",
662
- "is-promise": "^4.0.0",
663
- "parseurl": "^1.3.3",
664
- "path-to-regexp": "^8.0.0"
665
- },
666
- "engines": {
667
- "node": ">= 18"
668
- }
669
- },
670
- "node_modules/safer-buffer": {
671
- "version": "2.1.2",
672
- "resolved": "https://registry.npmjs.org/safer-buffer/-/safer-buffer-2.1.2.tgz",
673
- "integrity": "sha512-YZo3K82SD7Riyi0E1EQPojLz7kpepnSQI9IyPbHHg1XXXevb5dJI7tpyN2ADxGcQbHG7vcyRHk0cbwqcQriUtg==",
674
- "license": "MIT"
675
- },
676
- "node_modules/send": {
677
- "version": "1.2.0",
678
- "resolved": "https://registry.npmjs.org/send/-/send-1.2.0.tgz",
679
- "integrity": "sha512-uaW0WwXKpL9blXE2o0bRhoL2EGXIrZxQ2ZQ4mgcfoBxdFmQold+qWsD2jLrfZ0trjKL6vOw0j//eAwcALFjKSw==",
680
- "license": "MIT",
681
- "dependencies": {
682
- "debug": "^4.3.5",
683
- "encodeurl": "^2.0.0",
684
- "escape-html": "^1.0.3",
685
- "etag": "^1.8.1",
686
- "fresh": "^2.0.0",
687
- "http-errors": "^2.0.0",
688
- "mime-types": "^3.0.1",
689
- "ms": "^2.1.3",
690
- "on-finished": "^2.4.1",
691
- "range-parser": "^1.2.1",
692
- "statuses": "^2.0.1"
693
- },
694
- "engines": {
695
- "node": ">= 18"
696
- }
697
- },
698
- "node_modules/serve-static": {
699
- "version": "2.2.0",
700
- "resolved": "https://registry.npmjs.org/serve-static/-/serve-static-2.2.0.tgz",
701
- "integrity": "sha512-61g9pCh0Vnh7IutZjtLGGpTA355+OPn2TyDv/6ivP2h/AdAVX9azsoxmg2/M6nZeQZNYBEwIcsne1mJd9oQItQ==",
702
- "license": "MIT",
703
- "dependencies": {
704
- "encodeurl": "^2.0.0",
705
- "escape-html": "^1.0.3",
706
- "parseurl": "^1.3.3",
707
- "send": "^1.2.0"
708
- },
709
- "engines": {
710
- "node": ">= 18"
711
- }
712
- },
713
- "node_modules/setprototypeof": {
714
- "version": "1.2.0",
715
- "resolved": "https://registry.npmjs.org/setprototypeof/-/setprototypeof-1.2.0.tgz",
716
- "integrity": "sha512-E5LDX7Wrp85Kil5bhZv46j8jOeboKq5JMmYM3gVGdGH8xFpPWXUMsNrlODCrkoxMEeNi/XZIwuRvY4XNwYMJpw==",
717
- "license": "ISC"
718
- },
719
- "node_modules/side-channel": {
720
- "version": "1.1.0",
721
- "resolved": "https://registry.npmjs.org/side-channel/-/side-channel-1.1.0.tgz",
722
- "integrity": "sha512-ZX99e6tRweoUXqR+VBrslhda51Nh5MTQwou5tnUDgbtyM0dBgmhEDtWGP/xbKn6hqfPRHujUNwz5fy/wbbhnpw==",
723
- "license": "MIT",
724
- "dependencies": {
725
- "es-errors": "^1.3.0",
726
- "object-inspect": "^1.13.3",
727
- "side-channel-list": "^1.0.0",
728
- "side-channel-map": "^1.0.1",
729
- "side-channel-weakmap": "^1.0.2"
730
- },
731
- "engines": {
732
- "node": ">= 0.4"
733
- },
734
- "funding": {
735
- "url": "https://github.com/sponsors/ljharb"
736
- }
737
- },
738
- "node_modules/side-channel-list": {
739
- "version": "1.0.0",
740
- "resolved": "https://registry.npmjs.org/side-channel-list/-/side-channel-list-1.0.0.tgz",
741
- "integrity": "sha512-FCLHtRD/gnpCiCHEiJLOwdmFP+wzCmDEkc9y7NsYxeF4u7Btsn1ZuwgwJGxImImHicJArLP4R0yX4c2KCrMrTA==",
742
- "license": "MIT",
743
- "dependencies": {
744
- "es-errors": "^1.3.0",
745
- "object-inspect": "^1.13.3"
746
- },
747
- "engines": {
748
- "node": ">= 0.4"
749
- },
750
- "funding": {
751
- "url": "https://github.com/sponsors/ljharb"
752
- }
753
- },
754
- "node_modules/side-channel-map": {
755
- "version": "1.0.1",
756
- "resolved": "https://registry.npmjs.org/side-channel-map/-/side-channel-map-1.0.1.tgz",
757
- "integrity": "sha512-VCjCNfgMsby3tTdo02nbjtM/ewra6jPHmpThenkTYh8pG9ucZ/1P8So4u4FGBek/BjpOVsDCMoLA/iuBKIFXRA==",
758
- "license": "MIT",
759
- "dependencies": {
760
- "call-bound": "^1.0.2",
761
- "es-errors": "^1.3.0",
762
- "get-intrinsic": "^1.2.5",
763
- "object-inspect": "^1.13.3"
764
- },
765
- "engines": {
766
- "node": ">= 0.4"
767
- },
768
- "funding": {
769
- "url": "https://github.com/sponsors/ljharb"
770
- }
771
- },
772
- "node_modules/side-channel-weakmap": {
773
- "version": "1.0.2",
774
- "resolved": "https://registry.npmjs.org/side-channel-weakmap/-/side-channel-weakmap-1.0.2.tgz",
775
- "integrity": "sha512-WPS/HvHQTYnHisLo9McqBHOJk2FkHO/tlpvldyrnem4aeQp4hai3gythswg6p01oSoTl58rcpiFAjF2br2Ak2A==",
776
- "license": "MIT",
777
- "dependencies": {
778
- "call-bound": "^1.0.2",
779
- "es-errors": "^1.3.0",
780
- "get-intrinsic": "^1.2.5",
781
- "object-inspect": "^1.13.3",
782
- "side-channel-map": "^1.0.1"
783
- },
784
- "engines": {
785
- "node": ">= 0.4"
786
- },
787
- "funding": {
788
- "url": "https://github.com/sponsors/ljharb"
789
- }
790
- },
791
- "node_modules/statuses": {
792
- "version": "2.0.2",
793
- "resolved": "https://registry.npmjs.org/statuses/-/statuses-2.0.2.tgz",
794
- "integrity": "sha512-DvEy55V3DB7uknRo+4iOGT5fP1slR8wQohVdknigZPMpMstaKJQWhwiYBACJE3Ul2pTnATihhBYnRhZQHGBiRw==",
795
- "license": "MIT",
796
- "engines": {
797
- "node": ">= 0.8"
798
- }
799
- },
800
- "node_modules/toidentifier": {
801
- "version": "1.0.1",
802
- "resolved": "https://registry.npmjs.org/toidentifier/-/toidentifier-1.0.1.tgz",
803
- "integrity": "sha512-o5sSPKEkg/DIQNmH43V0/uerLrpzVedkUh8tGNvaeXpfpuwjKenlSox/2O/BTlZUtEe+JG7s5YhEz608PlAHRA==",
804
- "license": "MIT",
805
- "engines": {
806
- "node": ">=0.6"
807
- }
808
- },
809
- "node_modules/type-is": {
810
- "version": "2.0.1",
811
- "resolved": "https://registry.npmjs.org/type-is/-/type-is-2.0.1.tgz",
812
- "integrity": "sha512-OZs6gsjF4vMp32qrCbiVSkrFmXtG/AZhY3t0iAMrMBiAZyV9oALtXO8hsrHbMXF9x6L3grlFuwW2oAz7cav+Gw==",
813
- "license": "MIT",
814
- "dependencies": {
815
- "content-type": "^1.0.5",
816
- "media-typer": "^1.1.0",
817
- "mime-types": "^3.0.0"
818
- },
819
- "engines": {
820
- "node": ">= 0.6"
821
- }
822
- },
823
- "node_modules/typescript": {
824
- "version": "5.9.3",
825
- "resolved": "https://registry.npmjs.org/typescript/-/typescript-5.9.3.tgz",
826
- "integrity": "sha512-jl1vZzPDinLr9eUt3J/t7V6FgNEw9QjvBPdysz9KfQDD41fQrC2Y4vKQdiaUpFT4bXlb1RHhLpp8wtm6M5TgSw==",
827
- "license": "Apache-2.0",
828
- "bin": {
829
- "tsc": "bin/tsc",
830
- "tsserver": "bin/tsserver"
831
- },
832
- "engines": {
833
- "node": ">=14.17"
834
- }
835
- },
836
- "node_modules/unpipe": {
837
- "version": "1.0.0",
838
- "resolved": "https://registry.npmjs.org/unpipe/-/unpipe-1.0.0.tgz",
839
- "integrity": "sha512-pjy2bYhSsufwWlKwPc+l3cN7+wuJlK6uz0YdJEOlQDbl6jo/YlPi4mb8agUkVC8BF7V8NuzeyPNqRksA3hztKQ==",
840
- "license": "MIT",
841
- "engines": {
842
- "node": ">= 0.8"
843
- }
844
- },
845
- "node_modules/vary": {
846
- "version": "1.1.2",
847
- "resolved": "https://registry.npmjs.org/vary/-/vary-1.1.2.tgz",
848
- "integrity": "sha512-BNGbWLfd0eUPabhkXUVm0j8uuvREyTh5ovRa/dyow/BqAbZJyC+5fU+IzQOzmAKzYqYRAISoRhdQr3eIZ/PXqg==",
849
- "license": "MIT",
850
- "engines": {
851
- "node": ">= 0.8"
852
- }
853
- },
854
- "node_modules/wrappy": {
855
- "version": "1.0.2",
856
- "resolved": "https://registry.npmjs.org/wrappy/-/wrappy-1.0.2.tgz",
857
- "integrity": "sha512-l4Sp/DRseor9wL6EvV2+TuQn63dMkPjZ/sp9XkghTEbV9KlPS1xUsZ3u7/IQO4wxtcFB4bgpQPRcR3QCvezPcQ==",
858
- "license": "ISC"
859
- }
860
- }
861
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
package.json DELETED
@@ -1,19 +0,0 @@
1
- {
2
- "name": "warbler-cda",
3
- "version": "1.0.0",
4
- "description": "--- title: Warbler CDA RAG System emoji: 🦜 colorFrom: blue colorTo: purple sdk: gradio sdk_version: 5.49.1 app_file: app.py pinned: false license: mit tags: - rag - retrieval - semantic-search - stat7 - embeddings - nlp ---",
5
- "main": "index.js",
6
- "directories": {
7
- "test": "tests"
8
- },
9
- "scripts": {
10
- "test": "echo \"Error: no test specified\" && exit 1"
11
- },
12
- "keywords": [],
13
- "author": "",
14
- "license": "ISC",
15
- "dependencies": {
16
- "express": "^5.1.0",
17
- "typescript": "^5.9.3"
18
- }
19
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
packs/warbler-pack-hf-arxiv/package.json CHANGED
@@ -2,14 +2,14 @@
2
  "name": "warbler-pack-hf-arxiv",
3
  "version": "1.0.0",
4
  "description": "Warbler pack generated from HuggingFace datasets (chunked)",
5
- "created_at": "2025-11-19T19:07:32.887499",
6
  "document_count": 2549619,
7
  "source": "HuggingFace",
8
  "content_types": [
9
  "scholarly_discussion"
10
  ],
11
  "chunked": true,
12
- "chunk_count": 255,
13
- "docs_per_chunk": 10000,
14
- "chunk_pattern": "warbler-pack-hf-arxiv-chunk-*_compressed.jsonl"
15
  }
 
2
  "name": "warbler-pack-hf-arxiv",
3
  "version": "1.0.0",
4
  "description": "Warbler pack generated from HuggingFace datasets (chunked)",
5
+ "created_at": "2025-12-02T10:48:41.412949",
6
  "document_count": 2549619,
7
  "source": "HuggingFace",
8
  "content_types": [
9
  "scholarly_discussion"
10
  ],
11
  "chunked": true,
12
+ "chunk_count": 51,
13
+ "docs_per_chunk": 50000,
14
+ "chunk_pattern": "warbler-pack-hf-arxiv-chunk-*.jsonl"
15
  }
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-001_compressed.jsonl DELETED
The diff for this file is too large to render. See raw diff
 
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-002_compressed.jsonl DELETED
The diff for this file is too large to render. See raw diff
 
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-003_compressed.jsonl DELETED
The diff for this file is too large to render. See raw diff
 
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-004_compressed.jsonl DELETED
The diff for this file is too large to render. See raw diff
 
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-005_compressed.jsonl DELETED
The diff for this file is too large to render. See raw diff
 
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-006_compressed.jsonl DELETED
The diff for this file is too large to render. See raw diff
 
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-007_compressed.jsonl DELETED
The diff for this file is too large to render. See raw diff
 
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-008_compressed.jsonl DELETED
The diff for this file is too large to render. See raw diff
 
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-009_compressed.jsonl DELETED
The diff for this file is too large to render. See raw diff
 
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-010_compressed.jsonl DELETED
The diff for this file is too large to render. See raw diff
 
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-011_compressed.jsonl DELETED
The diff for this file is too large to render. See raw diff
 
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-012_compressed.jsonl DELETED
The diff for this file is too large to render. See raw diff
 
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-013_compressed.jsonl DELETED
The diff for this file is too large to render. See raw diff
 
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-014_compressed.jsonl DELETED
The diff for this file is too large to render. See raw diff
 
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-015_compressed.jsonl DELETED
The diff for this file is too large to render. See raw diff
 
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-016_compressed.jsonl DELETED
The diff for this file is too large to render. See raw diff
 
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-017_compressed.jsonl DELETED
The diff for this file is too large to render. See raw diff
 
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-018_compressed.jsonl DELETED
The diff for this file is too large to render. See raw diff
 
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-019_compressed.jsonl DELETED
The diff for this file is too large to render. See raw diff
 
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-020_compressed.jsonl DELETED
The diff for this file is too large to render. See raw diff
 
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-021_compressed.jsonl DELETED
The diff for this file is too large to render. See raw diff
 
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-022_compressed.jsonl DELETED
The diff for this file is too large to render. See raw diff
 
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-023_compressed.jsonl DELETED
The diff for this file is too large to render. See raw diff
 
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-024_compressed.jsonl DELETED
The diff for this file is too large to render. See raw diff
 
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-025_compressed.jsonl DELETED
The diff for this file is too large to render. See raw diff
 
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-026_compressed.jsonl DELETED
The diff for this file is too large to render. See raw diff
 
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-027_compressed.jsonl DELETED
The diff for this file is too large to render. See raw diff
 
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-028_compressed.jsonl DELETED
The diff for this file is too large to render. See raw diff
 
packs/warbler-pack-hf-arxiv/warbler-pack-hf-arxiv-chunk-029_compressed.jsonl DELETED
The diff for this file is too large to render. See raw diff