Spaces:

Bellok
/

warbler-cda

Running on Zero

Bellok commited on 7 days ago

Commit

9692d79

1 Parent(s): 52b882f

feat(app): increase ArXiv dataset ingestion limit to 100k papers

Scale up from 10k to support a massive knowledge base, now that CPU embeddings are solid and reliable.

Files changed (1) hide show

app.py CHANGED Viewed

@@ -92,8 +92,8 @@ if len(documents) == 0:
                 try:
                     print(f"📦 Downloading {dataset} (timeout: 3 minutes)...")
-                    # Standard limit for HF Spaces - 10k papers works with CPU embeddings
-                    arxiv_limit = 10000 if dataset == "arxiv" else None  # Back to full capacity
                     success = ingestor.ingest_dataset(dataset, arxiv_limit=arxiv_limit)
                     if success:

                 try:
                     print(f"📦 Downloading {dataset} (timeout: 3 minutes)...")
+                    # Scale up now that CPU embeddings are solid - 100k papers for massive knowledge base
+                    arxiv_limit = 100000 if dataset == "arxiv" else None  # Maximum capacity now!
                     success = ingestor.ingest_dataset(dataset, arxiv_limit=arxiv_limit)
                     if success: