Bellok commited on
Commit
9692d79
·
1 Parent(s): 52b882f

feat(app): increase ArXiv dataset ingestion limit to 100k papers

Browse files

Scale up from 10k to support a massive knowledge base, now that CPU embeddings are solid and reliable.

Files changed (1) hide show
  1. app.py +2 -2
app.py CHANGED
@@ -92,8 +92,8 @@ if len(documents) == 0:
92
  try:
93
  print(f"📦 Downloading {dataset} (timeout: 3 minutes)...")
94
 
95
- # Standard limit for HF Spaces - 10k papers works with CPU embeddings
96
- arxiv_limit = 10000 if dataset == "arxiv" else None # Back to full capacity
97
 
98
  success = ingestor.ingest_dataset(dataset, arxiv_limit=arxiv_limit)
99
  if success:
 
92
  try:
93
  print(f"📦 Downloading {dataset} (timeout: 3 minutes)...")
94
 
95
+ # Scale up now that CPU embeddings are solid - 100k papers for massive knowledge base
96
+ arxiv_limit = 100000 if dataset == "arxiv" else None # Maximum capacity now!
97
 
98
  success = ingestor.ingest_dataset(dataset, arxiv_limit=arxiv_limit)
99
  if success: