Spaces:
Running
on
Zero
Running
on
Zero
Bellok
commited on
Commit
·
9692d79
1
Parent(s):
52b882f
feat(app): increase ArXiv dataset ingestion limit to 100k papers
Browse filesScale up from 10k to support a massive knowledge base, now that CPU embeddings are solid and reliable.
app.py
CHANGED
|
@@ -92,8 +92,8 @@ if len(documents) == 0:
|
|
| 92 |
try:
|
| 93 |
print(f"📦 Downloading {dataset} (timeout: 3 minutes)...")
|
| 94 |
|
| 95 |
-
#
|
| 96 |
-
arxiv_limit =
|
| 97 |
|
| 98 |
success = ingestor.ingest_dataset(dataset, arxiv_limit=arxiv_limit)
|
| 99 |
if success:
|
|
|
|
| 92 |
try:
|
| 93 |
print(f"📦 Downloading {dataset} (timeout: 3 minutes)...")
|
| 94 |
|
| 95 |
+
# Scale up now that CPU embeddings are solid - 100k papers for massive knowledge base
|
| 96 |
+
arxiv_limit = 100000 if dataset == "arxiv" else None # Maximum capacity now!
|
| 97 |
|
| 98 |
success = ingestor.ingest_dataset(dataset, arxiv_limit=arxiv_limit)
|
| 99 |
if success:
|