File size: 22,378 Bytes
b4ff56e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
# Configuration Guide

## Overview

DeepCritical uses **Pydantic Settings** for centralized configuration management. All settings are defined in the `Settings` class in `src/utils/config.py` and can be configured via environment variables or a `.env` file.

The configuration system provides:

- **Type Safety**: Strongly-typed fields with Pydantic validation
- **Environment File Support**: Automatically loads from `.env` file (if present)
- **Case-Insensitive**: Environment variables are case-insensitive
- **Singleton Pattern**: Global `settings` instance for easy access throughout the codebase
- **Validation**: Automatic validation on load with helpful error messages

## Quick Start

1. Create a `.env` file in the project root
2. Set at least one LLM API key (`OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, or `HF_TOKEN`)
3. Optionally configure other services as needed
4. The application will automatically load and validate your configuration

## Configuration System Architecture

### Settings Class

The `Settings` class extends `BaseSettings` from `pydantic_settings` and defines all application configuration:

```13:21:src/utils/config.py
class Settings(BaseSettings):
    """Strongly-typed application settings."""

    model_config = SettingsConfigDict(
        env_file=".env",
        env_file_encoding="utf-8",
        case_sensitive=False,
        extra="ignore",
    )
```

### Singleton Instance

A global `settings` instance is available for import:

```234:235:src/utils/config.py
# Singleton for easy import
settings = get_settings()
```

### Usage Pattern

Access configuration throughout the codebase:

```python
from src.utils.config import settings

# Check if API keys are available
if settings.has_openai_key:
    # Use OpenAI
    pass

# Access configuration values
max_iterations = settings.max_iterations
web_search_provider = settings.web_search_provider
```

## Required Configuration

### LLM Provider

You must configure at least one LLM provider. The system supports:

- **OpenAI**: Requires `OPENAI_API_KEY`
- **Anthropic**: Requires `ANTHROPIC_API_KEY`
- **HuggingFace**: Optional `HF_TOKEN` or `HUGGINGFACE_API_KEY` (can work without key for public models)

#### OpenAI Configuration

```bash
LLM_PROVIDER=openai
OPENAI_API_KEY=your_openai_api_key_here
OPENAI_MODEL=gpt-5.1
```

The default model is defined in the `Settings` class:

```29:29:src/utils/config.py
    openai_model: str = Field(default="gpt-5.1", description="OpenAI model name")
```

#### Anthropic Configuration

```bash
LLM_PROVIDER=anthropic
ANTHROPIC_API_KEY=your_anthropic_api_key_here
ANTHROPIC_MODEL=claude-sonnet-4-5-20250929
```

The default model is defined in the `Settings` class:

```30:32:src/utils/config.py
    anthropic_model: str = Field(
        default="claude-sonnet-4-5-20250929", description="Anthropic model"
    )
```

#### HuggingFace Configuration

HuggingFace can work without an API key for public models, but an API key provides higher rate limits:

```bash
# Option 1: Using HF_TOKEN (preferred)
HF_TOKEN=your_huggingface_token_here

# Option 2: Using HUGGINGFACE_API_KEY (alternative)
HUGGINGFACE_API_KEY=your_huggingface_api_key_here

# Default model
HUGGINGFACE_MODEL=meta-llama/Llama-3.1-8B-Instruct
```

The HuggingFace token can be set via either environment variable:

```33:35:src/utils/config.py
    hf_token: str | None = Field(
        default=None, alias="HF_TOKEN", description="HuggingFace API token"
    )
```

```57:59:src/utils/config.py
    huggingface_api_key: str | None = Field(
        default=None, description="HuggingFace API token (HF_TOKEN or HUGGINGFACE_API_KEY)"
    )
```

## Optional Configuration

### Embedding Configuration

DeepCritical supports multiple embedding providers for semantic search and RAG:

```bash
# Embedding Provider: "openai", "local", or "huggingface"
EMBEDDING_PROVIDER=local

# OpenAI Embedding Model (used by LlamaIndex RAG)
OPENAI_EMBEDDING_MODEL=text-embedding-3-small

# Local Embedding Model (sentence-transformers, used by EmbeddingService)
LOCAL_EMBEDDING_MODEL=all-MiniLM-L6-v2

# HuggingFace Embedding Model
HUGGINGFACE_EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2
```

The embedding provider configuration:

```47:50:src/utils/config.py
    embedding_provider: Literal["openai", "local", "huggingface"] = Field(
        default="local",
        description="Embedding provider to use",
    )
```

**Note**: OpenAI embeddings require `OPENAI_API_KEY`. The local provider (default) uses sentence-transformers and requires no API key.

### Web Search Configuration

DeepCritical supports multiple web search providers:

```bash
# Web Search Provider: "serper", "searchxng", "brave", "tavily", or "duckduckgo"
# Default: "duckduckgo" (no API key required)
WEB_SEARCH_PROVIDER=duckduckgo

# Serper API Key (for Google search via Serper)
SERPER_API_KEY=your_serper_api_key_here

# SearchXNG Host URL (for self-hosted search)
SEARCHXNG_HOST=http://localhost:8080

# Brave Search API Key
BRAVE_API_KEY=your_brave_api_key_here

# Tavily API Key
TAVILY_API_KEY=your_tavily_api_key_here
```

The web search provider configuration:

```71:74:src/utils/config.py
    web_search_provider: Literal["serper", "searchxng", "brave", "tavily", "duckduckgo"] = Field(
        default="duckduckgo",
        description="Web search provider to use",
    )
```

**Note**: DuckDuckGo is the default and requires no API key, making it ideal for development and testing.

### PubMed Configuration

PubMed search supports optional NCBI API key for higher rate limits:

```bash
# NCBI API Key (optional, for higher rate limits: 10 req/sec vs 3 req/sec)
NCBI_API_KEY=your_ncbi_api_key_here
```

The PubMed tool uses this configuration:

```22:29:src/tools/pubmed.py
    def __init__(self, api_key: str | None = None) -> None:
        self.api_key = api_key or settings.ncbi_api_key
        # Ignore placeholder values from .env.example
        if self.api_key == "your-ncbi-key-here":
            self.api_key = None

        # Use shared rate limiter
        self._limiter = get_pubmed_limiter(self.api_key)
```

### Agent Configuration

Control agent behavior and research loop execution:

```bash
# Maximum iterations per research loop (1-50, default: 10)
MAX_ITERATIONS=10

# Search timeout in seconds
SEARCH_TIMEOUT=30

# Use graph-based execution for research flows
USE_GRAPH_EXECUTION=false
```

The agent configuration fields:

```80:85:src/utils/config.py
    # Agent Configuration
    max_iterations: int = Field(default=10, ge=1, le=50)
    search_timeout: int = Field(default=30, description="Seconds to wait for search")
    use_graph_execution: bool = Field(
        default=False, description="Use graph-based execution for research flows"
    )
```

### Budget & Rate Limiting Configuration

Control resource limits for research loops:

```bash
# Default token budget per research loop (1000-1000000, default: 100000)
DEFAULT_TOKEN_LIMIT=100000

# Default time limit per research loop in minutes (1-120, default: 10)
DEFAULT_TIME_LIMIT_MINUTES=10

# Default iterations limit per research loop (1-50, default: 10)
DEFAULT_ITERATIONS_LIMIT=10
```

The budget configuration with validation:

```87:105:src/utils/config.py
    # Budget & Rate Limiting Configuration
    default_token_limit: int = Field(
        default=100000,
        ge=1000,
        le=1000000,
        description="Default token budget per research loop",
    )
    default_time_limit_minutes: int = Field(
        default=10,
        ge=1,
        le=120,
        description="Default time limit per research loop (minutes)",
    )
    default_iterations_limit: int = Field(
        default=10,
        ge=1,
        le=50,
        description="Default iterations limit per research loop",
    )
```

### RAG Service Configuration

Configure the Retrieval-Augmented Generation service:

```bash
# ChromaDB collection name for RAG
RAG_COLLECTION_NAME=deepcritical_evidence

# Number of top results to retrieve from RAG (1-50, default: 5)
RAG_SIMILARITY_TOP_K=5

# Automatically ingest evidence into RAG
RAG_AUTO_INGEST=true
```

The RAG configuration:

```127:141:src/utils/config.py
    # RAG Service Configuration
    rag_collection_name: str = Field(
        default="deepcritical_evidence",
        description="ChromaDB collection name for RAG",
    )
    rag_similarity_top_k: int = Field(
        default=5,
        ge=1,
        le=50,
        description="Number of top results to retrieve from RAG",
    )
    rag_auto_ingest: bool = Field(
        default=True,
        description="Automatically ingest evidence into RAG",
    )
```

### ChromaDB Configuration

Configure the vector database for embeddings and RAG:

```bash
# ChromaDB storage path
CHROMA_DB_PATH=./chroma_db

# Whether to persist ChromaDB to disk
CHROMA_DB_PERSIST=true

# ChromaDB server host (for remote ChromaDB, optional)
CHROMA_DB_HOST=localhost

# ChromaDB server port (for remote ChromaDB, optional)
CHROMA_DB_PORT=8000
```

The ChromaDB configuration:

```113:125:src/utils/config.py
    chroma_db_path: str = Field(default="./chroma_db", description="ChromaDB storage path")
    chroma_db_persist: bool = Field(
        default=True,
        description="Whether to persist ChromaDB to disk",
    )
    chroma_db_host: str | None = Field(
        default=None,
        description="ChromaDB server host (for remote ChromaDB)",
    )
    chroma_db_port: int | None = Field(
        default=None,
        description="ChromaDB server port (for remote ChromaDB)",
    )
```

### External Services

#### Modal Configuration

Modal is used for secure sandbox execution of statistical analysis:

```bash
# Modal Token ID (for Modal sandbox execution)
MODAL_TOKEN_ID=your_modal_token_id_here

# Modal Token Secret
MODAL_TOKEN_SECRET=your_modal_token_secret_here
```

The Modal configuration:

```110:112:src/utils/config.py
    # External Services
    modal_token_id: str | None = Field(default=None, description="Modal token ID")
    modal_token_secret: str | None = Field(default=None, description="Modal token secret")
```

### Logging Configuration

Configure structured logging:

```bash
# Log Level: "DEBUG", "INFO", "WARNING", or "ERROR"
LOG_LEVEL=INFO
```

The logging configuration:

```107:108:src/utils/config.py
    # Logging
    log_level: Literal["DEBUG", "INFO", "WARNING", "ERROR"] = "INFO"
```

Logging is configured via the `configure_logging()` function:

```212:231:src/utils/config.py
def configure_logging(settings: Settings) -> None:
    """Configure structured logging with the configured log level."""
    # Set stdlib logging level from settings
    logging.basicConfig(
        level=getattr(logging, settings.log_level),
        format="%(message)s",
    )

    structlog.configure(
        processors=[
            structlog.stdlib.filter_by_level,
            structlog.stdlib.add_logger_name,
            structlog.stdlib.add_log_level,
            structlog.processors.TimeStamper(fmt="iso"),
            structlog.processors.JSONRenderer(),
        ],
        wrapper_class=structlog.stdlib.BoundLogger,
        context_class=dict,
        logger_factory=structlog.stdlib.LoggerFactory(),
    )
```

## Configuration Properties

The `Settings` class provides helpful properties for checking configuration state:

### API Key Availability

Check which API keys are available:

```171:189:src/utils/config.py
    @property
    def has_openai_key(self) -> bool:
        """Check if OpenAI API key is available."""
        return bool(self.openai_api_key)

    @property
    def has_anthropic_key(self) -> bool:
        """Check if Anthropic API key is available."""
        return bool(self.anthropic_api_key)

    @property
    def has_huggingface_key(self) -> bool:
        """Check if HuggingFace API key is available."""
        return bool(self.huggingface_api_key or self.hf_token)

    @property
    def has_any_llm_key(self) -> bool:
        """Check if any LLM API key is available."""
        return self.has_openai_key or self.has_anthropic_key or self.has_huggingface_key
```

**Usage:**

```python
from src.utils.config import settings

# Check API key availability
if settings.has_openai_key:
    # Use OpenAI
    pass

if settings.has_anthropic_key:
    # Use Anthropic
    pass

if settings.has_huggingface_key:
    # Use HuggingFace
    pass

if settings.has_any_llm_key:
    # At least one LLM is available
    pass
```

### Service Availability

Check if external services are configured:

```143:146:src/utils/config.py
    @property
    def modal_available(self) -> bool:
        """Check if Modal credentials are configured."""
        return bool(self.modal_token_id and self.modal_token_secret)
```

```191:204:src/utils/config.py
    @property
    def web_search_available(self) -> bool:
        """Check if web search is available (either no-key provider or API key present)."""
        if self.web_search_provider == "duckduckgo":
            return True  # No API key required
        if self.web_search_provider == "serper":
            return bool(self.serper_api_key)
        if self.web_search_provider == "searchxng":
            return bool(self.searchxng_host)
        if self.web_search_provider == "brave":
            return bool(self.brave_api_key)
        if self.web_search_provider == "tavily":
            return bool(self.tavily_api_key)
        return False
```

**Usage:**

```python
from src.utils.config import settings

# Check service availability
if settings.modal_available:
    # Use Modal sandbox
    pass

if settings.web_search_available:
    # Web search is configured
    pass
```

### API Key Retrieval

Get the API key for the configured provider:

```148:160:src/utils/config.py
    def get_api_key(self) -> str:
        """Get the API key for the configured provider."""
        if self.llm_provider == "openai":
            if not self.openai_api_key:
                raise ConfigurationError("OPENAI_API_KEY not set")
            return self.openai_api_key

        if self.llm_provider == "anthropic":
            if not self.anthropic_api_key:
                raise ConfigurationError("ANTHROPIC_API_KEY not set")
            return self.anthropic_api_key

        raise ConfigurationError(f"Unknown LLM provider: {self.llm_provider}")
```

For OpenAI-specific operations (e.g., Magentic mode):

```162:169:src/utils/config.py
    def get_openai_api_key(self) -> str:
        """Get OpenAI API key (required for Magentic function calling)."""
        if not self.openai_api_key:
            raise ConfigurationError(
                "OPENAI_API_KEY not set. Magentic mode requires OpenAI for function calling. "
                "Use mode='simple' for other providers."
            )
        return self.openai_api_key
```

## Configuration Usage in Codebase

The configuration system is used throughout the codebase:

### LLM Factory

The LLM factory uses settings to create appropriate models:

```129:144:src/utils/llm_factory.py
    if settings.llm_provider == "huggingface":
        model_name = settings.huggingface_model or "meta-llama/Llama-3.1-8B-Instruct"
        hf_provider = HuggingFaceProvider(api_key=settings.hf_token)
        return HuggingFaceModel(model_name, provider=hf_provider)

    if settings.llm_provider == "openai":
        if not settings.openai_api_key:
            raise ConfigurationError("OPENAI_API_KEY not set for pydantic-ai")
        provider = OpenAIProvider(api_key=settings.openai_api_key)
        return OpenAIModel(settings.openai_model, provider=provider)

    if settings.llm_provider == "anthropic":
        if not settings.anthropic_api_key:
            raise ConfigurationError("ANTHROPIC_API_KEY not set for pydantic-ai")
        anthropic_provider = AnthropicProvider(api_key=settings.anthropic_api_key)
        return AnthropicModel(settings.anthropic_model, provider=anthropic_provider)
```

### Embedding Service

The embedding service uses local embedding model configuration:

```29:31:src/services/embeddings.py
    def __init__(self, model_name: str | None = None):
        self._model_name = model_name or settings.local_embedding_model
        self._model = SentenceTransformer(self._model_name)
```

### Orchestrator Factory

The orchestrator factory uses settings to determine mode:

```69:80:src/orchestrator_factory.py
def _determine_mode(explicit_mode: str | None) -> str:
    """Determine which mode to use."""
    if explicit_mode:
        if explicit_mode in ("magentic", "advanced"):
            return "advanced"
        return "simple"

    # Auto-detect: advanced if paid API key available
    if settings.has_openai_key:
        return "advanced"

    return "simple"
```

## Environment Variables Reference

### Required (at least one LLM)

- `OPENAI_API_KEY` - OpenAI API key (required for OpenAI provider)
- `ANTHROPIC_API_KEY` - Anthropic API key (required for Anthropic provider)
- `HF_TOKEN` or `HUGGINGFACE_API_KEY` - HuggingFace API token (optional, can work without for public models)

#### LLM Configuration Variables

- `LLM_PROVIDER` - Provider to use: `"openai"`, `"anthropic"`, or `"huggingface"` (default: `"huggingface"`)
- `OPENAI_MODEL` - OpenAI model name (default: `"gpt-5.1"`)
- `ANTHROPIC_MODEL` - Anthropic model name (default: `"claude-sonnet-4-5-20250929"`)
- `HUGGINGFACE_MODEL` - HuggingFace model ID (default: `"meta-llama/Llama-3.1-8B-Instruct"`)

#### Embedding Configuration Variables

- `EMBEDDING_PROVIDER` - Provider: `"openai"`, `"local"`, or `"huggingface"` (default: `"local"`)
- `OPENAI_EMBEDDING_MODEL` - OpenAI embedding model (default: `"text-embedding-3-small"`)
- `LOCAL_EMBEDDING_MODEL` - Local sentence-transformers model (default: `"all-MiniLM-L6-v2"`)
- `HUGGINGFACE_EMBEDDING_MODEL` - HuggingFace embedding model (default: `"sentence-transformers/all-MiniLM-L6-v2"`)

#### Web Search Configuration Variables

- `WEB_SEARCH_PROVIDER` - Provider: `"serper"`, `"searchxng"`, `"brave"`, `"tavily"`, or `"duckduckgo"` (default: `"duckduckgo"`)
- `SERPER_API_KEY` - Serper API key (required for Serper provider)
- `SEARCHXNG_HOST` - SearchXNG host URL (required for SearchXNG provider)
- `BRAVE_API_KEY` - Brave Search API key (required for Brave provider)
- `TAVILY_API_KEY` - Tavily API key (required for Tavily provider)

#### PubMed Configuration Variables

- `NCBI_API_KEY` - NCBI API key (optional, increases rate limit from 3 to 10 req/sec)

#### Agent Configuration Variables

- `MAX_ITERATIONS` - Maximum iterations per research loop (1-50, default: `10`)
- `SEARCH_TIMEOUT` - Search timeout in seconds (default: `30`)
- `USE_GRAPH_EXECUTION` - Use graph-based execution (default: `false`)

#### Budget Configuration Variables

- `DEFAULT_TOKEN_LIMIT` - Default token budget per research loop (1000-1000000, default: `100000`)
- `DEFAULT_TIME_LIMIT_MINUTES` - Default time limit in minutes (1-120, default: `10`)
- `DEFAULT_ITERATIONS_LIMIT` - Default iterations limit (1-50, default: `10`)

#### RAG Configuration Variables

- `RAG_COLLECTION_NAME` - ChromaDB collection name (default: `"deepcritical_evidence"`)
- `RAG_SIMILARITY_TOP_K` - Number of top results to retrieve (1-50, default: `5`)
- `RAG_AUTO_INGEST` - Automatically ingest evidence into RAG (default: `true`)

#### ChromaDB Configuration Variables

- `CHROMA_DB_PATH` - ChromaDB storage path (default: `"./chroma_db"`)
- `CHROMA_DB_PERSIST` - Whether to persist ChromaDB to disk (default: `true`)
- `CHROMA_DB_HOST` - ChromaDB server host (optional, for remote ChromaDB)
- `CHROMA_DB_PORT` - ChromaDB server port (optional, for remote ChromaDB)

#### External Services Variables

- `MODAL_TOKEN_ID` - Modal token ID (optional, for Modal sandbox execution)
- `MODAL_TOKEN_SECRET` - Modal token secret (optional, for Modal sandbox execution)

#### Logging Configuration Variables

- `LOG_LEVEL` - Log level: `"DEBUG"`, `"INFO"`, `"WARNING"`, or `"ERROR"` (default: `"INFO"`)

## Validation

Settings are validated on load using Pydantic validation:

- **Type Checking**: All fields are strongly typed
- **Range Validation**: Numeric fields have min/max constraints (e.g., `ge=1, le=50` for `max_iterations`)
- **Literal Validation**: Enum fields only accept specific values (e.g., `Literal["openai", "anthropic", "huggingface"]`)
- **Required Fields**: API keys are checked when accessed via `get_api_key()` or `get_openai_api_key()`

### Validation Examples

The `max_iterations` field has range validation:

```81:81:src/utils/config.py
    max_iterations: int = Field(default=10, ge=1, le=50)
```

The `llm_provider` field has literal validation:

```26:28:src/utils/config.py
    llm_provider: Literal["openai", "anthropic", "huggingface"] = Field(
        default="openai", description="Which LLM provider to use"
    )
```

## Error Handling

Configuration errors raise `ConfigurationError` from `src/utils/exceptions.py`:

```22:25:src/utils/exceptions.py
class ConfigurationError(DeepCriticalError):
    """Raised when configuration is invalid."""

    pass
```

### Error Handling Example

```python
from src.utils.config import settings
from src.utils.exceptions import ConfigurationError

try:
    api_key = settings.get_api_key()
except ConfigurationError as e:
    print(f"Configuration error: {e}")
```

### Common Configuration Errors

1. **Missing API Key**: When `get_api_key()` is called but the required API key is not set
2. **Invalid Provider**: When `llm_provider` is set to an unsupported value
3. **Out of Range**: When numeric values exceed their min/max constraints
4. **Invalid Literal**: When enum fields receive unsupported values

## Configuration Best Practices

1. **Use `.env` File**: Store sensitive keys in `.env` file (add to `.gitignore`)
2. **Check Availability**: Use properties like `has_openai_key` before accessing API keys
3. **Handle Errors**: Always catch `ConfigurationError` when calling `get_api_key()`
4. **Validate Early**: Configuration is validated on import, so errors surface immediately
5. **Use Defaults**: Leverage sensible defaults for optional configuration

## Future Enhancements

The following configurations are planned for future phases:

1. **Additional LLM Providers**: DeepSeek, OpenRouter, Gemini, Perplexity, Azure OpenAI, Local models
2. **Model Selection**: Reasoning/main/fast model configuration
3. **Service Integration**: Additional service integrations and configurations