A newer version of the Gradio SDK is available:
6.1.0
LLM Council - Comprehensive Guide
π Overview
The LLM Council is a sophisticated multi-agent system that uses multiple Large Language Models (LLMs) to collectively answer questions through a 3-stage deliberation process:
- Stage 1 - Individual Responses: Each council member independently answers the question
- Stage 2 - Peer Review: Council members rank each other's anonymized responses
- Stage 3 - Synthesis: A chairman model synthesizes the final answer based on all inputs
Current Implementation: Uses FREE HuggingFace models (60%) + cheap OpenAI models (40%)
ποΈ Architecture
Current Implementation
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β User Question β
ββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Stage 1: Parallel Responses from 3-5 Council Models β
β β’ Model 1: Individual answer β
β β’ Model 2: Individual answer β
β β’ Model 3: Individual answer β
β β’ (etc...) β
ββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Stage 2: Peer Rankings (Anonymized) β
β β’ Each model ranks all responses (Response A, B, C...) β
β β’ Aggregate rankings calculated β
ββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Stage 3: Chairman Synthesis β
β β’ Reviews all responses + rankings β
β β’ Generates final comprehensive answer β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π§ Current Models (FREE HuggingFace + OpenAI)
Council Members (5 models)
FREE HuggingFace Models (via Inference API):
meta-llama/Llama-3.3-70B-Instruct- Meta's latest Llama (FREE)Qwen/Qwen2.5-72B-Instruct- Alibaba's Qwen (FREE)mistralai/Mixtral-8x7B-Instruct-v0.1- Mistral MoE (FREE)
OpenAI Models (paid but cheap):
gpt-4o-mini- Fast, affordable GPT-4 variantgpt-3.5-turbo- Ultra cheap, still capable
Chairman
gpt-4o-mini- Final synthesis model
Benefits of Current Setup:
- 60% of models are completely FREE (HuggingFace)
- 40% use cheap OpenAI models ($0.001-0.01 per query)
- 90-99% cost reduction compared to all-paid alternatives
- No experimental/beta endpoints - all stable APIs
- Diverse model providers for varied perspectives
β¨ Alternative Model Configurations
All-FREE Council (100% HuggingFace)
COUNCIL_MODELS = [
{"provider": "huggingface", "model": "meta-llama/Llama-3.3-70B-Instruct"},
{"provider": "huggingface", "model": "Qwen/Qwen2.5-72B-Instruct"},
{"provider": "huggingface", "model": "mistralai/Mixtral-8x7B-Instruct-v0.1"},
{"provider": "huggingface", "model": "meta-llama/Llama-3.1-405B-Instruct"},
{"provider": "huggingface", "model": "microsoft/Phi-3.5-MoE-instruct"},
]
CHAIRMAN_MODEL = {"provider": "huggingface", "model": "meta-llama/Llama-3.3-70B-Instruct"}
Cost: $0.00 per query!
Premium Council (OpenAI + HuggingFace)
COUNCIL_MODELS = [
{"provider": "openai", "model": "gpt-4o"},
{"provider": "openai", "model": "gpt-4-turbo"},
{"provider": "huggingface", "model": "meta-llama/Llama-3.3-70B-Instruct"},
{"provider": "huggingface", "model": "Qwen/Qwen2.5-72B-Instruct"},
{"provider": "openai", "model": "gpt-3.5-turbo"},
]
CHAIRMAN_MODEL = {"provider": "openai", "model": "gpt-4o"}
Cost: ~$0.05-0.15 per query
π Running on Hugging Face Spaces
Prerequisites
OpenAI API Key:
- Sign up at platform.openai.com
- Go to API Keys β Create new secret key
- Copy your key (starts with
sk-) - Add billing info and credits ($5-10 is plenty)
HuggingFace API Token:
- Sign up at huggingface.co
- Go to Settings β Access Tokens β New token
- Copy your token (starts with
hf_) - FREE! No billing required
HuggingFace Account: For deploying Spaces
Step-by-Step Deployment
Step-by-Step Deployment
Method 1: Deploy Your Existing Code
Create New Space
- Go to huggingface.co/new-space
- Choose "Gradio" as SDK
- Select SDK version: 6.0.0
- Choose hardware: CPU (free)
Push Your Code
# Clone your space git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME cd YOUR_SPACE_NAME # Copy your LLM Council code cp -r /path/to/llm_council/* . # Commit and push git add . git commit -m "Initial deployment" git pushConfigure Secrets
- Go to your space β Settings β Repository secrets
- Add secret #1:
- Name:
OPENAI_API_KEY - Value: (your OpenAI key starting with
sk-)
- Name:
- Add secret #2:
- Name:
HUGGINGFACE_API_KEY - Value: (your HuggingFace token starting with
hf_)
- Name:
Space Auto-Restarts
- HF Spaces will automatically rebuild and deploy
- Check the "Logs" tab to verify successful startup
Required Files Structure
your-space/
βββ README.md # Space configuration
βββ requirements.txt # Python dependencies
βββ app.py # Main Gradio app
βββ .env.example # Environment template
βββ backend/
βββ __init__.py
βββ config.py # Model configuration
βββ council.py # 3-stage logic
βββ openrouter.py # API client
βββ storage.py # Data storage
βββ main.py # FastAPI (optional)
π Environment Variables
Required Variables
For Local Development (.env file):
OPENAI_API_KEY=sk-proj-your-key-here
HUGGINGFACE_API_KEY=hf_your-token-here
For HuggingFace Spaces (Settings β Repository secrets):
- Secret 1:
OPENAI_API_KEY=sk-proj-... - Secret 2:
HUGGINGFACE_API_KEY=hf_...
API Endpoints Used
HuggingFace Inference API:
- Endpoint:
https://router.huggingface.co/v1/chat/completions - Format: OpenAI-compatible
- Cost: FREE for inference API
- Models: Llama, Qwen, Mixtral, etc.
OpenAI API:
- Endpoint:
https://api.openai.com/v1/chat/completions - Format: Native OpenAI
- Cost: Pay-per-token (very cheap for mini/3.5-turbo)
- Models: GPT-4o-mini, GPT-3.5-turbo, GPT-4o
Create .env file locally (DO NOT commit to git):
OPENAI_API_KEY=sk-proj-your-key-here
HUGGINGFACE_API_KEY=hf_your-token-here
For Hugging Face Spaces, use Repository Secrets instead of .env file.
π¦ Dependencies
gradio>=6.0.0
httpx>=0.27.0
python-dotenv>=1.0.0
openai>=1.0.0 # For OpenAI API
Note: The system uses:
httpxfor async HTTP requests to HuggingFace APIopenaiSDK for OpenAI API callspython-dotenvto load environment variables from.env
π» Running Locally
# 1. Clone repository (use your own space URL)
git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
cd YOUR_SPACE_NAME
# 2. Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# 3. Install dependencies
pip install -r requirements.txt
# 4. Create .env file with both API keys
echo OPENAI_API_KEY=sk-proj-your-key-here > .env
echo HUGGINGFACE_API_KEY=hf_your-token-here >> .env
# 5. Run the app
python app.py
The app will be available at http://localhost:7860
π§ Code Architecture
Key Components
1. Dual API Client (backend/api_client.py):
- Supports both HuggingFace and OpenAI APIs
- Automatic retry logic with exponential backoff
- Graceful error handling and fallbacks
- Parallel model querying for efficiency
2. FREE Model Configuration (backend/config_free.py):
- Mix of FREE HuggingFace + cheap OpenAI models
- Configurable timeouts and retries
- Easy to customize and extend
3. Council Orchestration (backend/council_free.py):
- Stage 1: Parallel response collection
- Stage 2: Peer ranking system
- Stage 3: Chairman synthesis with streaming
Error Handling Features
- Retry logic with exponential backoff (3 attempts)
- Graceful handling of individual model failures
- Detailed error logging for debugging
- Timeout management (60s default)
Benefits of Current Architecture
- Cost Efficient: 60% FREE models, 40% ultra-cheap
- Robust: Retry logic handles transient failures
- Fast: Parallel execution minimizes wait time
- Flexible: Easy to add/remove models
- Observable: Detailed logging for debugging
π Performance Characteristics
Typical Response Times (Current Setup)
- Stage 1: 10-30 seconds (5 models in parallel)
- Stage 2: 15-45 seconds (peer rankings)
- Stage 3: 15-40 seconds (synthesis with streaming)
- Total: ~40-115 seconds per question
Cost per Query (Current Setup)
- FREE HuggingFace portion: $0.00 (3 models)
- OpenAI portion: $0.001-0.01 (2 models)
- Total: ~$0.001-0.01 per query
Comparison to alternatives:
- 90-99% cheaper than all-paid services
- Similar quality to premium setups
- Faster than sequential execution
Costs vary based on prompt length and response complexity
π Troubleshooting
Common Issues
"401 Unauthorized" errors
- Check both API keys are set correctly
- Verify OpenAI key starts with
sk- - Verify HuggingFace key starts with
hf_ - Ensure OpenAI account has billing/credits enabled
- Check Space secrets are named exactly:
OPENAI_API_KEYandHUGGINGFACE_API_KEY
Timeout errors
- Increase timeout in
backend/config_free.py - Check network connectivity
- Some models may be slow - consider replacing
- Increase timeout in
Space won't start
- Verify
requirements.txtincludes all dependencies - Check logs in Space β Logs tab
- Ensure both secrets are added (not just one)
- Verify Python version compatibility (3.10+)
- Verify
Some models fail, others work
- Normal! System is designed to handle partial failures
- Check logs to see which models failed
- HuggingFace API may have rate limits (rare)
- OpenAI API requires billing setup
HuggingFace 410 error
- Old endpoint deprecated
- Ensure using
router.huggingface.co/v1/chat/completions - Update
backend/api_client.pyif needed
π― Best Practices
Model Selection
- Use 3-5 council members (sweet spot for quality vs speed)
- Mix FREE HuggingFace + cheap OpenAI for best value
- Choose diverse models for varied perspectives
- Match chairman to task complexity
Cost Management
- Start with current setup ($0.001-0.01 per query)
- Consider all-FREE HuggingFace config for $0 cost
- Monitor OpenAI usage at platform.openai.com/usage
- Set spending limits in OpenAI billing settings
Quality Optimization
- Use more council members for important queries (5-7)
- Use better chairman (gpt-4o instead of gpt-4o-mini)
- Adjust timeouts based on model speed
- Test different model combinations
Security
- NEVER commit .env to git (use .gitignore)
- Use HuggingFace Space secrets for production
- Rotate API keys periodically
- Monitor usage for anomalies
- Set spending limits
Quality Optimization
- Use Premium Council for important queries
- Reasoning Council for math/logic problems
- Adjust timeouts based on model speed
π Additional Resources
π€ Contributing
Suggestions for improvement:
- Add caching for repeated questions
- Implement conversation history
- Add custom model configurations via UI
- Support for different voting mechanisms
- Add cost tracking and estimates
π License
Check the original repository for license information.