Spaces:
Running
Running
| # π User Guide - ZeroGPU LLM Inference | |
| ## Quick Start (5 Minutes) | |
| ### 1. Choose Your Model | |
| The model dropdown shows 30+ options organized by size: | |
| - **Compact (<2B)**: Fast, lightweight - great for quick responses | |
| - **Mid-size (2-8B)**: Best balance of speed and quality | |
| - **Large (14B+)**: Highest quality, slower but more capable | |
| **Recommendation for beginners**: Start with `Qwen3-4B-Instruct-2507` | |
| ### 2. Try an Example Prompt | |
| Click on any example below the chat box to get started: | |
| - "Explain quantum computing in simple terms" | |
| - "Write a Python function..." | |
| - "What are the latest developments..." (requires web search) | |
| ### 3. Start Chatting! | |
| Type your message and press Enter or click "π€ Send" | |
| ## Core Features | |
| ### π¬ Chat Interface | |
| The main chat area shows: | |
| - Your messages on one side | |
| - AI responses with a π€ avatar | |
| - Copy button on each message | |
| - Smooth streaming as tokens generate | |
| **Tips:** | |
| - Press Enter to send (Shift+Enter for new line) | |
| - Click Copy button to save responses | |
| - Scroll up to review history | |
| - Use Clear Chat to start fresh | |
| ### π€ Model Selection | |
| **When to use each size:** | |
| | Model Size | Best For | Speed | Quality | | |
| |------------|----------|-------|---------| | |
| | <2B | Quick questions, testing | β‘β‘β‘ | ββ | | |
| | 2-8B | General chat, coding help | β‘β‘ | βββ | | |
| | 14B+ | Complex reasoning, long-form | β‘ | ββββ | | |
| **Specialized Models:** | |
| - **Phi-4-mini-Reasoning**: Math, logic problems | |
| - **Qwen2.5-Coder**: Programming tasks | |
| - **DeepSeek-R1-Distill**: Step-by-step reasoning | |
| - **Apriel-1.5-15b-Thinker**: Multimodal understanding | |
| ### π Web Search | |
| Enable this when you need: | |
| - Current events and news | |
| - Recent information (after model training cutoff) | |
| - Facts that change frequently | |
| - Real-time data | |
| **How it works:** | |
| 1. Toggle "π Enable Web Search" | |
| 2. Web search settings accordion appears | |
| 3. System prompt updates automatically | |
| 4. Search runs in background (won't block chat) | |
| 5. Results injected into context | |
| **Settings explained:** | |
| - **Max Results**: How many search results to fetch (4 is good default) | |
| - **Max Chars/Result**: Limit length per result (50 prevents overwhelming context) | |
| - **Search Timeout**: Maximum wait time (5s recommended) | |
| ### π System Prompt | |
| This defines the AI's personality and behavior. | |
| **Default prompts:** | |
| - Without search: Helpful, creative assistant | |
| - With search: Includes search results and current date | |
| **Customization ideas:** | |
| ``` | |
| You are a professional code reviewer... | |
| You are a creative writing coach... | |
| You are a patient tutor explaining concepts simply... | |
| You are a technical documentation writer... | |
| ``` | |
| ## Advanced Features | |
| ### ποΈ Advanced Generation Parameters | |
| Click the accordion to reveal these controls: | |
| #### Max Tokens (64-16384) | |
| - **What it does**: Sets maximum response length | |
| - **Lower (256-512)**: Quick, concise answers | |
| - **Medium (1024)**: Balanced (default) | |
| - **Higher (2048+)**: Long-form content, detailed explanations | |
| #### Temperature (0.1-2.0) | |
| - **What it does**: Controls randomness/creativity | |
| - **Low (0.1-0.3)**: Focused, deterministic (good for facts, code) | |
| - **Medium (0.7)**: Balanced creativity (default) | |
| - **High (1.2-2.0)**: Very creative, unpredictable (stories, brainstorming) | |
| #### Top-K (1-100) | |
| - **What it does**: Limits token choices to top K most likely | |
| - **Lower (10-20)**: More focused | |
| - **Medium (40)**: Balanced (default) | |
| - **Higher (80-100)**: More varied vocabulary | |
| #### Top-P (0.1-1.0) | |
| - **What it does**: Nucleus sampling threshold | |
| - **Lower (0.5-0.7)**: Conservative choices | |
| - **Medium (0.9)**: Balanced (default) | |
| - **Higher (0.95-1.0)**: Full vocabulary range | |
| #### Repetition Penalty (1.0-2.0) | |
| - **What it does**: Reduces repeated words/phrases | |
| - **Low (1.0-1.1)**: Allows some repetition | |
| - **Medium (1.2)**: Balanced (default) | |
| - **High (1.5+)**: Strongly avoids repetition (may hurt coherence) | |
| ### Preset Configurations | |
| **For Creative Writing:** | |
| ``` | |
| Temperature: 1.2 | |
| Top-P: 0.95 | |
| Top-K: 80 | |
| Max Tokens: 2048 | |
| ``` | |
| **For Code Generation:** | |
| ``` | |
| Temperature: 0.3 | |
| Top-P: 0.9 | |
| Top-K: 40 | |
| Max Tokens: 1024 | |
| Repetition Penalty: 1.1 | |
| ``` | |
| **For Factual Q&A:** | |
| ``` | |
| Temperature: 0.5 | |
| Top-P: 0.85 | |
| Top-K: 30 | |
| Max Tokens: 512 | |
| Enable Web Search: Yes | |
| ``` | |
| **For Reasoning Tasks:** | |
| ``` | |
| Model: Phi-4-mini-Reasoning or DeepSeek-R1 | |
| Temperature: 0.7 | |
| Max Tokens: 2048 | |
| ``` | |
| ## Tips & Tricks | |
| ### π― Getting Better Results | |
| 1. **Be Specific**: "Write a Python function to sort a list" β "Write a Python function that sorts a list of dictionaries by a specific key" | |
| 2. **Provide Context**: "Explain recursion" β "Explain recursion to someone learning programming for the first time, with a simple example" | |
| 3. **Use System Prompts**: Define role/expertise in system prompt instead of every message | |
| 4. **Iterate**: Use follow-up questions to refine responses | |
| 5. **Experiment with Models**: Try different models for the same task | |
| ### β‘ Performance Tips | |
| 1. **Start Small**: Test with smaller models first | |
| 2. **Adjust Max Tokens**: Don't request more than you need | |
| 3. **Use Cancel**: Stop bad generations early | |
| 4. **Clear Cache**: Clear chat if experiencing slowdowns | |
| 5. **One Task at a Time**: Don't send multiple requests simultaneously | |
| ### π When to Use Web Search | |
| **β Good use cases:** | |
| - "What happened in the latest SpaceX launch?" | |
| - "Current cryptocurrency prices" | |
| - "Recent AI research papers" | |
| - "Today's weather in Paris" | |
| **β Don't need search for:** | |
| - General knowledge questions | |
| - Code writing/debugging | |
| - Math problems | |
| - Creative writing | |
| - Theoretical explanations | |
| ### π Understanding Thinking Mode | |
| Some models output `<think>...</think>` blocks: | |
| ``` | |
| <think> | |
| Let me break this down step by step... | |
| First, I need to consider... | |
| </think> | |
| Here's the answer: ... | |
| ``` | |
| **In the UI:** | |
| - Thinking shows as "π Thought" | |
| - Answer shows separately | |
| - Helps you see the reasoning process | |
| **Best for:** | |
| - Complex math problems | |
| - Multi-step reasoning | |
| - Debugging logic | |
| - Learning how AI thinks | |
| ## Troubleshooting | |
| ### Generation is Slow | |
| - Try a smaller model | |
| - Reduce Max Tokens | |
| - Disable web search if not needed | |
| - Clear chat history | |
| ### Responses are Repetitive | |
| - Increase Repetition Penalty | |
| - Reduce Temperature slightly | |
| - Try different model | |
| ### Responses are Random/Nonsensical | |
| - Decrease Temperature | |
| - Reduce Top-P | |
| - Reduce Top-K | |
| - Try more stable model | |
| ### Web Search Not Working | |
| - Check timeout isn't too short | |
| - Verify internet connection | |
| - Try increasing Max Results | |
| - Check search query in debug panel | |
| ### Cancel Button Doesn't Work | |
| - Wait a moment (might be processing) | |
| - Refresh page if persists | |
| - Check browser console for errors | |
| ## Keyboard Shortcuts | |
| - **Enter**: Send message | |
| - **Shift+Enter**: New line in input | |
| - **Ctrl+C**: Copy (when text selected) | |
| - **Ctrl+A**: Select all in input | |
| ## Best Practices | |
| ### For Beginners | |
| 1. Start with example prompts | |
| 2. Use default settings initially | |
| 3. Try 2-4 different models | |
| 4. Gradually explore advanced settings | |
| 5. Read responses fully before replying | |
| ### For Power Users | |
| 1. Create custom system prompts | |
| 2. Fine-tune parameters per task | |
| 3. Use debug panel for prompt engineering | |
| 4. Experiment with model combinations | |
| 5. Utilize web search strategically | |
| ### For Developers | |
| 1. Study the debug output | |
| 2. Test code generation thoroughly | |
| 3. Use lower temperature for determinism | |
| 4. Compare multiple models | |
| 5. Save working configurations | |
| ## Privacy & Safety | |
| - **No data collection**: Conversations not stored permanently | |
| - **Model limitations**: May produce incorrect information | |
| - **Verify important info**: Don't rely solely on AI for critical decisions | |
| - **Web search**: Uses DuckDuckGo (privacy-focused) | |
| - **Open source**: Code is transparent and auditable | |
| ## Support & Feedback | |
| Found a bug? Have a suggestion? | |
| - Check GitHub issues | |
| - Submit feature requests | |
| - Contribute improvements | |
| - Share your use cases | |
| --- | |
| **Happy chatting! π** | |