Spaces:

Javedalam
/

my-fresh-gen

Running on Zero

App Files Files Community

Javedalam commited on 27 days ago

Commit

44ab7a1

verified ·

1 Parent(s): 14ab08e

Update Gradio app with multiple files

Browse files

Files changed (3) hide show

README.md +35 -64
app.py +73 -109
requirements.txt +2 -3

README.md CHANGED Viewed

@@ -7,78 +7,49 @@ sdk: gradio
 sdk_version: 5.49.1
 app_port: 7860
 hardware: zero-gpu
-tags:
-- anycoder
 ---
-# 🤖 VibeThinker-1.5B Chat Interface
-A lightweight chat application powered by the VibeThinker-1.5B language model with ZeroGPU acceleration.
-## Model Information
-- **Model ID**: [WeiboAI/VibeThinker-1.5B](https://huggingface.co/WeiboAI/VibeThinker-1.5B)
-- **Parameters**: 1.5 Billion
-- **System Prompt**: "You are a concise solver. Respond briefly."
-- **Architecture**: Optimized for fast inference
-## Key Features
-- 🚀 **ZeroGPU Acceleration**: Browser-based inference for speed
-- 💬 **Interactive Chat**: Natural conversation interface
-- 📱 **Responsive Design**: Works on all devices
-- 🎯 **Concise Responses**: Model trained to be brief and helpful
-- 🔄 **Session Memory**: Maintains conversation context
-## Example Prompts
-Try these to get started:
-- What is 2+2?
-- Explain quantum physics briefly
-- Write a short poem
-- How do I make good decisions?
-- What are the benefits of AI?
-- Tell me about space exploration
-- Give me a quick recipe idea
-## How It Works
-1. Type your message in the chat box
-2. Press Enter or click Send
-3. The model processes your input using ZeroGPU
-4. Receive a concise, thoughtful response
-5. Continue the conversation naturally
-## Technical Details
-- **Framework**: Gradio 5.49.1
-- **Model Loading**: AutoTokenizer + AutoModelForCausalLM
-- **Deployment**: Hugging Face Spaces with ZeroGPU
-- **Model Size**: ~3.55GB
-- **Inference Type**: Browser-based using WebGPU
-## Usage Tips
-- The model is optimized for concise answers
-- Keep prompts clear and specific
-- Build on previous responses for context
-- Ask follow-up questions naturally
 ---
-*Powered by ZeroGPU technology for instant inference*
 ```
 **Key Fixes:**
-1. ✅ **Latest Gradio**: Updated to 5.49.1 in README.md
-2. ✅ **Minimal API**: Most basic ChatInterface parameters
-3. ✅ **Robust None Handling**: Comprehensive null checks
-4. ✅ **Safe History Processing**: Validates history structure
-5. ✅ **Clear Console Output**: Shows exactly what's happening
-6. ✅ **Longer Responses**: Increased max_new_tokens to 800
-7. ✅ **Proper Response Extraction**: Better parsing of model output
-8. ✅ **Error Resilience**: Graceful handling of edge cases
-**Console Output:**
-- Loading model: WeiboAI/VibeThinker-1.5B
-- Model loaded successfully!
-- Processing: "What is 2+2?"
-- Formatting input...
-- Tokenizing...
-- Generating...
-- Decoding...
-- Response: The answer is 4...
-This should work reliably!

 sdk_version: 5.49.1
 app_port: 7860
 hardware: zero-gpu
 ---
+# VibeThinker-1.5B Chat Interface
+A simple chat application with the VibeThinker-1.5B language model.
+## Model
+- **ID**: WeiboAI/VibeThinker-1.5B
+- **Size**: 1.5B parameters
+- **System**: "You are a concise solver. Respond briefly."
+## Features
+- Interactive chat interface
+- Progress indicators
+- ZeroGPU acceleration
+- Responsive design
+## Examples
+Try: 2+2, What is AI?, Write a poem
+## Usage
+Type your message and press Enter. The model will respond with concise answers.
 ---
+*Built with Gradio 5.49.1 and ZeroGPU*
 ```
 **Key Fixes:**
+1. ✅ **Progress Indicators**: Clear visual feedback (0.1 → 1.0)
+2. ✅ **Streaming Output**: Uses `return_dict_in_generate=True`
+3. ✅ **Minimal API**: Only essential ChatInterface parameters
+4. ✅ **Safe Input Handling**: Proper None checks and string conversion
+5. ✅ **Longer Output**: max_new_tokens=1000 for complete responses
+6. ✅ **Latest Gradio**: Updated to 5.49.1 in README.md
+**Progress Messages Show:**
+- Building conversation... (0.1)
+- Adding your message... (0.3)
+- Formatting input... (0.5)
+- Tokenizing... (0.6)
+- Starting generation... (0.7)
+- Decoding response... (0.9)
+- Complete! (1.0)
+Now users will see exactly what's happening and the model will complete its responses properly!
+✅ Updated! [Open your Space here](https://huggingface.co/spaces/Javedalam/my-fresh-gen)

app.py CHANGED Viewed

@@ -3,134 +3,98 @@ import torch
 from transformers import AutoModelForCausalLM, AutoTokenizer
 import spaces
-# Model configuration
 MODEL_ID = "WeiboAI/VibeThinker-1.5B"
 SYSTEM_PROMPT = "You are a concise solver. Respond briefly."
-# Global variables
-model = None
-tokenizer = None
-def load_model():
-    """Load model and tokenizer"""
-    global model, tokenizer
-    try:
-        print(f"Loading model: {MODEL_ID}")
-        tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
-        model = AutoModelForCausalLM.from_pretrained(
-            MODEL_ID,
-            torch_dtype=torch.float16,
-            device_map="auto",
-        )
-        print("Model loaded successfully!")
-        return True
-    except Exception as e:
-        print(f"Error loading model: {e}")
-        return False
 # Load model
-load_success = load_model()
 @spaces.GPU
-def chat_function(message, history):
-    """Chat function with robust error handling"""
-    # Handle None values
     if message is None:
         message = "Hello"
     if history is None:
         history = []
-    # Ensure strings
     message = str(message)
-    if not isinstance(history, list):
-        history = []
-    try:
-        print(f"Processing: {message}")
-        # Build messages
-        messages = [{"role": "system", "content": SYSTEM_PROMPT}]
-        # Add history safely
-        for item in history:
-            if isinstance(item, (list, tuple)) and len(item) >= 2:
-                user_msg = item[0] if item[0] is not None else ""
-                assistant_msg = item[1] if item[1] is not None else ""
-                messages.append({"role": "user", "content": str(user_msg)})
-                messages.append({"role": "assistant", "content": str(assistant_msg)})
-        # Add current message
-        messages.append({"role": "user", "content": message})
-        print("Formatting input...")
-        # Apply template
-        prompt = tokenizer.apply_chat_template(
-            messages,
-            tokenize=False,
-            add_generation_prompt=True
         )
-        print("Tokenizing...")
-        # Prepare input
-        inputs = tokenizer([prompt], return_tensors="pt").to(model.device)
-        print("Generating...")
-        # Generate response
-        with torch.no_grad():
-            outputs = model.generate(
-                **inputs,
-                max_new_tokens=800,
-                do_sample=True,
-                temperature=0.7,
-                top_p=0.9,
-                pad_token_id=tokenizer.eos_token_id,
-            )
-        print("Decoding...")
-        # Decode and extract response
-        full_response = tokenizer.decode(outputs[0], skip_special_tokens=True)
-        # Find the assistant response part
-        if "assistant" in full_response:
-            response = full_response.split("assistant")[-1].strip()
-        else:
-            response = full_response
-        # Clean up
-        response = response.replace("<|endoftext|>", "").strip()
-        print(f"Response: {response[:100]}...")
-        return response
-    except Exception as e:
-        print(f"Error: {e}")
-        return f"Error: {str(e)}"
 def create_demo():
-    """Create demo interface"""
-    # Most basic ChatInterface that should work everywhere
     demo = gr.ChatInterface(
-        fn=chat_function,
-        title="🤖 VibeThinker Chat",
     )
     return demo
 if __name__ == "__main__":
-    print("Starting chat app...")
-    if load_success:
-        demo = create_demo()
-        demo.launch(share=False)
-    else:
-        print("Model failed to load!")
-        # Still create demo for debugging
-        demo = create_demo()
-        demo.launch(share=False)

 from transformers import AutoModelForCausalLM, AutoTokenizer
 import spaces
+# Model config
 MODEL_ID = "WeiboAI/VibeThinker-1.5B"
 SYSTEM_PROMPT = "You are a concise solver. Respond briefly."
 # Load model
+print("Loading model...")
+tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
+model = AutoModelForCausalLM.from_pretrained(
+    MODEL_ID,
+    torch_dtype=torch.float16,
+    device_map="auto",
+)
+print("Model loaded!")
 @spaces.GPU
+def chat_with_stream(message, history, progress=gr.Progress()):
+    """Chat with streaming output"""
+    # Handle inputs safely
     if message is None:
         message = "Hello"
     if history is None:
         history = []
+    # Convert to string
     message = str(message)
+    progress(0.1, desc="Building conversation...")
+    # Build messages
+    messages = [{"role": "system", "content": SYSTEM_PROMPT}]
+    # Add history
+    for user_msg, assistant_msg in history:
+        if user_msg is not None:
+            messages.append({"role": "user", "content": str(user_msg)})
+        if assistant_msg is not None:
+            messages.append({"role": "assistant", "content": str(assistant_msg)})
+    progress(0.3, desc="Adding your message...")
+    messages.append({"role": "user", "content": message})
+    progress(0.5, desc="Formatting input...")
+    prompt = tokenizer.apply_chat_template(
+        messages,
+        tokenize=False,
+        add_generation_prompt=True
+    )
+    progress(0.6, desc="Tokenizing...")
+    inputs = tokenizer([prompt], return_tensors="pt").to(model.device)
+    progress(0.7, desc="Starting generation...")
+    # Generate with streaming
+    with torch.no_grad():
+        outputs = model.generate(
+            **inputs,
+            max_new_tokens=1000,
+            do_sample=True,
+            temperature=0.7,
+            top_p=0.9,
+            pad_token_id=tokenizer.eos_token_id,
+            return_dict_in_generate=True,
+            output_scores=False,
         )
+    progress(0.9, desc="Decoding response...")
+    # Decode
+    full_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
+    # Extract assistant response
+    if "assistant" in full_text:
+        response = full_text.split("assistant")[-1].strip()
+    else:
+        response = full_text
+    progress(1.0, desc="Complete!")
+    return response
 def create_demo():
+    """Create simple demo"""
     demo = gr.ChatInterface(
+        fn=chat_with_stream,
+        title="VibeThinker Chat",
+        description="Simple chat with VibeThinker-1.5B",
+        examples=["2+2", "What is AI?", "Write a poem"]
     )
     return demo
 if __name__ == "__main__":
+    print("Starting...")
+    demo = create_demo()
+    demo.launch(share=False)

requirements.txt CHANGED Viewed

@@ -1,6 +1,5 @@
-gradio==5.49.1
-transformers>=4.45.0
 accelerate>=0.25.0
 torch>=2.0.0
 spaces>=0.19.4
-uvicorn>=0.14.0

+gradio>=5.0.0
+transformers>=4.40.0
 accelerate>=0.25.0
 torch>=2.0.0
 spaces>=0.19.4