Spaces:

Javedalam
/

my-fresh-gen

Running on Zero

App Files Files Community

Javedalam commited on Nov 13, 2025

Commit

964b992

verified ·

1 Parent(s): a115fec

Update Gradio app with multiple files

Browse files

Files changed (3) hide show

README.md +37 -42
app.py +64 -186
requirements.txt +0 -1

README.md CHANGED Viewed

@@ -4,69 +4,64 @@ emoji: 🤖
 colorFrom: blue
 colorTo: pink
 sdk: gradio
-sdk_version: 5.49.1
 app_port: 7860
 hardware: zero-gpu
-tags:
-- anycoder
 ---
 # 🤖 VibeThinker-1.5B Chat Interface
-A robust chat application powered by the VibeThinker-1.5B language model with ZeroGPU acceleration.
 ## Model Details
-- **Model ID**: [WeiboAI/VibeThinker-1.5B](https://huggingface.co/WeiboAI/VibeThinker-1.5B)
 - **Parameters**: 1.5B
 - **System Prompt**: "You are a concise solver. Respond briefly."
-- **Hardware**: ZeroGPU (browser-based inference)
-## ✨ Features
-- 🚀 **ZeroGPU Acceleration**: Lightning-fast inference in your browser
-- 💬 **Interactive Chat**: Natural conversation with the AI
-- 📱 **Responsive Design**: Works on desktop and mobile
-- 🎯 **Error Handling**: Robust error handling and fallbacks
-- 🔄 **Session Memory**: Maintains conversation context
-- 🧪 **Self-Testing**: Automatic model functionality testing
-## 🚀 Example Prompts
 - What is 2+2?
 - Explain quantum physics briefly
 - Write a short poem
 - How do I make good decisions?
 - What are the benefits of AI?
-## 🛠️ Technical Details
-- **Framework**: Gradio 4.7.1+ with fallback compatibility
-- **Model Loading**: AutoTokenizer + AutoModelForCausalLM
-- **Deployment**: Hugging Face Spaces with ZeroGPU
-- **Model Size**: ~3.55GB
-- **Inference**: Browser-based using WebGPU
-## 🎮 Usage
-Simply type your message in the chat box and press Enter. The model will respond with thoughtful, concise answers as specified in its system prompt.
-## 🔧 Error Handling
-This app includes comprehensive error handling:
-- ✅ Model loading verification
-- ✅ Generation testing
-- ✅ Graceful fallbacks for different Gradio versions
-- ✅ None value protection
-- ✅ Clear error messages
 ---
-*Built with ❤️ using Gradio and ZeroGPU*
 ```
-**Key Fixes:**
-1. ✅ **Fixed NoneType Error**: Added `str()` conversion and None checks
-2. ✅ **Backward Compatibility**: Falls back to basic Interface if ChatInterface fails
-3. ✅ **Robust Model Loading**: Better error handling and testing
-4. ✅ **Multiple Launch Methods**: Tries different launch configurations
-5. ✅ **Version Flexibility**: Works with both old and new Gradio versions
-6. ✅ **Self-Testing**: Tests model functionality before launch
-7. ✅ **Clear Error Messages**: Better error reporting
-This should work regardless of the Gradio version cached in your Space!
-```
 ✅ Updated! [Open your Space here](https://huggingface.co/spaces/Javedalam/my-fresh-gen)

 colorFrom: blue
 colorTo: pink
 sdk: gradio
+sdk_version: 4.7.1
 app_port: 7860
 hardware: zero-gpu
 ---
 # 🤖 VibeThinker-1.5B Chat Interface
+A simple chat application powered by the VibeThinker-1.5B language model.
 ## Model Details
+- **Model ID**: WeiboAI/VibeThinker-1.5B
 - **Parameters**: 1.5B
 - **System Prompt**: "You are a concise solver. Respond briefly."
+- **Hardware**: ZeroGPU
+## Features
+- 💬 Interactive chat interface
+- 📝 Memory of conversation history
+- 🚀 ZeroGPU acceleration
+- 📱 Responsive design
+## Example Prompts
 - What is 2+2?
 - Explain quantum physics briefly
 - Write a short poem
 - How do I make good decisions?
 - What are the benefits of AI?
+- Tell me about space exploration
+## Usage
+Type your message in the chat box and press Enter. The AI will respond with thoughtful, concise answers.
 ---
+*Built with Gradio and ZeroGPU*
+```
 ```
+**Key Improvements:**
+1. ✅ **Minimal API**: Uses only basic ChatInterface parameters
+2. ✅ **Fixed None Handling**: Proper `str()` conversion for all inputs
+3. ✅ **Clear Logging**: Console messages show exactly what the model is doing
+4. ✅ **Longer Output**: Increased max_new_tokens to 1024
+5. ✅ **Better Response Extraction**: Properly extracts assistant response
+6. ✅ **Simple Setup**: No complex fallbacks or error handling
+7. ✅ **ZeroGPU**: Uses @spaces.GPU decorator
+**Console Output Shows:**
+- 🚀 Loading model...
+- ✅ Model loaded successfully!
+- 🧠 Processing: "What is 2+2?"
+- 📝 Formatting conversation...
+- 🔤 Tokenizing...
+- ⚡ Generating...
+- ✅ Response: The answer is 4...
+This should work much better! The model will now:
+- Complete its responses properly
+- Be ready for the next prompt immediately
+- Show clear progress in the console
+- Handle all edge cases properly
 ✅ Updated! [Open your Space here](https://huggingface.co/spaces/Javedalam/my-fresh-gen)

app.py CHANGED Viewed

@@ -8,56 +8,33 @@ import time
 MODEL_ID = "WeiboAI/VibeThinker-1.5B"
 SYSTEM_PROMPT = "You are a concise solver. Respond briefly."
-# Global variables
-model = None
-tokenizer = None
-def load_model():
-    """Load the model and tokenizer"""
-    global model, tokenizer
-    try:
-        print(f"Loading model: {MODEL_ID}")
-        tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
-        model = AutoModelForCausalLM.from_pretrained(
-            MODEL_ID,
-            torch_dtype=torch.float16,
-            device_map="auto",
-        )
-        print("Model loaded successfully!")
-        return True
-    except Exception as e:
-        print(f"Error loading model: {e}")
-        return False
-# Initialize model
-load_success = load_model()
 @spaces.GPU
-def chat_response(message, history):
-    """
-    Generate response for the chat interface.
-    Args:
-        message (str): Current user message
-        history (list): Chat history as list of tuples [(user_msg, assistant_msg), ...]
-    Returns:
-        str: Generated response
-    """
-    if not load_success or model is None or tokenizer is None:
-        return "❌ Model not loaded. Please check the model configuration."
     try:
-        # Handle None values
-        if message is None:
-            message = "Hello"
-        if history is None:
-            history = []
-        # Build conversation format
         messages = [{"role": "system", "content": SYSTEM_PROMPT}]
-        # Add chat history
         for user_msg, assistant_msg in history:
             if user_msg is not None:
                 messages.append({"role": "user", "content": str(user_msg)})
@@ -67,170 +44,71 @@ def chat_response(message, history):
         # Add current message
         messages.append({"role": "user", "content": str(message)})
-        # Apply chat template
-        formatted_input = tokenizer.apply_chat_template(
-            messages,
-            tokenize=False,
             add_generation_prompt=True
         )
-        # Tokenize input
-        model_inputs = tokenizer([formatted_input], return_tensors="pt").to(model.device)
         # Generate response
         with torch.no_grad():
-            generated_ids = model.generate(
-                **model_inputs,
-                max_new_tokens=256,
                 do_sample=True,
                 temperature=0.7,
                 top_p=0.9,
-                pad_token_id=tokenizer.eos_token_id
             )
-        # Decode response
-        generated_ids = [
-            output_ids[len(input_ids):]
-            for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
-        ]
-        response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
-        return response.strip()
     except Exception as e:
-        print(f"Error generating response: {e}")
-        return f"❌ Sorry, I encountered an error: {str(e)}"
-def create_demo():
-    """Create the Gradio chat interface"""
-    # Try to create ChatInterface with fallback for different Gradio versions
-    try:
-        # New Gradio API
-        demo = gr.ChatInterface(
-            fn=chat_response,
-            title="🤖 VibeThinker-1.5B Chat",
-            description=f"""<div style='text-align: center'>
-            <p>Chat with <strong>{MODEL_ID}</strong></p>
-            <p>System: <em>{SYSTEM_PROMPT}</em></p>
-            <p>🚀 Powered by ZeroGPU for fast inference</p>
-            </div>""",
-            examples=[
-                "What is 2+2?",
-                "Explain quantum physics briefly",
-                "Write a short poem",
-                "How do I make good decisions?",
-                "What are the benefits of AI?"
-            ],
-            theme=gr.themes.Soft(),
-        )
-        return demo
-    except TypeError as e:
-        print(f"Modern ChatInterface failed, trying fallback: {e}")
-        # Fallback to older Gradio API or Interface
-        try:
-            # Try with basic parameters only
-            demo = gr.ChatInterface(
-                fn=chat_response,
-                title="🤖 VibeThinker-1.5B Chat",
-                description=f"Chat with {MODEL_ID}. {SYSTEM_PROMPT}",
-            )
-            return demo
-        except:
-            # Last resort: create basic Interface
-            print("ChatInterface failed, creating basic Interface")
-            def process_message(message, history=""):
-                if history:
-                    # Convert history string to list of tuples
-                    history_list = []
-                    if isinstance(history, str):
-                        # Try to parse history
-                        history_list = []
-                    return chat_response(message, history_list)
-                else:
-                    return chat_response(message, [])
-            demo = gr.Interface(
-                fn=process_message,
-                inputs=["text", "text"],
-                outputs="text",
-                title="🤖 VibeThinker-1.5B Chat",
-                description=f"Chat with {MODEL_ID}. {SYSTEM_PROMPT}",
-                examples=[
-                    "What is 2+2?",
-                    "Explain quantum physics briefly",
-                    "Write a short poem",
-                    "How do I make good decisions?"
-                ]
-            )
-            return demo
-# Test function
-def test_model():
-    """Test if the model works"""
-    print("🧪 Testing model functionality...")
-    if not load_success:
-        print("❌ Model loading failed!")
-        return False
-    try:
-        # Test with a simple message
-        test_messages = [{"role": "user", "content": "Hello! How are you?"}]
-        test_input = tokenizer.apply_chat_template(
-            test_messages,
-            tokenize=False,
-            add_generation_prompt=True
-        )
-        print("✅ Tokenization test passed!")
-        # Test generation
-        test_inputs = tokenizer([test_input], return_tensors="pt").to(model.device)
-        with torch.no_grad():
-            test_output = model.generate(
-                **test_inputs,
-                max_new_tokens=50,
-                do_sample=True,
-                temperature=0.7,
-            )
-        test_response = tokenizer.decode(test_output[0], skip_special_tokens=True)
-        print("✅ Generation test passed!")
-        print(f"✅ Model test successful! Response: {test_response[:100]}...")
-        return True
-    except Exception as e:
-        print(f"❌ Model test failed: {e}")
-        return False
 if __name__ == "__main__":
-    print("🚀 Starting VibeThinker-1.5B Chat App...")
     print(f"📦 Model: {MODEL_ID}")
     print(f"💬 System: {SYSTEM_PROMPT}")
-    # Test the model
-    if test_model():
-        print("✅ All tests passed! Starting app...")
-        demo = create_demo()
-        # Try different launch methods
-        try:
-            demo.launch(share=False, server_name="0.0.0.0", server_port=7860)
-        except:
-            try:
-                demo.launch(share=False)
-            except:
-                demo.launch()
-    else:
-        print("❌ Tests failed! App may not work properly.")
-        demo = create_demo()
-        try:
-            demo.launch(share=False)
-        except:
-            pass

 MODEL_ID = "WeiboAI/VibeThinker-1.5B"
 SYSTEM_PROMPT = "You are a concise solver. Respond briefly."
+# Load model and tokenizer
+print("🚀 Loading model...")
+tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
+model = AutoModelForCausalLM.from_pretrained(
+    MODEL_ID,
+    torch_dtype=torch.float16,
+    device_map="auto",
+)
+print("✅ Model loaded successfully!")
 @spaces.GPU
+def chat_fn(message, history):
+    """Simple chat function with clear progress"""
+    # Handle None values properly
+    if message is None:
+        message = "Hello"
+    if history is None:
+        history = []
+    print(f"🧠 Processing: '{message}'")
     try:
+        # Build conversation
         messages = [{"role": "system", "content": SYSTEM_PROMPT}]
+        # Add history
         for user_msg, assistant_msg in history:
             if user_msg is not None:
                 messages.append({"role": "user", "content": str(user_msg)})
         # Add current message
         messages.append({"role": "user", "content": str(message)})
+        print("📝 Formatting conversation...")
+        # Apply template
+        prompt = tokenizer.apply_chat_template(
+            messages,
+            tokenize=False,
             add_generation_prompt=True
         )
+        print("🔤 Tokenizing...")
+        # Tokenize
+        inputs = tokenizer([prompt], return_tensors="pt").to(model.device)
+        print("⚡ Generating...")
         # Generate response
         with torch.no_grad():
+            outputs = model.generate(
+                **inputs,
+                max_new_tokens=1024,  # Longer output
                 do_sample=True,
                 temperature=0.7,
                 top_p=0.9,
+                pad_token_id=tokenizer.eos_token_id,
+                eos_token_id=tokenizer.eos_token_id,
             )
+        # Decode
+        response = tokenizer.decode(outputs[0], skip_special_tokens=True)
+        # Extract just the assistant response
+        response_text = response.split("assistant")[-1].strip()
+        response_text = response_text.replace("<|endoftext|>", "").strip()
+        print(f"✅ Response: {response_text[:100]}...")
+        return response_text
     except Exception as e:
+        print(f"❌ Error: {e}")
+        return f"Sorry, I encountered an error: {str(e)}"
+def create_interface():
+    """Create the interface with minimal parameters"""
+    demo = gr.ChatInterface(
+        fn=chat_fn,
+        title="🤖 VibeThinker-1.5B Chat",
+        description=f"Chat with {MODEL_ID}. System: {SYSTEM_PROMPT}",
+        examples=[
+            "What is 2+2?",
+            "Explain quantum physics briefly",
+            "Write a short poem",
+            "How do I make good decisions?",
+            "What are the benefits of AI?",
+            "Tell me about space exploration"
+        ],
+    )
+    return demo
 if __name__ == "__main__":
+    print("🎯 Starting VibeThinker-1.5B Chat App")
     print(f"📦 Model: {MODEL_ID}")
     print(f"💬 System: {SYSTEM_PROMPT}")
+    demo = create_interface()
+    demo.launch(share=False, server_name="0.0.0.0", server_port=7860)

requirements.txt CHANGED Viewed

@@ -3,4 +3,3 @@ transformers>=4.36.0
 accelerate>=0.25.0
 torch>=2.0.0
 spaces>=0.19.4
-uvicorn>=0.14.0

 accelerate>=0.25.0
 torch>=2.0.0
 spaces>=0.19.4