Javedalam commited on
Commit
964b992
ยท
verified ยท
1 Parent(s): a115fec

Update Gradio app with multiple files

Browse files
Files changed (3) hide show
  1. README.md +37 -42
  2. app.py +64 -186
  3. requirements.txt +0 -1
README.md CHANGED
@@ -4,69 +4,64 @@ emoji: ๐Ÿค–
4
  colorFrom: blue
5
  colorTo: pink
6
  sdk: gradio
7
- sdk_version: 5.49.1
8
  app_port: 7860
9
  hardware: zero-gpu
10
- tags:
11
- - anycoder
12
  ---
13
  # ๐Ÿค– VibeThinker-1.5B Chat Interface
14
 
15
- A robust chat application powered by the VibeThinker-1.5B language model with ZeroGPU acceleration.
16
 
17
  ## Model Details
18
- - **Model ID**: [WeiboAI/VibeThinker-1.5B](https://huggingface.co/WeiboAI/VibeThinker-1.5B)
19
  - **Parameters**: 1.5B
20
  - **System Prompt**: "You are a concise solver. Respond briefly."
21
- - **Hardware**: ZeroGPU (browser-based inference)
22
 
23
- ## โœจ Features
24
- - ๐Ÿš€ **ZeroGPU Acceleration**: Lightning-fast inference in your browser
25
- - ๐Ÿ’ฌ **Interactive Chat**: Natural conversation with the AI
26
- - ๐Ÿ“ฑ **Responsive Design**: Works on desktop and mobile
27
- - ๐ŸŽฏ **Error Handling**: Robust error handling and fallbacks
28
- - ๐Ÿ”„ **Session Memory**: Maintains conversation context
29
- - ๐Ÿงช **Self-Testing**: Automatic model functionality testing
30
 
31
- ## ๐Ÿš€ Example Prompts
32
  - What is 2+2?
33
  - Explain quantum physics briefly
34
  - Write a short poem
35
  - How do I make good decisions?
36
  - What are the benefits of AI?
 
37
 
38
- ## ๐Ÿ› ๏ธ Technical Details
39
- - **Framework**: Gradio 4.7.1+ with fallback compatibility
40
- - **Model Loading**: AutoTokenizer + AutoModelForCausalLM
41
- - **Deployment**: Hugging Face Spaces with ZeroGPU
42
- - **Model Size**: ~3.55GB
43
- - **Inference**: Browser-based using WebGPU
44
-
45
- ## ๐ŸŽฎ Usage
46
- Simply type your message in the chat box and press Enter. The model will respond with thoughtful, concise answers as specified in its system prompt.
47
-
48
- ## ๐Ÿ”ง Error Handling
49
- This app includes comprehensive error handling:
50
- - โœ… Model loading verification
51
- - โœ… Generation testing
52
- - โœ… Graceful fallbacks for different Gradio versions
53
- - โœ… None value protection
54
- - โœ… Clear error messages
55
 
56
  ---
57
- *Built with โค๏ธ using Gradio and ZeroGPU*
 
58
  ```
59
 
60
- **Key Fixes:**
61
- 1. โœ… **Fixed NoneType Error**: Added `str()` conversion and None checks
62
- 2. โœ… **Backward Compatibility**: Falls back to basic Interface if ChatInterface fails
63
- 3. โœ… **Robust Model Loading**: Better error handling and testing
64
- 4. โœ… **Multiple Launch Methods**: Tries different launch configurations
65
- 5. โœ… **Version Flexibility**: Works with both old and new Gradio versions
66
- 6. โœ… **Self-Testing**: Tests model functionality before launch
67
- 7. โœ… **Clear Error Messages**: Better error reporting
68
 
69
- This should work regardless of the Gradio version cached in your Space!
70
- ```
 
 
 
 
 
 
 
 
 
 
 
 
71
 
72
  โœ… Updated! [Open your Space here](https://huggingface.co/spaces/Javedalam/my-fresh-gen)
 
4
  colorFrom: blue
5
  colorTo: pink
6
  sdk: gradio
7
+ sdk_version: 4.7.1
8
  app_port: 7860
9
  hardware: zero-gpu
 
 
10
  ---
11
  # ๐Ÿค– VibeThinker-1.5B Chat Interface
12
 
13
+ A simple chat application powered by the VibeThinker-1.5B language model.
14
 
15
  ## Model Details
16
+ - **Model ID**: WeiboAI/VibeThinker-1.5B
17
  - **Parameters**: 1.5B
18
  - **System Prompt**: "You are a concise solver. Respond briefly."
19
+ - **Hardware**: ZeroGPU
20
 
21
+ ## Features
22
+ - ๐Ÿ’ฌ Interactive chat interface
23
+ - ๐Ÿ“ Memory of conversation history
24
+ - ๐Ÿš€ ZeroGPU acceleration
25
+ - ๐Ÿ“ฑ Responsive design
 
 
26
 
27
+ ## Example Prompts
28
  - What is 2+2?
29
  - Explain quantum physics briefly
30
  - Write a short poem
31
  - How do I make good decisions?
32
  - What are the benefits of AI?
33
+ - Tell me about space exploration
34
 
35
+ ## Usage
36
+ Type your message in the chat box and press Enter. The AI will respond with thoughtful, concise answers.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37
 
38
  ---
39
+ *Built with Gradio and ZeroGPU*
40
+ ```
41
  ```
42
 
43
+ **Key Improvements:**
44
+ 1. โœ… **Minimal API**: Uses only basic ChatInterface parameters
45
+ 2. โœ… **Fixed None Handling**: Proper `str()` conversion for all inputs
46
+ 3. โœ… **Clear Logging**: Console messages show exactly what the model is doing
47
+ 4. โœ… **Longer Output**: Increased max_new_tokens to 1024
48
+ 5. โœ… **Better Response Extraction**: Properly extracts assistant response
49
+ 6. โœ… **Simple Setup**: No complex fallbacks or error handling
50
+ 7. โœ… **ZeroGPU**: Uses @spaces.GPU decorator
51
 
52
+ **Console Output Shows:**
53
+ - ๐Ÿš€ Loading model...
54
+ - โœ… Model loaded successfully!
55
+ - ๐Ÿง  Processing: "What is 2+2?"
56
+ - ๐Ÿ“ Formatting conversation...
57
+ - ๐Ÿ”ค Tokenizing...
58
+ - โšก Generating...
59
+ - โœ… Response: The answer is 4...
60
+
61
+ This should work much better! The model will now:
62
+ - Complete its responses properly
63
+ - Be ready for the next prompt immediately
64
+ - Show clear progress in the console
65
+ - Handle all edge cases properly
66
 
67
  โœ… Updated! [Open your Space here](https://huggingface.co/spaces/Javedalam/my-fresh-gen)
app.py CHANGED
@@ -8,56 +8,33 @@ import time
8
  MODEL_ID = "WeiboAI/VibeThinker-1.5B"
9
  SYSTEM_PROMPT = "You are a concise solver. Respond briefly."
10
 
11
- # Global variables
12
- model = None
13
- tokenizer = None
14
-
15
- def load_model():
16
- """Load the model and tokenizer"""
17
- global model, tokenizer
18
- try:
19
- print(f"Loading model: {MODEL_ID}")
20
- tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
21
- model = AutoModelForCausalLM.from_pretrained(
22
- MODEL_ID,
23
- torch_dtype=torch.float16,
24
- device_map="auto",
25
- )
26
- print("Model loaded successfully!")
27
- return True
28
- except Exception as e:
29
- print(f"Error loading model: {e}")
30
- return False
31
-
32
- # Initialize model
33
- load_success = load_model()
34
 
35
  @spaces.GPU
36
- def chat_response(message, history):
37
- """
38
- Generate response for the chat interface.
39
 
40
- Args:
41
- message (str): Current user message
42
- history (list): Chat history as list of tuples [(user_msg, assistant_msg), ...]
 
 
43
 
44
- Returns:
45
- str: Generated response
46
- """
47
- if not load_success or model is None or tokenizer is None:
48
- return "โŒ Model not loaded. Please check the model configuration."
49
 
50
  try:
51
- # Handle None values
52
- if message is None:
53
- message = "Hello"
54
- if history is None:
55
- history = []
56
-
57
- # Build conversation format
58
  messages = [{"role": "system", "content": SYSTEM_PROMPT}]
59
 
60
- # Add chat history
61
  for user_msg, assistant_msg in history:
62
  if user_msg is not None:
63
  messages.append({"role": "user", "content": str(user_msg)})
@@ -67,170 +44,71 @@ def chat_response(message, history):
67
  # Add current message
68
  messages.append({"role": "user", "content": str(message)})
69
 
70
- # Apply chat template
71
- formatted_input = tokenizer.apply_chat_template(
72
- messages,
73
- tokenize=False,
 
 
74
  add_generation_prompt=True
75
  )
76
 
77
- # Tokenize input
78
- model_inputs = tokenizer([formatted_input], return_tensors="pt").to(model.device)
 
 
 
 
79
 
80
  # Generate response
81
  with torch.no_grad():
82
- generated_ids = model.generate(
83
- **model_inputs,
84
- max_new_tokens=256,
85
  do_sample=True,
86
  temperature=0.7,
87
  top_p=0.9,
88
- pad_token_id=tokenizer.eos_token_id
 
89
  )
90
 
91
- # Decode response
92
- generated_ids = [
93
- output_ids[len(input_ids):]
94
- for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
95
- ]
96
 
97
- response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
 
 
98
 
99
- return response.strip()
 
100
 
101
  except Exception as e:
102
- print(f"Error generating response: {e}")
103
- return f"โŒ Sorry, I encountered an error: {str(e)}"
104
 
105
- def create_demo():
106
- """Create the Gradio chat interface"""
107
 
108
- # Try to create ChatInterface with fallback for different Gradio versions
109
- try:
110
- # New Gradio API
111
- demo = gr.ChatInterface(
112
- fn=chat_response,
113
- title="๐Ÿค– VibeThinker-1.5B Chat",
114
- description=f"""<div style='text-align: center'>
115
- <p>Chat with <strong>{MODEL_ID}</strong></p>
116
- <p>System: <em>{SYSTEM_PROMPT}</em></p>
117
- <p>๐Ÿš€ Powered by ZeroGPU for fast inference</p>
118
- </div>""",
119
- examples=[
120
- "What is 2+2?",
121
- "Explain quantum physics briefly",
122
- "Write a short poem",
123
- "How do I make good decisions?",
124
- "What are the benefits of AI?"
125
- ],
126
- theme=gr.themes.Soft(),
127
- )
128
- return demo
129
-
130
- except TypeError as e:
131
- print(f"Modern ChatInterface failed, trying fallback: {e}")
132
-
133
- # Fallback to older Gradio API or Interface
134
- try:
135
- # Try with basic parameters only
136
- demo = gr.ChatInterface(
137
- fn=chat_response,
138
- title="๐Ÿค– VibeThinker-1.5B Chat",
139
- description=f"Chat with {MODEL_ID}. {SYSTEM_PROMPT}",
140
- )
141
- return demo
142
- except:
143
- # Last resort: create basic Interface
144
- print("ChatInterface failed, creating basic Interface")
145
-
146
- def process_message(message, history=""):
147
- if history:
148
- # Convert history string to list of tuples
149
- history_list = []
150
- if isinstance(history, str):
151
- # Try to parse history
152
- history_list = []
153
- return chat_response(message, history_list)
154
- else:
155
- return chat_response(message, [])
156
-
157
- demo = gr.Interface(
158
- fn=process_message,
159
- inputs=["text", "text"],
160
- outputs="text",
161
- title="๐Ÿค– VibeThinker-1.5B Chat",
162
- description=f"Chat with {MODEL_ID}. {SYSTEM_PROMPT}",
163
- examples=[
164
- "What is 2+2?",
165
- "Explain quantum physics briefly",
166
- "Write a short poem",
167
- "How do I make good decisions?"
168
- ]
169
- )
170
- return demo
171
-
172
- # Test function
173
- def test_model():
174
- """Test if the model works"""
175
- print("๐Ÿงช Testing model functionality...")
176
 
177
- if not load_success:
178
- print("โŒ Model loading failed!")
179
- return False
180
-
181
- try:
182
- # Test with a simple message
183
- test_messages = [{"role": "user", "content": "Hello! How are you?"}]
184
- test_input = tokenizer.apply_chat_template(
185
- test_messages,
186
- tokenize=False,
187
- add_generation_prompt=True
188
- )
189
- print("โœ… Tokenization test passed!")
190
-
191
- # Test generation
192
- test_inputs = tokenizer([test_input], return_tensors="pt").to(model.device)
193
- with torch.no_grad():
194
- test_output = model.generate(
195
- **test_inputs,
196
- max_new_tokens=50,
197
- do_sample=True,
198
- temperature=0.7,
199
- )
200
-
201
- test_response = tokenizer.decode(test_output[0], skip_special_tokens=True)
202
- print("โœ… Generation test passed!")
203
- print(f"โœ… Model test successful! Response: {test_response[:100]}...")
204
- return True
205
-
206
- except Exception as e:
207
- print(f"โŒ Model test failed: {e}")
208
- return False
209
 
210
  if __name__ == "__main__":
211
- print("๐Ÿš€ Starting VibeThinker-1.5B Chat App...")
212
  print(f"๐Ÿ“ฆ Model: {MODEL_ID}")
213
  print(f"๐Ÿ’ฌ System: {SYSTEM_PROMPT}")
214
 
215
- # Test the model
216
- if test_model():
217
- print("โœ… All tests passed! Starting app...")
218
-
219
- demo = create_demo()
220
-
221
- # Try different launch methods
222
- try:
223
- demo.launch(share=False, server_name="0.0.0.0", server_port=7860)
224
- except:
225
- try:
226
- demo.launch(share=False)
227
- except:
228
- demo.launch()
229
- else:
230
- print("โŒ Tests failed! App may not work properly.")
231
-
232
- demo = create_demo()
233
- try:
234
- demo.launch(share=False)
235
- except:
236
- pass
 
8
  MODEL_ID = "WeiboAI/VibeThinker-1.5B"
9
  SYSTEM_PROMPT = "You are a concise solver. Respond briefly."
10
 
11
+ # Load model and tokenizer
12
+ print("๐Ÿš€ Loading model...")
13
+ tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
14
+ model = AutoModelForCausalLM.from_pretrained(
15
+ MODEL_ID,
16
+ torch_dtype=torch.float16,
17
+ device_map="auto",
18
+ )
19
+ print("โœ… Model loaded successfully!")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
 
21
  @spaces.GPU
22
+ def chat_fn(message, history):
23
+ """Simple chat function with clear progress"""
 
24
 
25
+ # Handle None values properly
26
+ if message is None:
27
+ message = "Hello"
28
+ if history is None:
29
+ history = []
30
 
31
+ print(f"๐Ÿง  Processing: '{message}'")
 
 
 
 
32
 
33
  try:
34
+ # Build conversation
 
 
 
 
 
 
35
  messages = [{"role": "system", "content": SYSTEM_PROMPT}]
36
 
37
+ # Add history
38
  for user_msg, assistant_msg in history:
39
  if user_msg is not None:
40
  messages.append({"role": "user", "content": str(user_msg)})
 
44
  # Add current message
45
  messages.append({"role": "user", "content": str(message)})
46
 
47
+ print("๐Ÿ“ Formatting conversation...")
48
+
49
+ # Apply template
50
+ prompt = tokenizer.apply_chat_template(
51
+ messages,
52
+ tokenize=False,
53
  add_generation_prompt=True
54
  )
55
 
56
+ print("๐Ÿ”ค Tokenizing...")
57
+
58
+ # Tokenize
59
+ inputs = tokenizer([prompt], return_tensors="pt").to(model.device)
60
+
61
+ print("โšก Generating...")
62
 
63
  # Generate response
64
  with torch.no_grad():
65
+ outputs = model.generate(
66
+ **inputs,
67
+ max_new_tokens=1024, # Longer output
68
  do_sample=True,
69
  temperature=0.7,
70
  top_p=0.9,
71
+ pad_token_id=tokenizer.eos_token_id,
72
+ eos_token_id=tokenizer.eos_token_id,
73
  )
74
 
75
+ # Decode
76
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
 
 
 
77
 
78
+ # Extract just the assistant response
79
+ response_text = response.split("assistant")[-1].strip()
80
+ response_text = response_text.replace("<|endoftext|>", "").strip()
81
 
82
+ print(f"โœ… Response: {response_text[:100]}...")
83
+ return response_text
84
 
85
  except Exception as e:
86
+ print(f"โŒ Error: {e}")
87
+ return f"Sorry, I encountered an error: {str(e)}"
88
 
89
+ def create_interface():
90
+ """Create the interface with minimal parameters"""
91
 
92
+ demo = gr.ChatInterface(
93
+ fn=chat_fn,
94
+ title="๐Ÿค– VibeThinker-1.5B Chat",
95
+ description=f"Chat with {MODEL_ID}. System: {SYSTEM_PROMPT}",
96
+ examples=[
97
+ "What is 2+2?",
98
+ "Explain quantum physics briefly",
99
+ "Write a short poem",
100
+ "How do I make good decisions?",
101
+ "What are the benefits of AI?",
102
+ "Tell me about space exploration"
103
+ ],
104
+ )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
105
 
106
+ return demo
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
107
 
108
  if __name__ == "__main__":
109
+ print("๐ŸŽฏ Starting VibeThinker-1.5B Chat App")
110
  print(f"๐Ÿ“ฆ Model: {MODEL_ID}")
111
  print(f"๐Ÿ’ฌ System: {SYSTEM_PROMPT}")
112
 
113
+ demo = create_interface()
114
+ demo.launch(share=False, server_name="0.0.0.0", server_port=7860)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
requirements.txt CHANGED
@@ -3,4 +3,3 @@ transformers>=4.36.0
3
  accelerate>=0.25.0
4
  torch>=2.0.0
5
  spaces>=0.19.4
6
- uvicorn>=0.14.0
 
3
  accelerate>=0.25.0
4
  torch>=2.0.0
5
  spaces>=0.19.4