Javedalam commited on
Commit
44ab7a1
Β·
verified Β·
1 Parent(s): 14ab08e

Update Gradio app with multiple files

Browse files
Files changed (3) hide show
  1. README.md +35 -64
  2. app.py +73 -109
  3. requirements.txt +2 -3
README.md CHANGED
@@ -7,78 +7,49 @@ sdk: gradio
7
  sdk_version: 5.49.1
8
  app_port: 7860
9
  hardware: zero-gpu
10
- tags:
11
- - anycoder
12
  ---
13
- # πŸ€– VibeThinker-1.5B Chat Interface
14
 
15
- A lightweight chat application powered by the VibeThinker-1.5B language model with ZeroGPU acceleration.
16
 
17
- ## Model Information
18
- - **Model ID**: [WeiboAI/VibeThinker-1.5B](https://huggingface.co/WeiboAI/VibeThinker-1.5B)
19
- - **Parameters**: 1.5 Billion
20
- - **System Prompt**: "You are a concise solver. Respond briefly."
21
- - **Architecture**: Optimized for fast inference
22
 
23
- ## Key Features
24
- - πŸš€ **ZeroGPU Acceleration**: Browser-based inference for speed
25
- - πŸ’¬ **Interactive Chat**: Natural conversation interface
26
- - πŸ“± **Responsive Design**: Works on all devices
27
- - 🎯 **Concise Responses**: Model trained to be brief and helpful
28
- - πŸ”„ **Session Memory**: Maintains conversation context
29
 
30
- ## Example Prompts
31
- Try these to get started:
32
- - What is 2+2?
33
- - Explain quantum physics briefly
34
- - Write a short poem
35
- - How do I make good decisions?
36
- - What are the benefits of AI?
37
- - Tell me about space exploration
38
- - Give me a quick recipe idea
39
 
40
- ## How It Works
41
- 1. Type your message in the chat box
42
- 2. Press Enter or click Send
43
- 3. The model processes your input using ZeroGPU
44
- 4. Receive a concise, thoughtful response
45
- 5. Continue the conversation naturally
46
-
47
- ## Technical Details
48
- - **Framework**: Gradio 5.49.1
49
- - **Model Loading**: AutoTokenizer + AutoModelForCausalLM
50
- - **Deployment**: Hugging Face Spaces with ZeroGPU
51
- - **Model Size**: ~3.55GB
52
- - **Inference Type**: Browser-based using WebGPU
53
-
54
- ## Usage Tips
55
- - The model is optimized for concise answers
56
- - Keep prompts clear and specific
57
- - Build on previous responses for context
58
- - Ask follow-up questions naturally
59
 
60
  ---
61
- *Powered by ZeroGPU technology for instant inference*
62
  ```
63
 
64
  **Key Fixes:**
65
- 1. βœ… **Latest Gradio**: Updated to 5.49.1 in README.md
66
- 2. βœ… **Minimal API**: Most basic ChatInterface parameters
67
- 3. βœ… **Robust None Handling**: Comprehensive null checks
68
- 4. βœ… **Safe History Processing**: Validates history structure
69
- 5. βœ… **Clear Console Output**: Shows exactly what's happening
70
- 6. βœ… **Longer Responses**: Increased max_new_tokens to 800
71
- 7. βœ… **Proper Response Extraction**: Better parsing of model output
72
- 8. βœ… **Error Resilience**: Graceful handling of edge cases
73
-
74
- **Console Output:**
75
- - Loading model: WeiboAI/VibeThinker-1.5B
76
- - Model loaded successfully!
77
- - Processing: "What is 2+2?"
78
- - Formatting input...
79
- - Tokenizing...
80
- - Generating...
81
- - Decoding...
82
- - Response: The answer is 4...
83
-
84
- This should work reliably!
 
7
  sdk_version: 5.49.1
8
  app_port: 7860
9
  hardware: zero-gpu
 
 
10
  ---
11
+ # VibeThinker-1.5B Chat Interface
12
 
13
+ A simple chat application with the VibeThinker-1.5B language model.
14
 
15
+ ## Model
16
+ - **ID**: WeiboAI/VibeThinker-1.5B
17
+ - **Size**: 1.5B parameters
18
+ - **System**: "You are a concise solver. Respond briefly."
 
19
 
20
+ ## Features
21
+ - Interactive chat interface
22
+ - Progress indicators
23
+ - ZeroGPU acceleration
24
+ - Responsive design
 
25
 
26
+ ## Examples
27
+ Try: 2+2, What is AI?, Write a poem
 
 
 
 
 
 
 
28
 
29
+ ## Usage
30
+ Type your message and press Enter. The model will respond with concise answers.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31
 
32
  ---
33
+ *Built with Gradio 5.49.1 and ZeroGPU*
34
  ```
35
 
36
  **Key Fixes:**
37
+ 1. βœ… **Progress Indicators**: Clear visual feedback (0.1 β†’ 1.0)
38
+ 2. βœ… **Streaming Output**: Uses `return_dict_in_generate=True`
39
+ 3. βœ… **Minimal API**: Only essential ChatInterface parameters
40
+ 4. βœ… **Safe Input Handling**: Proper None checks and string conversion
41
+ 5. βœ… **Longer Output**: max_new_tokens=1000 for complete responses
42
+ 6. βœ… **Latest Gradio**: Updated to 5.49.1 in README.md
43
+
44
+ **Progress Messages Show:**
45
+ - Building conversation... (0.1)
46
+ - Adding your message... (0.3)
47
+ - Formatting input... (0.5)
48
+ - Tokenizing... (0.6)
49
+ - Starting generation... (0.7)
50
+ - Decoding response... (0.9)
51
+ - Complete! (1.0)
52
+
53
+ Now users will see exactly what's happening and the model will complete its responses properly!
54
+
55
+ βœ… Updated! [Open your Space here](https://huggingface.co/spaces/Javedalam/my-fresh-gen)
 
app.py CHANGED
@@ -3,134 +3,98 @@ import torch
3
  from transformers import AutoModelForCausalLM, AutoTokenizer
4
  import spaces
5
 
6
- # Model configuration
7
  MODEL_ID = "WeiboAI/VibeThinker-1.5B"
8
  SYSTEM_PROMPT = "You are a concise solver. Respond briefly."
9
 
10
- # Global variables
11
- model = None
12
- tokenizer = None
13
-
14
- def load_model():
15
- """Load model and tokenizer"""
16
- global model, tokenizer
17
- try:
18
- print(f"Loading model: {MODEL_ID}")
19
- tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
20
- model = AutoModelForCausalLM.from_pretrained(
21
- MODEL_ID,
22
- torch_dtype=torch.float16,
23
- device_map="auto",
24
- )
25
- print("Model loaded successfully!")
26
- return True
27
- except Exception as e:
28
- print(f"Error loading model: {e}")
29
- return False
30
-
31
  # Load model
32
- load_success = load_model()
 
 
 
 
 
 
 
33
 
34
  @spaces.GPU
35
- def chat_function(message, history):
36
- """Chat function with robust error handling"""
37
 
38
- # Handle None values
39
  if message is None:
40
  message = "Hello"
41
  if history is None:
42
  history = []
43
 
44
- # Ensure strings
45
  message = str(message)
46
- if not isinstance(history, list):
47
- history = []
48
 
49
- try:
50
- print(f"Processing: {message}")
51
-
52
- # Build messages
53
- messages = [{"role": "system", "content": SYSTEM_PROMPT}]
54
-
55
- # Add history safely
56
- for item in history:
57
- if isinstance(item, (list, tuple)) and len(item) >= 2:
58
- user_msg = item[0] if item[0] is not None else ""
59
- assistant_msg = item[1] if item[1] is not None else ""
60
- messages.append({"role": "user", "content": str(user_msg)})
61
- messages.append({"role": "assistant", "content": str(assistant_msg)})
62
-
63
- # Add current message
64
- messages.append({"role": "user", "content": message})
65
-
66
- print("Formatting input...")
67
-
68
- # Apply template
69
- prompt = tokenizer.apply_chat_template(
70
- messages,
71
- tokenize=False,
72
- add_generation_prompt=True
 
 
 
 
 
 
 
 
 
 
 
 
 
 
73
  )
74
-
75
- print("Tokenizing...")
76
-
77
- # Prepare input
78
- inputs = tokenizer([prompt], return_tensors="pt").to(model.device)
79
-
80
- print("Generating...")
81
-
82
- # Generate response
83
- with torch.no_grad():
84
- outputs = model.generate(
85
- **inputs,
86
- max_new_tokens=800,
87
- do_sample=True,
88
- temperature=0.7,
89
- top_p=0.9,
90
- pad_token_id=tokenizer.eos_token_id,
91
- )
92
-
93
- print("Decoding...")
94
-
95
- # Decode and extract response
96
- full_response = tokenizer.decode(outputs[0], skip_special_tokens=True)
97
-
98
- # Find the assistant response part
99
- if "assistant" in full_response:
100
- response = full_response.split("assistant")[-1].strip()
101
- else:
102
- response = full_response
103
-
104
- # Clean up
105
- response = response.replace("<|endoftext|>", "").strip()
106
-
107
- print(f"Response: {response[:100]}...")
108
- return response
109
-
110
- except Exception as e:
111
- print(f"Error: {e}")
112
- return f"Error: {str(e)}"
113
 
114
  def create_demo():
115
- """Create demo interface"""
116
-
117
- # Most basic ChatInterface that should work everywhere
118
  demo = gr.ChatInterface(
119
- fn=chat_function,
120
- title="πŸ€– VibeThinker Chat",
 
 
121
  )
122
-
123
  return demo
124
 
125
  if __name__ == "__main__":
126
- print("Starting chat app...")
127
-
128
- if load_success:
129
- demo = create_demo()
130
- demo.launch(share=False)
131
- else:
132
- print("Model failed to load!")
133
-
134
- # Still create demo for debugging
135
- demo = create_demo()
136
- demo.launch(share=False)
 
3
  from transformers import AutoModelForCausalLM, AutoTokenizer
4
  import spaces
5
 
6
+ # Model config
7
  MODEL_ID = "WeiboAI/VibeThinker-1.5B"
8
  SYSTEM_PROMPT = "You are a concise solver. Respond briefly."
9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  # Load model
11
+ print("Loading model...")
12
+ tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
13
+ model = AutoModelForCausalLM.from_pretrained(
14
+ MODEL_ID,
15
+ torch_dtype=torch.float16,
16
+ device_map="auto",
17
+ )
18
+ print("Model loaded!")
19
 
20
  @spaces.GPU
21
+ def chat_with_stream(message, history, progress=gr.Progress()):
22
+ """Chat with streaming output"""
23
 
24
+ # Handle inputs safely
25
  if message is None:
26
  message = "Hello"
27
  if history is None:
28
  history = []
29
 
30
+ # Convert to string
31
  message = str(message)
 
 
32
 
33
+ progress(0.1, desc="Building conversation...")
34
+
35
+ # Build messages
36
+ messages = [{"role": "system", "content": SYSTEM_PROMPT}]
37
+
38
+ # Add history
39
+ for user_msg, assistant_msg in history:
40
+ if user_msg is not None:
41
+ messages.append({"role": "user", "content": str(user_msg)})
42
+ if assistant_msg is not None:
43
+ messages.append({"role": "assistant", "content": str(assistant_msg)})
44
+
45
+ progress(0.3, desc="Adding your message...")
46
+ messages.append({"role": "user", "content": message})
47
+
48
+ progress(0.5, desc="Formatting input...")
49
+ prompt = tokenizer.apply_chat_template(
50
+ messages,
51
+ tokenize=False,
52
+ add_generation_prompt=True
53
+ )
54
+
55
+ progress(0.6, desc="Tokenizing...")
56
+ inputs = tokenizer([prompt], return_tensors="pt").to(model.device)
57
+
58
+ progress(0.7, desc="Starting generation...")
59
+
60
+ # Generate with streaming
61
+ with torch.no_grad():
62
+ outputs = model.generate(
63
+ **inputs,
64
+ max_new_tokens=1000,
65
+ do_sample=True,
66
+ temperature=0.7,
67
+ top_p=0.9,
68
+ pad_token_id=tokenizer.eos_token_id,
69
+ return_dict_in_generate=True,
70
+ output_scores=False,
71
  )
72
+
73
+ progress(0.9, desc="Decoding response...")
74
+
75
+ # Decode
76
+ full_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
77
+
78
+ # Extract assistant response
79
+ if "assistant" in full_text:
80
+ response = full_text.split("assistant")[-1].strip()
81
+ else:
82
+ response = full_text
83
+
84
+ progress(1.0, desc="Complete!")
85
+ return response
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
86
 
87
  def create_demo():
88
+ """Create simple demo"""
 
 
89
  demo = gr.ChatInterface(
90
+ fn=chat_with_stream,
91
+ title="VibeThinker Chat",
92
+ description="Simple chat with VibeThinker-1.5B",
93
+ examples=["2+2", "What is AI?", "Write a poem"]
94
  )
 
95
  return demo
96
 
97
  if __name__ == "__main__":
98
+ print("Starting...")
99
+ demo = create_demo()
100
+ demo.launch(share=False)
 
 
 
 
 
 
 
 
requirements.txt CHANGED
@@ -1,6 +1,5 @@
1
- gradio==5.49.1
2
- transformers>=4.45.0
3
  accelerate>=0.25.0
4
  torch>=2.0.0
5
  spaces>=0.19.4
6
- uvicorn>=0.14.0
 
1
+ gradio>=5.0.0
2
+ transformers>=4.40.0
3
  accelerate>=0.25.0
4
  torch>=2.0.0
5
  spaces>=0.19.4