Tom Claude commited on
Commit
fd63fb5
Β·
1 Parent(s): f189467

fix: update for Gradio 5+/6+ compatibility

Browse files

- Move theme from gr.Blocks() to app.launch() (Gradio 6 API change)
- Update requirements to gradio>=5.0.0 for websockets 13+ support
- websockets 13+ required by supabase realtime module

Tested locally with fresh venv install.

πŸ€– Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

Files changed (4) hide show
  1. .env.example +9 -0
  2. SETUP.md +106 -0
  3. app.py +3 -3
  4. requirements.txt +2 -2
.env.example ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ # Jina AI API key (for jina-embeddings-v3)
2
+ JINA_API_KEY=your_jina_api_key_here
3
+
4
+ # Supabase Configuration
5
+ SUPABASE_URL=https://your-project.supabase.co
6
+ SUPABASE_KEY=your_supabase_anon_key_here
7
+
8
+ # HuggingFace API token (for Inference API)
9
+ HF_TOKEN=your_huggingface_token_here
SETUP.md ADDED
@@ -0,0 +1,106 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Johnny Harris Knowledge Base
2
+
3
+ Vector database of Johnny Harris YouTube transcripts for style replication and topic search.
4
+
5
+ ## Database: `johnny_transcripts`
6
+
7
+ | Column | Type | Description |
8
+ |--------|------|-------------|
9
+ | `video_id` | text | YouTube video ID |
10
+ | `video_url` | text | Full YouTube URL |
11
+ | `title` | text | Video title |
12
+ | `chunk_text` | text | ~1000 words of transcript |
13
+ | `chunk_index` | int | Position in video (0, 1, 2...) |
14
+ | `total_chunks` | int | Total chunks for this video |
15
+ | `embedding` | vector(1024) | Jina v3 embedding |
16
+
17
+ ## Querying
18
+
19
+ ### 1. Similarity Search (Tab 1: Topic Coverage)
20
+
21
+ Embed the user's query with Jina v3, then find similar chunks:
22
+
23
+ ```python
24
+ from supabase import create_client
25
+
26
+ # Embed query with Jina v3
27
+ query_embedding = jina_embed(user_query) # returns 1024-dim vector
28
+
29
+ # Find similar chunks
30
+ result = supabase.rpc('match_transcripts', {
31
+ 'query_embedding': query_embedding,
32
+ 'match_threshold': 0.7,
33
+ 'match_count': 10
34
+ }).execute()
35
+
36
+ # Returns: title, video_url, chunk_text, similarity score
37
+ ```
38
+
39
+ ### 2. Title Search (Quick lookup)
40
+
41
+ For simple keyword matching on titles:
42
+
43
+ ```python
44
+ result = supabase.from_('johnny_transcripts') \
45
+ .select('title, video_url') \
46
+ .ilike('title', f'%{keyword}%') \
47
+ .execute()
48
+ ```
49
+
50
+ ### 3. Get Full Context (Tab 2: Script Production)
51
+
52
+ After finding relevant chunks, fetch surrounding context:
53
+
54
+ ```python
55
+ # Get all chunks from a video for full context
56
+ result = supabase.from_('johnny_transcripts') \
57
+ .select('chunk_text, chunk_index') \
58
+ .eq('video_id', video_id) \
59
+ .order('chunk_index') \
60
+ .execute()
61
+
62
+ full_transcript = ' '.join([r['chunk_text'] for r in result.data])
63
+ ```
64
+
65
+ ## Gradio App Structure
66
+
67
+ **Tab 1: Topic Search**
68
+ - Input: Topic/question text
69
+ - Process: Embed with Jina v3 β†’ similarity search
70
+ - Output: List of matching videos with titles, URLs, and relevant excerpts
71
+
72
+ **Tab 2: Script Production**
73
+ - Input: Bullet points, reference text, optional video links
74
+ - Process:
75
+ 1. Embed inputs β†’ find similar Johnny content
76
+ 2. Pass Johnny excerpts + user input to LLM
77
+ 3. Generate script in Johnny's style
78
+ - Output: Script draft with source citations
79
+
80
+ ## Environment Variables
81
+
82
+ ```
83
+ SUPABASE_URL=https://fkrixdhynljoecpiuyng.supabase.co
84
+ SUPABASE_DB_PASSWORD=...
85
+ JINA_API_KEY=...
86
+ ```
87
+
88
+ ## Jina v3 Embedding
89
+
90
+ ```python
91
+ import requests
92
+
93
+ def jina_embed(text: str) -> list[float]:
94
+ response = requests.post(
95
+ 'https://api.jina.ai/v1/embeddings',
96
+ headers={'Authorization': f'Bearer {JINA_API_KEY}'},
97
+ json={
98
+ 'model': 'jina-embeddings-v3',
99
+ 'task': 'retrieval.query', # Use 'retrieval.query' for queries
100
+ 'input': [text]
101
+ }
102
+ )
103
+ return response.json()['data'][0]['embedding']
104
+ ```
105
+
106
+ Note: Documents were embedded with `task: retrieval.passage`. Queries should use `task: retrieval.query`.
app.py CHANGED
@@ -199,8 +199,7 @@ def create_app():
199
  """Create and configure the Gradio application"""
200
 
201
  with gr.Blocks(
202
- title="NewPress AI - Johnny Harris Script Assistant",
203
- theme=gr.themes.Soft()
204
  ) as app:
205
  app.queue() # Enable queue before defining event handlers for progress to work
206
 
@@ -314,5 +313,6 @@ if __name__ == "__main__":
314
  app.launch(
315
  server_name="0.0.0.0",
316
  server_port=7860,
317
- share=False
 
318
  )
 
199
  """Create and configure the Gradio application"""
200
 
201
  with gr.Blocks(
202
+ title="NewPress AI - Johnny Harris Script Assistant"
 
203
  ) as app:
204
  app.queue() # Enable queue before defining event handlers for progress to work
205
 
 
313
  app.launch(
314
  server_name="0.0.0.0",
315
  server_port=7860,
316
+ share=False,
317
+ theme="soft"
318
  )
requirements.txt CHANGED
@@ -1,5 +1,5 @@
1
- # Gradio for UI
2
- gradio==4.44.0
3
 
4
  # Supabase client for vector store
5
  supabase>=2.0.0
 
1
+ # Gradio for UI (6.x supports websockets 13+ needed by supabase)
2
+ gradio>=5.0.0
3
 
4
  # Supabase client for vector store
5
  supabase>=2.0.0