Spaces:

tomvaillant
/

newpress-ai

Sleeping

Tom Claude commited on 16 days ago

Commit

fd63fb5

1 Parent(s): f189467

fix: update for Gradio 5+/6+ compatibility

- Move theme from gr.Blocks() to app.launch() (Gradio 6 API change)
- Update requirements to gradio>=5.0.0 for websockets 13+ support
- websockets 13+ required by supabase realtime module

Tested locally with fresh venv install.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

Files changed (4) hide show

.env.example +9 -0
SETUP.md +106 -0
app.py +3 -3
requirements.txt +2 -2

.env.example ADDED Viewed

	@@ -0,0 +1,9 @@

+# Jina AI API key (for jina-embeddings-v3)
+JINA_API_KEY=your_jina_api_key_here
+# Supabase Configuration
+SUPABASE_URL=https://your-project.supabase.co
+SUPABASE_KEY=your_supabase_anon_key_here
+# HuggingFace API token (for Inference API)
+HF_TOKEN=your_huggingface_token_here

SETUP.md ADDED Viewed

	@@ -0,0 +1,106 @@

+# Johnny Harris Knowledge Base
+Vector database of Johnny Harris YouTube transcripts for style replication and topic search.
+## Database: `johnny_transcripts`
+| Column | Type | Description |
+|--------|------|-------------|
+| `video_id` | text | YouTube video ID |
+| `video_url` | text | Full YouTube URL |
+| `title` | text | Video title |
+| `chunk_text` | text | ~1000 words of transcript |
+| `chunk_index` | int | Position in video (0, 1, 2...) |
+| `total_chunks` | int | Total chunks for this video |
+| `embedding` | vector(1024) | Jina v3 embedding |
+## Querying
+### 1. Similarity Search (Tab 1: Topic Coverage)
+Embed the user's query with Jina v3, then find similar chunks:
+```python
+from supabase import create_client
+# Embed query with Jina v3
+query_embedding = jina_embed(user_query)  # returns 1024-dim vector
+# Find similar chunks
+result = supabase.rpc('match_transcripts', {
+    'query_embedding': query_embedding,
+    'match_threshold': 0.7,
+    'match_count': 10
+}).execute()
+# Returns: title, video_url, chunk_text, similarity score
+```
+### 2. Title Search (Quick lookup)
+For simple keyword matching on titles:
+```python
+result = supabase.from_('johnny_transcripts') \
+    .select('title, video_url') \
+    .ilike('title', f'%{keyword}%') \
+    .execute()
+```
+### 3. Get Full Context (Tab 2: Script Production)
+After finding relevant chunks, fetch surrounding context:
+```python
+# Get all chunks from a video for full context
+result = supabase.from_('johnny_transcripts') \
+    .select('chunk_text, chunk_index') \
+    .eq('video_id', video_id) \
+    .order('chunk_index') \
+    .execute()
+full_transcript = ' '.join([r['chunk_text'] for r in result.data])
+```
+## Gradio App Structure
+**Tab 1: Topic Search**
+- Input: Topic/question text
+- Process: Embed with Jina v3 → similarity search
+- Output: List of matching videos with titles, URLs, and relevant excerpts
+**Tab 2: Script Production**
+- Input: Bullet points, reference text, optional video links
+- Process:
+  1. Embed inputs → find similar Johnny content
+  2. Pass Johnny excerpts + user input to LLM
+  3. Generate script in Johnny's style
+- Output: Script draft with source citations
+## Environment Variables
+```
+SUPABASE_URL=https://fkrixdhynljoecpiuyng.supabase.co
+SUPABASE_DB_PASSWORD=...
+JINA_API_KEY=...
+```
+## Jina v3 Embedding
+```python
+import requests
+def jina_embed(text: str) -> list[float]:
+    response = requests.post(
+        'https://api.jina.ai/v1/embeddings',
+        headers={'Authorization': f'Bearer {JINA_API_KEY}'},
+        json={
+            'model': 'jina-embeddings-v3',
+            'task': 'retrieval.query',  # Use 'retrieval.query' for queries
+            'input': [text]
+        }
+    )
+    return response.json()['data'][0]['embedding']
+```
+Note: Documents were embedded with `task: retrieval.passage`. Queries should use `task: retrieval.query`.

app.py CHANGED Viewed

@@ -199,8 +199,7 @@ def create_app():
     """Create and configure the Gradio application"""
     with gr.Blocks(
-        title="NewPress AI - Johnny Harris Script Assistant",
-        theme=gr.themes.Soft()
     ) as app:
         app.queue()  # Enable queue before defining event handlers for progress to work
@@ -314,5 +313,6 @@ if __name__ == "__main__":
     app.launch(
         server_name="0.0.0.0",
         server_port=7860,
-        share=False
     )

     """Create and configure the Gradio application"""
     with gr.Blocks(
+        title="NewPress AI - Johnny Harris Script Assistant"
     ) as app:
         app.queue()  # Enable queue before defining event handlers for progress to work
     app.launch(
         server_name="0.0.0.0",
         server_port=7860,
+        share=False,
+        theme="soft"
     )

requirements.txt CHANGED Viewed

@@ -1,5 +1,5 @@
-# Gradio for UI
-gradio==4.44.0
 # Supabase client for vector store
 supabase>=2.0.0

+# Gradio for UI (6.x supports websockets 13+ needed by supabase)
+gradio>=5.0.0
 # Supabase client for vector store
 supabase>=2.0.0