--- title: VOICE SEMENTLE emoji: ๐ŸŽ™๏ธ colorFrom: purple colorTo: pink sdk: gradio sdk_version: "6.0.0" app_file: client/app.py pinned: false tags: - mcp-in-action-track-creative --- # ๐ŸŽ™๏ธ Voice Sementle > **Daily voice puzzle game โ€” guess the meme, song, or movie quote, but you have to SAY IT RIGHT!** It's not just *what* you say, it's *how* you say it. Your **pitch, rhythm, energy, and pronunciation** all matter. ๐Ÿ—“๏ธ New puzzle every day โ€ข ๐ŸŽญ 3 genres (memes, songs, movies) โ€ข ๐Ÿง  AI hints that get smarter --- ## ๐Ÿ“‹ Submission Info | | | |---|---| | **Track** | MCP in Action โ€” Creative | | **MCP Used** | [VoiceKit MCP](https://huggingface.co/spaces/MCP-1st-Birthday/voicekit) | | **Framework** | Gradio 6.0 | **๐Ÿ“ข Social Post:** [View on LinkedIn](https://www.linkedin.com/posts/traceychoi911_mcpinaction-buildwithmcp-gradio-activity-7400151841759494145-lA8U?utm_source=li_share&utm_content=feedcontent&utm_medium=g_dt_web&utm_campaign=copy) **๐Ÿ“ข Social Post:** [View on X](https://x.com/ChoiTracey24876/status/1994388486699245591?s=20) **๐ŸŽฌ Demo Video:** [Watch (1-5 min)](https://youtu.be/7VWELEUr-wE) **๐Ÿ‘ฅ Team:** [@LisaVLee](https://huggingface.co/LisaVLee), [@SabaPivot](https://huggingface.co/SabaPivot), [@daheepk](https://huggingface.co/daheepk), [@tchoi911](https://huggingface.co/tchoi911), [@Lucian25](https://huggingface.co/Lucian25) --- ## โœ… Track 2 Requirements | Requirement | How We Fulfill It | |-------------|-------------------| | **Autonomous Agent** | **Two agents**: MCP Advisor (voice analysis) + Chatbot (text + audio hints) | | **MCP as Tools** | VoiceKit MCP (`voicekit_analyze_voice_similarity`) for voice analysis | | **Gradio App** | Built with Gradio 6.0 | | **Tool Calling** | Chatbot autonomously calls `generate_audio_hint` โ†’ ElevenLabs TTS | --- ## ๐ŸŽฎ How It Works ``` 1. ๐ŸŽฏ Daily puzzle loads (meme / song / movie quote) 2. ๐ŸŽค You record your voice guess 3. ๐Ÿ”Š MCP analyzes: pitch, rhythm, energy, pronunciation, transcript 4. ๐Ÿง  AI agent generates progressive hints (vague โ†’ specific) 5. ๐Ÿ”Š Ask for audio hint โ†’ Agent calls ElevenLabs TTS with voice cloning 6. ๐Ÿ† Score > 85 = WIN! ``` --- ## ๐Ÿค– Agentic Architecture (Two Agents) ``` โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ VOICE SEMENTLE โ”‚ โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค โ”‚ โ”‚ โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ โ”‚ โ”‚ AGENT 1: MCP Advisor โ”‚ โ”‚ โ”‚ โ”‚ User Voice โ†’ [VoiceKit MCP] โ†’ 6 Scores โ†’ [Gemini] โ†’ Adviceโ”‚ โ”‚ โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ”‚ โ”‚ โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ โ”‚ โ”‚ AGENT 2: Chatbot (Tool Calling) โ”‚ โ”‚ โ”‚ โ”‚ User Chat โ†’ [Gemini] โ†’ Text Response โ”‚ โ”‚ โ”‚ โ”‚ โ†“ (autonomous decision) โ”‚ โ”‚ โ”‚ โ”‚ Tool Call: generate_audio_hint โ”‚ โ”‚ โ”‚ โ”‚ โ†“ โ”‚ โ”‚ โ”‚ โ”‚ [ElevenLabs IVC + TTS] โ†’ Audio Hint โ”‚ โ”‚ โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ”‚ โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ ``` ### Agent 1: MCP Advisor Analyzes voice via **VoiceKit MCP** and automatically generates advice. - Connects to MCP server (`voicekit_analyze_voice_similarity`) - Returns 6 scores: pitch, rhythm, energy, pronunciation, transcript, overall - Gemini generates progressive advice based on scores & attempt count **Progressive Advice Strategy:** - **Attempt 1**: Extremely vague (no category revealed) - **Attempt 2**: Vague hint + category mentioned - **Attempts 3-4**: More specific context - **Attempts 5-6**: Quite specific (era, usage) - **Attempts 7-10**: Very specific (syllables, first letter, rhymes) - **Attempt 11+**: Pronunciation coaching mode ### Agent 2: Chatbot (with Tool Calling) Conversational chatbot that provides **text hints** AND can **autonomously call tools**. - Answers user questions about the game - Provides additional hints on request - **Tool calling**: Autonomously decides to call `generate_audio_hint` โ†’ ElevenLabs TTS --- ## ๐Ÿ”Š Audio Hints (Agentic Tool Calling) The agent has access to `generate_audio_hint` and **autonomously decides when to use it**: ```python # User: "Can I hear how it sounds?" # Agent decides to call tool: generate_audio_hint(hint_type="syllable") โ†’ Clone voice from reference audio (ElevenLabs IVC) โ†’ Generate TTS with eleven_multilingual_v2 โ†’ Return audio to user ``` **๐ŸŽง Example:** --- ## ๐Ÿ› ๏ธ Tech Stack | Component | Technology | |-----------|------------| | Frontend | Gradio 6.0 | | Voice Analysis | VoiceKit MCP (SSE) | | Hint Agent | Gemini 2.5 Flash | | Audio Hints | ElevenLabs IVC + TTS | | Database | PostgreSQL | --- ## ๐Ÿ“Š Scoring (6 Metrics) | Metric | What It Measures | |--------|------------------| | ๐ŸŽต Pitch | Tone accuracy | | ๐Ÿฅ Rhythm | Timing & cadence | | โšก Energy | Intensity level | | ๐Ÿ—ฃ๏ธ Pronunciation | Clarity | | ๐Ÿ“ Transcript | Correct words (STT) | | ๐Ÿ† Overall | Combined (>85 = win) | --- ## ๐ŸŽฏ Why Voice Sementle? | Judging Criteria | Our Approach | |------------------|--------------| | **UI/UX** | Polished Gradio 6 interface, intuitive game flow | | **Functionality** | MCP + Agentic chatbot + Tool calling | | **Creativity** | First voice-based guessing game with performance scoring | | **Documentation** | Clear README, architecture diagrams | | **Real-world Impact** | Fun consumer app; language learning potential | --- ## ๐ŸŽฎ Try It Now! ๐Ÿ‘† **Click the interface above to start playing!** 1. Allow microphone access 2. Record your voice guess 3. Get scored on pitch, rhythm, energy & pronunciation 4. Ask for hints or audio examples 5. Keep trying until you win! --- **Built for [MCP's 1st Birthday Hackathon](https://huggingface.co/MCP-1st-Birthday)** ๐ŸŽ‚ *Celebrating one year of Model Context Protocol!*