Lindow commited on
Commit
3fac7d8
·
1 Parent(s): 7602502

add mcp and gradio app

Browse files
.agent/docs/common_core_spec.md DELETED
@@ -1,136 +0,0 @@
1
- This is the authoritative **Common Core Data Specification**. It contains the exact source locations, data schemas, field definitions, and the specific processing logic required to interpret the hierarchy correctly.
2
-
3
- **Use this document as the source of truth for `tools/build_data.py`.**
4
-
5
- ---
6
-
7
- # Data Specification: Common Core Standards
8
-
9
- **Authority:** Common Standards Project (GitHub)
10
- **License:** Creative Commons Attribution 4.0 (CC BY 4.0)
11
- **Format:** JSON (Flat List of Objects)
12
-
13
- ## 1. Source Locations
14
-
15
- We are using the "Clean Data" export from the Common Standards Project. These files are static JSON dumps where each file represents a full Subject.
16
-
17
- | Subject | Direct Download URL |
18
- | :----------------- | :--------------------------------------------------------------------------------------------------------------------------- |
19
- | **Mathematics** | `https://raw.githubusercontent.com/commoncurriculum/common-standards-project/master/data/clean-data/CCSSI/Mathematics.json` |
20
- | **ELA / Literacy** | `https://raw.githubusercontent.com/commoncurriculum/common-standards-project/master/data/clean-data/CCSSI/ELA-Literacy.json` |
21
-
22
- ---
23
-
24
- ## 2. The Data Structure (Glossary)
25
-
26
- The JSON file contains a root object. The actual standards are located in the `standards` dictionary, keyed by their internal GUID.
27
-
28
- ### **Root Object**
29
-
30
- ```json
31
- {
32
- "subject": "Mathematics",
33
- "standards": {
34
- "6051566A...": { ... }, // Standard Object
35
- "5E367098...": { ... } // Standard Object
36
- }
37
- }
38
- ```
39
-
40
- ### **Standard Object (The Item)**
41
-
42
- Each item represents a node in the curriculum tree. It could be a broad **Domain**, a grouping **Cluster**, or a specific **Standard**.
43
-
44
- | Field Name | Type | Definition & Usage |
45
- | :---------------------- | :-------------- | :------------------------------------------------------------------------------------------------------------------------------------ |
46
- | **`id`** | `String (GUID)` | The internal unique identifier. Used for lookups in `ancestorIds`. |
47
- | **`statementNotation`** | `String` | **The Display Code.** (e.g., `CCSS.Math.Content.1.OA.A.1`). This is what teachers recognize. Use this for the UI. |
48
- | **`description`** | `String` | The text content. **Warning:** For standards, this text is often incomplete without its parent context (see Hierarchy below). |
49
- | **`statementLabel`** | `String` | The hierarchy type. Critical values: <br>• `Domain` (Highest) <br>• `Cluster` (Grouping) <br>• `Standard` (The actionable item) |
50
- | **`gradeLevels`** | `Array[String]` | Scope of the standard. <br>• Format: `["01", "02"]` (Grades 1 & 2), `["K"]` (Kindergarten), `["09", "10", "11", "12"]` (High School). |
51
- | **`ancestorIds`** | `Array[GUID]` | **CRITICAL.** An ordered list of parent IDs (from root to immediate parent). You must resolve these to build the full context. |
52
-
53
- ---
54
-
55
- ## 3. Hierarchy & Context (The "Interpretation" Problem)
56
-
57
- **The Problem:**
58
- A standard's description often relies on its parent "Cluster" for meaning.
59
-
60
- - _Cluster Text:_ "Understand the place value system."
61
- - _Standard Text:_ "Recognize that in a multi-digit number, a digit in one place represents 10 times as much..."
62
-
63
- If you only embed the _Standard Text_, the vector will miss the concept of "Place Value."
64
-
65
- **The Solution (Processing Logic):**
66
- To generate the **Search String** for embedding, you must concatenate the hierarchy.
67
-
68
- 1. **Domain:** The broad category (e.g., "Number and Operations in Base Ten").
69
- 2. **Cluster:** The specific topic (e.g., "Generalize place value understanding").
70
- 3. **Standard:** The task.
71
-
72
- **Formula:**
73
-
74
- ```text
75
- "{Subject} {Grade}: {Domain Text} - {Cluster Text} - {Standard Text}"
76
- ```
77
-
78
- ---
79
-
80
- ## 4. Build Pipeline Specification (`tools/build_data.py`)
81
-
82
- This specific logic ensures we extract meaningful vectors.
83
-
84
- ### **Step A: Ingestion**
85
-
86
- 1. Download both JSON files.
87
- 2. Merge the `standards` dictionaries into a single **Lookup Map** (Memory: `Map<GUID, Object>`).
88
-
89
- ### **Step B: Iteration & Filtering**
90
-
91
- Iterate through the Lookup Map.
92
- **Filter Rule:**
93
-
94
- - **KEEP** if `statementLabel` equals `"Standard"`.
95
- - **DISCARD** if `statementLabel` is `"Domain"`, `"Cluster"`, or `"Component"`. (We only index the actionable leaves).
96
-
97
- ### **Step C: Context Resolution (The "Breadcrumb" Loop)**
98
-
99
- For every kept Standard:
100
-
101
- 1. Initialize `context_text = ""`
102
- 2. Iterate through `ancestorIds`:
103
- - Use the ID to look up the Parent Object in the **Lookup Map**.
104
- - Append `Parent.description` to `context_text`.
105
- 3. Construct the final string:
106
- - `full_text = f"{context_text} {current_standard.description}"`
107
- 4. **Vectorize `full_text`**.
108
-
109
- ### **Step D: Output Schema (`data/standards.json`)**
110
-
111
- The clean, flat JSON file you save for the App to load must look like this:
112
-
113
- ```json
114
- [
115
- {
116
- "id": "CCSS.Math.Content.1.OA.A.1", // From 'statementNotation'
117
- "guid": "6051566A...", // From 'id'
118
- "grade": "01", // From 'gradeLevels[0]'
119
- "subject": "Mathematics", // From 'subject'
120
- "description": "Use addition and subtraction within 20 to solve word problems...", // From 'description'
121
- "full_context": "Operations and Algebraic Thinking - Represent and solve problems... - Use addition and..." // The text we used for embedding
122
- }
123
- ]
124
- ```
125
-
126
- ---
127
-
128
- ## 5. Summary of Valid `gradeLevels`
129
-
130
- When processing, normalize these strings if necessary, but typically they appear as:
131
-
132
- - `K` (Kindergarten)
133
- - `01` - `08` (Grades 1-8)
134
- - `09-12` (High School generic)
135
-
136
- _Note: If `gradeLevels` is an array `["09", "10", "11", "12"]`, you can display it as "High School" or "Grades 9-12"._
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
.agent/specs/002_mcp/spec.md ADDED
@@ -0,0 +1,574 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # MCP Server Sprint Specification
2
+
3
+ ## Overview
4
+
5
+ This sprint builds the MCP (Model Context Protocol) server that exposes the Pinecone database of educational standards to MCP clients (e.g., Claude Desktop). The server provides two methods of interacting with the Pinecone database:
6
+
7
+ 1. **Semantic Search**: Vector similarity search using natural language queries to find relevant standards
8
+ 2. **Direct ID Lookup**: Retrieve a specific standard by its GUID or identifier
9
+
10
+ ---
11
+
12
+ ## MCP Server Architecture
13
+
14
+ ### Entry Point
15
+
16
+ The MCP server will be implemented as `server.py` in the project root. This file is minimal and focused on:
17
+
18
+ - Setting up the FastMCP server instance
19
+ - Defining tool decorators that delegate to logic in `src/`
20
+ - Running the server
21
+
22
+ The bulk of the logic lives in the `src/` directory, which serves as the authoritative location for shared code.
23
+
24
+ ### Framework
25
+
26
+ Use `mcp.server.fastmcp.FastMCP` for the MCP server implementation. This provides a simple decorator-based API for defining tools.
27
+
28
+ ### Code Organization
29
+
30
+ **Separation of Concerns:**
31
+
32
+ - `server.py` (root): Server setup, tool definitions, and delegation to `src/` modules
33
+ - `src/`: Authoritative location for all business logic, including Pinecone client and tool implementations
34
+ - `tools/`: CLI tools that import shared logic from `src/`
35
+
36
+ ### Pinecone Client Migration
37
+
38
+ Move `tools/pinecone_client.py` to `src/pinecone_client.py`. Update all imports in `tools/` to import from `src.pinecone_client` instead. This establishes `src/` as the definitive source for Pinecone interaction logic.
39
+
40
+ ### Configuration
41
+
42
+ Create a configuration module `src/mcp_config.py` that wraps or duplicates Pinecone configuration settings. This provides isolation from the CLI tools configuration.
43
+
44
+ **Configuration Requirements:**
45
+
46
+ - `PINECONE_API_KEY`: Pinecone API key (required)
47
+ - `PINECONE_INDEX_NAME`: Name of the Pinecone index (default: `common-core-standards`)
48
+ - `PINECONE_NAMESPACE`: Namespace for records (default: `standards`)
49
+
50
+ ---
51
+
52
+ ## MCP Tools
53
+
54
+ The server exposes two tools matching the MVP spec interface:
55
+
56
+ ### Tool 1: `find_relevant_standards`
57
+
58
+ Performs semantic search over educational standards using vector similarity.
59
+
60
+ **Parameters:**
61
+
62
+ - `activity` (str, required): Natural language description of the learning activity
63
+ - `max_results` (int, optional, default: 5): Maximum number of standards to return
64
+ - `grade` (str, optional): Grade level filter (e.g., "K", "01", "05", "09")
65
+ - `subject` (str, optional): Subject filter (e.g., "Mathematics", "ELA-Literacy")
66
+
67
+ **Implementation:**
68
+
69
+ - Use Pinecone's `search()` method with integrated embeddings (`llama-text-embed-v2`)
70
+ - Pass the user's query text via `query={"inputs": {"text": activity}, "top_k": max_results}`
71
+ - Embeddings are generated automatically by Pinecone
72
+ - Apply metadata filters inside the query dict: `"filter": {...}` (only include key if filters exist)
73
+ - Always rerank results using `rerank={"model": "bge-reranker-v2-m3", "top_n": max_results, "rank_fields": ["content"]}`
74
+ - Access results via `results['result']['hits']`, extracting `_id`, `_score`, and `fields` from each hit
75
+ - Return top N matches sorted by reranked relevance score
76
+
77
+ **Response Format:**
78
+ Returns a JSON string with structured format:
79
+
80
+ ```json
81
+ {
82
+ "success": true,
83
+ "results": [
84
+ {
85
+ "_id": "EA60C8D165F6481B90BFF782CE193F93",
86
+ "content": "...",
87
+ "subject": "Mathematics",
88
+ "education_levels": ["01"],
89
+ "statement_notation": "CCSS.Math.Content.1.OA.A.1",
90
+ "standard_set_title": "...",
91
+ "score": 0.85
92
+ }
93
+ ],
94
+ "message": "Found 5 matching standards"
95
+ }
96
+ ```
97
+
98
+ ### Tool 2: `get_standard_details`
99
+
100
+ Retrieves a specific standard by its GUID or identifier.
101
+
102
+ **Parameters:**
103
+
104
+ - `standard_id` (str, required): The standard's GUID (`_id` field) or identifier
105
+
106
+ **Implementation:**
107
+
108
+ - Use Pinecone's `fetch()` method with the standard's GUID (`_id` field)
109
+ - If the identifier is not a GUID, may need to query by metadata filter on `statement_notation` or `asn_identifier` fields
110
+
111
+ **Response Format:**
112
+ Returns a JSON string with structured format:
113
+
114
+ ```json
115
+ {
116
+ "success": true,
117
+ "results": [
118
+ {
119
+ "_id": "EA60C8D165F6481B90BFF782CE193F93",
120
+ "content": "...",
121
+ "subject": "Mathematics",
122
+ "education_levels": ["01"],
123
+ "statement_notation": "CCSS.Math.Content.1.OA.A.1",
124
+ "standard_set_title": "...",
125
+ "all_metadata_fields": "..."
126
+ }
127
+ ],
128
+ "message": "Retrieved standard details"
129
+ }
130
+ ```
131
+
132
+ ---
133
+
134
+ ## Error Handling
135
+
136
+ All tools catch exceptions and return structured error responses with `error_type` fields:
137
+
138
+ **Error Response Format:**
139
+
140
+ ```json
141
+ {
142
+ "success": false,
143
+ "results": [],
144
+ "message": "Error description",
145
+ "error_type": "error_category"
146
+ }
147
+ ```
148
+
149
+ **Valid `error_type` values:**
150
+
151
+ - `no_results`: Query returned no matches
152
+ - `invalid_input`: Malformed input (empty string, invalid ID format, etc.)
153
+ - `api_error`: Pinecone API failure or connection error
154
+ - `not_found`: Standard ID doesn't exist (for `get_standard_details`)
155
+
156
+ For `get_standard_details` with invalid ID, include a helpful suggestion:
157
+
158
+ ```json
159
+ {
160
+ "success": false,
161
+ "results": [],
162
+ "message": "Standard 'XYZ.123' not found. Try using find_relevant_standards with a keyword search instead.",
163
+ "error_type": "not_found"
164
+ }
165
+ ```
166
+
167
+ ---
168
+
169
+ ## Pinecone Integration
170
+
171
+ ### Implementation Approach
172
+
173
+ Use the shared `PineconeClient` class from `src/pinecone_client.py`. The MCP server and CLI tools both import and use this shared client, ensuring consistency and avoiding code duplication. The `src/` directory is the authoritative location for Pinecone interaction logic.
174
+
175
+ ### Extending PineconeClient
176
+
177
+ Add search and fetch methods to `src/pinecone_client.py`:
178
+
179
+ - `search_standards()`: Semantic search with filters
180
+ - `fetch_standard()`: Direct ID lookup
181
+
182
+ These methods encapsulate the Pinecone query logic and can be used by both the MCP server and CLI tools.
183
+
184
+ ### Semantic Search Implementation
185
+
186
+ - Use Pinecone's `search()` method with integrated embeddings
187
+ - The index is configured with `llama-text-embed-v2` model and `field_map text=content`
188
+ - Pass query text to Pinecone's `search()` method using the `inputs` parameter - embeddings are generated automatically
189
+ - Always rerank results using the `bge-reranker-v2-m3` model for improved relevance
190
+ - Build filter dictionary dynamically (only include filters with values):
191
+ - If `grade` provided: `{"education_levels": {"$in": [grade]}}`
192
+ - If `subject` provided: `{"subject": {"$eq": subject}}`
193
+ - Combine with `$and` if both: `{"$and": [grade_filter, subject_filter]}`
194
+ - Add filter to query dict only if it exists: `query_dict["filter"] = filter_dict`
195
+ - **Important**: Do not set `filter` to `None` — omit the key entirely when no filters
196
+
197
+ ### Direct ID Lookup Implementation
198
+
199
+ - Use Pinecone's `fetch()` method with the standard's GUID (`_id` field)
200
+ - The `_id` field corresponds to the standard's GUID from the source data
201
+
202
+ ### Pinecone Record Structure
203
+
204
+ Records in Pinecone follow the `PineconeRecord` model structure:
205
+
206
+ - `_id`: Standard GUID (string)
207
+ - `content`: Text content for embedding (string)
208
+ - `subject`: Subject name (string)
209
+ - `education_levels`: List of grade levels (list[str])
210
+ - `statement_notation`: CCSS notation if available (str | None)
211
+ - `standard_set_id`, `standard_set_title`, `jurisdiction_id`, etc.: Additional metadata fields
212
+
213
+ ---
214
+
215
+ ## File Structure
216
+
217
+ ```
218
+ common_core_mcp/
219
+ ├── server.py # MCP server entry point - minimal setup only (NEW)
220
+ ├── src/
221
+ │ ├── pinecone_client.py # Pinecone client (MOVED from tools/) (MODIFIED)
222
+ │ ├── mcp_config.py # MCP-specific configuration (NEW)
223
+ │ ├── search.py # Semantic search logic (NEW)
224
+ │ └── lookup.py # Direct ID lookup logic (NEW)
225
+ ├── tools/ # CLI tools (MODIFIED - imports from src/)
226
+ │ └── pinecone_client.py # (REMOVED - moved to src/)
227
+ ├── data/ # Existing data directory (unchanged)
228
+ └── pyproject.toml # Dependencies (mcp already included)
229
+ ```
230
+
231
+ ### Files to Create
232
+
233
+ - **`server.py`** (project root): Minimal MCP server setup and tool definitions
234
+ - **`src/mcp_config.py`**: MCP-specific configuration module
235
+ - **`src/search.py`**: Semantic search implementation logic
236
+ - **`src/lookup.py`**: Direct ID lookup implementation logic
237
+
238
+ ### Files to Move
239
+
240
+ - **`tools/pinecone_client.py`** → **`src/pinecone_client.py`**: Move Pinecone client to authoritative `src/` location
241
+
242
+ ### Files to Modify
243
+
244
+ - **`tools/cli.py`**: Update imports to use `from src.pinecone_client import PineconeClient` (replace `from tools.pinecone_client`)
245
+ - **`tools/pinecone_processor.py`**: Update imports to use `from src.pinecone_client import PineconeClient` (replace `from tools.pinecone_client`)
246
+ - **`src/pinecone_client.py`**: Add `search_standards()` and `fetch_standard()` methods for MCP server use
247
+ - Any other files in `tools/` that import `pinecone_client`: Update to import from `src.pinecone_client`
248
+
249
+ ### Files to Reference (Existing)
250
+
251
+ - **`tools/pinecone_models.py`**: Contains `PineconeRecord` model structure for reference (may also move to `src/` in future)
252
+ - **`tools/config.py`**: Contains `ToolsSettings` for reference (but MCP uses separate config)
253
+
254
+ ---
255
+
256
+ ## Implementation Details
257
+
258
+ ### Server Entry Point (`server.py`)
259
+
260
+ The `server.py` file should be minimal and focused on setup:
261
+
262
+ 1. Import FastMCP and create server instance:
263
+
264
+ ```python
265
+ from mcp.server.fastmcp import FastMCP
266
+ mcp = FastMCP("CommonCore")
267
+ ```
268
+
269
+ 2. Import tool logic functions from `src/` modules
270
+ 3. Define thin wrapper functions with `@mcp.tool()` decorator that delegate to `src/` logic
271
+ 4. Run server with `mcp.run()` when executed directly
272
+
273
+ **Example Structure:**
274
+
275
+ ```python
276
+ from mcp.server.fastmcp import FastMCP
277
+ from src.search import find_relevant_standards_impl
278
+ from src.lookup import get_standard_details_impl
279
+
280
+ mcp = FastMCP("CommonCore")
281
+
282
+ @mcp.tool()
283
+ def find_relevant_standards(activity: str, max_results: int = 5, grade: str | None = None, subject: str | None = None) -> str:
284
+ """Returns educational standards relevant to the activity."""
285
+ return find_relevant_standards_impl(activity, max_results, grade, subject)
286
+
287
+ @mcp.tool()
288
+ def get_standard_details(standard_id: str) -> str:
289
+ """Returns full metadata for a standard by its GUID or identifier."""
290
+ return get_standard_details_impl(standard_id)
291
+
292
+ if __name__ == "__main__":
293
+ mcp.run()
294
+ ```
295
+
296
+ **Execution:**
297
+
298
+ - Run with: `uv run server.py` or `python server.py`
299
+ - Server communicates via stdio (FastMCP handles transport automatically)
300
+
301
+ ### Configuration Module (`src/mcp_config.py`)
302
+
303
+ Create a configuration module that:
304
+
305
+ - Loads environment variables from `.env` file
306
+ - Provides settings for Pinecone connection (API key, index name, namespace)
307
+ - Can duplicate or wrap settings from `tools/config.py` but maintains isolation
308
+
309
+ **Function Signature:**
310
+
311
+ ```python
312
+ def get_mcp_settings() -> McpSettings:
313
+ """Get MCP server configuration settings."""
314
+ # Returns settings object with pinecone_api_key, pinecone_index_name, pinecone_namespace
315
+ ```
316
+
317
+ ### Search Module (`src/search.py`)
318
+
319
+ Contains the implementation logic for semantic search.
320
+
321
+ **Function Signature:**
322
+
323
+ ```python
324
+ def find_relevant_standards_impl(
325
+ activity: str,
326
+ max_results: int = 5,
327
+ grade: str | None = None,
328
+ subject: str | None = None
329
+ ) -> str:
330
+ """
331
+ Implementation of semantic search over educational standards.
332
+
333
+ Args:
334
+ activity: Description of the learning activity
335
+ max_results: Maximum number of standards to return (default: 5)
336
+ grade: Optional grade level filter (e.g., "K", "01", "05", "09")
337
+ subject: Optional subject filter (e.g., "Mathematics", "ELA-Literacy")
338
+
339
+ Returns:
340
+ JSON string with structured response containing matching standards
341
+ """
342
+ # Uses PineconeClient from src.pinecone_client
343
+ # Handles error cases and returns JSON response
344
+ ```
345
+
346
+ ### Lookup Module (`src/lookup.py`)
347
+
348
+ Contains the implementation logic for direct ID lookup.
349
+
350
+ **Function Signature:**
351
+
352
+ ```python
353
+ def get_standard_details_impl(standard_id: str) -> str:
354
+ """
355
+ Implementation of direct standard lookup by ID.
356
+
357
+ Args:
358
+ standard_id: The standard's GUID (_id field) or identifier
359
+
360
+ Returns:
361
+ JSON string with structured response containing standard details
362
+ """
363
+ # Uses PineconeClient from src.pinecone_client
364
+ # Handles error cases and returns JSON response
365
+ ```
366
+
367
+ ### PineconeClient Extensions (`src/pinecone_client.py`)
368
+
369
+ Add methods to the `PineconeClient` class (moved from `tools/`):
370
+
371
+ **New Methods:**
372
+
373
+ ```python
374
+ def search_standards(
375
+ self,
376
+ query_text: str,
377
+ top_k: int = 5,
378
+ grade: str | None = None,
379
+ subject: str | None = None
380
+ ) -> list[dict]:
381
+ """
382
+ Perform semantic search over standards.
383
+
384
+ Args:
385
+ query_text: Natural language query
386
+ top_k: Maximum number of results
387
+ grade: Optional grade filter
388
+ subject: Optional subject filter
389
+
390
+ Returns:
391
+ List of result dictionaries with metadata and scores
392
+ """
393
+
394
+ def fetch_standard(self, standard_id: str) -> dict | None:
395
+ """
396
+ Fetch a standard by its GUID.
397
+
398
+ Args:
399
+ standard_id: Standard GUID (_id field)
400
+
401
+ Returns:
402
+ Standard dictionary with metadata, or None if not found
403
+ """
404
+ ```
405
+
406
+ ### Pinecone Query Implementation
407
+
408
+ **Semantic Search Workflow (`src/search.py`):**
409
+
410
+ 1. Import `PineconeClient` from `src.pinecone_client`
411
+ 2. Initialize client instance (or use singleton pattern)
412
+ 3. Call `client.search_standards()` with parameters
413
+ 4. Format results into JSON response structure
414
+ 5. Handle errors and return appropriate error responses
415
+
416
+ **Implementation in `PineconeClient.search_standards()` (`src/pinecone_client.py`):**
417
+
418
+ 1. Build Pinecone filter dictionary from optional parameters:
419
+ - If `grade` provided: Add `{"education_levels": {"$in": [grade]}}`
420
+ - If `subject` provided: Add `{"subject": {"$eq": subject}}`
421
+ - Combine filters with `$and` if both provided
422
+ 2. Build the query dictionary:
423
+ - `"inputs": {"text": query_text}` for text queries (embeddings generated automatically)
424
+ - `"top_k": top_k * 2` to get more candidates for reranking
425
+ - `"filter": filter_dict` only if filters are provided (omit key if no filters)
426
+ 3. Call `index.search()` with:
427
+ - `namespace=namespace` (from config)
428
+ - `query=query_dict` (the constructed query dictionary)
429
+ - `rerank={"model": "bge-reranker-v2-m3", "top_n": top_k, "rank_fields": ["content"]}` (always enabled)
430
+ 4. Access results via `results['result']['hits']`
431
+ 5. Extract `_id`, `_score`, and `fields` from each hit and return list of result dictionaries
432
+
433
+ **Response Parsing:**
434
+
435
+ Access search results via `results['result']['hits']`. Each hit contains:
436
+
437
+ - `hit['_id']`: Record ID
438
+ - `hit['_score']`: Reranked relevance score
439
+ - `hit['fields']`: Dictionary of metadata fields (e.g., `hit['fields']['content']`, `hit['fields']['subject']`)
440
+
441
+ Example parsing:
442
+
443
+ ```python
444
+ for hit in results['result']['hits']:
445
+ record = {
446
+ "_id": hit["_id"],
447
+ "score": hit["_score"],
448
+ **hit["fields"] # Spread all metadata fields
449
+ }
450
+ ```
451
+
452
+ **Direct ID Lookup Workflow (`src/lookup.py`):**
453
+
454
+ 1. Import `PineconeClient` from `src.pinecone_client`
455
+ 2. Initialize client instance (or use singleton pattern)
456
+ 3. Call `client.fetch_standard()` with standard_id
457
+ 4. If found, format into JSON response structure
458
+ 5. If not found, return error response with `error_type: "not_found"`
459
+
460
+ **Implementation in `PineconeClient.fetch_standard()` (`src/pinecone_client.py`):**
461
+
462
+ 1. Call `index.fetch()` with:
463
+ - `ids=[standard_id]`
464
+ - `namespace=namespace` (from config)
465
+ 2. Extract result from returned dictionary
466
+ 3. Return standard dictionary with metadata, or None if not found
467
+
468
+ ### Error Handling Implementation
469
+
470
+ Error handling is implemented in the `src/` modules (`src/search.py` and `src/lookup.py`):
471
+
472
+ **In `src/search.py` (`find_relevant_standards_impl`):**
473
+
474
+ 1. Validate input parameters (e.g., empty strings, None values)
475
+ 2. Wrap `PineconeClient.search_standards()` call in try/except
476
+ 3. Catch `PineconeException` and map to appropriate `error_type`
477
+ 4. Handle empty results case
478
+ 5. Return structured JSON error responses
479
+ 6. Never raise exceptions - always return JSON response
480
+
481
+ **In `src/lookup.py` (`get_standard_details_impl`):**
482
+
483
+ 1. Validate input parameters (e.g., empty strings, None values)
484
+ 2. Wrap `PineconeClient.fetch_standard()` call in try/except
485
+ 3. Catch `PineconeException` and map to appropriate `error_type`
486
+ 4. Handle None result (not found)
487
+ 5. Return structured JSON error responses
488
+ 6. Never raise exceptions - always return JSON response
489
+
490
+ **Error Mapping:**
491
+
492
+ - `PineconeException` → `error_type: "api_error"`
493
+ - Empty `activity` or `standard_id` → `error_type: "invalid_input"`
494
+ - No results from query → `error_type: "no_results"`
495
+ - ID not found in fetch (returns None) → `error_type: "not_found"`
496
+
497
+ ---
498
+
499
+ ## Dependencies
500
+
501
+ The `mcp` package is already included in `pyproject.toml`. Ensure `pinecone` is also available (already included). No additional dependencies are required for this sprint.
502
+
503
+ ---
504
+
505
+ ## Running the Server
506
+
507
+ ### Local Development
508
+
509
+ 1. Ensure environment variables are set in `.env`:
510
+
511
+ ```
512
+ PINECONE_API_KEY=your_api_key_here
513
+ PINECONE_INDEX_NAME=common-core-standards
514
+ PINECONE_NAMESPACE=standards
515
+ ```
516
+
517
+ 2. Run the server:
518
+
519
+ ```bash
520
+ uv run server.py
521
+ ```
522
+
523
+ Or:
524
+
525
+ ```bash
526
+ python server.py
527
+ ```
528
+
529
+ 3. The server communicates via stdio. FastMCP handles the MCP protocol transport automatically.
530
+
531
+ ### Claude Desktop Integration
532
+
533
+ To connect Claude Desktop to the local MCP server, add configuration:
534
+
535
+ **macOS:** `~/Library/Application Support/Claude/claude_desktop_config.json`
536
+ **Windows:** `%APPDATA%\Claude\claude_desktop_config.json`
537
+
538
+ ```json
539
+ {
540
+ "mcpServers": {
541
+ "common-core": {
542
+ "command": "uv",
543
+ "args": ["run", "server.py"],
544
+ "cwd": "/absolute/path/to/common_core_mcp"
545
+ }
546
+ }
547
+ }
548
+ ```
549
+
550
+ **Important:** Replace `/absolute/path/to/common_core_mcp` with the actual absolute path to your project directory.
551
+
552
+ ---
553
+
554
+ ## Testing
555
+
556
+ Skip tests for this sprint. Focus on getting the server working first. Tests can be added in a future sprint.
557
+
558
+ ### Manual Validation
559
+
560
+ To validate the server works:
561
+
562
+ 1. Run `server.py` and verify it starts without errors
563
+ 2. Connect Claude Desktop and verify tools appear
564
+ 3. Test `find_relevant_standards` with a sample activity
565
+ 4. Test `get_standard_details` with a known GUID
566
+ 5. Test error cases (invalid ID, empty query, etc.)
567
+
568
+ ---
569
+
570
+ ## Limitations and Future Work
571
+
572
+ - **Tools Only**: The MCP server only supports tools for now. Prompts and resources are not included in this sprint.
573
+ - **No Reasoning**: The server does not include LLM reasoning/explanations for why standards match activities. This matches the MVP spec's `ask_llama` functionality but is deferred for now.
574
+ - **Limited Filters**: Only `grade` and `subject` filters are supported initially. Additional filters (e.g., `is_leaf`, `standard_set_id`, `jurisdiction_id`) can be added in future sprints.
.agent/specs/002_mcp/tasks.md ADDED
@@ -0,0 +1,69 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Spec Tasks
2
+
3
+ ## Tasks
4
+
5
+ - [x] 1. Create MCP Configuration Module
6
+
7
+ - [x] 1.1 Create `src/mcp_config.py` with `McpSettings` class using Pydantic BaseSettings
8
+ - [x] 1.2 Add `pinecone_api_key` (required), `pinecone_index_name` (default: "common-core-standards"), and `pinecone_namespace` (default: "standards") fields
9
+ - [x] 1.3 Configure to load from `.env` file with `env_file=".env"` and `env_file_encoding="utf-8"`
10
+ - [x] 1.4 Create `get_mcp_settings()` function that returns a singleton `McpSettings` instance
11
+ - [x] 1.5 Add validation to ensure `pinecone_api_key` is not empty (raise ValueError if missing)
12
+
13
+ - [x] 2. Move PineconeClient to src/ Directory
14
+
15
+ - [x] 2.1 Move `tools/pinecone_client.py` to `src/pinecone_client.py`
16
+ - [x] 2.2 Update imports in `src/pinecone_client.py`: change `from tools.config import get_settings` to use `src.mcp_config.get_mcp_settings()` instead
17
+ - [x] 2.3 Update `PineconeClient.__init__()` to use `get_mcp_settings()` from `src.mcp_config`
18
+ - [x] 2.4 Verify `src/pinecone_client.py` imports `PineconeRecord` from `tools.pinecone_models` (keep this import for now)
19
+
20
+ - [x] 3. Update Tools Imports to Use src.pinecone_client
21
+
22
+ - [x] 3.1 Update `tools/cli.py`: replace `from tools.pinecone_client import PineconeClient` with `from src.pinecone_client import PineconeClient` (2 occurrences)
23
+ - [x] 3.2 Check `tools/pinecone_processor.py` for any `pinecone_client` imports and update if present
24
+ - [x] 3.3 Verify all imports work correctly by checking for any remaining references to `tools.pinecone_client`
25
+
26
+ - [x] 4. Add search_standards() Method to PineconeClient
27
+
28
+ - [x] 4.1 Add `search_standards()` method signature: `def search_standards(self, query_text: str, top_k: int = 5, grade: str | None = None, subject: str | None = None) -> list[dict]`
29
+ - [x] 4.2 Build filter dictionary dynamically: create empty list, add `{"education_levels": {"$in": [grade]}}` if grade provided, add `{"subject": {"$eq": subject}}` if subject provided, combine with `$and` if both exist
30
+ - [x] 4.3 Build query dictionary: `{"inputs": {"text": query_text}, "top_k": top_k * 2}` (double for reranking candidates), add `"filter": filter_dict` only if filter_dict exists
31
+ - [x] 4.4 Call `index.search()` with `namespace=self.namespace`, `query=query_dict`, and `rerank={"model": "bge-reranker-v2-m3", "top_n": top_k, "rank_fields": ["content"]}`
32
+ - [x] 4.5 Parse results: access `results['result']['hits']`, extract `_id`, `_score`, and `fields` from each hit, combine into dict with `{"_id": hit["_id"], "score": hit["_score"], **hit["fields"]}`
33
+ - [x] 4.6 Return list of result dictionaries
34
+
35
+ - [x] 5. Add fetch_standard() Method to PineconeClient
36
+
37
+ - [x] 5.1 Add `fetch_standard()` method signature: `def fetch_standard(self, standard_id: str) -> dict | None`
38
+ - [x] 5.2 Call `index.fetch()` with `ids=[standard_id]` and `namespace=self.namespace`
39
+ - [x] 5.3 Extract result from `result.records` dictionary using `standard_id` as key
40
+ - [x] 5.4 If record found, extract `_id` and all fields from `record.fields`, combine into dict
41
+ - [x] 5.5 Return dictionary with metadata or `None` if not found
42
+
43
+ - [x] 6. Create search.py Module with Semantic Search Implementation
44
+
45
+ - [x] 6.1 Create `src/search.py` with imports: `json`, `PineconeClient` from `src.pinecone_client`, `PineconeException` from `pinecone.exceptions`
46
+ - [x] 6.2 Implement `find_relevant_standards_impl()` function with signature matching spec (activity, max_results=5, grade=None, subject=None) -> str
47
+ - [x] 6.3 Add input validation: check if `activity` is empty or None, return error JSON with `error_type: "invalid_input"` if invalid
48
+ - [x] 6.4 Wrap `PineconeClient.search_standards()` call in try/except, catch `PineconeException` and return error JSON with `error_type: "api_error"`
49
+ - [x] 6.5 Handle empty results: if results list is empty, return error JSON with `error_type: "no_results"` and message "No matching standards found"
50
+ - [x] 6.6 Format successful results: create response dict with `success: True`, `results` list (each with `_id`, `score`, and all metadata fields), and `message` with count
51
+ - [x] 6.7 Return JSON string using `json.dumps()` with proper formatting
52
+
53
+ - [x] 7. Create lookup.py Module with Direct ID Lookup Implementation
54
+
55
+ - [x] 7.1 Create `src/lookup.py` with imports: `json`, `PineconeClient` from `src.pinecone_client`, `PineconeException` from `pinecone.exceptions`
56
+ - [x] 7.2 Implement `get_standard_details_impl()` function with signature: `(standard_id: str) -> str`
57
+ - [x] 7.3 Add input validation: check if `standard_id` is empty or None, return error JSON with `error_type: "invalid_input"` if invalid
58
+ - [x] 7.4 Wrap `PineconeClient.fetch_standard()` call in try/except, catch `PineconeException` and return error JSON with `error_type: "api_error"`
59
+ - [x] 7.5 Handle not found: if `fetch_standard()` returns `None`, return error JSON with `error_type: "not_found"` and helpful message suggesting to use `find_relevant_standards`
60
+ - [x] 7.6 Format successful result: create response dict with `success: True`, `results` list containing single standard dict with all metadata, and `message: "Retrieved standard details"`
61
+ - [x] 7.7 Return JSON string using `json.dumps()` with proper formatting
62
+
63
+ - [x] 8. Create server.py MCP Entry Point
64
+ - [x] 8.1 Create `server.py` in project root with imports: `FastMCP` from `mcp.server.fastmcp`, `find_relevant_standards_impl` from `src.search`, `get_standard_details_impl` from `src.lookup`
65
+ - [x] 8.2 Initialize FastMCP server: `mcp = FastMCP("CommonCore")`
66
+ - [x] 8.3 Define `find_relevant_standards` tool with `@mcp.tool()` decorator, signature matching spec (activity, max_results=5, grade=None, subject=None) -> str, docstring "Returns educational standards relevant to the activity"
67
+ - [x] 8.4 Define `get_standard_details` tool with `@mcp.tool()` decorator, signature `(standard_id: str) -> str`, docstring "Returns full metadata for a standard by its GUID or identifier"
68
+ - [x] 8.5 Add `if __name__ == "__main__": mcp.run()` block to start server
69
+ - [x] 8.6 Verify server starts without errors by running `uv run server.py`
.agent/specs/003_gradio/spec.md ADDED
@@ -0,0 +1,1167 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Gradio MCP Server Sprint Specification
2
+
3
+ ## Overview
4
+
5
+ This sprint replaces the existing FastMCP server implementation with a Gradio-based MCP server that can be hosted publicly on Hugging Face Spaces. The Gradio app will expose the Common Core Standards MCP tools and include a chat interface that demonstrates MCP tool calling capabilities. This enables public access to the MCP server and provides a demonstration interface for the hackathon submission.
6
+
7
+ ## Key Changes
8
+
9
+ - Replace `server.py` (FastMCP) with `app.py` (Gradio MCP server)
10
+ - Update dependencies to use Gradio 6.0.0+ with MCP support
11
+ - Remove FastMCP dependency from `pyproject.toml`
12
+ - Create Hugging Face Space configuration files
13
+ - Implement chat interface with MCP tool calling support
14
+ - Update README to meet hackathon requirements
15
+
16
+ ## User Stories
17
+
18
+ 1. **As a developer**, I want to access the MCP server via a public Hugging Face Space URL so that I can use it from any MCP client without running it locally.
19
+
20
+ 2. **As a user**, I want to interact with a chat interface that can answer questions about educational standards using the MCP tools, so that I can see how the MCP server works in practice.
21
+
22
+ 3. **As a hackathon judge**, I want to see a working MCP server hosted on Hugging Face Spaces with proper documentation, so that I can evaluate the submission.
23
+
24
+ ## Technical Architecture
25
+
26
+ ### Gradio MCP Server Implementation
27
+
28
+ Gradio 6 introduces native MCP server support. When `mcp_server=True` is set in `demo.launch()`, Gradio automatically:
29
+
30
+ 1. Converts each API endpoint (function) into an MCP tool
31
+ 2. Uses function docstrings and type hints to generate tool descriptions and parameter schemas
32
+ 3. Exposes the MCP server at `http://your-server:port/gradio_api/mcp/`
33
+ 4. Provides an SSE (Server-Sent Events) endpoint for MCP clients
34
+
35
+ ### Function to MCP Tool Conversion
36
+
37
+ Gradio automatically converts functions with proper docstrings and type hints into MCP tools:
38
+
39
+ - **Function name** → Tool name
40
+ - **Docstring** → Tool description
41
+ - **Type hints** → Parameter schema
42
+ - **Default values** → Default parameter values (from component initial values)
43
+
44
+ ### MCP Server Endpoints
45
+
46
+ When `mcp_server=True` is enabled:
47
+
48
+ - **MCP Schema**: `http://your-server:port/gradio_api/mcp/schema`
49
+ - **MCP SSE Endpoint**: `http://your-server:port/gradio_api/mcp/` (for MCP clients)
50
+ - **MCP Documentation**: Available via "View API" link in Gradio app footer
51
+
52
+ ### MCP Server Activation
53
+
54
+ **We will use the `demo.launch(mcp_server=True)` parameter approach** (not the environment variable method). This provides explicit control and makes the MCP server activation clear in the code.
55
+
56
+ ## Implementation Details
57
+
58
+ ### Dependencies Update
59
+
60
+ **File: `pyproject.toml`**
61
+
62
+ **Explicit Requirement**: Update Gradio dependency to version 6.0.0 or higher **with MCP extras**:
63
+
64
+ ```toml
65
+ dependencies = [
66
+ "gradio[mcp]>=6.0.0",
67
+ "pinecone",
68
+ "python-dotenv",
69
+ "typer",
70
+ "requests",
71
+ "rich",
72
+ "loguru",
73
+ "pydantic>=2.0.0",
74
+ "pydantic-settings>=2.0.0",
75
+ "huggingface_hub",
76
+ ]
77
+ ```
78
+
79
+ **Important Notes:**
80
+
81
+ - The `[mcp]` extra ensures all MCP dependencies are installed
82
+ - Remove the standalone `mcp` package dependency if present (FastMCP is no longer used)
83
+ - Add `huggingface_hub` for Inference API access in the chat interface
84
+
85
+ ### Gradio App Structure
86
+
87
+ **File: `app.py`** (new file, replaces `server.py`)
88
+
89
+ The Gradio app should:
90
+
91
+ 1. **Expose MCP Tools**: Create functions that wrap the existing `src/search.py` and `src/lookup.py` implementations
92
+ 2. **Enable MCP Server**: Set `mcp_server=True` in `demo.launch()`
93
+ 3. **Include Chat Interface**: Use `gr.ChatInterface` with a function that supports MCP tool calling
94
+
95
+ **Function Requirements for MCP Tools:**
96
+
97
+ - Functions must have detailed docstrings in the format:
98
+
99
+ ```python
100
+ def function_name(param1: type, param2: type) -> return_type:
101
+ """
102
+ Description of what the function does.
103
+
104
+ Args:
105
+ param1: Description of param1
106
+ param2: Description of param2
107
+
108
+ Returns:
109
+ Description of return value
110
+ """
111
+ ```
112
+
113
+ - Type hints are required for all parameters
114
+ - Default values can be set via component initial values (e.g., `gr.Textbox("default value")`)
115
+
116
+ **Example Structure:**
117
+
118
+ ```python
119
+ import gradio as gr
120
+ from src.search import find_relevant_standards_impl
121
+ from src.lookup import get_standard_details_impl
122
+
123
+ def find_relevant_standards(
124
+ activity: str,
125
+ max_results: int = 5,
126
+ grade: str | None = None,
127
+ subject: str | None = None,
128
+ ) -> str:
129
+ """
130
+ Searches for educational standards relevant to a learning activity using semantic search.
131
+
132
+ This function performs a vector similarity search over the Common Core Standards database
133
+ to find standards that match the described learning activity. Results are ranked by relevance
134
+ and can be filtered by grade level and subject area.
135
+
136
+ Args:
137
+ activity: A natural language description of the learning activity, lesson, or educational
138
+ objective. Examples: "teaching fractions to third graders", "reading comprehension
139
+ activities", "solving quadratic equations". This is the primary search query and should
140
+ be descriptive and specific for best results.
141
+
142
+ max_results: The maximum number of standards to return. Must be between 1 and 20.
143
+ Default is 5. Higher values return more results but may include less relevant matches.
144
+
145
+ grade: Optional grade level filter. Must be one of the following valid grade level codes:
146
+ - "K" for Kindergarten
147
+ - "01" for Grade 1
148
+ - "02" for Grade 2
149
+ - "03" for Grade 3
150
+ - "04" for Grade 4
151
+ - "05" for Grade 5
152
+ - "06" for Grade 6
153
+ - "07" for Grade 7
154
+ - "08" for Grade 8
155
+ - "09" for Grade 9
156
+ - "10" for Grade 10
157
+ - "11" for Grade 11
158
+ - "12" for Grade 12
159
+ - "09-12" for high school range (when standards span multiple high school grades)
160
+
161
+ If None or empty string, no grade filtering is applied and standards from all grade
162
+ levels may be returned. The grade filter uses exact matching against the education_levels
163
+ metadata field in the database.
164
+
165
+ subject: Optional subject area filter. Common values include:
166
+ - "Mathematics" or "Math"
167
+ - "ELA-Literacy" or "English Language Arts"
168
+ - "Science"
169
+ - "Social Studies"
170
+ - Other subject names as they appear in the standards database
171
+
172
+ If None or empty string, no subject filtering is applied. The subject filter uses
173
+ case-insensitive matching against the subject metadata field.
174
+
175
+ Returns:
176
+ A JSON string containing a structured response with the following format:
177
+ {
178
+ "success": true|false,
179
+ "results": [
180
+ {
181
+ "_id": "standard_guid",
182
+ "content": "full standard text with hierarchy",
183
+ "subject": "Mathematics",
184
+ "education_levels": ["03"],
185
+ "statement_notation": "3.NF.A.1",
186
+ "standard_set_title": "Grade 3",
187
+ "score": 0.85
188
+ },
189
+ ...
190
+ ],
191
+ "message": "Found N matching standards" or error message,
192
+ "error_type": null or error type if success is false
193
+ }
194
+
195
+ On success, the results array contains up to max_results standards, sorted by relevance
196
+ score (highest first). Each result includes the full standard content, metadata, and
197
+ relevance score. On error, success is false and an error message describes the issue.
198
+ """
199
+ # Handle empty string from dropdown (convert to None)
200
+ if grade == "":
201
+ grade = None
202
+ if subject == "":
203
+ subject = None
204
+
205
+ # Ensure max_results is an integer (gr.Number returns float by default)
206
+ max_results = int(max_results)
207
+
208
+ return find_relevant_standards_impl(activity, max_results, grade, subject)
209
+
210
+ def get_standard_details(standard_id: str) -> str:
211
+ """
212
+ Retrieves complete metadata and content for a specific educational standard by its identifier.
213
+
214
+ This function performs a direct lookup of a standard using its unique identifier. The identifier
215
+ can be either the standard's GUID (a unique UUID-like string) or its statement notation
216
+ (the human-readable code like "3.NF.A.1" or "CCSS.Math.Content.3.NF.A.1").
217
+
218
+ Args:
219
+ standard_id: The unique identifier for the standard. This can be:
220
+ - A GUID (e.g., "EA60C8D165F6481B90BFF782CE193F93"): The internal database ID
221
+ - A statement notation (e.g., "3.NF.A.1"): The standard's notation code
222
+ - An ASN identifier (e.g., "S21238682"): If available in the standard's metadata
223
+
224
+ The function will attempt to match the identifier against multiple fields in the database.
225
+ GUIDs provide the fastest and most reliable lookup. Statement notations may match
226
+ multiple standards if the notation format is ambiguous.
227
+
228
+ Returns:
229
+ A JSON string containing a structured response with the following format:
230
+ {
231
+ "success": true|false,
232
+ "results": [
233
+ {
234
+ "_id": "standard_guid",
235
+ "content": "full standard text with hierarchy",
236
+ "subject": "Mathematics",
237
+ "education_levels": ["03"],
238
+ "statement_notation": "3.NF.A.1",
239
+ "standard_set_title": "Grade 3",
240
+ "asn_identifier": "S21238682",
241
+ "depth": 3,
242
+ "is_leaf": true,
243
+ "parent_id": "parent_guid",
244
+ "ancestor_ids": [...],
245
+ "child_ids": [...],
246
+ ... (all available metadata fields)
247
+ }
248
+ ],
249
+ "message": "Retrieved standard details" or error message,
250
+ "error_type": null or error type if success is false
251
+ }
252
+
253
+ On success, the results array contains exactly one standard object with all available
254
+ metadata fields including hierarchy relationships, content, and identifiers. On error
255
+ (e.g., standard not found), success is false and the message provides guidance, such as
256
+ suggesting to use find_relevant_standards for searching.
257
+
258
+ Raises:
259
+ This function does not raise exceptions. All errors are returned as JSON responses
260
+ with success=false and appropriate error messages.
261
+ """
262
+ return get_standard_details_impl(standard_id)
263
+
264
+ # Chat interface function - see complete implementation in Chat Interface Implementation section below
265
+
266
+ # Create Gradio interface
267
+ demo = gr.TabbedInterface(
268
+ [
269
+ gr.Interface(
270
+ fn=find_relevant_standards,
271
+ inputs=[
272
+ gr.Textbox(label="Activity Description", placeholder="Describe a learning activity..."),
273
+ gr.Number(label="Max Results", value=5, minimum=1, maximum=20),
274
+ gr.Dropdown(
275
+ label="Grade (optional)",
276
+ choices=["", "K", "01", "02", "03", "04", "05", "06", "07", "08", "09", "10", "11", "12", "09-12"],
277
+ value=None,
278
+ info="Select a grade level to filter results"
279
+ ),
280
+ gr.Textbox(label="Subject (optional)", placeholder="e.g., Mathematics, ELA-Literacy"),
281
+ ],
282
+ outputs=gr.JSON(label="Results"),
283
+ title="Find Relevant Standards",
284
+ description="Search for educational standards relevant to a learning activity.",
285
+ api_name="find_relevant_standards",
286
+ ),
287
+ gr.Interface(
288
+ fn=get_standard_details,
289
+ inputs=gr.Textbox(label="Standard ID", placeholder="Enter a standard GUID or identifier..."),
290
+ outputs=gr.JSON(label="Standard Details"),
291
+ title="Get Standard Details",
292
+ description="Retrieve full metadata for a specific standard by its ID.",
293
+ api_name="get_standard_details",
294
+ ),
295
+ gr.ChatInterface(
296
+ fn=chat_with_standards, # See complete implementation in Chat Interface Implementation section
297
+ type="messages", # Required in Gradio 6 - uses OpenAI-style message format
298
+ title="Chat with Standards",
299
+ description="Ask questions about educational standards. The AI will use MCP tools to find relevant information.",
300
+ examples=["What standards apply to teaching fractions in 3rd grade?", "Find standards for reading comprehension"],
301
+ ),
302
+ ],
303
+ ["Search", "Lookup", "Chat"],
304
+ )
305
+
306
+ if __name__ == "__main__":
307
+ demo.launch(mcp_server=True)
308
+ ```
309
+
310
+ ### Chat Interface Implementation
311
+
312
+ **Priority: First Priority** - The chat interface is a required deliverable for this sprint.
313
+
314
+ **Minimum Viable Implementation:**
315
+
316
+ - Use Hugging Face Inference API with a free/open model that supports MCP tool calling
317
+ - Model should be able to call the MCP tools (`find_relevant_standards` and `get_standard_details`)
318
+ - Chat function should integrate with the MCP server to answer questions about educational requirements
319
+
320
+ **Model Selection (Researched and Verified):**
321
+
322
+ **Selected Model: `Qwen/Qwen2.5-7B-Instruct`**
323
+
324
+ This model has been verified to:
325
+
326
+ - Support tool/function calling via Hugging Face Inference API
327
+ - Be available through Inference Providers (Together AI, Featherless AI)
328
+ - Have good performance for chat applications
329
+ - Support the OpenAI-compatible function calling format used by InferenceClient
330
+ - Be actively maintained and widely used (57.9M+ downloads as of research date)
331
+
332
+ **Important:** The model requires specifying an inference provider (e.g., `provider="together"` or `provider="nebius"`) when using InferenceClient.
333
+
334
+ **Alternative (for more complex queries):** `Qwen/Qwen2.5-72B-Instruct` (larger, more capable, available via Nebius provider)
335
+
336
+ **Implementation Details:**
337
+
338
+ The chat function will use Hugging Face's `InferenceClient` with function calling. Since the MCP tools (`find_relevant_standards` and `get_standard_details`) are exposed by the same Gradio app, we can call them directly as Python functions rather than making HTTP requests to the MCP server endpoint. This is more efficient and simpler.
339
+
340
+ **Complete Chat Function Implementation:**
341
+
342
+ ```python
343
+ import os
344
+ import json
345
+ from typing import Any
346
+ from huggingface_hub import InferenceClient
347
+ from src.search import find_relevant_standards_impl
348
+ from src.lookup import get_standard_details_impl
349
+
350
+ # Initialize the Hugging Face Inference Client
351
+ # Use HF_TOKEN from environment (automatically available in Hugging Face Spaces)
352
+ # Provider is required for models that need Inference Providers (e.g., Together AI, Nebius)
353
+ HF_TOKEN = os.environ.get("HF_TOKEN")
354
+ client = InferenceClient(
355
+ provider="together", # Required: specifies the inference provider for tool calling
356
+ token=HF_TOKEN
357
+ )
358
+
359
+ # Define the function schemas in OpenAI format for the model
360
+ TOOLS = [
361
+ {
362
+ "type": "function",
363
+ "function": {
364
+ "name": "find_relevant_standards",
365
+ "description": "Searches for educational standards relevant to a learning activity using semantic search. Use this when the user asks about standards for a specific activity, lesson, or educational objective.",
366
+ "parameters": {
367
+ "type": "object",
368
+ "properties": {
369
+ "activity": {
370
+ "type": "string",
371
+ "description": "A natural language description of the learning activity, lesson, or educational objective. Be specific and descriptive."
372
+ },
373
+ "max_results": {
374
+ "type": "integer",
375
+ "description": "Maximum number of standards to return (1-20). Default is 5.",
376
+ "default": 5,
377
+ "minimum": 1,
378
+ "maximum": 20
379
+ },
380
+ "grade": {
381
+ "type": "string",
382
+ "description": "Optional grade level filter. Valid values: K, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, or 09-12 for high school range.",
383
+ "enum": ["K", "01", "02", "03", "04", "05", "06", "07", "08", "09", "10", "11", "12", "09-12"]
384
+ },
385
+ "subject": {
386
+ "type": "string",
387
+ "description": "Optional subject area filter (e.g., 'Mathematics', 'ELA-Literacy', 'Science')."
388
+ }
389
+ },
390
+ "required": ["activity"]
391
+ }
392
+ }
393
+ },
394
+ {
395
+ "type": "function",
396
+ "function": {
397
+ "name": "get_standard_details",
398
+ "description": "Retrieves complete metadata and content for a specific educational standard by its identifier (GUID or statement notation). Use this when the user asks about a specific standard or wants details about a standard mentioned in previous results.",
399
+ "parameters": {
400
+ "type": "object",
401
+ "properties": {
402
+ "standard_id": {
403
+ "type": "string",
404
+ "description": "The unique identifier for the standard. Can be a GUID (UUID-like string) or statement notation (e.g., '3.NF.A.1')."
405
+ }
406
+ },
407
+ "required": ["standard_id"]
408
+ }
409
+ }
410
+ }
411
+ ]
412
+
413
+ # Function registry for executing tool calls
414
+ AVAILABLE_FUNCTIONS = {
415
+ "find_relevant_standards": find_relevant_standards_impl,
416
+ "get_standard_details": get_standard_details_impl,
417
+ }
418
+
419
+ def chat_with_standards(message: str, history: list) -> str:
420
+ """
421
+ Chat function that uses MCP tools via Hugging Face Inference API with tool calling.
422
+
423
+ This function integrates with Qwen2.5-7B-Instruct to answer questions about educational
424
+ standards. The model can call find_relevant_standards and get_standard_details tools
425
+ to retrieve information and provide accurate responses.
426
+
427
+ Args:
428
+ message: The user's current message/query
429
+ history: Chat history in Gradio 6 messages format. Each message is a dict with
430
+ "role" and "content" keys. In Gradio 6, content uses structured format:
431
+ [{"type": "text", "text": "..."}, ...] for text content.
432
+
433
+ Returns:
434
+ The assistant's response as a string, incorporating information from MCP tools
435
+ when relevant.
436
+ """
437
+ # Convert Gradio 6 history format to OpenAI messages format
438
+ # Gradio 6 uses structured content: {"role": "user", "content": [{"type": "text", "text": "..."}]}
439
+ messages = []
440
+ if history:
441
+ for msg in history:
442
+ if isinstance(msg, dict):
443
+ role = msg.get("role", "user")
444
+ content = msg.get("content", "")
445
+
446
+ # Handle Gradio 6 structured content format
447
+ if isinstance(content, list):
448
+ # Extract text from content blocks
449
+ text_parts = []
450
+ for block in content:
451
+ if isinstance(block, dict) and block.get("type") == "text":
452
+ text_parts.append(block.get("text", ""))
453
+ content = " ".join(text_parts)
454
+
455
+ messages.append({
456
+ "role": role,
457
+ "content": content
458
+ })
459
+
460
+ # Add system message to guide the model
461
+ system_message = {
462
+ "role": "system",
463
+ "content": "You are a helpful assistant that answers questions about educational standards. You have access to tools that can search for standards and retrieve standard details. Use these tools when users ask about standards, learning activities, or educational requirements. Always provide clear, helpful responses based on the tool results."
464
+ }
465
+
466
+ # Add current user message
467
+ messages.append({"role": "user", "content": message})
468
+
469
+ # Prepare full message list with system message
470
+ full_messages = [system_message] + messages
471
+
472
+ try:
473
+ # Initial API call with tools
474
+ response = client.chat.completions.create(
475
+ model="Qwen/Qwen2.5-7B-Instruct",
476
+ messages=full_messages,
477
+ tools=TOOLS,
478
+ tool_choice="auto", # Let the model decide when to call functions
479
+ temperature=0.7,
480
+ max_tokens=1000,
481
+ )
482
+
483
+ response_message = response.choices[0].message
484
+
485
+ # Check if model wants to call functions
486
+ if response_message.tool_calls:
487
+ # Add assistant's tool call request to messages
488
+ full_messages.append(response_message)
489
+
490
+ # Process each tool call
491
+ for tool_call in response_message.tool_calls:
492
+ function_name = tool_call.function.name
493
+ function_args = json.loads(tool_call.function.arguments)
494
+
495
+ # Execute the function
496
+ if function_name in AVAILABLE_FUNCTIONS:
497
+ if function_name == "find_relevant_standards":
498
+ result = AVAILABLE_FUNCTIONS[function_name](
499
+ activity=function_args.get("activity", ""),
500
+ max_results=function_args.get("max_results", 5),
501
+ grade=function_args.get("grade"),
502
+ subject=function_args.get("subject")
503
+ )
504
+ elif function_name == "get_standard_details":
505
+ result = AVAILABLE_FUNCTIONS[function_name](
506
+ standard_id=function_args.get("standard_id", "")
507
+ )
508
+ else:
509
+ result = json.dumps({"error": f"Unknown function: {function_name}"})
510
+ else:
511
+ result = json.dumps({"error": f"Function {function_name} not available"})
512
+
513
+ # Add function result to messages
514
+ full_messages.append({
515
+ "role": "tool",
516
+ "tool_call_id": tool_call.id,
517
+ "name": function_name,
518
+ "content": result,
519
+ })
520
+
521
+ # Get final response with function results
522
+ final_response = client.chat.completions.create(
523
+ model="Qwen/Qwen2.5-7B-Instruct",
524
+ messages=full_messages,
525
+ temperature=0.7,
526
+ max_tokens=1000,
527
+ )
528
+
529
+ return final_response.choices[0].message.content
530
+ else:
531
+ # No tool calls, return direct response
532
+ return response_message.content
533
+
534
+ except Exception as e:
535
+ # Error handling
536
+ return f"I apologize, but I encountered an error: {str(e)}. Please try again or rephrase your question."
537
+ ```
538
+
539
+ **Key Implementation Points:**
540
+
541
+ 1. **Direct Function Calls**: Since the MCP tools are in the same Python process, we call the underlying implementation functions (`find_relevant_standards_impl` and `get_standard_details_impl`) directly rather than making HTTP requests to the MCP server endpoint.
542
+
543
+ 2. **Tool Schema Conversion**: The MCP tools are converted to OpenAI function calling format, which is what `InferenceClient` expects. The schemas match the function signatures and docstrings.
544
+
545
+ 3. **Tool Calling Workflow**:
546
+
547
+ - First API call includes tools and lets model decide if/when to call them
548
+ - If model requests tool calls, execute them and add results to conversation
549
+ - Second API call generates final response incorporating tool results
550
+
551
+ 4. **Error Handling**: All errors are caught and returned as user-friendly messages.
552
+
553
+ 5. **Model Configuration**: Uses `Qwen/Qwen2.5-7B-Instruct` via Together AI provider with `tool_choice="auto"` to let the model decide when to use tools.
554
+
555
+ 6. **Gradio 6 History Format**: The chat function handles Gradio 6's structured content format where content is a list of typed blocks (e.g., `[{"type": "text", "text": "..."}]`) rather than simple strings.
556
+
557
+ ### Hugging Face Space Configuration
558
+
559
+ **CRITICAL: Space must be created in the MCP-1st-Birthday organization**
560
+
561
+ **Required Files:**
562
+
563
+ 1. **`app.py`**: Main Gradio application entry point (as described above)
564
+
565
+ 2. **`requirements.txt`**: Python dependencies
566
+
567
+ - Extract from `pyproject.toml` or manually specify
568
+ - Must include: `gradio[mcp]>=6.0.0`, `pinecone`, `python-dotenv`, `pydantic>=2.0.0`, `pydantic-settings>=2.0.0`, `huggingface_hub`
569
+ - The `[mcp]` extra ensures all MCP dependencies are included
570
+ - `huggingface_hub` is required for Inference API access in the chat interface
571
+
572
+ 3. **`README.md`**: Updated with hackathon requirements (see README Requirements section below)
573
+
574
+ 4. **`.env.example`**: Template for environment variables
575
+
576
+ ```
577
+ PINECONE_API_KEY=your_api_key_here
578
+ PINECONE_INDEX_NAME=common-core-standards
579
+ PINECONE_NAMESPACE=standards
580
+ HF_TOKEN=your_huggingface_token_here
581
+ # Note: MCP_SERVER_URL is not needed since we call functions directly
582
+ ```
583
+
584
+ 5. **Space Configuration** (via Hugging Face UI):
585
+ - **Organization**: Must be `MCP-1st-Birthday` (create Space in this organization)
586
+ - SDK: `gradio`
587
+ - Python version: 3.12+
588
+ - Environment variables: Set `PINECONE_API_KEY`, `HF_TOKEN`, and other required variables in Space settings
589
+ - **Visibility**: Can be public or private (both work for MCP servers)
590
+
591
+ ### Hackathon Registration and Submission Requirements
592
+
593
+ **Before Building:**
594
+
595
+ 1. **Join the Organization** (REQUIRED):
596
+
597
+ - Go to https://huggingface.co/MCP-1st-Birthday
598
+ - Click "Request to join this org" (top right)
599
+ - Wait for approval (usually automatic or quick)
600
+
601
+ 2. **Complete Registration** (REQUIRED):
602
+
603
+ - Complete the official registration form (linked on the hackathon page)
604
+ - Registration link: Available on the hackathon page
605
+
606
+ 3. **Team Members** (if applicable):
607
+ - If working in a team (2-5 people), **all** members must:
608
+ - Join the MCP-1st-Birthday organization individually
609
+ - Complete the registration form individually
610
+ - Be listed in the README with their Hugging Face usernames
611
+
612
+ **Submission Requirements (All Must Be Completed by November 30, 2025, 11:59 PM UTC):**
613
+
614
+ 1. **Hugging Face Space** (REQUIRED):
615
+
616
+ - Space must be in the `MCP-1st-Birthday` organization
617
+ - Space must be functional and accessible
618
+ - Code must be pushed to the Space repository
619
+
620
+ 2. **README.md** (REQUIRED):
621
+
622
+ - Must include track tag: `building-mcp-track-consumer`
623
+ - Must include team member usernames (if team)
624
+ - Must include demo video link
625
+ - Must include social media post link
626
+ - Must include clear documentation (see README Requirements section)
627
+
628
+ 3. **Demo Video** (REQUIRED):
629
+
630
+ - **Length:** 1-5 minutes
631
+ - **Content:** Must show the MCP server in action, specifically demonstrating:
632
+ - Integration with an MCP client (Claude Desktop, Cursor, or similar)
633
+ - The MCP tools being used through the client
634
+ - The Gradio web interface
635
+ - The chat interface using MCP tools
636
+ - **Hosting:** YouTube, Vimeo, or similar platform
637
+ - **Link:** Must be included in the README
638
+
639
+ 4. **Social Media Post** (REQUIRED):
640
+
641
+ - Post about your project on X/Twitter, LinkedIn, or similar
642
+ - Include information about the project and hackathon
643
+ - **Link:** Must be included in the README (not just submission form)
644
+
645
+ 5. **Functionality Requirements**:
646
+ - Working MCP server (exposed via Gradio)
647
+ - Integration with MCP client (demonstrated in video)
648
+ - Published as Hugging Face Space
649
+
650
+ **Judging Criteria (To Guide Implementation):**
651
+
652
+ Projects will be evaluated on:
653
+
654
+ 1. **Completeness**: Space, video, documentation, and social link all present
655
+ 2. **Functionality**: Works effectively, uses Gradio 6 and MCP features
656
+ 3. **Real-world Impact**: Useful tool with potential for real-world application
657
+ 4. **Creativity**: Innovative or original idea and implementation
658
+ 5. **Design/UI-UX**: Polished, intuitive, and easy to use
659
+ 6. **Documentation**: Well-communicated in README and/or demo video
660
+
661
+ **Additional Considerations for Judging:**
662
+
663
+ - **Community Choice Award**: Based on social media engagement, Space interactions (Discussions tab), and Discord community engagement
664
+ - **Gradio 6 Features**: Use of Gradio 6 capabilities (MCP server, ChatInterface, etc.)
665
+ - **MCP Integration**: Effective use of MCP protocol and tool exposure
666
+
667
+ ### README Requirements
668
+
669
+ **File: `README.md`**
670
+
671
+ The README is a critical component of the hackathon submission and must include all required elements. Follow this structure:
672
+
673
+ #### 1. **Hackathon Track Tag (REQUIRED)**
674
+
675
+ **Must be included in the README metadata or prominently at the top:**
676
+
677
+ Add the track tag `building-mcp-track-consumer` to classify this as a Consumer MCP Server entry. This tag is **mandatory** for submission eligibility.
678
+
679
+ **Placement options:**
680
+
681
+ - In the README frontmatter (if using YAML frontmatter)
682
+ - As a tag/badge at the top of the README
683
+ - In a "Hackathon" or "Submission" section
684
+
685
+ **Example:**
686
+
687
+ ```markdown
688
+ ---
689
+ tags:
690
+ - building-mcp-track-consumer
691
+ - mcp
692
+ - gradio
693
+ - education
694
+ ---
695
+ ```
696
+
697
+ Or as a badge:
698
+
699
+ ```markdown
700
+ ![Hackathon Track](https://img.shields.io/badge/Track-Consumer%20MCP%20Server-blue)
701
+ ```
702
+
703
+ #### 2. **Project Title and Description**
704
+
705
+ Clear, compelling explanation of:
706
+
707
+ - What the MCP server does
708
+ - Its purpose and capabilities
709
+ - Why it's useful for consumers (teachers, students, parents, etc.)
710
+ - Key features and benefits
711
+
712
+ #### 3. **Team Information (If Applicable)**
713
+
714
+ **If working in a team (2-5 members):**
715
+
716
+ - Include Hugging Face usernames of **all** team members
717
+ - Format: "Built by @username1, @username2, @username3"
718
+ - All team members must be members of the MCP-1st-Birthday organization
719
+
720
+ **If working solo:**
721
+
722
+ - Optional: Include your Hugging Face username
723
+ - Format: "Built by @username"
724
+
725
+ #### 4. **Usage Instructions**
726
+
727
+ **A. Gradio Web Interface:**
728
+
729
+ - How to use the web interface
730
+ - What each tab/component does
731
+ - Example queries or use cases
732
+
733
+ **B. MCP Client Integration (REQUIRED for demo video):**
734
+
735
+ - How to connect an MCP client (Claude Desktop, Cursor, etc.) to the Space
736
+ - MCP server URL: `https://your-space-name.hf.space/gradio_api/mcp/`
737
+ - Step-by-step configuration instructions
738
+ - Example MCP client configuration:
739
+ ```json
740
+ {
741
+ "mcpServers": {
742
+ "common-core": {
743
+ "url": "https://your-space-name.hf.space/gradio_api/mcp/"
744
+ }
745
+ }
746
+ }
747
+ ```
748
+ - Screenshots of the MCP client showing the tools available
749
+
750
+ #### 5. **Setup Instructions**
751
+
752
+ - Local development setup (if applicable)
753
+ - Environment variables needed
754
+ - Installation steps
755
+ - How to run locally
756
+
757
+ #### 6. **Visual Documentation**
758
+
759
+ - **Screenshots or GIFs** of the interface in action
760
+ - Show the Gradio web interface
761
+ - Show MCP client integration (if possible)
762
+ - Demonstrate key features
763
+
764
+ #### 7. **Demo Video Link (REQUIRED)**
765
+
766
+ **Must include a link to a demo video that:**
767
+
768
+ - **Length:** 1-5 minutes
769
+ - **Content Requirements:**
770
+ - Shows the MCP server **in action**
771
+ - **Specifically demonstrates integration with an MCP client** (Claude Desktop, Cursor, or similar)
772
+ - Shows the MCP tools being used through the client
773
+ - Demonstrates the Gradio web interface
774
+ - Shows the chat interface using MCP tools
775
+ - **Platform:** YouTube, Vimeo, or other video hosting service
776
+ - **Format:** Include the video link prominently in the README
777
+
778
+ **Example section:**
779
+
780
+ ```markdown
781
+ ## 🎥 Demo Video
782
+
783
+ Watch the demo video showing the MCP server in action:
784
+
785
+ [![Demo Video](video-thumbnail-url)](video-url)
786
+
787
+ The video demonstrates:
788
+
789
+ - MCP server integration with Claude Desktop
790
+ - Using the Gradio web interface
791
+ - Chat interface with tool calling
792
+ ```
793
+
794
+ #### 8. **Social Media Post Link (REQUIRED)**
795
+
796
+ **Must include a link to a social media post about the project:**
797
+
798
+ - Platform: X/Twitter, LinkedIn, or similar
799
+ - Content: Post about your project, the hackathon, and what you built
800
+ - **This link must be included in the README** (not just in submission form)
801
+ - Format: "Share on [Twitter](link) | [LinkedIn](link)"
802
+
803
+ **Example section:**
804
+
805
+ ```markdown
806
+ ## 📱 Social Media
807
+
808
+ Check out our project announcement:
809
+
810
+ - [Twitter/X Post](your-twitter-post-url)
811
+ - [LinkedIn Post](your-linkedin-post-url)
812
+ ```
813
+
814
+ #### 9. **Technical Details**
815
+
816
+ - Architecture overview
817
+ - Technologies used (Gradio 6, MCP, etc.)
818
+ - How the MCP tools work
819
+ - API documentation (if applicable)
820
+
821
+ #### 10. **Acknowledgments**
822
+
823
+ - Hackathon organizers
824
+ - Libraries and tools used
825
+ - Any inspiration or references
826
+
827
+ **README Checklist for Hackathon Submission:**
828
+
829
+ - [ ] Track tag `building-mcp-track-consumer` included
830
+ - [ ] Team member usernames listed (if team)
831
+ - [ ] Clear project description
832
+ - [ ] Usage instructions for web interface
833
+ - [ ] MCP client integration instructions
834
+ - [ ] Screenshots/GIFs included
835
+ - [ ] Demo video link included (1-5 minutes, shows MCP client integration)
836
+ - [ ] Social media post link included
837
+ - [ ] Setup/installation instructions
838
+ - [ ] Technical details documented
839
+
840
+ ### File Changes Summary
841
+
842
+ **Files to Create:**
843
+
844
+ - `app.py`: Main Gradio application with MCP server and chat interface
845
+ - `requirements.txt`: Python dependencies for Hugging Face Space
846
+ - `.env.example`: Environment variable template
847
+ - `README.md`: Updated with hackathon requirements (or update existing)
848
+
849
+ **Files to Delete:**
850
+
851
+ - `server.py`: Replaced by `app.py`
852
+
853
+ **Files to Modify:**
854
+
855
+ - `pyproject.toml`: Update Gradio to `gradio[mcp]>=6.0.0`, add `huggingface_hub`, remove standalone `mcp` dependency if present
856
+
857
+ **Files to Reference (Existing):**
858
+
859
+ - `src/search.py`: Contains `find_relevant_standards_impl()` function
860
+ - `src/lookup.py`: Contains `get_standard_details_impl()` function
861
+ - `src/pinecone_client.py`: Pinecone client implementation
862
+ - `src/mcp_config.py`: Configuration settings
863
+
864
+ ## Technical Specifications (Verified from Documentation)
865
+
866
+ ### Gradio MCP Server Syntax
867
+
868
+ **Enabling MCP Server:**
869
+
870
+ ```python
871
+ demo.launch(mcp_server=True)
872
+ ```
873
+
874
+ Or via environment variable:
875
+
876
+ ```bash
877
+ export GRADIO_MCP_SERVER=True
878
+ ```
879
+
880
+ **MCP Server Endpoints:**
881
+
882
+ - Schema: `{base_url}/gradio_api/mcp/schema`
883
+ - SSE Endpoint: `{base_url}/gradio_api/mcp/` (for MCP clients)
884
+
885
+ ### Function Signature Requirements
886
+
887
+ Functions exposed as MCP tools must:
888
+
889
+ 1. Have type hints for all parameters
890
+ 2. Have detailed docstrings with Args and Returns sections
891
+ 3. Return a value (not None, unless explicitly typed as `str | None`)
892
+
893
+ **Example:**
894
+
895
+ ```python
896
+ def find_relevant_standards(
897
+ activity: str,
898
+ max_results: int = 5,
899
+ grade: str | None = None,
900
+ subject: str | None = None,
901
+ ) -> str:
902
+ """
903
+ Returns educational standards relevant to the activity.
904
+
905
+ Args:
906
+ activity: Natural language description of the learning activity
907
+ max_results: Maximum number of standards to return (default: 5)
908
+ grade: Optional grade level filter (e.g., "K", "01", "05", "09")
909
+ subject: Optional subject filter (e.g., "Mathematics", "ELA-Literacy")
910
+
911
+ Returns:
912
+ JSON string with structured response containing matching standards
913
+ """
914
+ # Implementation
915
+ ```
916
+
917
+ ### Repository Structure for Hugging Face Spaces
918
+
919
+ **Required Files:**
920
+
921
+ - `app.py` (or `main.py`): Entry point for the Gradio app
922
+ - `requirements.txt`: Python dependencies
923
+ - `README.md`: Project documentation
924
+
925
+ **Optional but Recommended:**
926
+
927
+ - `.env.example`: Environment variable template
928
+ - `src/`: Source code directory (already exists)
929
+
930
+ **Space Configuration:**
931
+
932
+ - SDK: Set to `gradio` in Hugging Face Space settings
933
+ - Python version: 3.12+ (matches project requirement)
934
+ - Environment variables: Configure in Space settings UI
935
+
936
+ ### Exposing Functions as MCP Endpoints
937
+
938
+ **Automatic Conversion:**
939
+
940
+ - Any function passed to `gr.Interface()` or `gr.ChatInterface()` is automatically exposed as an MCP tool
941
+ - Function name becomes the tool name
942
+ - Docstring becomes the tool description
943
+ - Type hints define the parameter schema
944
+
945
+ **API Name Customization:**
946
+
947
+ ```python
948
+ gr.Interface(
949
+ fn=find_relevant_standards,
950
+ # ... inputs and outputs ...
951
+ api_name="find_relevant_standards", # Custom API endpoint name
952
+ )
953
+ ```
954
+
955
+ **API Visibility Control:**
956
+
957
+ ```python
958
+ gr.Interface(
959
+ fn=find_relevant_standards,
960
+ # ... inputs and outputs ...
961
+ api_visibility="public", # "public", "private", or "undocumented"
962
+ )
963
+ ```
964
+
965
+ **API Description Customization:**
966
+
967
+ ```python
968
+ gr.Interface(
969
+ fn=find_relevant_standards,
970
+ # ... inputs and outputs ...
971
+ api_description="Custom description for MCP tool", # Overrides docstring
972
+ )
973
+ ```
974
+
975
+ ## Chat Interface MCP Integration
976
+
977
+ The chat interface must:
978
+
979
+ 1. Use a Hugging Face model that supports tool calling (e.g., `Qwen/Qwen2.5-7B-Instruct`)
980
+ 2. Specify an inference provider (e.g., `provider="together"`) for the model
981
+ 3. Handle Gradio 6's structured content format for chat history
982
+ 4. Handle tool calling: detect tool requests, execute functions directly, return results to model
983
+
984
+ **Implementation Notes:**
985
+
986
+ - The chat interface and MCP tools are in the same Gradio app
987
+ - We call the underlying Python functions directly rather than making HTTP requests to the MCP server
988
+ - The model must be configured to call tools when answering questions about educational standards
989
+ - **Gradio 6 History Format**: Content is now structured as `[{"type": "text", "text": "..."}]` rather than simple strings. The chat function must extract text from these content blocks.
990
+
991
+ ## Testing and Validation
992
+
993
+ ### Local Testing
994
+
995
+ 1. Run `app.py` locally:
996
+
997
+ ```bash
998
+ python app.py
999
+ ```
1000
+
1001
+ 2. Verify MCP server is running:
1002
+
1003
+ - Check console output for MCP server URL
1004
+ - Visit `http://localhost:7860/gradio_api/mcp/schema` to view tools
1005
+
1006
+ 3. Test MCP client connection:
1007
+
1008
+ - Configure Claude Desktop or Cursor to use `http://localhost:7860/gradio_api/mcp/`
1009
+ - Verify tools appear in the client
1010
+
1011
+ 4. Test chat interface:
1012
+ - Interact with the chat interface in the Gradio UI
1013
+ - Verify it can call MCP tools and return educational standards information
1014
+
1015
+ ### Hugging Face Space Deployment
1016
+
1017
+ 1. Push code to Hugging Face Space
1018
+ 2. Verify Space builds and runs successfully
1019
+ 3. Check MCP server endpoint: `https://your-space-name.hf.space/gradio_api/mcp/schema`
1020
+ 4. Test MCP client connection using the Space URL
1021
+ 5. Test chat interface in the Space UI
1022
+
1023
+ ## Risks and Assumptions
1024
+
1025
+ ### Risks
1026
+
1027
+ 1. **Chat Interface Complexity**: Implementing MCP tool calling with Hugging Face Inference API may be complex and require additional research or libraries.
1028
+
1029
+ 2. **Model/Provider Availability**: The selected model (`Qwen/Qwen2.5-7B-Instruct`) requires an inference provider (Together AI or Featherless AI). Provider availability and rate limits may affect performance.
1030
+
1031
+ 3. **MCP Client Configuration**: Users may need guidance on configuring MCP clients to connect to the Space.
1032
+
1033
+ 4. **Gradio 6 Breaking Changes**: Gradio 6 introduces several breaking changes including structured content format for ChatInterface history. Implementation must handle these changes correctly.
1034
+
1035
+ ### Assumptions
1036
+
1037
+ 1. Gradio 6.0.0+ includes all necessary MCP server functionality without additional packages (with `gradio[mcp]` extras).
1038
+
1039
+ 2. The existing `src/search.py` and `src/lookup.py` implementations can be directly called from Gradio functions without modification.
1040
+
1041
+ 3. Hugging Face Spaces automatically sets `GRADIO_MCP_SERVER=True` when Gradio 6+ is detected.
1042
+
1043
+ 4. The MCP server URL format for Hugging Face Spaces is `https://space-name.hf.space/gradio_api/mcp/`.
1044
+
1045
+ 5. The `Qwen/Qwen2.5-7B-Instruct` model is available via Together AI or Featherless AI inference providers and supports tool calling.
1046
+
1047
+ 6. Gradio 6 ChatInterface passes history in structured content format that must be parsed to extract text content.
1048
+
1049
+ ## Dependencies
1050
+
1051
+ - **Gradio 6.0.0+**: Required for MCP server support (install with `gradio[mcp]` extras)
1052
+ - **Hugging Face Hub/Inference API**: For chat interface model access
1053
+ - Requires `provider="together"` or similar for models that need inference providers
1054
+ - `Qwen/Qwen2.5-7B-Instruct` is available via Together AI and Featherless AI providers
1055
+ - **Existing dependencies**: Pinecone, pydantic, etc. (unchanged)
1056
+
1057
+ ## Deliverables
1058
+
1059
+ 1. ✅ `app.py`: Gradio application with MCP server and chat interface
1060
+ 2. ✅ `requirements.txt`: Dependencies for Hugging Face Space
1061
+ 3. ✅ Updated `README.md`: Hackathon-compliant documentation with all required elements
1062
+ 4. ✅ `.env.example`: Environment variable template
1063
+ 5. ✅ Updated `pyproject.toml`: Gradio 6.0.0+ dependency with MCP extras
1064
+ 6. ✅ Deleted `server.py`: Old FastMCP implementation removed
1065
+ 7. ✅ Working Hugging Face Space: Deployed in MCP-1st-Birthday organization
1066
+ 8. ✅ Chat interface: Functional with MCP tool calling
1067
+ 9. ✅ Demo video: 1-5 minutes showing MCP client integration
1068
+ 10. ✅ Social media post: Link included in README
1069
+
1070
+ ## Hackathon Submission Checklist
1071
+
1072
+ **Before Submission Deadline (November 30, 2025, 11:59 PM UTC):**
1073
+
1074
+ ### Registration (Complete Before Building)
1075
+
1076
+ - [ ] Joined MCP-1st-Birthday organization on Hugging Face
1077
+ - [ ] Completed official registration form
1078
+ - [ ] All team members joined organization and registered (if team)
1079
+
1080
+ ### Technical Implementation
1081
+
1082
+ - [ ] `app.py` created with Gradio MCP server
1083
+ - [ ] `requirements.txt` includes all dependencies
1084
+ - [ ] Space deployed and functional in MCP-1st-Birthday organization
1085
+ - [ ] MCP server accessible at `/gradio_api/mcp/` endpoint
1086
+ - [ ] Chat interface working with tool calling
1087
+ - [ ] All MCP tools (`find_relevant_standards`, `get_standard_details`) functional
1088
+
1089
+ ### Documentation (README.md)
1090
+
1091
+ - [ ] Track tag `building-mcp-track-consumer` included
1092
+ - [ ] Team member usernames listed (if team)
1093
+ - [ ] Clear project description and purpose
1094
+ - [ ] Usage instructions for Gradio web interface
1095
+ - [ ] MCP client integration instructions with configuration example
1096
+ - [ ] Setup/installation instructions
1097
+ - [ ] Screenshots or GIFs included
1098
+ - [ ] **Demo video link included** (1-5 minutes, shows MCP client integration)
1099
+ - [ ] **Social media post link included**
1100
+ - [ ] Technical details documented
1101
+
1102
+ ### Demo Video Requirements
1103
+
1104
+ - [ ] Video length: 1-5 minutes
1105
+ - [ ] Shows MCP server in action
1106
+ - [ ] **Demonstrates integration with MCP client** (Claude Desktop, Cursor, etc.)
1107
+ - [ ] Shows MCP tools being used through the client
1108
+ - [ ] Shows Gradio web interface
1109
+ - [ ] Shows chat interface using MCP tools
1110
+ - [ ] Video hosted on YouTube, Vimeo, or similar
1111
+ - [ ] Link included in README
1112
+
1113
+ ### Social Media Post
1114
+
1115
+ - [ ] Post created on X/Twitter, LinkedIn, or similar
1116
+ - [ ] Post mentions the project and hackathon
1117
+ - [ ] Link included in README (not just submission form)
1118
+
1119
+ ### Space Configuration
1120
+
1121
+ - [ ] Space created in `MCP-1st-Birthday` organization
1122
+ - [ ] SDK set to `gradio`
1123
+ - [ ] Python version 3.12+
1124
+ - [ ] Environment variables configured (PINECONE_API_KEY, HF_TOKEN, etc.)
1125
+ - [ ] Space is accessible and functional
1126
+
1127
+ ### Quality Checks
1128
+
1129
+ - [ ] Code follows best practices
1130
+ - [ ] Error handling implemented
1131
+ - [ ] UI is polished and intuitive
1132
+ - [ ] Documentation is clear and complete
1133
+ - [ ] All features work as expected
1134
+
1135
+ ## Next Steps After Sprint
1136
+
1137
+ 1. **Create Demo Video**:
1138
+
1139
+ - Record 1-5 minute video showing MCP client integration
1140
+ - Demonstrate all key features
1141
+ - Upload to YouTube or Vimeo
1142
+ - Add link to README
1143
+
1144
+ 2. **Create Social Media Post**:
1145
+
1146
+ - Post about the project on X/Twitter or LinkedIn
1147
+ - Include project highlights and hackathon information
1148
+ - Add link to README
1149
+
1150
+ 3. **Final README Polish**:
1151
+
1152
+ - Ensure all required elements are present
1153
+ - Add screenshots/GIFs
1154
+ - Verify all links work
1155
+ - Check formatting and clarity
1156
+
1157
+ 4. **Submit to Hackathon**:
1158
+
1159
+ - Verify all checklist items are complete
1160
+ - Submit before November 30, 2025, 11:59 PM UTC
1161
+ - Engage with community (Discord, Space discussions)
1162
+
1163
+ 5. **Future Enhancements** (Post-Hackathon):
1164
+ - Fine-tune chat interface model selection and configuration
1165
+ - Add error handling and user feedback improvements
1166
+ - Consider adding more MCP tools or resources
1167
+ - Optimize performance and user experience
.agent/specs/003_gradio/tasks.md ADDED
@@ -0,0 +1,76 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Spec Tasks
2
+
3
+ ## Tasks
4
+
5
+ - [x] 1. Update Dependencies in pyproject.toml
6
+
7
+ - [x] 1.1 Update Gradio dependency from `gradio>=5.0.0,<6.0.0` to `gradio[mcp]>=6.0.0` to enable MCP server support
8
+ - [x] 1.2 Add `huggingface_hub` to dependencies list for Inference API access in chat interface
9
+ - [x] 1.3 Remove standalone `mcp` package dependency (FastMCP is no longer used, Gradio 6 includes MCP support)
10
+ - [x] 1.4 Verify all other dependencies remain unchanged (pinecone, python-dotenv, typer, requests, rich, loguru, pydantic>=2.0.0, pydantic-settings>=2.0.0)
11
+
12
+ - [x] 2. Create app.py with MCP Tool Wrapper Functions
13
+
14
+ - [x] 2.1 Create `app.py` in project root with imports: `gradio as gr`, `find_relevant_standards_impl` from `src.search`, `get_standard_details_impl` from `src.lookup`
15
+ - [x] 2.2 Implement `find_relevant_standards()` function with signature: `(activity: str, max_results: int = 5, grade: str | None = None, subject: str | None = None) -> str`
16
+ - [x] 2.3 Add comprehensive docstring to `find_relevant_standards()` following spec format with Args and Returns sections, including all grade level codes and subject examples
17
+ - [x] 2.4 Add input handling: convert empty string `grade` and `subject` to `None`, convert `max_results` float to int (Gradio Number returns float)
18
+ - [x] 2.5 Implement `get_standard_details()` function with signature: `(standard_id: str) -> str`
19
+ - [x] 2.6 Add comprehensive docstring to `get_standard_details()` following spec format with Args, Returns, and Raises sections
20
+ - [x] 2.7 Delegate both functions to their respective `_impl` functions from `src/` modules
21
+
22
+ - [x] 3. Create Gradio Interface Structure with TabbedInterface
23
+
24
+ - [x] 3.1 Create `gr.Interface` for `find_relevant_standards` with inputs: `gr.Textbox` (activity), `gr.Number` (max_results, min=1, max=20, value=5), `gr.Dropdown` (grade with choices including empty string), `gr.Textbox` (subject, optional)
25
+ - [x] 3.2 Configure `find_relevant_standards` interface with `gr.JSON` output, title "Find Relevant Standards", description, and `api_name="find_relevant_standards"`
26
+ - [x] 3.3 Create `gr.Interface` for `get_standard_details` with `gr.Textbox` input (standard_id) and `gr.JSON` output
27
+ - [x] 3.4 Configure `get_standard_details` interface with title "Get Standard Details", description, and `api_name="get_standard_details"`
28
+ - [x] 3.5 Create `gr.ChatInterface` with `fn=chat_with_standards` (placeholder for now), `type="messages"`, title, description, and example prompts
29
+ - [x] 3.6 Combine all three interfaces into `gr.TabbedInterface` with tab labels: ["Search", "Lookup", "Chat"]
30
+ - [x] 3.7 Add `if __name__ == "__main__": demo.launch(mcp_server=True)` to enable MCP server
31
+
32
+ - [x] 4. Implement Chat Interface with Hugging Face Inference API
33
+
34
+ - [x] 4.1 Add imports to `app.py`: `os`, `json`, `InferenceClient` from `huggingface_hub`
35
+ - [x] 4.2 Initialize `InferenceClient` with `provider="together"` and `token=os.environ.get("HF_TOKEN")` at module level
36
+ - [x] 4.3 Define `TOOLS` list with OpenAI function calling format schemas for `find_relevant_standards` and `get_standard_details` matching the function signatures
37
+ - [x] 4.4 Create `AVAILABLE_FUNCTIONS` dict mapping function names to their `_impl` implementations
38
+ - [x] 4.5 Implement `chat_with_standards(message: str, history: list) -> str` function signature
39
+ - [x] 4.6 Add Gradio 6 history format conversion: extract text from structured content blocks `[{"type": "text", "text": "..."}]` format
40
+ - [x] 4.7 Build message list with system message, converted history, and current user message
41
+ - [x] 4.8 Implement tool calling workflow: initial API call with tools, detect tool_calls, execute functions, add results, get final response
42
+ - [x] 4.9 Add error handling with try/except returning user-friendly error messages
43
+ - [x] 4.10 Configure API calls: model `"Qwen/Qwen2.5-7B-Instruct"`, `tool_choice="auto"`, `temperature=0.7`, `max_tokens=1000`
44
+
45
+ - [x] 5. Create requirements.txt for Hugging Face Space Deployment
46
+
47
+ - [x] 5.1 Create `requirements.txt` in project root
48
+ - [x] 5.2 Extract dependencies from `pyproject.toml` or manually specify: `gradio[mcp]>=6.0.0`, `pinecone`, `python-dotenv`, `pydantic>=2.0.0`, `pydantic-settings>=2.0.0`, `huggingface_hub`
49
+ - [x] 5.3 Ensure `[mcp]` extra is included in Gradio dependency specification
50
+ - [x] 5.4 Verify all required dependencies for both MCP server and chat interface are included
51
+
52
+ - [x] 6. Create .env.example Template File
53
+
54
+ - [x] 6.1 Create `.env.example` in project root
55
+ - [x] 6.2 Add `PINECONE_API_KEY=your_api_key_here` with comment explaining Pinecone API key requirement
56
+ - [x] 6.3 Add `PINECONE_INDEX_NAME=common-core-standards` with default value
57
+ - [x] 6.4 Add `PINECONE_NAMESPACE=standards` with default value
58
+ - [x] 6.5 Add `HF_TOKEN=your_huggingface_token_here` with comment explaining Hugging Face token requirement for chat interface
59
+ - [x] 6.6 Add comment noting that `MCP_SERVER_URL` is not needed since functions are called directly
60
+
61
+ - [x] 7. Create README.md with Code and Documentation Sections
62
+
63
+ - [x] 7.1 Create `README.md` in project root with hackathon track tag `building-mcp-track-consumer` in frontmatter or badge format
64
+ - [x] 7.2 Add project title and description explaining the MCP server purpose, capabilities, and target users (teachers, students, parents)
65
+ - [x] 7.3 Add team information section (placeholder for username if solo, or format for team members)
66
+ - [x] 7.4 Add "Usage Instructions" section with subsection A: Gradio Web Interface usage, tab descriptions, and example queries
67
+ - [x] 7.5 Add subsection B: MCP Client Integration instructions with MCP server URL format, step-by-step configuration, example JSON config, and note about screenshots
68
+ - [x] 7.6 Add "Setup Instructions" section with local development setup, environment variables, installation steps, and how to run locally
69
+ - [x] 7.7 Add "Technical Details" section with architecture overview, technologies used (Gradio 6, MCP), how MCP tools work, and API documentation references
70
+ - [x] 7.8 Add "Visual Documentation" section placeholder noting screenshots/GIFs should be added (but do not create actual media files)
71
+ - [x] 7.9 Add "Acknowledgments" section with hackathon organizers, libraries/tools used, and inspiration/references
72
+ - [x] 7.10 Add placeholder sections for "Demo Video" and "Social Media" with note that links will be added separately (exclude from code tasks)
73
+
74
+ - [x] 8. Delete Old FastMCP server.py File
75
+ - [x] 8.1 Delete `server.py` from project root (replaced by `app.py`)
76
+ - [x] 8.2 Verify no other files reference `server.py` that would break
.cursor/commands/{spec_draft.md → draft_spec.md} RENAMED
@@ -1,37 +1,38 @@
1
  I am working on developing a comprehensive spec document for the next development sprint.
2
 
3
  <goal>
4
- Solidify the current spec_draft document into a comprehensive specification for the next development sprint through iterative refinement.
5
 
6
  The spec draft represents the rough notes and ideas for the next sprint. These notes are likely incomplete and require additional details and decisions to obtain sufficient information to move forward with the sprint.
7
 
8
- READ: @.cursor/commands/finalize_spec.md to see the complete requirements for the finalized spec. The goal is to reach the level of specificity and clarity required to create this final spec.
9
  </goal>
10
 
11
  <process>
12
  <overview>
13
- Iteratively carry out the following steps to progressively refine the requirements for this sprint. Use `Requests for Input` only to gather information that cannot be inferred from the user's selection of a Recommendation; do not ask to confirm details already specified by a selected option. The initial `spec_draft` may be a loose assortment of notes, ideas, and thoughts; treat it accordingly in the first round.
 
14
 
15
  First round: produce a response that includes Recommendations and Requests for Input. The user will reply by selecting exactly one option per Recommendation (or asking for refinement if none fit) and answering only those questions that cannot be inferred from selected options.
16
 
17
- After each user response: update the `spec_draft` to incorporate the selected options with minimal, focused edits. Remove any conflicting or superseded information made obsolete by the selection. Avoid unrelated formatting or editorial changes.
 
 
18
 
19
- Repeat this back-and-forth until ambiguity is removed and the draft aligns with the requirements in `@.cursor/commands/finalize_spec.md`.
20
  </overview>
21
 
22
  <steps>
23
- - READ the spec_draft.
24
  - IDENTIFY anything in the spec draft that is confusing, conflicting, unclear, or missing. Identify important decisions that need to be made.
25
  - REVIEW the current state of the project to fully understand how these new requirements fit into what already exists.
 
26
  - RECOMMEND specific additions or updates to the draft spec to resolve confusion, add clarity, fill gaps, or add specificity. Recommendations may provide a single option when appropriate or multiple options when needed. Each Recommendation expects selection of one and only one option by the user.
27
  - ASK targeted questions to acquire details, decisions, or preferences from the user.
28
- - APPLY the user's selections: make minimal, localized edits to the `spec_draft` to incorporate the chosen options and remove conflicting content. Incorporate all information contained in the selected options; do not omit details. Do not change unrelated text, structure, or formatting.
29
  - REFINE: if the user rejects the provided options, revise the Recommendations based on feedback and repeat selection and apply.
30
  </steps>
31
 
32
- <end_conditions>
33
- - Continue this process until the draft is unambiguous and conforms to `@.cursor/commands/finalize_spec.md`, or the user directs you to do otherwise.
34
- - Do not stop after a single round unless the draft already satisfies all requirements in `@.cursor/commands/finalize_spec.md`.
35
  </end_conditions>
36
  </process>
37
 
@@ -42,6 +43,7 @@ READ: @.cursor/commands/finalize_spec.md to see the complete requirements for th
42
  Using incrementing section numbers are essential for helping the user quickly reference specific options or questions in their responses.
43
  Responses must strictly follow the Format section. Include only the specified sections and no additional commentary or subsections.
44
  The agent is responsible for updating the spec draft after each user response.
 
45
  </overview>
46
 
47
  <guidelines>
@@ -52,7 +54,7 @@ READ: @.cursor/commands/finalize_spec.md to see the complete requirements for th
52
  - Do not ask confirmation questions about facts stated by options; assume the selected option is authoritative.
53
  - Use numbered sections that increment.
54
  - Use incrementing decimals for recommendation options and request for input questions.
55
- - After the user selects options, apply minimal, focused edits to the `spec_draft` reflecting only those selections. Remove conflicting or superseded content. Avoid broad formatting or editorial changes to unrelated content.
56
  - Do not clutter options or questions with information already clear and unambiguous from the current draft.
57
  - Do not add subsections beyond those defined in the Format.
58
  </guidelines>
@@ -60,7 +62,9 @@ READ: @.cursor/commands/finalize_spec.md to see the complete requirements for th
60
  <format>
61
 
62
  # Recommendations
 
63
  ## 1: Section Title
 
64
  Short overview providing background on the section.
65
 
66
  **Option 1.1**
@@ -70,16 +74,20 @@ Specifics of the first option.
70
  Specifics of the second option.
71
 
72
  ## 2: Section Title
 
73
  Short overview providing background on the section.
74
 
75
  **Option 2.1**
76
  Specifics of the first option.
77
 
78
  # Request for Input
 
79
  ## 3: Section Title
 
80
  Short overview providing background on the section.
81
 
82
  **Questions**
 
83
  - 3.1 Some question.
84
  - 3.2 Another question.
85
 
@@ -99,14 +107,10 @@ Short overview providing background on the section.
99
  7 Directions that indicate the users preference in response to the question.
100
  8 Clear directive in response to the question.
101
  ```
 
102
  </user_selection_format>
103
 
104
- <selection_and_editing_rules>
105
- - One and only one option must be selected per Recommendation. If none fit, request refinement.
106
- - Apply edits narrowly: change only text directly impacted by the chosen option(s).
107
- - Incorporate all information from the selected options into the draft.
108
- - Remove or rewrite conflicting statements made obsolete by the selection.
109
- - Preserve unrelated content and overall formatting; do not perform wide editorial passes.
110
  </selection_and_editing_rules>
111
  </response>
112
 
@@ -115,8 +119,41 @@ Short overview providing background on the section.
115
  </guardrails>
116
 
117
  <finalize_spec_compliance_checklist>
118
- - [ ] All information required by @.cursor/commands/finalize_spec.md is present.
 
119
  - [ ] Requirements are testable and unambiguous.
 
 
 
120
  - [ ] Risks, dependencies, and assumptions captured.
121
  - [ ] Approval received.
122
- </finalize_spec_compliance_checklist>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  I am working on developing a comprehensive spec document for the next development sprint.
2
 
3
  <goal>
4
+ Solidify the current spec document into a comprehensive specification for the next development sprint through iterative refinement.
5
 
6
  The spec draft represents the rough notes and ideas for the next sprint. These notes are likely incomplete and require additional details and decisions to obtain sufficient information to move forward with the sprint.
7
 
8
+ READ: `<finalized_spec_requirements>` to see the complete requirements for the finalized spec. The goal is to reach the level of specificity and clarity required to create this final spec.
9
  </goal>
10
 
11
  <process>
12
  <overview>
13
+
14
+ Iteratively carry out the following steps to progressively refine the requirements for this sprint. Use `Requests for Input` only to gather information that cannot be inferred from the user's selection of a Recommendation; do not ask to confirm details already specified by a selected option. The initial spec draft may be a loose assortment of notes, ideas, and thoughts; treat it accordingly in the first round.
15
 
16
  First round: produce a response that includes Recommendations and Requests for Input. The user will reply by selecting exactly one option per Recommendation (or asking for refinement if none fit) and answering only those questions that cannot be inferred from selected options.
17
 
18
+ After each user response: update the spec draft to incorporate the selected options with minimal, focused edits. Remove any conflicting or superseded information made obsolete by the selection. Avoid unrelated formatting or editorial changes.
19
+
20
+ Repeat this back-and-forth until ambiguity is removed and the draft aligns with the requirements in `<finalized_spec_requirements>`.
21
 
 
22
  </overview>
23
 
24
  <steps>
25
+ - READ the spec draft.
26
  - IDENTIFY anything in the spec draft that is confusing, conflicting, unclear, or missing. Identify important decisions that need to be made.
27
  - REVIEW the current state of the project to fully understand how these new requirements fit into what already exists.
28
+ - RESEARCH any technical questions, library options, or implementation approaches that need to be resolved. Conduct this research during spec development so that specific, concrete guidance can be included in the final spec rather than leaving research tasks for the implementer.
29
  - RECOMMEND specific additions or updates to the draft spec to resolve confusion, add clarity, fill gaps, or add specificity. Recommendations may provide a single option when appropriate or multiple options when needed. Each Recommendation expects selection of one and only one option by the user.
30
  - ASK targeted questions to acquire details, decisions, or preferences from the user.
31
+ - APPLY the user's selections: make minimal, localized edits to the spec draft to incorporate the chosen options and remove conflicting content. Incorporate all information contained in the selected options; do not omit details. Do not change unrelated text, structure, or formatting.
32
  - REFINE: if the user rejects the provided options, revise the Recommendations based on feedback and repeat selection and apply.
33
  </steps>
34
 
35
+ <end_conditions> - Continue this process until the draft is unambiguous and conforms to `<finalized_spec_requirements>`, or the user directs you to do otherwise. - Do not stop after a single round unless the draft already satisfies all requirements in `<finalized_spec_requirements>`.
 
 
36
  </end_conditions>
37
  </process>
38
 
 
43
  Using incrementing section numbers are essential for helping the user quickly reference specific options or questions in their responses.
44
  Responses must strictly follow the Format section. Include only the specified sections and no additional commentary or subsections.
45
  The agent is responsible for updating the spec draft after each user response.
46
+
47
  </overview>
48
 
49
  <guidelines>
 
54
  - Do not ask confirmation questions about facts stated by options; assume the selected option is authoritative.
55
  - Use numbered sections that increment.
56
  - Use incrementing decimals for recommendation options and request for input questions.
57
+ - After the user selects options, apply minimal, focused edits to the spec draft reflecting only those selections. Remove conflicting or superseded content. Avoid broad formatting or editorial changes to unrelated content.
58
  - Do not clutter options or questions with information already clear and unambiguous from the current draft.
59
  - Do not add subsections beyond those defined in the Format.
60
  </guidelines>
 
62
  <format>
63
 
64
  # Recommendations
65
+
66
  ## 1: Section Title
67
+
68
  Short overview providing background on the section.
69
 
70
  **Option 1.1**
 
74
  Specifics of the second option.
75
 
76
  ## 2: Section Title
77
+
78
  Short overview providing background on the section.
79
 
80
  **Option 2.1**
81
  Specifics of the first option.
82
 
83
  # Request for Input
84
+
85
  ## 3: Section Title
86
+
87
  Short overview providing background on the section.
88
 
89
  **Questions**
90
+
91
  - 3.1 Some question.
92
  - 3.2 Another question.
93
 
 
107
  7 Directions that indicate the users preference in response to the question.
108
  8 Clear directive in response to the question.
109
  ```
110
+
111
  </user_selection_format>
112
 
113
+ <selection_and_editing_rules> - One and only one option must be selected per Recommendation. If none fit, request refinement. - Apply edits narrowly: change only text directly impacted by the chosen option(s). - Incorporate all information from the selected options into the draft. - Remove or rewrite conflicting statements made obsolete by the selection. - Preserve unrelated content and overall formatting; do not perform wide editorial passes.
 
 
 
 
 
114
  </selection_and_editing_rules>
115
  </response>
116
 
 
119
  </guardrails>
120
 
121
  <finalize_spec_compliance_checklist>
122
+
123
+ - [ ] All information required by `<finalized_spec_requirements>` are present.
124
  - [ ] Requirements are testable and unambiguous.
125
+ - [ ] All research completed and findings documented in the spec.
126
+ - [ ] All decisions made and documented; no decision-making left for the implementer.
127
+ - [ ] No research tasks or decision points left for the implementer (or explicitly documented as blockers).
128
  - [ ] Risks, dependencies, and assumptions captured.
129
  - [ ] Approval received.
130
+
131
+ </finalize_spec_compliance_checklist>
132
+
133
+ <finalized_spec_requirements>
134
+ The spec acts as the comprehensive source of truth for this sprint and should include all the necessary context and technical details to implement this sprint. It should leave no ambiguity for important details necessary to properly implement the changes required.
135
+
136
+ The spec.md will act as a reference for an LLM coding agent responsible for completing this sprint.
137
+
138
+ The spec must not include any directions for the implementer to conduct research or make decisions. All research must be completed during spec development, and all decisions must be made and documented in the spec. If there are pending decisions or research that cannot be completed during spec development, these must be explicitly documented as blockers or prerequisites that prevent implementation from proceeding.
139
+
140
+ The spec should include the following information if applicable:
141
+
142
+ - An overview of the changes implemented in this sprint.
143
+ - User stories for the new functionality, if applicable.
144
+ - An outline of any new data models proposed.
145
+ - An other technical details determined in the spec_draft or related conversations.
146
+ - Specific filepaths for files for any files that need to be added, edited, or deleted as part of this sprint.
147
+ - Specific files or modules relevant to this sprint.
148
+ - Details on how things should function such as a function, workflow, or other process.
149
+ - Describe what any new functions, services, ect. are supposed to do.
150
+ - Any reasoning or rationale behind decisions, preferences, or changes that provides context for the sprint and its changes.
151
+ - Any other information required to properly understand this sprint, the desired changes, the expected deliverables, or important technical details.
152
+
153
+ Strive to retain all the final decisions and implementation details provided in the spec draft and related conversations. Cleaning and organizing these raw notes is desirable, but do not exclude or leave out information provided in the spec draft if it is relevant to this sprint. If there is information in the spec draft that is outdated and negated or revised by further direction in the draft or related conversation, you should leave that stale information out of the final spec.
154
+
155
+ The spec should have all the information a junior developer needs to complete this sprint. They should be able to independently find answers to any questions they have about this sprint and how to implement it in this document. The spec defines exactly what should be implemented and how; it does not require the implementer to make decisions or conduct research. All technical research, library selection, design decisions, and implementation approaches must be resolved and documented in the spec before implementation begins.
156
+
157
+ **Code Examples in Specs:**
158
+ Use code examples sparingly and only when they provide clarity that text cannot achieve. Keep examples small and focused on specific scenarios, usage patterns, or situations that are difficult to express concisely in prose. Prefer code examples when they are more explicit or concise than equivalent text descriptions. Avoid code examples for obvious implementations or concepts that can be clearly explained in bullet points or brief text. If explicitly directed, longer code examples are appropriate. The guiding principle is to maintain a balance of conciseness, precision, and comprehensiveness—choose the format (code or text) that best achieves this balance.
159
+ </finalized_spec_requirements>
.cursor/commands/finalize_spec.md DELETED
@@ -1,21 +0,0 @@
1
- Convert the spec_draft document into a final draft in the spec.md file.
2
-
3
- The spec acts as the comprehensive source of truth for this sprint and should include all the necessary context and technical details to implement this sprint. It should leave no ambiguity for important details necessary to properly implement the changes required.
4
-
5
- The spec.md will act as a reference for an LLM coding agent responsible for completing this sprint.
6
-
7
- The spec should include the following information if applicable:
8
- - An overview of the changes implemented in this sprint.
9
- - User stories for the new functionality, if applicable.
10
- - An outline of any new data models proposed.
11
- - An other technical details determined in the spec_draft or related conversations.
12
- - Specific filepaths for files for any files that need to be added, edited, or deleted as part of this sprint.
13
- - Specific files or modules relevant to this sprint.
14
- - Details on how things should function such as a function, workflow, or other process.
15
- - Describe what any new functions, services, ect. are supposed to do.
16
- - Any reasoning or rationale behind decisions, preferences, or changes that provides context for the sprint and its changes.
17
- - Any other information required to properly understand this sprint, the desired changes, the expected deliverables, or important technical details.
18
-
19
- Strive to retain all the final decisions and implementation details provided in the spec_draft and related conversations. Cleaning and organizing these raw notes is desirable, but do not exclude or leave out information provided in the spec_draft if it is relevant to this sprint. If there is information in the spec_draft that is outdated and negated or revised by further direction in the draft or related conversation, you should leave that stale information out of the final spec.
20
-
21
- The spec should have all the information a junior developer needs to complete this sprint. They should be able to independently find answers to any questions they have about this sprint and how to implement it in this document.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
.cursor/rules/standards/code_style/readme.mdc ADDED
@@ -0,0 +1,99 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ description: Guidelines for writing README documents.
3
+ alwaysApply: false
4
+ ---
5
+
6
+ # README Generation Rules
7
+
8
+ You are an expert technical writer and software engineer. When asked to write, update, or critique a README.md, adhere strictly to the following principles, style guide, and structure.
9
+
10
+ ## 1. Core Principles
11
+
12
+ - **The 15-Minute Rule:** The primary goal is to minimize "Time-to-Hello-World." A user must be able to install and run the project (or a specific example) within 15 minutes.
13
+ - **Truth Over Fluff:** NEVER hallucinate features. If a feature is planned, put it in a "Roadmap" section. If code does not exist to support a claim, do not write it.
14
+ - **Inverted Pyramid:** Place the most critical information (What is it? How do I run it?) at the top.
15
+
16
+ ## 2. Tone & Style Guidelines
17
+
18
+ - **No Marketing Fluff:** DELETE adjectives like "seamless," "easy," "robust," "blazing fast," "state-of-the-art," and "comprehensive." Let the code prove its value.
19
+ - **Active Voice:** Use the imperative mood.
20
+ - _Bad:_ "The script can be run by the user..."
21
+ - _Good:_ "Run the script..."
22
+ - **Present Tense:** Avoid "will."
23
+ - _Bad:_ "Clicking save will write the file."
24
+ - _Good:_ "Click save. The system writes the file."
25
+ - **Second Person:** Address the user as "you" (implied).
26
+ - **Concrete & Specific:**
27
+ - _Bad:_ "Requires a database."
28
+ - _Good:_ "Requires PostgreSQL v14+ running on port 5432."
29
+
30
+ ## 3. Formatting Standards (Markdown)
31
+
32
+ - **Semantic Headers:** DO NOT skip header levels (e.g., jumping from H1 to H3).
33
+ - **Code Blocks:**
34
+ - ALWAYS use fenced code blocks (three backticks) with a language identifier (e.g., `bash, `python).
35
+ - NEVER use indentation (4 spaces) for code blocks.
36
+ - **Inline Code:** Use single backticks for: filenames, directories, variable names, methods, and boolean values (`true`/`false`). Do not bold these.
37
+ - **Lists:**
38
+ - Use hyphens `-` for bullet points.
39
+ - Use `1.` for ALL numbered list items (Markdown renders the order automatically).
40
+ - **Alerts:** Use GitHub-standard alert syntax:
41
+ ```markdown
42
+ > [!NOTE]
43
+ > text
44
+ ```
45
+ - **Links:** Use descriptive link text. Never use "click here."
46
+
47
+ ## 4. Structural Template & Logic
48
+
49
+ Determine if the project is a **Library** (code used by other code) or an **Application** (standalone tool).
50
+
51
+ ### Section 1: The Hook (Required)
52
+
53
+ - **H1:** Project Name.
54
+ - **Description:** ONE paragraph.
55
+ 1. What is it? (Concrete noun).
56
+ 2. What problem does it solve?
57
+ 3. Why is it distinct? (Metrics, not adjectives).
58
+ - **Bad Description:** "A holistic solution for data."
59
+ - **Good Description:** "A Python library that converts CSV to JSON without loading the file into memory."
60
+
61
+ ### Section 2: Visuals (Required)
62
+
63
+ - Include a placeholder for a screenshot, GIF, or terminal output.
64
+ - `![Description of visual content](relative/path/to/image.png)`
65
+
66
+ ### Section 3: Installation & Usage (Context Dependent)
67
+
68
+ **IF LIBRARY (e.g., npm, pip):**
69
+
70
+ 1. **Install:** `pip install package-name`
71
+ 2. **Quick Start:** Provide a "Copy-Paste" block.
72
+ - MUST be a self-contained code snippet that actually runs.
73
+ - MUST use real API method names found in the context.
74
+ - DO NOT use generic placeholders like `foo` or `bar` unless necessary.
75
+
76
+ **IF APPLICATION (e.g., Web App, CLI):**
77
+
78
+ 1. **Prerequisites:** Strict list (e.g., "Node v18+", "Docker").
79
+ 2. **Setup:**
80
+ ```bash
81
+ git clone [repo]
82
+ npm install
83
+ cp .env.example .env
84
+ npm run start
85
+ ```
86
+
87
+ ### Section 4: Deep Dive (Optional but Recommended)
88
+
89
+ - **Configuration:** Env variables, flags.
90
+ - **Architecture:** High-level diagram explanation (if complex).
91
+ - **Roadmap:** Planned features (clearly marked as "Future").
92
+
93
+ ## 5. "AI-Proofing" Verification Checklist
94
+
95
+ Before finalizing the output, perform these checks:
96
+
97
+ 1. **Hallucination Check:** Do the installation commands actually exist in the codebase (e.g., is there a `requirements.txt` or `package.json` matching the install instructions)?
98
+ 2. **API Check:** Do the methods used in the "Quick Start" match the actual function definitions in the provided files?
99
+ 3. **Adverb Purge:** Remove all adverbs ending in "ly" (e.g., "automatically," "intuitively") unless essential for technical accuracy.
.env.example CHANGED
@@ -1,7 +1,13 @@
1
- # Common Standards Project API Configuration
2
- CSP_API_KEY=your_generated_api_key_here
3
-
4
  # Pinecone Configuration
5
- PINECONE_API_KEY=your_pinecone_api_key_here
 
6
  PINECONE_INDEX_NAME=common-core-standards
7
  PINECONE_NAMESPACE=standards
 
 
 
 
 
 
 
 
 
 
 
 
1
  # Pinecone Configuration
2
+ # Get your API key from https://app.pinecone.io/
3
+ PINECONE_API_KEY=your_api_key_here
4
  PINECONE_INDEX_NAME=common-core-standards
5
  PINECONE_NAMESPACE=standards
6
+
7
+ # Hugging Face Configuration
8
+ # Get your token from https://huggingface.co/settings/tokens
9
+ # Required for chat interface Inference API access
10
+ HF_TOKEN=your_huggingface_token_here
11
+
12
+ # Note: MCP_SERVER_URL is not needed since we call functions directly
13
+ # The MCP server is automatically exposed by Gradio when mcp_server=True
app.py ADDED
@@ -0,0 +1,419 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Gradio MCP server for Common Core Standards search and lookup."""
2
+
3
+ import os
4
+ import json
5
+ from typing import Any
6
+
7
+ from dotenv import load_dotenv
8
+ import gradio as gr
9
+ from huggingface_hub import InferenceClient
10
+
11
+ # Load environment variables from .env file
12
+ load_dotenv()
13
+
14
+ from src.search import find_relevant_standards_impl
15
+ from src.lookup import get_standard_details_impl
16
+
17
+ # Initialize the Hugging Face Inference Client
18
+ # Use HF_TOKEN from environment (automatically available in Hugging Face Spaces)
19
+ # Provider is required for models that need Inference Providers (e.g., Together AI, Nebius)
20
+ HF_TOKEN = os.environ.get("HF_TOKEN")
21
+ client = InferenceClient(
22
+ provider="together", # Required: specifies the inference provider for tool calling
23
+ token=HF_TOKEN
24
+ )
25
+
26
+ # Define the function schemas in OpenAI format for the model
27
+ TOOLS = [
28
+ {
29
+ "type": "function",
30
+ "function": {
31
+ "name": "find_relevant_standards",
32
+ "description": "Searches for educational standards relevant to a learning activity using semantic search. Use this when the user asks about standards for a specific activity, lesson, or educational objective.",
33
+ "parameters": {
34
+ "type": "object",
35
+ "properties": {
36
+ "activity": {
37
+ "type": "string",
38
+ "description": "A natural language description of the learning activity, lesson, or educational objective. Be specific and descriptive."
39
+ },
40
+ "max_results": {
41
+ "type": "integer",
42
+ "description": "Maximum number of standards to return (1-20). Default is 5.",
43
+ "default": 5,
44
+ "minimum": 1,
45
+ "maximum": 20
46
+ },
47
+ "grade": {
48
+ "type": "string",
49
+ "description": "Optional grade level filter. Valid values: K, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, or 09-12 for high school range.",
50
+ "enum": ["K", "01", "02", "03", "04", "05", "06", "07", "08", "09", "10", "11", "12", "09-12"]
51
+ }
52
+ },
53
+ "required": ["activity"]
54
+ }
55
+ }
56
+ },
57
+ {
58
+ "type": "function",
59
+ "function": {
60
+ "name": "get_standard_details",
61
+ "description": "Retrieves complete metadata and content for a specific educational standard by its GUID (_id field). Use this when you have the exact GUID from a previous search result. This function ONLY accepts GUIDs, not statement notations or other identifiers. For searching by content or notation, use find_relevant_standards instead.",
62
+ "parameters": {
63
+ "type": "object",
64
+ "properties": {
65
+ "standard_id": {
66
+ "type": "string",
67
+ "description": "The standard's GUID (_id field) - must be a valid GUID format (e.g., 'EA60C8D165F6481B90BFF782CE193F93'). This function does NOT accept statement notations or other identifier formats."
68
+ }
69
+ },
70
+ "required": ["standard_id"]
71
+ }
72
+ }
73
+ }
74
+ ]
75
+
76
+ def find_relevant_standards(
77
+ activity: str,
78
+ max_results: int = 5,
79
+ grade: str | None = None,
80
+ ) -> str:
81
+ """
82
+ Searches for educational standards relevant to a learning activity using semantic search.
83
+
84
+ This function performs a vector similarity search over the Common Core Standards database
85
+ to find standards that match the described learning activity. Results are ranked by relevance
86
+ and can be filtered by grade level.
87
+
88
+ Args:
89
+ activity: A natural language description of the learning activity, lesson, or educational
90
+ objective. Examples: "teaching fractions to third graders", "reading comprehension
91
+ activities", "solving quadratic equations". This is the primary search query and should
92
+ be descriptive and specific for best results.
93
+
94
+ max_results: The maximum number of standards to return. Must be between 1 and 20.
95
+ Default is 5. Higher values return more results but may include less relevant matches.
96
+
97
+ grade: Optional grade level filter. Must be one of the following valid grade level codes:
98
+ - "K" for Kindergarten
99
+ - "01" for Grade 1
100
+ - "02" for Grade 2
101
+ - "03" for Grade 3
102
+ - "04" for Grade 4
103
+ - "05" for Grade 5
104
+ - "06" for Grade 6
105
+ - "07" for Grade 7
106
+ - "08" for Grade 8
107
+ - "09" for Grade 9
108
+ - "10" for Grade 10
109
+ - "11" for Grade 11
110
+ - "12" for Grade 12
111
+ - "09-12" for high school range (when standards span multiple high school grades)
112
+
113
+ If None or empty string, no grade filtering is applied and standards from all grade
114
+ levels may be returned. The grade filter uses exact matching against the education_levels
115
+ metadata field in the database.
116
+
117
+ Returns:
118
+ A JSON string containing a structured response with the following format:
119
+ {
120
+ "success": true|false,
121
+ "results": [
122
+ {
123
+ "_id": "standard_guid",
124
+ "content": "full standard text with hierarchy",
125
+ "subject": "Mathematics",
126
+ "education_levels": ["03"],
127
+ "statement_notation": "3.NF.A.1",
128
+ "standard_set_title": "Grade 3",
129
+ "score": 0.85
130
+ },
131
+ ...
132
+ ],
133
+ "message": "Found N matching standards" or error message,
134
+ "error_type": null or error type if success is false
135
+ }
136
+
137
+ On success, the results array contains up to max_results standards, sorted by relevance
138
+ score (highest first). Each result includes the full standard content, metadata, and
139
+ relevance score. On error, success is false and an error message describes the issue.
140
+ """
141
+ # Handle empty string from dropdown (convert to None)
142
+ if grade == "":
143
+ grade = None
144
+
145
+ # Ensure max_results is an integer (gr.Number returns float by default)
146
+ max_results = int(max_results)
147
+
148
+ return find_relevant_standards_impl(activity, max_results, grade)
149
+
150
+
151
+ def get_standard_details(standard_id: str) -> str:
152
+ """
153
+ Retrieves complete metadata and content for a specific educational standard by its GUID.
154
+
155
+ This function performs a direct lookup using the standard's GUID (_id field) only.
156
+ It does NOT accept statement notations, ASN identifiers, or any other identifier formats.
157
+ Use find_relevant_standards to search for standards by content or metadata.
158
+
159
+ Args:
160
+ standard_id: The standard's GUID (_id field) - must be a valid GUID format
161
+ (e.g., "EA60C8D165F6481B90BFF782CE193F93"). This is the GUID returned in
162
+ search results from find_relevant_standards.
163
+
164
+ Returns:
165
+ A JSON string containing a structured response with the following format:
166
+ {
167
+ "success": true|false,
168
+ "results": [
169
+ {
170
+ "_id": "standard_guid",
171
+ "content": "full standard text with hierarchy",
172
+ "subject": "Mathematics",
173
+ "education_levels": ["03"],
174
+ "statement_notation": "3.NF.A.1",
175
+ "standard_set_title": "Grade 3",
176
+ "asn_identifier": "S21238682",
177
+ "depth": 3,
178
+ "is_leaf": true,
179
+ "parent_id": "parent_guid",
180
+ "ancestor_ids": [...],
181
+ "child_ids": [...],
182
+ ... (all available metadata fields)
183
+ }
184
+ ],
185
+ "message": "Retrieved standard details" or error message,
186
+ "error_type": null or error type if success is false
187
+ }
188
+
189
+ On success, the results array contains exactly one standard object with all available
190
+ metadata fields including hierarchy relationships, content, and identifiers. On error
191
+ (e.g., standard not found), success is false and the message provides guidance, such as
192
+ suggesting to use find_relevant_standards for searching.
193
+
194
+ Raises:
195
+ This function does not raise exceptions. All errors are returned as JSON responses
196
+ with success=false and appropriate error messages.
197
+ """
198
+ return get_standard_details_impl(standard_id)
199
+
200
+
201
+ def chat_with_standards(message: str, history: list):
202
+ """
203
+ Chat function that uses MCP tools via Hugging Face Inference API with tool calling.
204
+
205
+ This function integrates with Qwen2.5-7B-Instruct to answer questions about educational
206
+ standards. The model can call find_relevant_standards and get_standard_details tools
207
+ to retrieve information and provide accurate responses.
208
+
209
+ Args:
210
+ message: The user's current message/query
211
+ history: Chat history in Gradio 6 messages format. Each message is a dict with
212
+ "role" and "content" keys. In Gradio 6, content uses structured format:
213
+ [{"type": "text", "text": "..."}, ...] for text content.
214
+
215
+ Returns:
216
+ Structured content as a list of content blocks. When tool calls are made, includes:
217
+ - Expandable JSON blocks showing tool call results
218
+ - The final assistant response as text
219
+ When no tool calls are made, returns a simple text response.
220
+ """
221
+ # Convert Gradio 6 history format to OpenAI messages format
222
+ # Gradio 6 uses structured content: {"role": "user", "content": [{"type": "text", "text": "..."}]}
223
+ messages = []
224
+ if history:
225
+ for msg in history:
226
+ if isinstance(msg, dict):
227
+ role = msg.get("role", "user")
228
+ content = msg.get("content", "")
229
+
230
+ # Handle Gradio 6 structured content format
231
+ if isinstance(content, list):
232
+ # Extract text from content blocks
233
+ text_parts = []
234
+ for block in content:
235
+ if isinstance(block, dict) and block.get("type") == "text":
236
+ text_parts.append(block.get("text", ""))
237
+ content = " ".join(text_parts)
238
+
239
+ messages.append({
240
+ "role": role,
241
+ "content": content
242
+ })
243
+
244
+ # Add system message to guide the model
245
+ system_message = {
246
+ "role": "system",
247
+ "content": "You are a helpful assistant for parents and teachers. Your role is to help them plan educational activities and find educational requirements for activities they might have already done. You have access to tools that can search for standards and retrieve standard details. Use these tools when users ask about standards, learning activities, or educational requirements. Always provide clear, helpful responses based on the tool results."
248
+ }
249
+
250
+ # Add current user message
251
+ messages.append({"role": "user", "content": message})
252
+
253
+ # Prepare full message list with system message
254
+ full_messages = [system_message] + messages
255
+
256
+ try:
257
+ # Initial API call with tools
258
+ response = client.chat.completions.create(
259
+ model="Qwen/Qwen2.5-7B-Instruct",
260
+ messages=full_messages,
261
+ tools=TOOLS,
262
+ tool_choice="auto", # Let the model decide when to call functions
263
+ temperature=0.7,
264
+ max_tokens=1000,
265
+ )
266
+
267
+ response_message = response.choices[0].message
268
+
269
+ # Check if model wants to call functions
270
+ if response_message.tool_calls:
271
+ # Add assistant's tool call request to messages
272
+ full_messages.append(response_message)
273
+
274
+ # Store tool call results for display
275
+ tool_results = []
276
+
277
+ # Process each tool call
278
+ for tool_call in response_message.tool_calls:
279
+ function_name = tool_call.function.name
280
+ function_args = json.loads(tool_call.function.arguments)
281
+
282
+ # Execute the function
283
+ if function_name == "find_relevant_standards":
284
+ print(f"Finding relevant standards for activity: {function_args.get('activity', '')}")
285
+ result = find_relevant_standards_impl(
286
+ activity=function_args.get("activity", ""),
287
+ max_results=function_args.get("max_results", 5),
288
+ grade=function_args.get("grade"),
289
+ )
290
+ elif function_name == "get_standard_details":
291
+ print(f"Getting standard details for standard ID: {function_args.get('standard_id', '')}")
292
+ result = get_standard_details_impl(
293
+ standard_id=function_args.get("standard_id", "")
294
+ )
295
+ else:
296
+ result = json.dumps({"error": f"Function {function_name} not available"})
297
+
298
+ # Parse result JSON for display
299
+ try:
300
+ result_data = json.loads(result) if isinstance(result, str) else result
301
+ except json.JSONDecodeError:
302
+ result_data = {"raw_result": result}
303
+
304
+ # Store tool call info for display
305
+ tool_results.append({
306
+ "function": function_name,
307
+ "arguments": function_args,
308
+ "result": result_data
309
+ })
310
+
311
+ # Add function result to messages
312
+ full_messages.append({
313
+ "role": "tool",
314
+ "tool_call_id": tool_call.id,
315
+ "name": function_name,
316
+ "content": result,
317
+ })
318
+
319
+ # Get final response with function results
320
+ final_response = client.chat.completions.create(
321
+ model="Qwen/Qwen2.5-7B-Instruct",
322
+ messages=full_messages,
323
+ temperature=0.7,
324
+ max_tokens=1000,
325
+ )
326
+
327
+ # Build structured response with tool call results and final answer
328
+ response_blocks = []
329
+
330
+ # Add tool call results as expandable JSON blocks using markdown
331
+ for i, tool_result in enumerate(tool_results):
332
+ # Format arguments and result as pretty JSON
333
+ args_json = json.dumps(tool_result["arguments"], indent=2)
334
+ result_json = json.dumps(tool_result["result"], indent=2)
335
+
336
+ # Create collapsible markdown section
337
+ tool_markdown = f"""<details>
338
+ <summary><strong>🔧 Tool Call: {tool_result["function"]}</strong></summary>
339
+
340
+ **Arguments:**
341
+ ```json
342
+ {args_json}
343
+ ```
344
+
345
+ **Result:**
346
+ ```json
347
+ {result_json}
348
+ ```
349
+ </details>
350
+ """
351
+ response_blocks.append({
352
+ "type": "text",
353
+ "text": tool_markdown
354
+ })
355
+
356
+ # Add separator before final response
357
+ response_blocks.append({
358
+ "type": "text",
359
+ "text": "---\n"
360
+ })
361
+
362
+ # Add final assistant response as text
363
+ response_blocks.append({
364
+ "type": "text",
365
+ "text": final_response.choices[0].message.content
366
+ })
367
+
368
+ return response_blocks
369
+ else:
370
+ # No tool calls, return direct response as text
371
+ return response_message.content
372
+
373
+ except Exception as e:
374
+ # Error handling
375
+ return f"I apologize, but I encountered an error: {str(e)}. Please try again or rephrase your question."
376
+
377
+
378
+ # Create Gradio interface
379
+ demo = gr.TabbedInterface(
380
+ [
381
+ gr.ChatInterface(
382
+ fn=chat_with_standards, # See complete implementation above
383
+ title="Chat with Standards",
384
+ description="Ask questions about educational standards. The AI will use MCP tools to find relevant information.",
385
+ examples=["What standards apply to teaching fractions in 3rd grade?", "Find standards for reading comprehension"],
386
+ api_visibility="private", # Hide from MCP server - only expose search and lookup tools
387
+ ),
388
+ gr.Interface(
389
+ fn=find_relevant_standards,
390
+ inputs=[
391
+ gr.Textbox(label="Activity Description", placeholder="Describe a learning activity..."),
392
+ gr.Number(label="Max Results", value=5, minimum=1, maximum=20),
393
+ gr.Dropdown(
394
+ label="Grade (optional)",
395
+ choices=["", "K", "01", "02", "03", "04", "05", "06", "07", "08", "09", "10", "11", "12", "09-12"],
396
+ value=None,
397
+ info="Select a grade level to filter results"
398
+ ),
399
+ ],
400
+ outputs=gr.JSON(label="Results"),
401
+ title="Find Relevant Standards",
402
+ description="Search for educational standards relevant to a learning activity.",
403
+ api_name="find_relevant_standards",
404
+ ),
405
+ gr.Interface(
406
+ fn=get_standard_details,
407
+ inputs=gr.Textbox(label="Standard ID", placeholder="Enter a standard GUID or identifier..."),
408
+ outputs=gr.JSON(label="Standard Details"),
409
+ title="Get Standard Details",
410
+ description="Retrieve full metadata for a specific standard by its ID.",
411
+ api_name="get_standard_details",
412
+ ),
413
+ ],
414
+ ["Chat", "Search", "Lookup"],
415
+ )
416
+
417
+ if __name__ == "__main__":
418
+ demo.launch(mcp_server=True)
419
+
pyproject.toml CHANGED
@@ -3,8 +3,7 @@ name = "common-core-mcp"
3
  version = "0.1.0"
4
  requires-python = ">=3.12"
5
  dependencies = [
6
- "mcp",
7
- "gradio>=5.0.0,<6.0.0",
8
  "pinecone",
9
  "python-dotenv",
10
  "typer",
@@ -13,6 +12,7 @@ dependencies = [
13
  "loguru",
14
  "pydantic>=2.0.0",
15
  "pydantic-settings>=2.0.0",
 
16
  ]
17
 
18
  [project.optional-dependencies]
 
3
  version = "0.1.0"
4
  requires-python = ">=3.12"
5
  dependencies = [
6
+ "gradio[mcp]>=6.0.0",
 
7
  "pinecone",
8
  "python-dotenv",
9
  "typer",
 
12
  "loguru",
13
  "pydantic>=2.0.0",
14
  "pydantic-settings>=2.0.0",
15
+ "huggingface_hub",
16
  ]
17
 
18
  [project.optional-dependencies]
requirements.txt ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ gradio[mcp]>=6.0.0
2
+ pinecone
3
+ python-dotenv
4
+ typer
5
+ requests
6
+ rich
7
+ loguru
8
+ pydantic>=2.0.0
9
+ pydantic-settings>=2.0.0
10
+ huggingface_hub
11
+
src/lookup.py ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Direct ID lookup implementation for educational standards."""
2
+
3
+ from __future__ import annotations
4
+
5
+ import json
6
+
7
+ from pinecone.exceptions import PineconeException
8
+
9
+ from src.pinecone_client import PineconeClient
10
+
11
+
12
+ def get_standard_details_impl(standard_id: str) -> str:
13
+ """
14
+ Implementation of direct standard lookup by GUID only.
15
+
16
+ This function only accepts GUIDs (_id field) from Pinecone. It does NOT accept
17
+ statement_notation or other identifier formats. Use find_relevant_standards to
18
+ search for standards by content or metadata.
19
+
20
+ Args:
21
+ standard_id: The standard's GUID (_id field) - must be a valid GUID format
22
+ (e.g., "EA60C8D165F6481B90BFF782CE193F93")
23
+
24
+ Returns:
25
+ JSON string with structured response containing standard details
26
+ """
27
+ # Input validation
28
+ if not standard_id or not standard_id.strip():
29
+ return json.dumps(
30
+ {
31
+ "success": False,
32
+ "results": [],
33
+ "message": "Standard ID cannot be empty",
34
+ "error_type": "invalid_input",
35
+ }
36
+ )
37
+
38
+ try:
39
+ # Initialize client and fetch standard
40
+ client = PineconeClient()
41
+ result = client.fetch_standard(standard_id.strip())
42
+
43
+ # Handle not found
44
+ if result is None:
45
+ return json.dumps(
46
+ {
47
+ "success": False,
48
+ "results": [],
49
+ "message": f"Standard with GUID '{standard_id}' not found. This function only accepts GUIDs (e.g., 'EA60C8D165F6481B90BFF782CE193F93'). For statement notations or other identifiers, use find_relevant_standards with a keyword search instead.",
50
+ "error_type": "not_found",
51
+ }
52
+ )
53
+
54
+ # Format successful result
55
+ response = {
56
+ "success": True,
57
+ "results": [result],
58
+ "message": "Retrieved standard details",
59
+ }
60
+
61
+ return json.dumps(response, indent=2)
62
+
63
+ except PineconeException as e:
64
+ return json.dumps(
65
+ {
66
+ "success": False,
67
+ "results": [],
68
+ "message": f"Pinecone API error: {str(e)}",
69
+ "error_type": "api_error",
70
+ }
71
+ )
72
+ except Exception as e:
73
+ return json.dumps(
74
+ {
75
+ "success": False,
76
+ "results": [],
77
+ "message": f"Unexpected error: {str(e)}",
78
+ "error_type": "api_error",
79
+ }
80
+ )
81
+
src/mcp_config.py ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """MCP server configuration module."""
2
+
3
+ from __future__ import annotations
4
+
5
+ from pydantic_settings import BaseSettings, SettingsConfigDict
6
+
7
+
8
+ class McpSettings(BaseSettings):
9
+ """Configuration settings for the MCP server."""
10
+
11
+ model_config = SettingsConfigDict(
12
+ env_file=".env",
13
+ env_file_encoding="utf-8",
14
+ case_sensitive=False,
15
+ extra="ignore",
16
+ )
17
+
18
+ # Pinecone Configuration
19
+ pinecone_api_key: str = ""
20
+ pinecone_index_name: str = "common-core-standards"
21
+ pinecone_namespace: str = "standards"
22
+
23
+
24
+ _settings: McpSettings | None = None
25
+
26
+
27
+ def get_mcp_settings() -> McpSettings:
28
+ """Get the singleton MCP settings instance."""
29
+ global _settings
30
+ if _settings is None:
31
+ _settings = McpSettings()
32
+ return _settings
33
+
{tools → src}/pinecone_client.py RENAMED
@@ -12,10 +12,10 @@ from loguru import logger
12
  from pinecone import Pinecone
13
  from pinecone.exceptions import PineconeException
14
 
15
- from tools.config import get_settings
16
  from tools.pinecone_models import PineconeRecord
17
 
18
- settings = get_settings()
19
 
20
 
21
  class PineconeClient:
@@ -205,6 +205,8 @@ class PineconeClient:
205
  "normalized_subject",
206
  "publication_status",
207
  "parent_id", # Must be omitted when None (Pinecone doesn't accept null)
 
 
208
  }
209
  for field in optional_fields:
210
  if record_dict.get(field) is None:
@@ -212,6 +214,103 @@ class PineconeClient:
212
 
213
  return record_dict
214
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
215
  @staticmethod
216
  def is_uploaded(set_dir: Path) -> bool:
217
  """
 
12
  from pinecone import Pinecone
13
  from pinecone.exceptions import PineconeException
14
 
15
+ from src.mcp_config import get_mcp_settings
16
  from tools.pinecone_models import PineconeRecord
17
 
18
+ settings = get_mcp_settings()
19
 
20
 
21
  class PineconeClient:
 
205
  "normalized_subject",
206
  "publication_status",
207
  "parent_id", # Must be omitted when None (Pinecone doesn't accept null)
208
+ "document_id",
209
+ "document_valid",
210
  }
211
  for field in optional_fields:
212
  if record_dict.get(field) is None:
 
214
 
215
  return record_dict
216
 
217
+ def search_standards(
218
+ self,
219
+ query_text: str,
220
+ top_k: int = 5,
221
+ grade: str | None = None,
222
+ ) -> list[dict]:
223
+ """
224
+ Perform semantic search over standards.
225
+
226
+ Args:
227
+ query_text: Natural language query
228
+ top_k: Maximum number of results
229
+ grade: Optional grade filter
230
+
231
+ Returns:
232
+ List of result dictionaries with metadata and scores
233
+ """
234
+ # Build filter dictionary dynamically
235
+ # Always filter to only leaf nodes (actual standards, not parent categories)
236
+ filter_parts = [{"is_leaf": {"$eq": True}}]
237
+
238
+ if grade:
239
+ filter_parts.append({"education_levels": {"$in": [grade]}})
240
+
241
+ filter_dict = None
242
+ if len(filter_parts) == 1:
243
+ filter_dict = filter_parts[0]
244
+ elif len(filter_parts) == 2:
245
+ filter_dict = {"$and": filter_parts}
246
+
247
+ # Build query dictionary
248
+ query_dict: dict[str, Any] = {
249
+ "inputs": {"text": query_text},
250
+ "top_k": top_k * 2, # Get more candidates for reranking
251
+ }
252
+ if filter_dict:
253
+ query_dict["filter"] = filter_dict
254
+
255
+ # Call search with reranking
256
+ results = self.index.search(
257
+ namespace=self.namespace,
258
+ query=query_dict,
259
+ rerank={"model": "bge-reranker-v2-m3", "top_n": top_k, "rank_fields": ["content"]},
260
+ )
261
+
262
+ # Parse results
263
+ hits = results.get("result", {}).get("hits", [])
264
+ parsed_results = []
265
+ for hit in hits:
266
+ result_dict = {
267
+ "_id": hit["_id"],
268
+ "score": hit["_score"],
269
+ **hit.get("fields", {}),
270
+ }
271
+ parsed_results.append(result_dict)
272
+
273
+ return parsed_results
274
+
275
+ def fetch_standard(self, standard_id: str) -> dict | None:
276
+ """
277
+ Fetch a standard by its GUID (_id field only).
278
+
279
+ This method performs a direct lookup using Pinecone's fetch() API, which only
280
+ works with the standard's GUID (_id field). It does NOT search by statement_notation,
281
+ asn_identifier, or any other metadata fields.
282
+
283
+ Args:
284
+ standard_id: Standard GUID (_id field) - must be the exact GUID format
285
+ (e.g., "EA60C8D165F6481B90BFF782CE193F93")
286
+
287
+ Returns:
288
+ Standard dictionary with metadata, or None if not found
289
+ """
290
+ result = self.index.fetch(ids=[standard_id], namespace=self.namespace)
291
+
292
+ # Extract vectors from FetchResponse
293
+ # FetchResponse.vectors is a dict mapping ID to Vector objects
294
+ vectors = result.vectors
295
+
296
+ if not vectors or standard_id not in vectors:
297
+ return None
298
+
299
+ vector = vectors[standard_id]
300
+
301
+ # Extract metadata from Vector object
302
+ # Vector has: id, values (embedding), and metadata (dict with all fields)
303
+ metadata = vector.metadata or {}
304
+ vector_id = vector.id
305
+
306
+ # Combine _id with all metadata fields
307
+ record_dict = {
308
+ "_id": vector_id,
309
+ **metadata,
310
+ }
311
+
312
+ return record_dict
313
+
314
  @staticmethod
315
  def is_uploaded(set_dir: Path) -> bool:
316
  """
src/search.py ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Semantic search implementation for educational standards."""
2
+
3
+ from __future__ import annotations
4
+
5
+ import json
6
+
7
+ from pinecone.exceptions import PineconeException
8
+
9
+ from src.pinecone_client import PineconeClient
10
+
11
+
12
+ def find_relevant_standards_impl(
13
+ activity: str,
14
+ max_results: int = 5,
15
+ grade: str | None = None,
16
+ ) -> str:
17
+ """
18
+ Implementation of semantic search over educational standards.
19
+
20
+ Args:
21
+ activity: Description of the learning activity
22
+ max_results: Maximum number of standards to return (default: 5)
23
+ grade: Optional grade level filter (e.g., "K", "01", "05", "09")
24
+
25
+ Returns:
26
+ JSON string with structured response containing matching standards
27
+ """
28
+ # Input validation
29
+ if not activity or not activity.strip():
30
+ return json.dumps(
31
+ {
32
+ "success": False,
33
+ "results": [],
34
+ "message": "Activity description cannot be empty",
35
+ "error_type": "invalid_input",
36
+ }
37
+ )
38
+
39
+ try:
40
+ # Initialize client and perform search
41
+ client = PineconeClient()
42
+ results = client.search_standards(
43
+ query_text=activity.strip(),
44
+ top_k=max_results,
45
+ grade=grade,
46
+ )
47
+
48
+ # Handle empty results
49
+ if not results:
50
+ return json.dumps(
51
+ {
52
+ "success": False,
53
+ "results": [],
54
+ "message": "No matching standards found",
55
+ "error_type": "no_results",
56
+ }
57
+ )
58
+
59
+ # Format successful results
60
+ response = {
61
+ "success": True,
62
+ "results": results,
63
+ "message": f"Found {len(results)} matching standards",
64
+ }
65
+
66
+ return json.dumps(response, indent=2)
67
+
68
+ except PineconeException as e:
69
+ return json.dumps(
70
+ {
71
+ "success": False,
72
+ "results": [],
73
+ "message": f"Pinecone API error: {str(e)}",
74
+ "error_type": "api_error",
75
+ }
76
+ )
77
+ except Exception as e:
78
+ return json.dumps(
79
+ {
80
+ "success": False,
81
+ "results": [],
82
+ "message": f"Unexpected error: {str(e)}",
83
+ "error_type": "api_error",
84
+ }
85
+ )
86
+
tools/cli.py CHANGED
@@ -470,7 +470,7 @@ def pinecone_init():
470
  Uses integrated embeddings with llama-text-embed-v2 model.
471
  """
472
  try:
473
- from tools.pinecone_client import PineconeClient
474
 
475
  console.print("[bold]Initializing Pinecone...[/bold]")
476
 
@@ -555,7 +555,7 @@ def pinecone_upload(
555
  If neither is provided, you'll be prompted to confirm uploading all sets.
556
  """
557
  try:
558
- from tools.pinecone_client import PineconeClient
559
  from tools.pinecone_models import ProcessedStandardSet
560
  import json
561
 
 
470
  Uses integrated embeddings with llama-text-embed-v2 model.
471
  """
472
  try:
473
+ from src.pinecone_client import PineconeClient
474
 
475
  console.print("[bold]Initializing Pinecone...[/bold]")
476
 
 
555
  If neither is provided, you'll be prompted to confirm uploading all sets.
556
  """
557
  try:
558
+ from src.pinecone_client import PineconeClient
559
  from tools.pinecone_models import ProcessedStandardSet
560
  import json
561
 
tools/pinecone_models.py CHANGED
@@ -31,8 +31,8 @@ class PineconeRecord(BaseModel):
31
  subject: str
32
  normalized_subject: str | None = None
33
  education_levels: list[str]
34
- document_id: str
35
- document_valid: str
36
  publication_status: str | None = None
37
  jurisdiction_id: str
38
  jurisdiction_title: str
 
31
  subject: str
32
  normalized_subject: str | None = None
33
  education_levels: list[str]
34
+ document_id: str | None = None
35
+ document_valid: str | None = None
36
  publication_status: str | None = None
37
  jurisdiction_id: str
38
  jurisdiction_title: str