Spaces:
Running
Running
Commit
·
11df203
1
Parent(s):
69d9c55
Set up base gradio server
Browse files- .kiro/specs/hf-eda-mcp-server/design.md +305 -0
- .kiro/specs/hf-eda-mcp-server/requirements.md +60 -0
- .kiro/specs/hf-eda-mcp-server/tasks.md +102 -0
- .kiro/steering/product.md +16 -0
- .kiro/steering/structure.md +49 -0
- .kiro/steering/tech.md +36 -0
- .vscode/settings.json +3 -0
- pdm.lock +0 -0
- pyproject.toml +41 -0
- src/hf_eda_mcp/__init__.py +11 -0
- src/hf_eda_mcp/__main__.py +46 -0
- src/hf_eda_mcp/integrations/__init__.py +8 -0
- src/hf_eda_mcp/integrations/hf_client.py +7 -0
- src/hf_eda_mcp/server.py +30 -0
- src/hf_eda_mcp/services/__init__.py +8 -0
- src/hf_eda_mcp/services/dataset_service.py +7 -0
- src/hf_eda_mcp/tools/__init__.py +7 -0
- src/hf_eda_mcp/tools/analysis.py +7 -0
- src/hf_eda_mcp/tools/metadata.py +7 -0
- src/hf_eda_mcp/tools/sampling.py +7 -0
- tests/__init__.py +0 -0
.kiro/specs/hf-eda-mcp-server/design.md
ADDED
|
@@ -0,0 +1,305 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Design Document
|
| 2 |
+
|
| 3 |
+
## Overview
|
| 4 |
+
|
| 5 |
+
The hf-eda-mcp system is designed as a Gradio-based MCP server that provides exploratory data analysis tools for HuggingFace datasets. The system leverages Gradio's built-in MCP server capabilities to automatically expose EDA functions as MCP tools, enabling seamless integration with AI assistants and other MCP-compatible systems.
|
| 6 |
+
|
| 7 |
+
The architecture follows a modular approach where core EDA functionality is implemented as separate Python functions, which are then wrapped in Gradio interfaces and automatically converted to MCP tools through Gradio's native MCP integration.
|
| 8 |
+
|
| 9 |
+
## Architecture
|
| 10 |
+
|
| 11 |
+
### High-Level Architecture
|
| 12 |
+
|
| 13 |
+
```mermaid
|
| 14 |
+
graph TB
|
| 15 |
+
subgraph "MCP Client (AI Assistant)"
|
| 16 |
+
A[AI Assistant]
|
| 17 |
+
end
|
| 18 |
+
|
| 19 |
+
subgraph "hf-eda-mcp Server"
|
| 20 |
+
B[Gradio App with MCP Server]
|
| 21 |
+
C[EDA Tools Module]
|
| 22 |
+
D[Dataset Service]
|
| 23 |
+
E[HuggingFace Integration]
|
| 24 |
+
end
|
| 25 |
+
|
| 26 |
+
subgraph "External Services"
|
| 27 |
+
F[HuggingFace Hub]
|
| 28 |
+
end
|
| 29 |
+
|
| 30 |
+
A -->|MCP Protocol| B
|
| 31 |
+
B --> C
|
| 32 |
+
C --> D
|
| 33 |
+
D --> E
|
| 34 |
+
E -->|API Calls| F
|
| 35 |
+
```
|
| 36 |
+
|
| 37 |
+
### Component Architecture
|
| 38 |
+
|
| 39 |
+
The system is organized into the following key components:
|
| 40 |
+
|
| 41 |
+
1. **Gradio MCP Server**: The main application that hosts the MCP server and web interface
|
| 42 |
+
2. **EDA Tools Module**: Core analysis functions for dataset exploration
|
| 43 |
+
3. **Dataset Service**: Handles dataset loading, caching, and metadata retrieval
|
| 44 |
+
4. **HuggingFace Integration**: Manages authentication and API interactions with HF Hub
|
| 45 |
+
|
| 46 |
+
## Components and Interfaces
|
| 47 |
+
|
| 48 |
+
### 1. Gradio MCP Server (`src/hf_eda_mcp/server.py`)
|
| 49 |
+
|
| 50 |
+
**Purpose**: Main application entry point that creates Gradio interfaces for EDA tools and enables MCP server functionality.
|
| 51 |
+
|
| 52 |
+
**Key Responsibilities**:
|
| 53 |
+
- Initialize Gradio app with MCP server enabled
|
| 54 |
+
- Define Gradio interfaces for each EDA tool
|
| 55 |
+
- Handle MCP protocol communication
|
| 56 |
+
- Manage server configuration and startup
|
| 57 |
+
|
| 58 |
+
**Interface**:
|
| 59 |
+
```python
|
| 60 |
+
def create_gradio_app() -> gr.Blocks:
|
| 61 |
+
"""Create and configure the main Gradio application with MCP server."""
|
| 62 |
+
|
| 63 |
+
def launch_server(port: int = 7860, mcp_server: bool = True) -> None:
|
| 64 |
+
"""Launch the Gradio app with MCP server enabled."""
|
| 65 |
+
```
|
| 66 |
+
|
| 67 |
+
### 2. EDA Tools Module (`src/hf_eda_mcp/tools/`)
|
| 68 |
+
|
| 69 |
+
**Purpose**: Contains individual EDA functions that will be exposed as MCP tools.
|
| 70 |
+
|
| 71 |
+
#### Dataset Metadata Tool (`tools/metadata.py`)
|
| 72 |
+
```python
|
| 73 |
+
def get_dataset_metadata(dataset_id: str, config_name: str = None) -> dict:
|
| 74 |
+
"""
|
| 75 |
+
Retrieve comprehensive metadata for a HuggingFace dataset.
|
| 76 |
+
|
| 77 |
+
Args:
|
| 78 |
+
dataset_id: HuggingFace dataset identifier (e.g., 'squad', 'glue')
|
| 79 |
+
config_name: Optional configuration name for multi-config datasets
|
| 80 |
+
|
| 81 |
+
Returns:
|
| 82 |
+
Dictionary containing dataset metadata including:
|
| 83 |
+
- Basic info (size, splits, features)
|
| 84 |
+
- Configuration details
|
| 85 |
+
- Download statistics
|
| 86 |
+
- Dataset card information
|
| 87 |
+
"""
|
| 88 |
+
```
|
| 89 |
+
|
| 90 |
+
#### Dataset Sampling Tool (`tools/sampling.py`)
|
| 91 |
+
```python
|
| 92 |
+
def get_dataset_sample(
|
| 93 |
+
dataset_id: str,
|
| 94 |
+
split: str = "train",
|
| 95 |
+
num_samples: int = 10,
|
| 96 |
+
config_name: str = None
|
| 97 |
+
) -> dict:
|
| 98 |
+
"""
|
| 99 |
+
Retrieve a sample of rows from a HuggingFace dataset.
|
| 100 |
+
|
| 101 |
+
Args:
|
| 102 |
+
dataset_id: HuggingFace dataset identifier
|
| 103 |
+
split: Dataset split to sample from
|
| 104 |
+
num_samples: Number of samples to retrieve
|
| 105 |
+
config_name: Optional configuration name
|
| 106 |
+
|
| 107 |
+
Returns:
|
| 108 |
+
Dictionary containing sampled data and metadata
|
| 109 |
+
"""
|
| 110 |
+
```
|
| 111 |
+
|
| 112 |
+
#### Basic Analysis Tool (`tools/analysis.py`)
|
| 113 |
+
```python
|
| 114 |
+
def analyze_dataset_features(
|
| 115 |
+
dataset_id: str,
|
| 116 |
+
split: str = "train",
|
| 117 |
+
sample_size: int = 1000,
|
| 118 |
+
config_name: str = None
|
| 119 |
+
) -> dict:
|
| 120 |
+
"""
|
| 121 |
+
Perform basic exploratory analysis on dataset features.
|
| 122 |
+
|
| 123 |
+
Args:
|
| 124 |
+
dataset_id: HuggingFace dataset identifier
|
| 125 |
+
split: Dataset split to analyze
|
| 126 |
+
sample_size: Number of samples to use for analysis
|
| 127 |
+
config_name: Optional configuration name
|
| 128 |
+
|
| 129 |
+
Returns:
|
| 130 |
+
Dictionary containing feature analysis results:
|
| 131 |
+
- Feature types and distributions
|
| 132 |
+
- Missing value statistics
|
| 133 |
+
- Summary statistics for numerical features
|
| 134 |
+
- Unique value counts for categorical features
|
| 135 |
+
"""
|
| 136 |
+
```
|
| 137 |
+
|
| 138 |
+
### 3. Dataset Service (`src/hf_eda_mcp/services/dataset_service.py`)
|
| 139 |
+
|
| 140 |
+
**Purpose**: Centralized service for dataset operations, caching, and metadata management.
|
| 141 |
+
|
| 142 |
+
**Key Responsibilities**:
|
| 143 |
+
- Load datasets from HuggingFace Hub
|
| 144 |
+
- Cache dataset metadata and samples
|
| 145 |
+
- Handle authentication for private datasets
|
| 146 |
+
- Manage dataset configuration and splits
|
| 147 |
+
|
| 148 |
+
**Interface**:
|
| 149 |
+
```python
|
| 150 |
+
class DatasetService:
|
| 151 |
+
def __init__(self, cache_dir: str = None, token: str = None):
|
| 152 |
+
"""Initialize dataset service with optional caching and authentication."""
|
| 153 |
+
|
| 154 |
+
def load_dataset_info(self, dataset_id: str, config_name: str = None) -> DatasetInfo:
|
| 155 |
+
"""Load dataset information from HuggingFace Hub."""
|
| 156 |
+
|
| 157 |
+
def load_dataset_sample(self, dataset_id: str, split: str, num_samples: int, config_name: str = None) -> Dataset:
|
| 158 |
+
"""Load a sample from the specified dataset."""
|
| 159 |
+
|
| 160 |
+
def get_cached_metadata(self, dataset_id: str, config_name: str = None) -> dict:
|
| 161 |
+
"""Retrieve cached metadata or fetch if not available."""
|
| 162 |
+
```
|
| 163 |
+
|
| 164 |
+
### 4. HuggingFace Integration (`src/hf_eda_mcp/integrations/hf_client.py`)
|
| 165 |
+
|
| 166 |
+
**Purpose**: Handles all interactions with HuggingFace Hub API and datasets library.
|
| 167 |
+
|
| 168 |
+
**Key Responsibilities**:
|
| 169 |
+
- Authenticate with HuggingFace Hub
|
| 170 |
+
- Fetch dataset information using huggingface_hub
|
| 171 |
+
- Load datasets using datasets library
|
| 172 |
+
- Handle errors and rate limiting
|
| 173 |
+
|
| 174 |
+
## Data Models
|
| 175 |
+
|
| 176 |
+
### Dataset Metadata Model
|
| 177 |
+
```python
|
| 178 |
+
from pydantic import BaseModel
|
| 179 |
+
|
| 180 |
+
class DatasetMetadata(BaseModel):
|
| 181 |
+
id: str
|
| 182 |
+
author: str
|
| 183 |
+
description: str
|
| 184 |
+
features: Dict[str, str]
|
| 185 |
+
splits: Dict[str, int]
|
| 186 |
+
configs: List[str]
|
| 187 |
+
size_bytes: int
|
| 188 |
+
downloads: int
|
| 189 |
+
likes: int
|
| 190 |
+
tags: List[str]
|
| 191 |
+
created_at: datetime
|
| 192 |
+
last_modified: datetime
|
| 193 |
+
```
|
| 194 |
+
|
| 195 |
+
### Analysis Result Model
|
| 196 |
+
```python
|
| 197 |
+
from pydantic import BaseModel
|
| 198 |
+
|
| 199 |
+
class FeatureAnalysis(BaseModel):
|
| 200 |
+
feature_name: str
|
| 201 |
+
feature_type: str
|
| 202 |
+
missing_count: int
|
| 203 |
+
missing_percentage: float
|
| 204 |
+
unique_count: int
|
| 205 |
+
statistics: Dict[str, Any] # Mean, std, min, max for numerical; top values for categorical
|
| 206 |
+
```
|
| 207 |
+
|
| 208 |
+
### Sample Data Model
|
| 209 |
+
```python
|
| 210 |
+
from pydantic import BaseModel
|
| 211 |
+
|
| 212 |
+
class DatasetSample(BaseModel):
|
| 213 |
+
dataset_id: str
|
| 214 |
+
split: str
|
| 215 |
+
config_name: str
|
| 216 |
+
sample_size: int
|
| 217 |
+
data: List[Dict[str, Any]]
|
| 218 |
+
schema: Dict[str, str]
|
| 219 |
+
```
|
| 220 |
+
|
| 221 |
+
## Error Handling
|
| 222 |
+
|
| 223 |
+
### Error Categories and Handling Strategy
|
| 224 |
+
|
| 225 |
+
1. **Dataset Not Found Errors**
|
| 226 |
+
- Return structured error response with suggestions
|
| 227 |
+
- Log error for monitoring
|
| 228 |
+
- Provide helpful error messages to users
|
| 229 |
+
|
| 230 |
+
2. **Authentication Errors**
|
| 231 |
+
- Handle private dataset access gracefully
|
| 232 |
+
- Provide clear instructions for authentication
|
| 233 |
+
- Support both token-based and login-based auth
|
| 234 |
+
|
| 235 |
+
3. **Network and API Errors**
|
| 236 |
+
- Implement retry logic with exponential backoff
|
| 237 |
+
- Cache successful responses to reduce API calls
|
| 238 |
+
- Provide fallback responses when possible
|
| 239 |
+
|
| 240 |
+
4. **Data Processing Errors**
|
| 241 |
+
- Validate input parameters before processing
|
| 242 |
+
- Handle malformed or unexpected data gracefully
|
| 243 |
+
- Provide partial results when possible
|
| 244 |
+
|
| 245 |
+
### Error Response Format
|
| 246 |
+
```python
|
| 247 |
+
from pydantic import BaseModel
|
| 248 |
+
|
| 249 |
+
class ErrorResponse(BaseModel):
|
| 250 |
+
error_type: str
|
| 251 |
+
message: str
|
| 252 |
+
details: Dict[str, Any]
|
| 253 |
+
suggestions: List[str]
|
| 254 |
+
```
|
| 255 |
+
|
| 256 |
+
## Testing Strategy
|
| 257 |
+
|
| 258 |
+
### Unit Testing
|
| 259 |
+
- Test individual EDA functions with mock datasets
|
| 260 |
+
- Validate data processing and analysis logic
|
| 261 |
+
- Test error handling for various edge cases
|
| 262 |
+
- Mock HuggingFace API calls for consistent testing
|
| 263 |
+
|
| 264 |
+
### Integration Testing
|
| 265 |
+
- Test Gradio interface creation and MCP tool exposure
|
| 266 |
+
- Validate end-to-end dataset loading and analysis workflows
|
| 267 |
+
- Test authentication and private dataset access
|
| 268 |
+
- Verify MCP protocol compliance
|
| 269 |
+
|
| 270 |
+
### Performance Testing
|
| 271 |
+
- Test with large datasets to ensure efficient sampling
|
| 272 |
+
- Validate caching mechanisms for repeated requests
|
| 273 |
+
- Monitor memory usage during dataset processing
|
| 274 |
+
- Test concurrent request handling
|
| 275 |
+
|
| 276 |
+
### Test Data Strategy
|
| 277 |
+
- Use small, well-known public datasets for testing
|
| 278 |
+
- Create mock datasets for edge case testing
|
| 279 |
+
- Test with various dataset formats and configurations
|
| 280 |
+
- Include datasets with missing values and data quality issues
|
| 281 |
+
|
| 282 |
+
## Configuration and Deployment
|
| 283 |
+
|
| 284 |
+
### Environment Configuration
|
| 285 |
+
- Support for HuggingFace authentication tokens
|
| 286 |
+
- Configurable cache directory and size limits
|
| 287 |
+
- Adjustable sampling limits and timeouts
|
| 288 |
+
- Optional logging and monitoring configuration
|
| 289 |
+
|
| 290 |
+
### Deployment Options
|
| 291 |
+
1. **Local Development**: Run as standalone Gradio app
|
| 292 |
+
2. **HuggingFace Spaces**: Deploy as hosted MCP server34. **MCP Client Integration**: Direct integration with MCP-compatible systems
|
| 293 |
+
|
| 294 |
+
### MCP Server Configuration
|
| 295 |
+
The server will be configured to work with standard MCP clients through Gradio's built-in MCP support:
|
| 296 |
+
|
| 297 |
+
```json
|
| 298 |
+
{
|
| 299 |
+
"mcpServers": {
|
| 300 |
+
"hf-eda-mcp-server": {
|
| 301 |
+
"url": "https://your-space.hf.space/gradio_api/mcp/sse"
|
| 302 |
+
}
|
| 303 |
+
}
|
| 304 |
+
}
|
| 305 |
+
```
|
.kiro/specs/hf-eda-mcp-server/requirements.md
ADDED
|
@@ -0,0 +1,60 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Requirements Document
|
| 2 |
+
|
| 3 |
+
## Introduction
|
| 4 |
+
|
| 5 |
+
This document specifies the requirements for hf-eda-mcp, a Model Context Protocol (MCP) server that provides tools for Exploratory Data Analysis (EDA) of datasets hosted on HuggingFace. The system will enable AI assistants to perform structured dataset exploration and analysis through MCP-compatible interfaces.
|
| 6 |
+
|
| 7 |
+
## Glossary
|
| 8 |
+
|
| 9 |
+
- **MCP Server**: A server implementation following the Model Context Protocol that provides tools accessible to AI systems
|
| 10 |
+
- **HuggingFace Dataset**: A dataset hosted on the HuggingFace Hub platform
|
| 11 |
+
- **Several EDA Tool**: Functions that perform exploratory data analysis operations on datasets
|
| 12 |
+
- **Dataset Metadata Tool**: Tool to fetch information about a dataset including size, features, splits, and configuration details
|
| 13 |
+
- **Gradio**: A Python library for building web interfaces and applications
|
| 14 |
+
- **Dataset Sample**: A subset of rows from a dataset used for analysis and preview
|
| 15 |
+
|
| 16 |
+
## Requirements
|
| 17 |
+
|
| 18 |
+
### Requirement 1
|
| 19 |
+
|
| 20 |
+
**User Story:** As a data scientist, I want to retrieve metadata from HuggingFace datasets, so that I can understand the structure and properties of datasets before analysis.
|
| 21 |
+
|
| 22 |
+
#### Acceptance Criteria
|
| 23 |
+
|
| 24 |
+
1. WHEN a dataset identifier is provided, THE MCP Server SHALL retrieve comprehensive metadata including dataset size, feature types, splits, and configuration details
|
| 25 |
+
2. THE MCP Server SHALL validate dataset identifiers and return appropriate error messages for invalid or non-existent datasets
|
| 26 |
+
3. THE MCP Server SHALL format metadata in a structured, readable format for AI assistant consumption
|
| 27 |
+
4. THE MCP Server SHALL handle datasets with multiple configurations and return configuration-specific metadata when requested
|
| 28 |
+
|
| 29 |
+
### Requirement 2
|
| 30 |
+
|
| 31 |
+
**User Story:** As an AI assistant, I want to access dataset samples through MCP tools, so that I can perform analysis on actual data content.
|
| 32 |
+
|
| 33 |
+
#### Acceptance Criteria
|
| 34 |
+
|
| 35 |
+
1. WHEN a dataset sample is requested, THE MCP Server SHALL return a configurable number of rows from the specified dataset
|
| 36 |
+
2. THE MCP Server SHALL support sampling from different dataset splits (train, validation, test)
|
| 37 |
+
3. WHERE a specific configuration is specified, THE MCP Server SHALL return samples from that configuration
|
| 38 |
+
4. THE MCP Server SHALL handle large datasets efficiently by implementing appropriate streaming and sampling strategies
|
| 39 |
+
|
| 40 |
+
### Requirement 3
|
| 41 |
+
|
| 42 |
+
**User Story:** As a developer, I want the MCP server to follow standard MCP protocols, so that it can integrate seamlessly with MCP-compatible AI systems.
|
| 43 |
+
|
| 44 |
+
#### Acceptance Criteria
|
| 45 |
+
|
| 46 |
+
1. THE MCP Server SHALL implement the standard MCP protocol for tool discovery and execution
|
| 47 |
+
2. THE MCP Server SHALL provide proper tool schemas and descriptions for all available EDA functions
|
| 48 |
+
3. THE MCP Server SHALL handle MCP requests and responses according to protocol specifications
|
| 49 |
+
4. THE MCP Server SHALL support graceful error handling and return appropriate MCP error responses
|
| 50 |
+
|
| 51 |
+
### Requirement 4
|
| 52 |
+
|
| 53 |
+
**User Story:** As a data analyst, I want basic exploratory analysis tools, so that I can quickly understand dataset characteristics and quality.
|
| 54 |
+
|
| 55 |
+
#### Acceptance Criteria
|
| 56 |
+
|
| 57 |
+
1. THE MCP Server SHALL provide tools to analyze dataset feature distributions and statistics
|
| 58 |
+
2. THE MCP Server SHALL identify missing values and data quality issues in dataset samples
|
| 59 |
+
3. THE MCP Server SHALL generate summary statistics for numerical and categorical features
|
| 60 |
+
4. WHERE applicable, THE MCP Server SHALL detect and report potential data anomalies or inconsistencies
|
.kiro/specs/hf-eda-mcp-server/tasks.md
ADDED
|
@@ -0,0 +1,102 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Implementation Plan
|
| 2 |
+
|
| 3 |
+
- [x] 1. Set up project structure and dependencies
|
| 4 |
+
- Create package directory structure following Python best practices
|
| 5 |
+
- Configure pyproject.toml with required dependencies (gradio, datasets, huggingface_hub)
|
| 6 |
+
- Set up basic package initialization files
|
| 7 |
+
- _Requirements: 3.1, 4.1, 4.2_
|
| 8 |
+
|
| 9 |
+
- [ ] 2. Implement HuggingFace integration layer
|
| 10 |
+
- [ ] 2.1 Create HuggingFace client wrapper
|
| 11 |
+
- Write HfClient class to handle authentication and API interactions
|
| 12 |
+
- Implement dataset info retrieval using huggingface_hub
|
| 13 |
+
- Add error handling for authentication and network issues
|
| 14 |
+
- _Requirements: 1.2, 4.3_
|
| 15 |
+
|
| 16 |
+
- [ ] 2.2 Implement dataset service with caching
|
| 17 |
+
- Create DatasetService class for centralized dataset operations
|
| 18 |
+
- Add metadata caching to reduce API calls
|
| 19 |
+
- Implement dataset loading and sampling functionality
|
| 20 |
+
- _Requirements: 1.1, 2.1, 2.2_
|
| 21 |
+
|
| 22 |
+
- [ ] 3. Create core EDA tools
|
| 23 |
+
- [ ] 3.1 Implement dataset metadata tool
|
| 24 |
+
- Write get_dataset_metadata function to retrieve comprehensive dataset information
|
| 25 |
+
- Format metadata response with dataset size, features, splits, and configuration details
|
| 26 |
+
- Handle multi-configuration datasets appropriately
|
| 27 |
+
- _Requirements: 1.1, 1.3, 1.4_
|
| 28 |
+
|
| 29 |
+
- [ ] 3.2 Implement dataset sampling tool
|
| 30 |
+
- Create get_dataset_sample function for retrieving dataset samples
|
| 31 |
+
- Support different splits (train, validation, test) and configurable sample sizes
|
| 32 |
+
- Implement efficient sampling strategies for large datasets
|
| 33 |
+
- _Requirements: 2.1, 2.2, 2.3_
|
| 34 |
+
|
| 35 |
+
- [ ] 3.3 Implement basic analysis tool
|
| 36 |
+
- Write analyze_dataset_features function for exploratory data analysis
|
| 37 |
+
- Generate feature statistics, missing value analysis, and data quality insights
|
| 38 |
+
- Handle different data types (numerical, categorical, text) appropriately
|
| 39 |
+
- _Requirements: 5.1, 5.2, 5.3, 5.4_
|
| 40 |
+
|
| 41 |
+
- [ ] 4. Create Gradio interfaces and MCP server
|
| 42 |
+
- [ ] 4.1 Design Gradio interfaces for each EDA tool
|
| 43 |
+
- Create Gradio interface for metadata retrieval with appropriate input/output components
|
| 44 |
+
- Build interface for dataset sampling with split and sample size controls
|
| 45 |
+
- Design interface for feature analysis with configuration options
|
| 46 |
+
- _Requirements: 3.1, 3.2_
|
| 47 |
+
|
| 48 |
+
- [ ] 4.2 Implement main Gradio application
|
| 49 |
+
- Create main Gradio app that combines all EDA tool interfaces
|
| 50 |
+
- Enable MCP server functionality using Gradio's built-in MCP support
|
| 51 |
+
- Configure proper tool descriptions and schemas for MCP exposure
|
| 52 |
+
- _Requirements: 3.1, 3.2, 3.3_
|
| 53 |
+
|
| 54 |
+
- [ ] 4.3 Add server configuration and startup
|
| 55 |
+
- Implement server launch function with configurable parameters
|
| 56 |
+
- Add environment variable support for authentication and configuration
|
| 57 |
+
- Include proper logging and error handling for server operations
|
| 58 |
+
- _Requirements: 4.1, 4.2, 4.4_
|
| 59 |
+
|
| 60 |
+
- [ ] 5. Implement error handling and validation
|
| 61 |
+
- [ ] 5.1 Add input validation for all tools
|
| 62 |
+
- Validate dataset identifiers and configuration names
|
| 63 |
+
- Check split names and sample size parameters
|
| 64 |
+
- Provide helpful error messages for invalid inputs
|
| 65 |
+
- _Requirements: 1.2, 2.1_
|
| 66 |
+
|
| 67 |
+
- [ ] 5.2 Implement comprehensive error handling
|
| 68 |
+
- Handle dataset not found errors with suggestions
|
| 69 |
+
- Manage authentication errors for private datasets
|
| 70 |
+
- Add retry logic for network and API failures
|
| 71 |
+
- _Requirements: 1.2, 4.3_
|
| 72 |
+
|
| 73 |
+
- [ ]* 5.3 Write unit tests for core functionality
|
| 74 |
+
- Create tests for HuggingFace client and dataset service
|
| 75 |
+
- Test EDA tools with mock datasets and various edge cases
|
| 76 |
+
- Validate error handling and input validation logic
|
| 77 |
+
- _Requirements: 1.1, 2.1, 5.1_
|
| 78 |
+
|
| 79 |
+
- [ ] 6. Integration and deployment setup
|
| 80 |
+
- [ ] 6.1 Create main entry point and CLI
|
| 81 |
+
- Implement main module for running the server
|
| 82 |
+
- Add command-line interface for server configuration
|
| 83 |
+
- Include help documentation and usage examples
|
| 84 |
+
- _Requirements: 4.1, 4.2_
|
| 85 |
+
|
| 86 |
+
- [ ] 6.2 Add deployment configuration
|
| 87 |
+
- Create configuration for HuggingFace Spaces deployment
|
| 88 |
+
- Add Docker configuration for containerized deployment
|
| 89 |
+
- Include MCP client configuration examples
|
| 90 |
+
- _Requirements: 4.1, 4.2_
|
| 91 |
+
|
| 92 |
+
- [ ]* 6.3 Write integration tests
|
| 93 |
+
- Test end-to-end workflows from MCP client perspective
|
| 94 |
+
- Validate Gradio interface functionality and MCP tool exposure
|
| 95 |
+
- Test with real HuggingFace datasets for integration validation
|
| 96 |
+
- _Requirements: 3.1, 3.2, 3.3_
|
| 97 |
+
|
| 98 |
+
- [ ]* 7. Documentation and examples
|
| 99 |
+
- Create comprehensive README with installation and usage instructions
|
| 100 |
+
- Add example MCP client configurations for popular clients
|
| 101 |
+
- Include API documentation for all available tools
|
| 102 |
+
- _Requirements: 4.2_
|
.kiro/steering/product.md
ADDED
|
@@ -0,0 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Product Overview
|
| 2 |
+
|
| 3 |
+
**hf-eda-mcp** is an MCP (Model Context Protocol) server that provides tools for Exploratory Data Analysis (EDA) of HuggingFace datasets.
|
| 4 |
+
|
| 5 |
+
## Purpose
|
| 6 |
+
- Enables AI assistants to perform data analysis on HuggingFace datasets
|
| 7 |
+
- Provides structured tools for dataset exploration and visualization
|
| 8 |
+
- Integrates with MCP-compatible AI systems for seamless data analysis workflows
|
| 9 |
+
|
| 10 |
+
## Target Users
|
| 11 |
+
- Data scientists and researchers working with HuggingFace datasets
|
| 12 |
+
- AI developers building applications that need dataset analysis capabilities
|
| 13 |
+
- Anyone needing programmatic access to dataset exploration tools
|
| 14 |
+
|
| 15 |
+
## License
|
| 16 |
+
Apache License 2.0 - Open source project encouraging community contributions and commercial use.
|
.kiro/steering/structure.md
ADDED
|
@@ -0,0 +1,49 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Project Structure
|
| 2 |
+
|
| 3 |
+
## Current Organization
|
| 4 |
+
```
|
| 5 |
+
hf-eda-mcp/
|
| 6 |
+
├── .git/ # Git version control
|
| 7 |
+
├── .kiro/ # Kiro AI assistant configuration
|
| 8 |
+
│ ├── settings/ # Kiro settings (MCP config, etc.)
|
| 9 |
+
│ └── steering/ # AI guidance documents
|
| 10 |
+
├── .vscode/ # VSCode configuration
|
| 11 |
+
├── .gitignore # Python-focused gitignore
|
| 12 |
+
├── LICENSE # Apache 2.0 license
|
| 13 |
+
└── README.md # Project documentation
|
| 14 |
+
```
|
| 15 |
+
|
| 16 |
+
## Expected Structure (for MCP server)
|
| 17 |
+
Based on MCP server conventions, the project will likely expand to:
|
| 18 |
+
|
| 19 |
+
```
|
| 20 |
+
hf-eda-mcp/
|
| 21 |
+
├── scripts/
|
| 22 |
+
├── src/
|
| 23 |
+
│ └── hf_eda_mcp/ # Main package directory
|
| 24 |
+
│ ├── __init__.py # Package initialization
|
| 25 |
+
│ ├── server.py # MCP server implementation
|
| 26 |
+
│ └── tools/ # EDA tool implementations
|
| 27 |
+
├── tests/ # Test suite
|
| 28 |
+
├── docs/ # Documentation
|
| 29 |
+
├── requirements.txt # Dependencies (or pyproject.toml)
|
| 30 |
+
└── setup.py # Package setup (or pyproject.toml)
|
| 31 |
+
```
|
| 32 |
+
|
| 33 |
+
## Naming Conventions
|
| 34 |
+
- **Package/Module**: Snake_case (hf_eda_mcp)
|
| 35 |
+
- **Classes**: PascalCase (DatasetAnalyzer)
|
| 36 |
+
- **Functions/Variables**: Snake_case (analyze_dataset)
|
| 37 |
+
- **Constants**: UPPER_SNAKE_CASE (DEFAULT_BATCH_SIZE)
|
| 38 |
+
|
| 39 |
+
## File Organization Principles
|
| 40 |
+
- Keep MCP tools modular and focused
|
| 41 |
+
- Separate data processing logic from MCP server logic
|
| 42 |
+
- Use clear, descriptive names for EDA functions
|
| 43 |
+
- Group related analysis tools together
|
| 44 |
+
- Follow Python package structure best practices
|
| 45 |
+
|
| 46 |
+
## Configuration Files
|
| 47 |
+
- Use `.kiro/settings/mcp.json` for MCP server configuration
|
| 48 |
+
- Environment variables for sensitive data (API keys, etc.)
|
| 49 |
+
- Support multiple dependency management systems
|
.kiro/steering/tech.md
ADDED
|
@@ -0,0 +1,36 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Technology Stack
|
| 2 |
+
|
| 3 |
+
## Primary Technologies
|
| 4 |
+
- **Python**: Core programming language
|
| 5 |
+
- **MCP (Model Context Protocol)**: Server framework for AI tool integration
|
| 6 |
+
- **HuggingFace**: Dataset ecosystem and APIs
|
| 7 |
+
|
| 8 |
+
## Development Environment
|
| 9 |
+
- **Python Package Management**: Supports multiple managers (pdm)
|
| 10 |
+
- **Virtual Environments**: Standard Python venv/virtualenv workflow
|
| 11 |
+
- **IDE**: VSCode with Kiro agent integration
|
| 12 |
+
|
| 13 |
+
## Key Dependencies
|
| 14 |
+
- HuggingFace libraries (datasets, transformers ecosystem, gradio)
|
| 15 |
+
- MCP server framework (gradio)
|
| 16 |
+
- Data analysis libraries (likely pandas, numpy, matplotlib/seaborn for EDA)
|
| 17 |
+
|
| 18 |
+
## Common Commands
|
| 19 |
+
```bash
|
| 20 |
+
# Environment setup
|
| 21 |
+
pdm sync
|
| 22 |
+
|
| 23 |
+
# Testing
|
| 24 |
+
pytest
|
| 25 |
+
# OR
|
| 26 |
+
python -m pytest
|
| 27 |
+
|
| 28 |
+
# Linting and formatting
|
| 29 |
+
ruff check .
|
| 30 |
+
ruff format .
|
| 31 |
+
```
|
| 32 |
+
|
| 33 |
+
## MCP Integration
|
| 34 |
+
- Designed to run as an MCP server
|
| 35 |
+
- Provides tools accessible to MCP-compatible AI systems
|
| 36 |
+
- Configuration through standard MCP server protocols
|
.vscode/settings.json
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"kiroAgent.configureMCP": "Enabled"
|
| 3 |
+
}
|
pdm.lock
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
pyproject.toml
ADDED
|
@@ -0,0 +1,41 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
[project]
|
| 2 |
+
name = "hf-eda-mcp"
|
| 3 |
+
version = "0.1.0"
|
| 4 |
+
description = "MCP server for EDA on HuggingFace datasets"
|
| 5 |
+
authors = [
|
| 6 |
+
{name = "Khalil Guetari", email = "khalil.guetari@momentslab.com"},
|
| 7 |
+
]
|
| 8 |
+
dependencies = [
|
| 9 |
+
"gradio>=5.49.1",
|
| 10 |
+
"datasets>=4.3.0",
|
| 11 |
+
"huggingface_hub>=0.20.0",
|
| 12 |
+
"pydantic>=2.0.0",
|
| 13 |
+
"pandas>=2.0.0",
|
| 14 |
+
"numpy>=1.24.0"
|
| 15 |
+
]
|
| 16 |
+
requires-python = ">=3.13"
|
| 17 |
+
readme = "README.md"
|
| 18 |
+
license = {text = "Apache-2.0"}
|
| 19 |
+
|
| 20 |
+
[build-system]
|
| 21 |
+
requires = ["pdm-backend"]
|
| 22 |
+
build-backend = "pdm.backend"
|
| 23 |
+
|
| 24 |
+
|
| 25 |
+
[project.scripts]
|
| 26 |
+
hf-eda-mcp = "hf_eda_mcp.server:launch_server"
|
| 27 |
+
|
| 28 |
+
[tool.pdm]
|
| 29 |
+
distribution = true
|
| 30 |
+
|
| 31 |
+
[tool.pdm.dev-dependencies]
|
| 32 |
+
test = [
|
| 33 |
+
"pytest>=7.0.0",
|
| 34 |
+
"pytest-asyncio>=0.21.0",
|
| 35 |
+
"pytest-cov>=4.0.0"
|
| 36 |
+
]
|
| 37 |
+
lint = [
|
| 38 |
+
"ruff>=0.1.0",
|
| 39 |
+
"black>=23.0.0",
|
| 40 |
+
"mypy>=1.0.0"
|
| 41 |
+
]
|
src/hf_eda_mcp/__init__.py
ADDED
|
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
HuggingFace EDA MCP Server package.
|
| 3 |
+
|
| 4 |
+
A Model Context Protocol (MCP) server that provides tools for
|
| 5 |
+
Exploratory Data Analysis (EDA) of datasets hosted on HuggingFace.
|
| 6 |
+
"""
|
| 7 |
+
|
| 8 |
+
from .server import create_gradio_app, launch_server
|
| 9 |
+
|
| 10 |
+
__version__ = "0.1.0"
|
| 11 |
+
__all__ = ["create_gradio_app", "launch_server"]
|
src/hf_eda_mcp/__main__.py
ADDED
|
@@ -0,0 +1,46 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Main entry point for the hf-eda-mcp server.
|
| 3 |
+
|
| 4 |
+
This module allows the package to be run as a module using:
|
| 5 |
+
python -m hf_eda_mcp
|
| 6 |
+
"""
|
| 7 |
+
|
| 8 |
+
import argparse
|
| 9 |
+
import sys
|
| 10 |
+
from .server import launch_server
|
| 11 |
+
|
| 12 |
+
|
| 13 |
+
def main():
|
| 14 |
+
"""Main entry point with command line argument parsing."""
|
| 15 |
+
parser = argparse.ArgumentParser(
|
| 16 |
+
description="HuggingFace EDA MCP Server",
|
| 17 |
+
prog="hf-eda-mcp"
|
| 18 |
+
)
|
| 19 |
+
|
| 20 |
+
parser.add_argument(
|
| 21 |
+
"--port",
|
| 22 |
+
type=int,
|
| 23 |
+
default=7860,
|
| 24 |
+
help="Port to run the server on (default: 7860)"
|
| 25 |
+
)
|
| 26 |
+
|
| 27 |
+
parser.add_argument(
|
| 28 |
+
"--no-mcp",
|
| 29 |
+
action="store_true",
|
| 30 |
+
help="Disable MCP server functionality"
|
| 31 |
+
)
|
| 32 |
+
|
| 33 |
+
args = parser.parse_args()
|
| 34 |
+
|
| 35 |
+
try:
|
| 36 |
+
launch_server(port=args.port, mcp_server=not args.no_mcp)
|
| 37 |
+
except KeyboardInterrupt:
|
| 38 |
+
print("\nServer stopped by user")
|
| 39 |
+
sys.exit(0)
|
| 40 |
+
except Exception as e:
|
| 41 |
+
print(f"Error starting server: {e}")
|
| 42 |
+
sys.exit(1)
|
| 43 |
+
|
| 44 |
+
|
| 45 |
+
if __name__ == "__main__":
|
| 46 |
+
main()
|
src/hf_eda_mcp/integrations/__init__.py
ADDED
|
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Integration module for external services.
|
| 3 |
+
|
| 4 |
+
This package contains integration classes for HuggingFace Hub
|
| 5 |
+
and other external services.
|
| 6 |
+
"""
|
| 7 |
+
|
| 8 |
+
__all__ = []
|
src/hf_eda_mcp/integrations/hf_client.py
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
HuggingFace client wrapper for API interactions.
|
| 3 |
+
|
| 4 |
+
This module will be implemented in task 2.1.
|
| 5 |
+
"""
|
| 6 |
+
|
| 7 |
+
# Placeholder - will be implemented in task 2.1
|
src/hf_eda_mcp/server.py
ADDED
|
@@ -0,0 +1,30 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Main Gradio application with MCP server functionality.
|
| 3 |
+
|
| 4 |
+
This module provides the main entry point for the hf-eda-mcp server,
|
| 5 |
+
creating Gradio interfaces for EDA tools and enabling MCP server functionality.
|
| 6 |
+
"""
|
| 7 |
+
|
| 8 |
+
import gradio as gr
|
| 9 |
+
|
| 10 |
+
|
| 11 |
+
def create_gradio_app() -> gr.Blocks:
|
| 12 |
+
"""Create and configure the main Gradio application with MCP server."""
|
| 13 |
+
# Placeholder implementation - will be expanded in later tasks
|
| 14 |
+
with gr.Blocks(title="HF EDA MCP Server") as app:
|
| 15 |
+
gr.Markdown("# HuggingFace EDA MCP Server")
|
| 16 |
+
gr.Markdown("MCP server for exploratory data analysis of HuggingFace datasets.")
|
| 17 |
+
|
| 18 |
+
return app
|
| 19 |
+
|
| 20 |
+
|
| 21 |
+
def launch_server(port: int = 7860, mcp_server: bool = True) -> None:
|
| 22 |
+
"""Launch the Gradio app with MCP server enabled."""
|
| 23 |
+
app = create_gradio_app()
|
| 24 |
+
|
| 25 |
+
# Launch with MCP server enabled
|
| 26 |
+
app.launch(server_port=port, share=False, show_error=True)
|
| 27 |
+
|
| 28 |
+
|
| 29 |
+
if __name__ == "__main__":
|
| 30 |
+
launch_server()
|
src/hf_eda_mcp/services/__init__.py
ADDED
|
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Services module for dataset operations and integrations.
|
| 3 |
+
|
| 4 |
+
This package contains service classes for dataset management, caching,
|
| 5 |
+
and external API integrations.
|
| 6 |
+
"""
|
| 7 |
+
|
| 8 |
+
__all__ = []
|
src/hf_eda_mcp/services/dataset_service.py
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Dataset service for centralized dataset operations and caching.
|
| 3 |
+
|
| 4 |
+
This module will be implemented in task 2.2.
|
| 5 |
+
"""
|
| 6 |
+
|
| 7 |
+
# Placeholder - will be implemented in task 2.2
|
src/hf_eda_mcp/tools/__init__.py
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
EDA tools module for HuggingFace datasets.
|
| 3 |
+
|
| 4 |
+
This package contains individual EDA functions that will be exposed as MCP tools.
|
| 5 |
+
"""
|
| 6 |
+
|
| 7 |
+
__all__ = []
|
src/hf_eda_mcp/tools/analysis.py
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Basic analysis tool for exploratory data analysis.
|
| 3 |
+
|
| 4 |
+
This module will be implemented in task 3.3.
|
| 5 |
+
"""
|
| 6 |
+
|
| 7 |
+
# Placeholder - will be implemented in task 3.3
|
src/hf_eda_mcp/tools/metadata.py
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Dataset metadata tool for retrieving HuggingFace dataset information.
|
| 3 |
+
|
| 4 |
+
This module will be implemented in task 3.1.
|
| 5 |
+
"""
|
| 6 |
+
|
| 7 |
+
# Placeholder - will be implemented in task 3.1
|
src/hf_eda_mcp/tools/sampling.py
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Dataset sampling tool for retrieving dataset samples.
|
| 3 |
+
|
| 4 |
+
This module will be implemented in task 3.2.
|
| 5 |
+
"""
|
| 6 |
+
|
| 7 |
+
# Placeholder - will be implemented in task 3.2
|
tests/__init__.py
ADDED
|
File without changes
|