Spaces:

MCP-1st-Birthday
/

cluas_huginn_by_eightzerofoursix

Running

App Files Files Community

Diomedes Git commited on 14 days ago

Commit

a79e071

1 Parent(s): 850fc8d

added cursor.md files to root and sub folders, deleted some stuff

Browse files

Files changed (10) hide show

dev_diary.md +0 -109
phase2.md +0 -84
phase3.md +0 -34
project_analysis.md +0 -55
src/cluas_mcp/common/api_clients.py +0 -149
src/cluas_mcp/common/cache.py +0 -20
src/cluas_mcp/common/cursor.md +0 -2
steps_taken.md +0 -13
tests/olderidea.py +0 -395
ticket_list.md +0 -67

dev_diary.md DELETED Viewed

@@ -1,109 +0,0 @@
-# Development Diary
-## 2024-01-XX - MVP Character Skeletons and Gradio Group Chat
-### Goal
-Create working skeletons for the three missing characters (Magpie, Raven, Crow) with placeholder tools, expand the MCP server to handle all tools, and build a Gradio group chat interface for the hackathon demo.
-### Implementation
-**Character Skeletons Created:**
-- **Magpie** (Sanguine temperament): Enthusiastic trend-spotter with tools: `search_web`, `find_trending_topics`, `get_quick_facts`
-- **Raven** (Choleric temperament): Passionate activist with tools: `search_news`, `get_environmental_data`, `verify_claim`
-- **Crow** (Phlegmatic temperament): Calm observer with tools: `get_bird_sightings`, `get_weather_patterns`, `analyze_temporal_patterns`
-All characters follow the Corvus pattern with Groq client setup, system prompts matching their temperaments, and stub `respond()` methods returning mock messages for MVP.
-**Tool Entrypoints:**
-Created three new entrypoint modules grouped by tool type:
-- `src/cluas_mcp/web/web_search_entrypoint.py` - 3 functions
-- `src/cluas_mcp/news/news_search_entrypoint.py` - 3 functions
-- `src/cluas_mcp/observation/observation_entrypoint.py` - 3 functions
-All return structured mock data matching expected real response formats, with TODO comments for future full implementation.
-**MCP Server Expansion:**
-Updated `src/cluas_mcp/server.py` to:
-- List all 10 tools (9 new + academic_search) in `list_tools()`
-- Route all tool calls to appropriate entrypoints in `call_tool()`
-- Added formatting functions for all tool result types
-**Gradio Interface:**
-Built `src/gradio/app.py` with:
-- Sequential responses from all 4 characters
-- Character names and emojis displayed
-- Conversation history maintained
-- Simple, MVP-focused implementation (no async handling yet)
-### Issues Encountered and Resolved
-1. **Circular Import Issue**: `src/gradio/__init__.py` was causing circular imports. **Resolution**: Deleted the file entirely as it wasn't needed. Updated root `app.py` to import directly from `src.gradio.app`.
-2. **Import Path Inconsistencies**: Several files had incorrect import paths (missing `src.` prefix):
-   - `src/gradio/app.py` - character imports
-   - `src/cluas_mcp/academic/semantic_scholar.py`
-   - `src/cluas_mcp/academic/pubmed.py`
-   - `src/cluas_mcp/academic/arxiv.py`
-   - `src/cluas_mcp/academic/thing.py`
-   **Resolution**: Fixed all imports to use consistent `src.` prefix pattern.
-3. **Gradio API Compatibility**: `theme=gr.themes.Soft()` parameter not supported in this Gradio version. **Resolution**: Removed the theme parameter.
-4. **Gradio 6.x Migration**: The initial implementation used Gradio 5.x tuple format for chat history. **Resolution**: Migrated to Gradio 6.x messages format with structured content blocks:
-   - Changed from `List[Tuple[str, str]]` to `List[dict]` with `{"role": "user/assistant", "content": [{"type": "text", "text": "..."}]}`
-   - Updated `get_character_response()` to parse Gradio 6.x format and extract text from content blocks
-   - Updated `chat_fn()` to return messages in the new structured format
-   - Verified compatibility with Gradio 6.0.0-dev.4
-5. **Groq Tool Calling Error**: "Tool choice is none, but model called a tool" (400 error). **Resolution**: Added `tool_choice="auto"` parameter to both Corvus and Magpie's Groq API calls. This allows the model to decide whether to use tools, rather than defaulting to None which rejects tool calls.
-### Testing
-- All characters instantiate successfully
-- Character responses work (stub implementations)
-- Gradio app imports and initializes correctly
-- All imports resolve properly
-- No linter errors
-### Status
-MVP complete and working. Magpie now has full Groq integration with tool calling. Raven and Crow still have stub implementations. All placeholder tools return structured mock data. Ready for hackathon demo. Future enhancements: full tool implementations for Raven/Crow, async responses, memory functionality, tool usage indicators.
-### Character Integration Progress
-- ✅ **Corvus**: Full Groq integration with academic_search tool (working)
-- ✅ **Magpie**: Full Groq integration with 3 tools (search_web, find_trending_topics, get_quick_facts) - just implemented
-- ⏳ **Raven**: Stub implementation (needs Groq integration)
-- ⏳ **Crow**: Stub implementation (needs Groq integration)
-### Commits
-- `71f5dac` - Added character skeletons (Magpie, Raven, Crow) with placeholder tools, MCP server routes, and Gradio group chat interface
-- `1868ae1` - Fixed import paths: removed gradio __init__.py, fixed all src. imports, removed theme parameter
-- `1f44947` - Added documentation: steps_taken.md and dev_diary.md for character skeletons implementation
-- `8696667` - Migrated chat_fn to Gradio 6.x messages format with structured content blocks
-- `28718e5` - Implemented full Groq integration for Magpie with tool calling for search_web, find_trending_topics, and get_quick_facts
-__________
-    End-of-Day Progress Report (2025-11-21, 23:26)
-    Accomplished
-    Feature branch merged
-    Merged stable feature branch (character skeletons, MCP server tools, Gradio 6.x compatibility) into main; successfully pulled remote updates.
-    Groq integration for Magpie
-    Implemented complete Groq integration for Magpie, including tool-calling logic for all tools: search_web, find_trending_topics, get_quick_facts.
-    Added helper formatting and async support for responses.
-    Documented individual steps and committed granularly.
-    Fixed Groq API tool_use error
-    Resolved 400 error ("Tool choice is none, but model called a tool") by adding tool_choice="auto" for both Corvus and Magpie. Now both characters can call their tools from the LLM.
-    Documentation
-    Updated dev diary and steps_taken.md for each significant change.
-    Maintained clean commit history (granular, logical commits).
-    Current Status
-    Corvus & Magpie: Working Groq integration & tool calls.
-    MCP: All 10 tools listed, tool MPC routing correct.
-    Gradio Chat: All 4 characters visible, system runs.
-    Raven & Crow: Still stubs (to be upgraded).
-    No uncommitted changes; project state is stable, demo-ready for MVP.
-    Next Steps
-    Bring Raven and Crow up to parity (Groq+tools).
-    Continue UI/polish and E2E testing.
-    Prepare for next rounds of hackathon demo/test/iteration.
-________________

phase2.md DELETED Viewed

@@ -1,84 +0,0 @@
-# Phase 2: Hackathon Preparation Plan
-This document outlines a practical, 6-day plan to take the "Cluas" project from its current state to a complete, polished, and shareable project for the hackathon. The focus is on stability, user experience, and clear presentation.
----
-## **Key Areas of Focus**
-1.  **User Experience (UX) & Interface Polish:** First impressions are critical. A clean, intuitive, and responsive UI will make the project stand out.
-2.  **Core Functionality Hardening:** The main features must be robust and handle errors gracefully.
-3.  **Deployment & Accessibility:** Judges need to be able to access and use the project with minimal friction.
-4.  **Presentation & Documentation:** A compelling narrative and clear instructions are as important as the code itself.
----
-## **6-Day Action Plan**
-### **Days 1-2: UI/UX & Core Logic Refinement**
-The goal for these two days is to enhance the front-end experience and make the backend more resilient.
-*   **UI Polish:**
-    *   **Task:** Review the Gradio interface for clarity. Can a new user understand it immediately?
-    *   **Task:** Implement loading indicators for long-running processes (e.g., when the AI council is "thinking"). This provides crucial feedback to the user.
-    *   **Task:** Add a clear "About" or "How to Use" section within the Gradio app. Explain the roles of the different corvids.
-    *   **Task:** Improve the visual separation of messages from different agents. Use icons, labels, or colors to indicate who is speaking.
-*   **Error Handling:**
-    *   **Task:** Implement graceful error handling for external API failures (e.g., Groq, academic search tools). The app should display a user-friendly message like "Sorry, I couldn't fetch that information. Please try again." instead of crashing.
-    *   **Task:** Add basic input validation to prevent trivial errors.
-*   **Conversation Flow:**
-    *   **Task:** Review the final synthesized answer. Ensure it's well-formatted and clearly presented as the culmination of the council's discussion.
-### **Days 3-4: Deployment & Testing**
-The focus now shifts to making the project accessible and finding any remaining bugs.
-*   **Deployment:**
-    *   **Task:** Choose a deployment platform. **Hugging Face Spaces** is an excellent and free choice for Gradio applications.
-    *   **Task:** Verify that `pyproject.toml` and `requirements.txt` contain all necessary dependencies for a clean installation.
-    *   **Task:** Create an `.env.example` file to show what environment variables are needed (like `GROQ_API_KEY`), but without a real key.
-    *   **Task:** Write clear, step-by-step deployment instructions in the `README.md`.
-    *   **Task:** Deploy a live version of the app and test it thoroughly.
-*   **End-to-End Testing:**
-    *   **Task:** Manually run through 5-10 complex user queries. Try to break the application.
-    *   **Task:** Ask a friend or colleague (ideally non-technical) to use the app. Watch how they interact with it and gather feedback. Fresh eyes will find issues you've become blind to.
-    *   **Task:** Fix any critical bugs discovered during this testing phase.
-### **Day 5: Documentation & Presentation**
-With a stable, deployed app, the focus is on crafting the project's story.
-*   **README.md Overhaul:**
-    *   **Task:** Update the `README.md` to be a comprehensive guide for a hackathon judge. It should be the central hub of your project.
-    *   **Task:** Add a compelling one-paragraph project pitch at the top. What is Cluas, and why is it cool?
-    *   **Task:** **Create and embed a short demo video or GIF** showing the app in action. This is the single most effective way to communicate your project's value.
-    *   **Task:** Add a clear "Getting Started" section for running the project locally.
-    *   **Task:** Include a prominent link to the live demo you deployed on Day 4.
-    *   **Task:** Add a brief "Technology Stack" section listing the key frameworks and APIs used.
-*   **Prepare Presentation Materials:**
-    *   **Task:** Create a short slide deck (5-7 slides) or a 2-minute video script explaining the project.
-    *   **Task:** Focus on the **Problem**, the **Solution (Your App)**, and the **Technology**.
-    *   **Task:** Practice your pitch. Be ready to explain your project clearly and concisely.
-### **Day 6: Final Polish & Submission**
-This is the last mile. No new features, just refinement.
-*   **Final Code Freeze:**
-    *   **Task:** Stop adding new features. Only commit critical, show-stopping bug fixes.
-    *   **Task:** Clean up the codebase: remove commented-out code, add docstrings to key functions, and ensure consistent formatting.
-*   **Review Submission Requirements:**
-    *   **Task:** Double-check all the hackathon rules and submission requirements. Don't be disqualified on a technicality.
-*   **Final Polish:**
-    *   **Task:** Do one last end-to-end test of the live demo.
-    *   **Task:** Proofread all your documentation (`README.md`, presentation).
-*   **Submit!**
-    *   **Task:** Submit your project with confidence. Good luck!

phase3.md DELETED Viewed

@@ -1,34 +0,0 @@
-# Phase 3 Analysis of the Cluas Project
-This document provides an analysis of the Cluas project's current state. The analysis is based on a review of the project's documentation, source code, and tests.
-## Overall Impressions
-The Cluas project is in a very strong state. It has a clear and compelling vision, a well-designed modular architecture, and a solid implementation of its core features. The project is well on its way to achieving its goal of creating a "dialectic research tool" where AI agents collaborate to answer user questions.
-## Strengths
-*   **Strong Concept and Vision:** The project's goal of creating a multi-agent AI research tool with memory and collaborative capabilities is both ambitious and well-defined. The `README.md` and `GEMINI.md` files do an excellent job of articulating this vision.
-*   **Excellent Modular Architecture:** The codebase is well-organized and easy to understand. The separation of concerns between the UI (`src/gradio`), the AI characters (`src/characters`), and the tools (`src/cluas_mcp`) is a key strength. This modularity will make it much easier to maintain and extend the project in the future.
-*   **Well-Defined Characters:** The four AI characters—Corvus, Magpie, Raven, and Crow—are well-defined with distinct personalities, roles, and tools. The use of detailed system prompts to shape the characters' behavior is very effective.
-*   **Memory Implementation:** The memory system, particularly as implemented for Corvus, is a standout feature. The ability for a character to recall past conversations and learned information is crucial to the project's vision of "research that remembers."
-*   **Robust Tool Integration:** The system for allowing characters to use external tools is well-designed. The code for handling tool calls, parsing arguments, and incorporating tool outputs into the conversation is robust and effective. The modular design of the `cluas_mcp` makes it easy to add new data sources and capabilities.
-*   **Flexible LLM Backend:** The support for both the Groq API and a local Ollama instance provides valuable flexibility for development and deployment.
-*   **Solid Testing Strategy:** The project includes a suite of tests, including integration tests that make live API calls and a structure for mocked, non-calling tests. This commitment to testing is essential for ensuring the quality and reliability of the codebase.
-## Areas for Improvement and Next Steps
-The project has a strong foundation, but there are several areas where it could be improved and extended.
-*   **Complete Character Implementations:**
-    *   **Crow:** Based on the current review, it's likely that Crow's implementation is not as complete as Corvus's and Magpie's. Flesh out Crow's personality, tools (e.g., for nature observation, pattern analysis), and response logic.
-    *   **Raven:** Similarly, ensure Raven's implementation as a "news monitor and fact-checker" is fully realized.
-*   **Full Ollama Support:** The Ollama backend is not yet fully implemented for all characters (e.g., Magpie). Completing this would provide a robust and fully-functional local development environment, which is a significant advantage.
-*   **Enhanced UI Error Handling:** While the backend has some error handling, this could be more effectively communicated to the user in the Gradio interface. For example, if a tool fails, the UI could display a clear, user-friendly message explaining what went wrong, rather than just having the character fall silent or give a generic error message.
-*   **Reduce Code Duplication:** There is some repetition in the `_respond_groq` methods of the different character classes. Consider creating a base `Character` class that abstracts some of the common logic for handling LLM responses and tool calls. This would reduce code duplication and make the character classes easier to maintain.
-*   **Structured Configuration Management:** Currently, API keys and other configuration are loaded directly from a `.env` file. As the project grows, it would be beneficial to adopt a more structured approach to configuration management. Libraries like Pydantic's settings management can provide type-safe, validated configuration objects, which can help to prevent configuration-related errors.
-*   **More Sophisticated Agent Interaction:** The current interaction model is sequential, with each character responding in a fixed order. To fully realize the "dialectic" vision, consider implementing a more dynamic interaction model. For example, a character could choose to respond to another character's statement, or a "moderator" agent could guide the conversation.
-## Conclusion
-The Cluas project is an impressive piece of work. It is a well-designed and well-implemented multi-agent AI system with a clear and compelling vision. The project's strengths far outweigh its current limitations, and it has a strong foundation for future development. By addressing the areas for improvement outlined above, the project can move even closer to its goal of creating a truly innovative and powerful research tool.

project_analysis.md DELETED Viewed

@@ -1,55 +0,0 @@
-# Project Analysis: Cluas
-## Overview
-**Cluas** is a Python-based multi-agent AI research tool designed for dialectic deliberation. The name "Cluas" is Gaelic for "ear," reflecting its purpose as a listening and information-gathering system. The project uses a "Corvid Council" of four AI agents (Corvus, Magpie, Raven, and Crow), each with a distinct persona and a specialized set of tools for research. The system facilitates a conversational research process where these agents can collaborate, debate, and build upon shared knowledge over time.
-The primary interface is a web application built with Gradio, allowing users to interact with the council in two modes: a "Collaborative Mode" for synthesized answers and an "Active Mode" for direct participation in the discussion.
-## Key Components
-### 1. Application Core
-- **`app.py`**: The main entry point for the Gradio web application.
-- **`src/gradio/app.py`**: Contains the UI and logic for the Gradio chat interface. It manages the interaction between the user and the AI characters.
-- **`src/orchestrator.py`**: This file is intended to be the central coordinator for the AI agents' interactions, managing the dialectic process, and handling shared memory. It is currently a placeholder, with the Gradio app handling basic orchestration.
-- **`src/characters/`**: This directory defines the different AI agent personas:
-    - `corvus.py`: The scholar, focused on academic research.
-    - `magpie.py`: The enthusiast, skilled at finding trends and quick facts.
-    - `raven.py`: The activist, focused on news and fact-checking.
-    - `crow.py`: The observer, specializing in environmental and temporal patterns.
-### 2. Tooling and Integrations (MCP - Multi-Component Platform)
-- **`src/cluas_mcp/server.py`**: An MCP server that exposes the various research tools to the AI agents. This allows the agents to perform actions like searching academic papers, news, and the web.
-- **`src/cluas_mcp/`**: This directory is organized by domain, with entry points for different types of searches:
-    - **`academic/`**: Integrates with ArXiv, PubMed, and Semantic Scholar.
-    - **`news/`**: Provides news search and claim verification.
-    - **`web/`**: For general web searches and trending topics.
-    - **`observation/`**: Connects to eBird for bird sighting data.
-- **`src/cluas_mcp/common/`**: Contains shared utilities for API clients, caching, and data formatting.
-### 3. Dependencies
-- **`pyproject.toml` & `requirements.txt`**: Define the project dependencies. Key libraries include:
-    - `gradio`: For the web UI.
-    - `fastmcp` & `mcp`: For the multi-agent communication and tool-serving framework.
-    - `groq`: Likely for interacting with a large language model API.
-    - `feedparser`, `requests`, `tenacity`: For fetching data from external APIs and web sources.
-    - `pytest`: For testing.
-### 4. Testing
-- **`tests/`**: The project has a testing suite with `pytest`.
-    - **`clients/`**: Contains tests for the various API clients (ArXiv, PubMed, etc.), with both live and mocked tests.
-    - **`integration/`**: Includes integration tests for the search entry points.
-## Analysis Summary
-The project is well-structured, with a clear separation of concerns between the UI (Gradio), the agent personas (characters), and the tool implementations (MCP). The use of a multi-agent system is a sophisticated approach to research, allowing for a more robust and nuanced exploration of topics.
-The `orchestrator.py` file indicates a plan for a more advanced system that can manage complex interactions and a persistent shared memory, which is the core of the "dialectic" process described in the README.
-The file `src/cluas_mcp/academic/thing.py` appears to be a temporary or test file and should be reviewed.
-Overall, Cluas is an ambitious and interesting project with a solid foundation. The immediate next steps would likely involve implementing the `orchestrator.py` to realize the full vision of a dialectic research tool.

src/cluas_mcp/common/api_clients.py DELETED Viewed

@@ -1,149 +0,0 @@
-from common.http import fetch_with_retry
-import requests
-import feedparser
-import xml.etree.ElementTree as ET
-import urllib.parse
-from typing import List, Optional
-class PubMedClient:
-    @staticmethod
-    def parse_id_list(xml: str) -> List[str]:
-        """Parse XML and return a list of PubMed IDs."""
-        try:
-            root = ET.fromstring(xml)
-        except ET.ParseError:
-            return []  # invalid XML or rate limit page
-        id_list = root.find(".//IdList")
-        if id_list is None:
-            return []
-        return [elem.text for elem in id_list.findall("Id") if elem.text]
-    @staticmethod
-    def pubmed_search(
-        keywords: List[str],
-        extra_terms: Optional[List[str]] = None,
-        retmax: int = 20,
-    ) -> List[str]:
-        """
-        Search PubMed for (keywords OR ...) AND (extra_terms OR ...).
-        Returns PubMed IDs.
-        """
-        # building grouped OR clauses
-        base = "(" + " OR ".join(keywords) + ")"
-        if extra_terms:
-            base = f"{base} AND ({' OR '.join(extra_terms)})"
-        # URL-encode the full term string
-        term = urllib.parse.quote(base)
-        url = (
-            "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi"
-            f"?db=pubmed&term={term}&retmax={retmax}&retmode=xml"
-        )
-        try:
-            response = fetch_with_retry(url)
-            response.raise_for_status()
-            return PubMedClient.parse_id_list(response.text)
-        except requests.exceptions.RequestException:
-            # log instead of print, lol
-            return []
-class SemanticScholarClient:
-    def search(self, query, max_results=5):
-        # placeholder: implement Semantic Scholar API call
-        return []
-class ArxivClient:
-    KEYWORDS = ['corvid','crow','raven','corvus','jay','magpie','jackdaw','rook','chough','nutcracker']
-    def search(self, query, max_results=5):
-        q = " OR ".join([query] + self.KEYWORDS)
-        url = (
-            f"https://export.arxiv.org/api/query?"
-            f"search_query=all:({q})&start=0&max_results={max_results}&"
-            "sortBy=lastUpdatedDate&sortOrder=descending"
-        )
-        data = requests.get(url).text
-        feed = feedparser.parse(data)
-        results = []
-        for entry in feed.entries:
-            if not getattr(entry, "summary", "").strip():
-                continue
-            results.append({
-                "title": getattr(entry, "title", "Untitled"),
-                "abstract": getattr(entry, "summary", ""),
-                "authors": [a.name for a in getattr(entry, "authors", [])],
-                "published": getattr(entry, "published", ""),
-                "arxiv_link": getattr(entry, "link", "")
-            })
-        return results
-# possible endpoint?
-# https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&retmode=json&term=corvid+AND+memory
-# from Bio import Entrez
-# Entrez.email = "your.email@domain.tld"  # required by NCBI
-# Entrez.api_key = "YOUR_KEY_IF_YOU_HAVE_ONE"
-# KEYWORDS = ['corvid','crow','raven','corvus','jay','magpie','jackdaw','rook','chough','nutcracker']
-# SECONDARY = ['memory', 'feeding']
-# term = "(" + " OR ".join(KEYWORDS) + ")" + " AND (" + " OR ".join(SECONDARY) + ")"
-# handle = Entrez.esearch(db="pubmed", term=term, retmax=100)  # adjust retmax
-# record = Entrez.read(handle)
-# ids = record["IdList"]
-# for pmid in ids:
-#     handle2 = Entrez.efetch(db="pubmed", id=pmid, retmode="xml")
-#     rec = Entrez.read(handle2)
-#     article = rec['PubmedArticle'][0]
-#     # parse title
-#     title = article['MedlineCitation']['Article']['ArticleTitle']
-#     # parse authors
-#     authors = article['MedlineCitation']['Article']['AuthorList']
-#     first_author = authors[0]['LastName'] + ", " + authors[0]['ForeName']
-#     author_str = first_author + (", et al" if len(authors) > 1 else "")
-#     # parse abstract
-#     abstract = ""
-#     if 'Abstract' in article['MedlineCitation']['Article']:
-#         abstract = " ".join([x for x in article['MedlineCitation']['Article']['Abstract']['AbstractText']])
-#     # parse DOI
-#     doi = None
-#     for aid in article['PubmedData']['ArticleIdList']:
-#         if aid.attributes['IdType'] == 'doi':
-#             doi = str(aid)
-#     # parse a “conclusion” if structured abstract includes it
-#     conclusion = None
-#     # one simple heuristic: look for segments labeled 'CONCLUSION' in structured abstract
-#     if 'Abstract' in article['MedlineCitation']['Article']:
-#         for sec in article['MedlineCitation']['Article']['Abstract']['AbstractText']:
-#             if hasattr(sec, "attributes") and sec.attributes.get('Label', '').upper() == 'CONCLUSION':
-#                 conclusion = str(sec)
-#     # fallback: maybe take the last sentence of abstract
-#     if conclusion is None and abstract:
-#         conclusion = abstract.split('.')[-2] + '.'
-#     # now you have doi, title, author_str, abstract, conclusion
-#     print(pmid, doi, title, author_str, conclusion)
-# f'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term={query}'
-#         f'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id={id[0]}&retmode=xml&rettype=abstract'

src/cluas_mcp/common/cache.py DELETED Viewed

@@ -1,20 +0,0 @@
-import json
-from pathlib import Path
-class CacheManager:
-    def __init__(self, cache_file: str):
-        self.cache_file = Path(cache_file)
-        if not self.cache_file.exists():
-            self.cache_file.write_text(json.dumps({}))
-    def get(self, query: str):
-        with open(self.cache_file, "r") as f:
-            data = json.load(f)
-        return data.get(query)
-    def set(self, query: str, results):
-        with open(self.cache_file, "r") as f:
-            data = json.load(f)
-        data[query] = results
-        with open(self.cache_file, "w") as f:
-            json.dump(data, f, indent=2)

src/cluas_mcp/common/cursor.md CHANGED Viewed

@@ -7,8 +7,6 @@ Shared helpers: http, cache, memory, formatting.
 - Preserve helper interfaces.
 # Important files
-- api_clients.py
-- cache.py
 - formatting.py
 - http.py
 - memory.py

 - Preserve helper interfaces.
 # Important files
 - formatting.py
 - http.py
 - memory.py

steps_taken.md DELETED Viewed

@@ -1,13 +0,0 @@
-# Steps Taken
-## 2024-01-XX - Character Skeletons and Gradio Chat Implementation
-1. Created character skeletons for Magpie, Raven, and Crow following Corvus pattern
-2. Created tool entrypoint stubs grouped by type (web, news, observation) with structured mock data
-3. Updated MCP server to route all 9 new tools plus existing academic_search
-4. Built Gradio group chat interface with sequential character responses
-5. Fixed import paths: removed gradio __init__.py, fixed all src. imports, removed unsupported theme parameter
-6. Tested and verified all characters instantiate and respond correctly
-7. Migrated chat_fn to Gradio 6.x messages format with structured content blocks (per Gradio 6 migration guide)
-8. Implemented full Groq integration for Magpie with tool calling (search_web, find_trending_topics, get_quick_facts)

tests/olderidea.py DELETED Viewed

@@ -1,395 +0,0 @@
-import requests
-import feedparser
-import xml.etree.ElementTree as ET
-import urllib.parse
-from typing import List, Optional, Dict, Any
-import logging
-from http import fetch_with_retry
-logger = logging.getLogger(__name__)
-class PubMedClient:
-    """Client for searching and fetching articles from PubMed."""
-    BASE_SEARCH_URL = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi"
-    BASE_FETCH_URL = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi"
-    @staticmethod
-    def parse_id_list(xml: str) -> List[str]:
-        """Parse XML and return a list of PubMed IDs."""
-        try:
-            root = ET.fromstring(xml)
-        except ET.ParseError as e:
-            logger.error(f"Failed to parse ID list XML: {e}")
-            return []
-        id_list = root.find(".//IdList")
-        if id_list is None:
-            return []
-        return [elem.text for elem in id_list.findall("Id") if elem.text]
-    @staticmethod
-    def parse_articles(xml: str) -> List[Dict[str, Any]]:
-        """Parse PubMed article XML into structured data."""
-        try:
-            root = ET.fromstring(xml)
-        except ET.ParseError as e:
-            logger.error(f"Failed to parse articles XML: {e}")
-            return []
-        articles = []
-        for article_elem in root.findall(".//PubmedArticle"):
-            try:
-                article = PubMedClient._parse_single_article(article_elem)
-                if article:
-                    articles.append(article)
-            except Exception as e:
-                logger.warning(f"Failed to parse article: {e}")
-                continue
-        return articles
-    @staticmethod
-    def _parse_single_article(article_elem: ET.Element) -> Optional[Dict[str, Any]]:
-        """Parse a single PubMed article element."""
-        medline = article_elem.find(".//MedlineCitation")
-        if medline is None:
-            return None
-        article_data = medline.find(".//Article")
-        if article_data is None:
-            return None
-        # extract PMID
-        pmid_elem = medline.find(".//PMID")
-        pmid = pmid_elem.text if pmid_elem is not None else None
-        # extract title
-        title_elem = article_data.find(".//ArticleTitle")
-        title = title_elem.text if title_elem is not None else "Untitled"
-        # extract authors
-        authors = []
-        author_list = article_data.find(".//AuthorList")
-        if author_list is not None:
-            for author in author_list.findall(".//Author"):
-                last_name = author.find(".//LastName")
-                fore_name = author.find(".//ForeName")
-                if last_name is not None:
-                    name = last_name.text
-                    if fore_name is not None:
-                        name = f"{last_name.text}, {fore_name.text}"
-                    authors.append(name)
-        author_str = authors[0] if authors else "Unknown"
-        if len(authors) > 1:
-            author_str += " et al."
-        # extract abstract
-        abstract_parts = []
-        abstract_elem = article_data.find(".//Abstract")
-        if abstract_elem is not None:
-            for abstract_text in abstract_elem.findall(".//AbstractText"):
-                if abstract_text.text:
-                    abstract_parts.append(abstract_text.text)
-        abstract = " ".join(abstract_parts)
-        # extract conclusion (from structured abstract)
-        conclusion = None
-        if abstract_elem is not None:
-            for abstract_text in abstract_elem.findall(".//AbstractText"):
-                label = abstract_text.get("Label", "")
-                if label.upper() in ["CONCLUSION", "CONCLUSIONS"]:
-                    conclusion = abstract_text.text
-                    break
-        # fallback: use last sentence of abstract as conclusion
-        if conclusion is None and abstract:
-            sentences = abstract.split('. ')
-            if len(sentences) > 1:
-                conclusion = sentences[-2] + '.'
-        # extract DOI
-        doi = None
-        pubmed_data = article_elem.find(".//PubmedData")
-        if pubmed_data is not None:
-            article_id_list = pubmed_data.find(".//ArticleIdList")
-            if article_id_list is not None:
-                for article_id in article_id_list.findall(".//ArticleId"):
-                    if article_id.get("IdType") == "doi":
-                        doi = article_id.text
-                        break
-        # extract publication date
-        pub_date = None
-        pub_date_elem = article_data.find(".//ArticleDate")
-        if pub_date_elem is None:
-            pub_date_elem = medline.find(".//PubDate")
-        if pub_date_elem is not None:
-            year = pub_date_elem.find(".//Year")
-            month = pub_date_elem.find(".//Month")
-            day = pub_date_elem.find(".//Day")
-            date_parts = []
-            if year is not None:
-                date_parts.append(year.text)
-            if month is not None:
-                date_parts.append(month.text)
-            if day is not None:
-                date_parts.append(day.text)
-            pub_date = "-".join(date_parts)
-        return {
-            "pmid": pmid,
-            "title": title,
-            "authors": authors,
-            "author_str": author_str,
-            "abstract": abstract,
-            "conclusion": conclusion,
-            "doi": doi,
-            "published": pub_date,
-            "pubmed_link": f"https://pubmed.ncbi.nlm.nih.gov/{pmid}/" if pmid else None
-        }
-    @staticmethod
-    def pubmed_search(
-        keywords: List[str],
-        extra_terms: Optional[List[str]] = None,
-        retmax: int = 20,
-    ) -> List[str]:
-        """
-        Search PubMed for (keywords OR ...) AND (extra_terms OR ...).
-        Returns PubMed IDs.
-        """
-        # building grouped OR clauses
-        base = "(" + " OR ".join(keywords) + ")"
-        if extra_terms:
-            base = f"{base} AND ({' OR '.join(extra_terms)})"
-        # URL-encode the full term string
-        term = urllib.parse.quote(base)
-        url = (
-            f"{PubMedClient.BASE_SEARCH_URL}"
-            f"?db=pubmed&term={term}&retmax={retmax}&retmode=xml"
-        )
-        try:
-            response = fetch_with_retry(url)
-            response.raise_for_status()
-            return PubMedClient.parse_id_list(response.text)
-        except requests.exceptions.RequestException as e:
-            logger.error(f"PubMed search failed: {e}")
-            return []
-    @staticmethod
-    def fetch_articles(pmids: List[str]) -> List[Dict[str, Any]]:
-        """Fetch full article details for given PubMed IDs."""
-        if not pmids:
-            return []
-        ids = ",".join(pmids)
-        url = (
-            f"{PubMedClient.BASE_FETCH_URL}"
-            f"?db=pubmed&id={ids}&retmode=xml&rettype=abstract"
-        )
-        try:
-            response = fetch_with_retry(url)
-            response.raise_for_status()
-            return PubMedClient.parse_articles(response.text)
-        except requests.exceptions.RequestException as e:
-            logger.error(f"PubMed fetch failed: {e}")
-            return []
-    @staticmethod
-    def search_and_fetch(
-        keywords: List[str],
-        extra_terms: Optional[List[str]] = None,
-        retmax: int = 20,
-    ) -> List[Dict[str, Any]]:
-        """
-        Convenience method to search and fetch articles in one call.
-        """
-        pmids = PubMedClient.pubmed_search(keywords, extra_terms, retmax)
-        if not pmids:
-            logger.info("No PubMed IDs found for search")
-            return []
-        return PubMedClient.fetch_articles(pmids)
-class SemanticScholarClient:
-    """Client for searching Semantic Scholar API."""
-    BASE_URL = "https://api.semanticscholar.org/graph/v1"
-    def __init__(self, api_key: Optional[str] = None):
-        self.api_key = api_key
-        self.headers = {}
-        if api_key:
-            self.headers["x-api-key"] = api_key
-    def search(self, query: str, max_results: int = 5) -> List[Dict[str, Any]]:
-        """
-        Search Semantic Scholar for papers.
-        Args:
-            query: Search query string
-            max_results: Maximum number of results to return
-        Returns:
-            List of paper dictionaries with title, abstract, authors, etc.
-        """
-        url = f"{self.BASE_URL}/paper/search"
-        params = {
-            "query": query,
-            "limit": max_results,
-            "fields": "title,abstract,authors,year,publicationDate,citationCount,url,externalIds"
-        }
-        try:
-            response = requests.get(
-                url,
-                params=params,
-                headers=self.headers,
-                timeout=10
-            )
-            response.raise_for_status()
-            data = response.json()
-            results = []
-            for paper in data.get("data", []):
-                results.append({
-                    "title": paper.get("title", "Untitled"),
-                    "abstract": paper.get("abstract", ""),
-                    "authors": [author.get("name", "") for author in paper.get("authors", [])],
-                    "year": paper.get("year"),
-                    "published": paper.get("publicationDate", ""),
-                    "citation_count": paper.get("citationCount", 0),
-                    "url": paper.get("url", ""),
-                    "doi": paper.get("externalIds", {}).get("DOI"),
-                    "arxiv_id": paper.get("externalIds", {}).get("ArXiv"),
-                    "pmid": paper.get("externalIds", {}).get("PubMed")
-                })
-            return results
-        except requests.exceptions.RequestException as e:
-            logger.error(f"Semantic Scholar search failed: {e}")
-            return []
-class ArxivClient:
-    """Client for searching arXiv papers."""
-    DEFAULT_KEYWORDS = [
-        'corvid', 'crow', 'raven', 'corvus', 'jay',
-        'magpie', 'jackdaw', 'rook', 'chough', 'nutcracker'
-    ]
-    def __init__(self, default_keywords: Optional[List[str]] = None):
-        """
-        Initialize ArxivClient.
-        Args:
-            default_keywords: List of keywords to include in searches.
-                            If None, uses DEFAULT_KEYWORDS.
-        """
-        self.default_keywords = default_keywords or self.DEFAULT_KEYWORDS
-    def search(
-        self,
-        query: str,
-        additional_keywords: Optional[List[str]] = None,
-        max_results: int = 5
-    ) -> List[Dict[str, Any]]:
-        """
-        Search arXiv for papers.
-        Args:
-            query: Main search query
-            additional_keywords: Keywords to OR with query. If None, uses default_keywords.
-            max_results: Maximum number of results to return
-        Returns:
-            List of paper dictionaries with title, abstract, authors, etc.
-        """
-        keywords = additional_keywords if additional_keywords is not None else self.default_keywords
-        # build query: query OR keyword1 OR keyword2 ...
-        q_parts = [query] + keywords
-        q = " OR ".join(q_parts)
-        url = (
-            f"https://export.arxiv.org/api/query?"
-            f"search_query=all:({q})&start=0&max_results={max_results}&"
-            "sortBy=lastUpdatedDate&sortOrder=descending"
-        )
-        try:
-            response = requests.get(url, timeout=10)
-            response.raise_for_status()
-            feed = feedparser.parse(response.text)
-            results = []
-            for entry in feed.entries:
-                # Skip entries without abstracts
-                if not getattr(entry, "summary", "").strip():
-                    continue
-                results.append({
-                    "title": getattr(entry, "title", "Untitled"),
-                    "abstract": getattr(entry, "summary", ""),
-                    "authors": [a.name for a in getattr(entry, "authors", [])],
-                    "published": getattr(entry, "published", ""),
-                    "updated": getattr(entry, "updated", ""),
-                    "arxiv_link": getattr(entry, "link", ""),
-                    "arxiv_id": getattr(entry, "id", "").split("/abs/")[-1] if hasattr(entry, "id") else None,
-                    "categories": [tag.term for tag in getattr(entry, "tags", [])]
-                })
-            return results
-        except requests.exceptions.RequestException as e:
-            logger.error(f"arXiv search failed: {e}")
-            return []
-        except Exception as e:
-            logger.error(f"Error parsing arXiv feed: {e}")
-            return []
-# example usage
-if __name__ == "__main__":
-    logging.basicConfig(level=logging.INFO)
-    # pubMed example
-    print("=== PubMed Search ===")
-    keywords = ['corvid', 'crow', 'raven']
-    extra = ['memory', 'cognition']
-    articles = PubMedClient.search_and_fetch(keywords, extra, retmax=5)
-    for article in articles:
-        print(f"\nTitle: {article['title']}")
-        print(f"Authors: {article['author_str']}")
-        print(f"DOI: {article.get('doi', 'N/A')}")
-    # arXiv example
-    print("\n\n=== arXiv Search ===")
-    arxiv = ArxivClient()
-    papers = arxiv.search("intelligence", max_results=3)
-    for paper in papers:
-        print(f"\nTitle: {paper['title']}")
-        print(f"Authors: {', '.join(paper['authors'][:3])}")
-        print(f"Link: {paper['arxiv_link']}")
-    # semantic Scholar example
-    print("\n\n=== Semantic Scholar Search ===")
-    ss = SemanticScholarClient()
-    papers = ss.search("corvid cognition", max_results=3)
-    for paper in papers:
-        print(f"\nTitle: {paper['title']}")
-        print(f"Citations: {paper['citation_count']}")
-        print(f"Year: {paper['year']}")

ticket_list.md DELETED Viewed

@@ -1,67 +0,0 @@
-# Ticket List for Cluas
-This document outlines suggested tasks to improve the Cluas project, aimed at a junior to mid-level engineer.
-## Core Improvements
-These tickets address foundational aspects of the project to improve its robustness, maintainability, and developer experience.
-- **TICKET-01: Implement a Linter and Formatter**
-  - **Description**: The project currently lacks automated code linting and formatting. Introduce a tool like `ruff` to enforce a consistent code style and catch potential errors.
-  - **Tasks**:
-    1. Add `ruff` to the project dependencies in `pyproject.toml`.
-    2. Create a `ruff.toml` or `pyproject.toml` configuration file with basic rules.
-    3. Run `ruff format .` and `ruff check --fix .` to format the existing codebase.
-    4. Update the `README.md` with instructions on how to run the linter.
-- **TICKET-02: Expand Test Coverage for Entrypoints**
-  - **Description**: The `tests/integration` directory only contains tests for `academic_search_entrypoint`. Similar tests should be created for the other entrypoints to ensure they work as expected.
-  - **Tasks**:
-    1. Create `test_news_search_entrypoint.py` in `tests/integration/`.
-    2. Create `test_observation_entrypoint.py` in `tests/integration/`.
-    3. Create `test_web_search_entrypoint.py` in `tests/integration/`.
-    4. Write basic integration tests for each entrypoint, mocking the external API calls.
-- **TICKET-03: Add Type Hinting**
-  - **Description**: While some parts of the code use type hints, many functions are missing them. Gradually adding type hints will improve code clarity and allow for static analysis.
-  - **Tasks**:
-    1. Start with the files in `src/cluas_mcp/common/` and add type hints to all function signatures and variables.
-    2. Continue adding type hints to the entrypoint files in `src/cluas_mcp/`.
-- **TICKET-04: Improve the README.md**
-  - **Description**: The `README.md` provides a good overview, but it could be improved with more practical information for developers.
-  - **Tasks**:
-    1. Add an "Installation" section with instructions on how to set up the project and install dependencies (e.g., using `uv`).
-    2. Add a "Running the Application" section that explains how to start the Gradio app.
-    3. Add a "Running Tests" section that consolidates the test commands from the bottom of the file.
-- **TICKET-05: Refactor or Remove `thing.py`**
-  - **Description**: The file `src/cluas_mcp/academic/thing.py` seems to be a temporary or test script. It should be either removed or refactored into a meaningful module.
-  - **Tasks**:
-    1. Analyze the purpose of the `print` statement in `thing.py`.
-    2. If it's a leftover test script, delete the file.
-    3. If it serves a purpose, rename the file to something descriptive and integrate it properly.
-## Further Ideas
-These are suggestions for new features or major improvements that could be implemented after the core improvements are complete.
-- **IDEA-01: Implement the Orchestrator**
-  - **Description**: The `src/orchestrator.py` file is currently a placeholder. Implementing it would be the next major step towards the project's vision of a dialectic research tool.
-  - **Tasks**:
-    1. Design the `Orchestrator` class structure.
-    2. Implement logic to pass user queries to the relevant characters.
-    3. Develop a system for synthesizing responses from multiple characters.
-    4. Integrate the orchestrator with the Gradio app.
-- **IDEA-02: Create a Dockerfile**
-  - **Description**: Containerizing the application with Docker would make it easier to deploy and run in a consistent environment.
-  - **Tasks**:
-    1. Create a `Dockerfile` that installs Python, copies the project files, and installs dependencies.
-    2. Add a `docker-compose.yml` file for easier local development.
-- **IDEA-03: Set Up a CI/CD Pipeline**
-  - **Description**: A simple CI/CD pipeline (e.g., using GitHub Actions) could automatically run tests and linting on every push or pull request.
-  - **Tasks**:
-    1. Create a `.github/workflows/ci.yml` file.
-    2. Define a workflow that runs `ruff check .` and `pytest`.