Vault.MCP / docs /llama.md
bigwolfe
llama
c78c0b5

Gemini LlamaIndex Vault Chat Agent Spec

Feature spec: Gemini + LlamaIndex Vault Chat Agent (HF Space)

Objective

Add a second planning chat interface to the Hugging Face Space.
Use LlamaIndex for retrieval-augmented generation over the same Markdown vault used by Document-MCP.
Use Gemini (via LlamaIndex) as both the LLM and embedding model.
Optionally allow the agent to write new notes into the vault via constrained tools.

Non-goals

Do not change the existing MCP server or ChatGPT App widget behavior.
Do not introduce a new external database; rely on LlamaIndex storage or simple filesystem persistence for the hackathon.

High-Level Architecture

Vault: directory of Markdown notes already used by Document-MCP.
Indexer: Python module using LlamaIndex to scan the vault, split notes into chunks, and build a VectorStoreIndex backed by a simple vector store.
Chat backend: FastAPI endpoints that load the index, run RAG queries with Gemini, and return answers plus source notes.
HF Space frontend: a new chat panel that calls the backend, shows the assistant response, and lists linked sources (note titles and paths).

Backend details

Dependencies: llama-index core, llama-index-llms-google-genai, llama-index-embeddings-google-genai.
Env vars: GOOGLE_API_KEY, VAULT_DIR, LLAMAINDEX_PERSIST_DIR.
On startup: if a persisted index exists, load it; otherwise, scan VAULT_DIR for markdown files, build a new index, and persist it under LLAMAINDEX_PERSIST_DIR.
Provide a helper get_or_build_index that returns a singleton VectorStoreIndex.
Implement a function rag_chat(messages) that:
    Takes a simple chat history array.
    Uses index.as_query_engine with Gemini as the LLM.
    Runs a query on the latest user message.
    Returns a dict with fields: answer (string), sources (list of title, path, snippet), notes_written (empty list for now).
Expose POST /api/rag/chat in FastAPI that wraps rag_chat.

Frontend details

Add a new panel or tab labeled Gemini Planning Agent.
Layout: left side may keep the existing docs UI; right side is a chat view.
Chat view: list of messages and a composer textarea with a Send button.
On send: push the user message into local history, POST to /api/rag/chat, then append the assistant answer and its sources when the response arrives.
Under each assistant message, show a collapsible Sources section; clicking a source should either open the note in the existing viewer or show the snippet inline.

Index refresh strategy

On every backend startup, attempt to load an existing index; rebuild if missing or invalid.
For hackathon scale, it is acceptable that index updates require a restart or redeploy.

Phase 2 (optional write tools)

Implement safe note-writing helpers (create_note, append_to_note, tag_note) that operate only in a dedicated agent folder inside the vault.
Register these as tools for a LlamaIndex-based agent using Gemini as the reasoning model.
Extend /api/rag/chat so that responses can include notes_written metadata when the agent creates or updates notes.
In the UI, show a small badge when a new note is created, with a link into the vault viewer.

Implementation order

Wire dependencies and environment variables.
Implement get_or_build_index and verify indexing works.
Implement rag_chat and the /api/rag/chat endpoint.
Build the frontend chat UI and hook it up to the endpoint.
If time allows, add Phase 2 tools and surface created notes in the UI.