Spaces:
Running
Feature Specification: Multi-Tenant Obsidian-Like Docs Viewer
Feature Branch: 001-obsidian-docs-viewer
Created: 2025-11-15
Status: Draft
Input: User description: "Build a multi-tenant Obsidian-like docs viewer with FastMCP server (Python) exposing tools over MCP, HTTP API with Bearer auth for UI and MCP, multi-tenant vaults (per-user Markdown directories), indexing (full-text search, backlinks, tags), and a React + shadcn/ui frontend hosted in a Hugging Face Space with Obsidian-style UI (left: directory pane with vault explorer + search, right: main note pane with live-rendered Markdown and light editing). Primary workflow: AI (via MCP) writes/updates docs, humans read + occasionally tweak in the UI."
User Scenarios & Testing (mandatory)
User Story 1 - AI Agent Writes and Updates Documentation (Priority: P1)
An AI agent (Claude via MCP) needs to create and maintain structured documentation within a user's vault. The agent discovers existing notes, creates new notes with proper frontmatter and wikilinks, updates existing content, and automatically maintains the index for searchability.
Why this priority: This is the primary workflow. Without AI write capability via MCP, the core value proposition doesn't exist. This enables the "AI writes, humans read" paradigm.
Independent Test: Can be fully tested by configuring an MCP client (Claude Code/Desktop) with STDIO transport, issuing write_note commands, and verifying files are created with correct frontmatter, body content, and automatic index updates. Delivers immediate value for AI-driven documentation workflows without requiring the UI.
Acceptance Scenarios:
- Given an MCP client connected via STDIO transport, When the agent calls
write_notewith path "api/design.md", title "API Design", and markdown body, Then the file is created with YAML frontmatter containing title, created/updated timestamps, and the body content - Given a note already exists at "api/design.md" with version 3, When the agent calls
write_notewith updated content, Then the note is updated, version increments to 4, updated timestamp is set to now, and the full-text index is updated - Given a note containing wikilinks
[[Authentication Flow]]and[[Database Schema]], When the note is written, Then the link graph index records outgoing links and updates backlinks for target notes - Given a note with frontmatter tags
[backend, api], When the note is written, Then the tag index is updated to include this note under both tags - Given an MCP client requests
search_noteswith query "authentication", When notes containing "authentication" in title or body exist, Then results are returned ranked by title matches first, then body matches, with recency bonus
User Story 2 - Human Reads Documentation in Web UI (Priority: P1)
A human user logs into the web UI, browses their vault's directory tree, searches for notes, and reads rendered Markdown with working wikilinks and backlinks visible.
Why this priority: The read-first UI is the primary human interaction mode. Without this, humans can't consume the AI-generated documentation effectively. This is equally critical to P1 as the write capability.
Independent Test: Can be fully tested by opening the web UI, authenticating with a static token (local mode), clicking through directory tree items, and verifying rendered Markdown displays correctly with clickable wikilinks. Delivers value for documentation consumption without requiring MCP or editing features.
Acceptance Scenarios:
- Given a user is on the login page in local mode, When they access the UI with a valid static Bearer token in local storage, Then they see the directory pane with all vault notes organized by folder structure
- Given the directory pane shows nested folders, When the user clicks on a note "api/design.md", Then the main pane displays the rendered Markdown with title "API Design", body content with proper formatting, and metadata footer showing tags and timestamps
- Given a rendered note contains a wikilink
[[Authentication Flow]], When the user clicks the link, Then the UI navigates to and renders "authentication-flow.md" (resolved via normalized slug matching) - Given a note "auth.md" is referenced by 3 other notes, When the note is displayed, Then the backlinks section in the footer shows all 3 referring notes as clickable links
- Given the user types "database" into the search bar, When the search debounces and executes, Then matching notes appear in a dropdown with snippets, ranked by title matches (3x weight) then body matches, with recency bonus
User Story 3 - Human Edits Documentation in Web UI (Priority: P2)
A human user needs to make minor corrections or additions to AI-generated documentation. They click "Edit" on a note, see a split view with Markdown editor on the left and live preview on the right, make changes, and save with optimistic concurrency protection.
Why this priority: Enables human refinement of AI content. Important for quality but secondary to read/write workflows. Users can still achieve primary value (AI writes, humans read) without editing.
Independent Test: Can be tested by opening a note, clicking "Edit", modifying content in the textarea, clicking "Save", and verifying the file is updated with version conflict detection. Delivers value for collaborative human-AI documentation without requiring full MCP or advanced features.
Acceptance Scenarios:
- Given a user is viewing a rendered note with version 5, When they click the "Edit" button, Then the main pane switches to split view: left side shows markdown source in a textarea, right side shows live-rendered preview
- Given the user is in edit mode, When they modify the markdown body and click "Save", Then a PUT request is sent with
if_version: 5, the note is updated to version 6, updated timestamp is set to now, and the UI switches back to read mode - Given the user opened a note with version 5 and another user/agent updated it to version 6, When the first user clicks "Save", Then the server returns 409 Conflict and the UI displays "This note changed since you opened it; please reload before saving"
- Given the user edits the note title in frontmatter from "API Design" to "API Architecture", When they save, Then the file is updated with new title, directory tree reflects the change, and the title-based link resolution index is updated
- Given the user adds a new wikilink
[[New Feature]]that doesn't exist, When they save and view the rendered note, Then the wikilink is rendered as a "broken link" style with a "Create note" affordance
User Story 4 - Multi-Tenant Access via Hugging Face OAuth (Priority: P2)
Multiple users can sign in to the Hugging Face Space using "Sign in with HF", each getting isolated vaults, personalized API tokens for MCP access, and per-user indices.
Why this priority: Enables the production deployment model. Critical for multi-user scenarios but not needed for local PoC or single-user hackathon demos.
Independent Test: Can be tested by deploying to HF Space with OAuth enabled, signing in with two different HF accounts, creating notes in each vault, and verifying complete data isolation. Delivers value for hosted multi-tenant scenarios without requiring advanced features.
Acceptance Scenarios:
- Given a user visits the HF Space and is not authenticated, When they land on the app, Then they see a "Sign in with Hugging Face" button
- Given a user clicks "Sign in with Hugging Face", When HF OAuth flow completes successfully, Then the backend maps their HF username to an internal user_id, creates a vault directory at
/data/vaults/<user_id>/, initializes an empty index, and redirects to the main app UI - Given a user is authenticated via HF OAuth, When they call
POST /api/tokens, Then the server issues a JWT withsub=user_idandexp=now+90days, returning{"token": "<jwt>"} - Given an MCP client configures the HTTP transport with
Authorization: Bearer <jwt>, When the client callslist_notes, Then the server validates the JWT, extracts user_id, and returns notes only from that user's vault - Given two users (Alice and Bob) each create notes in their vaults, When Alice searches for notes, Then she sees only her own notes, never Bob's (complete data isolation)
User Story 5 - Full-Text Search with Index Health Monitoring (Priority: P3)
Users and AI agents can search across all notes using full-text queries, with results ranked by relevance (title matches weighted higher, recency bonus). Users can manually trigger index rebuilds if needed and view index health status.
Why this priority: Enhances discoverability but is a supporting feature. The basic search in P1/P2 stories is sufficient for MVP. Index rebuild is primarily a maintenance/troubleshooting tool.
Independent Test: Can be tested by creating notes with specific keywords, calling search_notes MCP tool or /api/search endpoint, and verifying ranking (title 3x weight, body 1x, recency bonus). Can verify rebuild by calling POST /api/index/rebuild and checking updated counts. Delivers value for large vaults without requiring other features.
Acceptance Scenarios:
- Given a vault with 50 notes, 10 containing "authentication" in body, and 2 containing "authentication" in title, When a search query "authentication" is executed, Then the 2 title-match notes are ranked first, followed by body-match notes, with notes updated in last 7 days receiving a +1.0 recency bonus
- Given a note contains tokens in both title and body matching the query, When the search is executed, Then the score is
(3 * title_hits) + (1 * body_hits) + recency_bonusand results are sorted by descending score - Given a user calls
GET /api/index/health, When the index exists, Then the response includesnote_count,last_full_rebuildtimestamp, andlast_incremental_updatetimestamp - Given a user has made many manual file changes outside the app, When they call
POST /api/index/rebuild, Then the server drops existing index rows for their user_id, re-scans all .md files, rebuilds full-text index, tag index, and link graph, and updateslast_full_rebuildtimestamp - Given the index shows
note_count: 100andlast_incremental_updateis 2 minutes ago, When a new note is written via MCP, Thennote_countincrements to 101,last_incremental_updateis set to now, and the new note is immediately searchable
Edge Cases
- Wikilink ambiguity: When
[[Note Name]]matches multiple files (e.g.,docs/setup.mdandguides/setup.md), the system resolves deterministically by preferring same-folder match first, then lexicographically smallest path. Ambiguous links may be flagged in backlinks view. - Concurrent edits: When Claude (via MCP, last-write-wins) and a human (via UI, optimistic concurrency) edit the same note simultaneously, the human's save will fail with 409 Conflict if the version changed, preventing silent data loss from the human perspective.
- Broken wikilinks: When a note contains
[[Non Existent Note]], it is rendered as a visually distinct "broken link" style. The index tracks unresolved links. UI offers "Create note" affordance on click. - Large note uploads: When a note exceeds 1 MiB UTF-8 text, the server returns
413 Payload Too Largewith a clear error message. - Vault limit exceeded: When a user attempts to create a note that would exceed 5,000 notes in their vault, the server returns
403 Forbiddenwith error code "vault_note_limit_exceeded". - Malformed frontmatter: When a note has invalid YAML frontmatter, the system treats it as a note without frontmatter, using the first
# Headingas title or filename stem as fallback. - Path traversal attempts: When a path contains
..or absolute path components, the vault module normalizes and validates against the user's vault root, rejecting any escape attempts with400 Bad Request. - Token expiration: When a JWT expires (after 90 days), API/MCP requests return
401 Unauthorized. User must re-authenticate and issue a new token viaPOST /api/tokens. - Case-insensitive wikilink resolution: When
[[api design]]and[[API Design]]both exist as notes, resolution uses case-insensitive normalized slug matching, with exact case matches preferred, then any case variation. - Empty search query: When search is called with an empty or whitespace-only query, the API returns an empty result set without error.
Requirements (mandatory)
Functional Requirements
Core Vault Operations
- FR-001: System MUST provide isolated vault directories per user under a configurable base path (e.g.,
/data/vaults/<user_id>/) - FR-002: System MUST support arbitrary nested folder structures within each vault, containing Markdown (.md) files
- FR-003: System MUST enforce path normalization and validation to prevent directory traversal attacks (no
..escapes, all paths relative to vault root) - FR-004: System MUST parse Markdown files with optional YAML frontmatter containing metadata fields (title, tags, created, updated, project, etc.)
- FR-005: System MUST use
python-frontmatterlibrary (or equivalent) to load and serialize frontmatter + body - FR-006: System MUST auto-manage
createdtimestamp (set once on creation if not provided) andupdatedtimestamp (always set to now on writes) - FR-007: System MUST reject notes exceeding 1 MiB UTF-8 text with
413 Payload Too Large - FR-008: System MUST reject vault operations that would exceed 5,000 notes per user with
403 Forbiddenand error code "vault_note_limit_exceeded" - FR-009: System MUST limit relative path strings to 256 characters maximum
Indexing and Search
- FR-010: System MUST maintain per-user indices for: (a) full-text search (token β note paths), (b) tag index (tag β note paths), (c) link graph (note β outgoing wikilinks, note β backlinks)
- FR-011: System MUST store indices in SQLite database with per-user isolation
- FR-012: System MUST support full-text search with simple tokenization (split on non-alphanumeric, case-insensitive)
- FR-013: System MUST rank search results using scoring formula:
(3 * title_hits) + (1 * body_hits) + recency_bonus, where recency_bonus is 1.0 for updates in last 7 days, 0.5 for last 30 days, 0 otherwise - FR-014: System MUST extract wikilinks from note bodies using regex pattern
\[\[([^\]]+)\]\] - FR-015: System MUST resolve wikilinks via case-insensitive normalized slug matching: normalize(link_text) matches normalize(filename_stem) or normalize(frontmatter_title)
- FR-016: System MUST handle ambiguous wikilinks deterministically: prefer same-folder match, then lexicographically smallest full path
- FR-017: System MUST track unresolved wikilinks (links with no matching note) in the index for UI display
- FR-018: System MUST update indices incrementally on every write/delete operation (synchronous, blocking)
- FR-019: System MUST provide manual full index rebuild capability that re-scans all vault files and reconstructs indices from scratch
Versioning and Concurrency
- FR-020: System MUST maintain a version counter (integer) per note, stored in the index (not in frontmatter)
- FR-021: System MUST increment version by 1 on every successful write operation
- FR-022: System MUST support optimistic concurrency for HTTP API writes: if
if_versionparameter is provided and does not match current version, return409 Conflict - FR-023: System MUST implement last-write-wins for MCP tool writes (no version checking)
Authentication and Authorization
- FR-024: System MUST support two authentication modes: (a) Local mode with static user_id "local-dev" and optional static Bearer token, (b) HF Space mode with OAuth-based per-user identity
- FR-025: System MUST use JWT tokens for API and MCP HTTP authentication, containing claims:
sub=user_id,exp=now+90days, signed with configurable secret - FR-026: System MUST validate Bearer tokens via
Authorization: Bearer <token>header on all protected endpoints - FR-027: System MUST extract user_id from validated JWT and scope all vault/index operations to that user
- FR-028: System MUST integrate with Hugging Face OAuth in Space mode, using
huggingface_hub.attach_huggingface_oauthandparse_huggingface_oauthhelpers - FR-029: System MUST map HF OAuth identity (username or ID) to internal user_id
- FR-030: System MUST create vault directory and initialize empty index on first login for new HF users
HTTP API
- FR-031: System MUST expose HTTP API using FastAPI (or equivalent) with JSON request/response format
- FR-032: System MUST provide endpoint
GET /api/mereturning user info (user_id, HF profile if applicable, authentication status) - FR-033: System MUST provide endpoint
POST /api/tokensto issue new JWT tokens for authenticated users - FR-034: System MUST provide endpoint
GET /api/notesto list notes with optional folder filtering, returning array of{path, title, last_modified} - FR-035: System MUST provide endpoint
GET /api/notes/{path}(where path is URL-encoded, includes.md) returning full note:{path, title, metadata, body, version, created, updated} - FR-036: System MUST provide endpoint
PUT /api/notes/{path}accepting{title, metadata, body, if_version?}to create/update notes - FR-037: System MUST provide endpoint
DELETE /api/notes/{path}to delete notes - FR-038: System MUST provide endpoint
GET /api/search?q=<query>returning ranked search results with snippets - FR-039: System MUST provide endpoint
GET /api/backlinks/{path}returning array of notes that reference the target note - FR-040: System MUST provide endpoint
GET /api/tagsreturning list of{tag, count}across all user notes - FR-041: System MUST provide endpoint
GET /api/index/healthreturning{note_count, last_full_rebuild, last_incremental_update} - FR-042: System MUST provide endpoint
POST /api/index/rebuildto trigger manual full index rebuild
MCP Server (FastMCP)
- FR-043: System MUST expose MCP server using FastMCP library with two transport modes: (a) STDIO for local development, (b) HTTP for remote/HF Space access
- FR-044: System MUST configure MCP HTTP transport to require
Authorization: Bearer <token>header and validate JWT - FR-045: System MUST provide MCP tool
list_noteswith input{folder?: string}returning[{path, title, last_modified}] - FR-046: System MUST provide MCP tool
read_notewith input{path: string}returning{path, title, metadata, body} - FR-047: System MUST provide MCP tool
write_notewith input{path: string, title?: string, metadata?: object, body: string}returning{status: "ok", path} - FR-048: System MUST provide MCP tool
delete_notewith input{path: string}returning{status: "ok"} - FR-049: System MUST provide MCP tool
search_noteswith input{query: string}returning[{path, title, snippet}] - FR-050: System MUST provide MCP tool
get_backlinkswith input{path: string}returning[{path, title}] - FR-051: System MUST provide MCP tool
get_tagswith input{}returning[{tag, count}] - FR-052: System MUST define all MCP tool inputs/outputs with JSON Schema using FastMCP/Pydantic models
Frontend (React + shadcn/ui)
- FR-053: System MUST provide a single-page React application built with Vite or Next.js, using shadcn/ui components
- FR-054: System MUST implement Obsidian-style layout: left sidebar (directory pane + search), right main pane (note viewer/editor)
- FR-055: System MUST display directory tree in left sidebar with collapsible folders and note leaf items, using shadcn
ScrollArea - FR-056: System MUST provide search input in left sidebar with debounced queries calling
GET /api/search, displaying results in dropdown - FR-057: System MUST render selected note in main pane as read-only Markdown by default using
react-markdownwith plugins for code blocks and links - FR-058: System MUST display note metadata in footer: tags (chips), created/updated timestamps, backlinks (clickable)
- FR-059: System MUST provide "Edit" button to switch main pane to edit mode: left side textarea with markdown source, right side live preview
- FR-060: System MUST provide "Save" button in edit mode that calls
PUT /api/notes/{path}withif_version, handling 409 Conflict by showing "Note changed, please reload" message - FR-061: System MUST render wikilinks as clickable links, resolving to target notes on click
- FR-062: System MUST render unresolved wikilinks as distinct "broken link" style with "Create note" affordance
- FR-063: System MUST provide "New note" button in left sidebar to create new notes with auto-generated frontmatter template
- FR-064: System MUST implement authentication flow for HF Space mode: landing page with "Sign in with Hugging Face" button, redirecting to main app after OAuth callback
- FR-065: System MUST call
GET /api/meon startup to detect authentication status andPOST /api/tokensto obtain Bearer token for API calls - FR-066: System MUST display user profile/settings view showing user_id and API token(s) with "copy" button for MCP configuration
- FR-067: System MUST store Bearer token in memory or localStorage and include in
Authorizationheader for all API requests - FR-068: System MUST display small index health indicator showing note count and last updated timestamp
Key Entities
User: Represents an authenticated user (local-dev or HF OAuth identity). Has a unique
user_id, optional HF profile data (username, avatar), and vault directory path.Vault: A user-specific directory tree containing Markdown notes. Has a root path (
/data/vaults/<user_id>/), arbitrary nested folders, and .md files.Note: A Markdown file with optional YAML frontmatter and body content. Key attributes:
path(relative to vault root, includes .md),title(from frontmatter or first H1 or filename stem),metadata(frontmatter key-value pairs),body(markdown content),version(integer counter),created(ISO timestamp),updated(ISO timestamp).Wikilink: A reference from one note to another using
[[link text]]syntax. Hassource_note_path,link_text,target_note_path(resolved via normalized slug matching, may be null if unresolved).Tag: A metadata label applied to notes via frontmatter
tags: [tag1, tag2]. Hastag_nameand count of associated notes.Index: Per-user data structures for efficient search and navigation. Contains: full-text inverted index (token β note paths), tag index (tag β note paths), link graph (note β outgoing wikilinks, note β backlinks).
Token (JWT): A signed JSON Web Token used for API and MCP authentication. Contains claims:
sub(user_id),exp(expiration timestamp), signed with server secret.
Success Criteria (mandatory)
Measurable Outcomes
- SC-001: AI agents can create, read, update, and delete notes via MCP STDIO transport in under 500ms per operation for vaults with up to 1,000 notes
- SC-002: Human users can browse directory tree, select a note, and view rendered Markdown in under 2 seconds from click to full render
- SC-003: Search queries return ranked results in under 1 second for vaults with up to 5,000 notes
- SC-004: Wikilink navigation (click to target note) completes in under 1 second, with ambiguous links resolved deterministically without user intervention
- SC-005: Concurrent edit conflicts (human vs AI) are detected and prevented 100% of the time via optimistic concurrency for UI writes
- SC-006: Index health accurately reflects vault state:
note_countmatches actual file count within 1 second of any write operation (incremental update) - SC-007: Multi-tenant isolation is complete: users never see or access other users' vaults or notes in API, MCP, or UI
- SC-008: OAuth authentication flow (HF Space mode) completes in under 10 seconds from "Sign in" click to main app view
- SC-009: Users can issue API tokens and configure MCP clients with documented steps, successfully making MCP HTTP requests within 5 minutes of first login
- SC-010: Manual index rebuild completes in under 30 seconds for vaults with up to 1,000 notes
- SC-011: System handles 10 concurrent users (each making read/write/search operations) without response time degradation beyond 2x baseline
- SC-012: Broken wikilinks are visually distinct and offer "Create note" affordance, with 90% of test users successfully creating target notes on first attempt
- SC-013: Note edit-save cycle with version conflict detection prevents silent overwrites in 100% of conflict scenarios
- SC-014: All API endpoints return appropriate HTTP status codes (200, 201, 400, 401, 403, 409, 413, 500) with clear error messages in JSON format
Assumptions
- Deployment target: Primary deployment is Hugging Face Space with OAuth. Local PoC uses static token or no auth.
- Note format: All notes are UTF-8 encoded Markdown with optional YAML frontmatter. No binary files or non-.md formats in vaults.
- Wikilink syntax: Only
[[wikilink]]syntax is supported (Obsidian-style). No[[link|alias]]or other variants in initial version. - Search sophistication: Simple tokenization (split on non-alphanum) is sufficient. No stemming, synonyms, or advanced NLP.
- Concurrency model: SQLite is acceptable for per-user indices given hackathon scale. Future versions may need distributed DB for higher concurrency.
- Frontend deployment: Frontend is served from the same Python process as backend (static files or integrated framework), not a separate service.
- MCP client types: Primary MCP clients are Claude Code, Claude Desktop (STDIO), and potentially other MCP-compatible tools via HTTP transport.
- Storage: Filesystem-based vault storage is sufficient. No object storage or cloud sync in initial version.
- Performance targets: Designed for individual user vaults (hundreds to low thousands of notes), not enterprise-scale knowledge bases.
- Security: HTTPS is assumed for HF Space deployment. JWT secret management follows standard env-var configuration practices.
Scope Boundaries
In Scope
- Multi-tenant vault storage with per-user directories
- Full-text search, tag index, and bidirectional link graph
- Wikilink resolution with ambiguity handling and broken link detection
- HTTP API with Bearer auth (JWT)
- FastMCP server with STDIO (local) and HTTP (remote) transports
- React + shadcn/ui frontend with Obsidian-style layout
- Read-first UI with secondary editing capability
- Optimistic concurrency for UI, last-write-wins for MCP
- HF OAuth integration for multi-user Space deployment
- Manual index rebuild and health monitoring
Out of Scope
- AI-driven re-organization or "smart" refactors of documentation structure
- Complex agentic flows or autonomous planning by AI
- Wikilink aliases (
[[link|display text]]) - Advanced Markdown features (footnotes, math rendering, diagrams) beyond basic code/link support
- Real-time collaborative editing (operational transforms or CRDTs)
- Version history or rollback beyond current version conflict detection
- Export to non-Markdown formats (PDF, HTML, etc.)
- Import from other note systems (Notion, Evernote, etc.)
- Mobile-optimized UI (desktop-first only)
- Offline support or PWA capabilities
- Fine-grained RBAC or multi-user permissions within a single vault (each user has their own isolated vault)
- Auto-save or draft states in UI editor