Spaces:
Running
Research: Multi-Tenant Obsidian-Like Docs Viewer
Branch: 001-obsidian-docs-viewer | Date: 2025-11-15 | Plan: plan.md
Overview
This document captures technical research and decisions for the implementation of a multi-tenant Obsidian-like documentation viewer. Each section addresses a specific research topic from Phase 0 of the implementation plan.
1. FastMCP HTTP Transport Authentication (Bearer Token)
Decision
Use FastMCP's built-in BearerAuth mechanism with JWT token validation for HTTP transport authentication.
Implementation approach:
- Server: Configure FastMCP HTTP transport to accept
Authorization: Bearer <token>header - Client: Pass JWT token as string to
authparameter (FastMCP adds "Bearer" prefix automatically) - Token format: JWT with claims
sub=user_id,exp=now+90days, signed withHS256and server secret
Rationale
- Native FastMCP support: FastMCP provides first-class Bearer token authentication via
BearerAuthclass and string token shortcuts - Minimal configuration: Client code is as simple as
Client("https://...", auth="<token>") - Standard compliance: Uses industry-standard
Authorization: Bearerheader pattern - Transport flexibility: Works seamlessly with both HTTP and SSE (Server-Sent Events) transports
- Non-interactive workflow: Perfect for AI agents and service accounts that need programmatic access
Alternatives Considered
Alternative 1: Custom header authentication
- Rejected: FastMCP supports custom headers but requires manual implementation of auth logic
- Why rejected: More complex, loses benefit of FastMCP's built-in token handling and validation
Alternative 2: OAuth flow for MCP clients
- Rejected: FastMCP supports full OAuth 2.1 flows with browser-based authentication
- Why rejected: Overly complex for AI agent use case; requires interactive browser flow which doesn't suit MCP STDIO or programmatic access patterns
Alternative 3: API key-based authentication
- Rejected: Could use simple API keys instead of JWTs
- Why rejected: JWTs provide expiration, claims, and stateless validation; better security posture for multi-tenant system
Implementation Notes
Server-side setup:
from fastmcp import FastMCP
from fastmcp.server.auth import BearerAuthProvider
import jwt
# For token validation (if using external issuer)
auth_provider = BearerAuthProvider(
public_key="<RSA_PUBLIC_KEY>",
issuer="https://your-issuer.com",
audience="your-api"
)
# For internal JWT validation (our use case)
# Validate manually in middleware/dependency injection
def validate_jwt(token: str) -> str:
payload = jwt.decode(token, SECRET_KEY, algorithms=["HS256"])
return payload["sub"] # user_id
Client-side setup:
from fastmcp import Client
# Simplest approach - pass token as string
async with Client(
"https://fastmcp.cloud/mcp",
auth="eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..."
) as client:
await client.call_tool("list_notes", {})
# Explicit approach - use BearerAuth class
from fastmcp.client.auth import BearerAuth
async with Client(
"https://fastmcp.cloud/mcp",
auth=BearerAuth(token="eyJhbGci...")
) as client:
await client.call_tool("list_notes", {})
Key points:
- Do NOT include "Bearer" prefix when passing token - FastMCP adds it automatically
- Token validation happens on every MCP tool call via HTTP transport
- STDIO transport bypasses authentication (local development only)
- For HF Space deployment, combine with HF OAuth to issue user-specific JWTs
References:
- FastMCP Bearer Auth docs: https://gofastmcp.com/clients/auth/bearer
- FastMCP authentication patterns: https://gyliu513.github.io/jekyll/update/2025/08/12/fastmcp-auth-patterns.html
2. Hugging Face Space OAuth Integration
Decision
Use huggingface_hub library's built-in OAuth helpers (attach_huggingface_oauth, parse_huggingface_oauth) for zero-configuration OAuth integration in HF Spaces.
Implementation approach:
- Add
hf_oauth: trueto Space metadata in README.md - Call
attach_huggingface_oauth(app)to auto-register OAuth endpoints (/oauth/huggingface/login,/oauth/huggingface/logout,/oauth/huggingface/callback) - Call
parse_huggingface_oauth(request)in route handlers to extract authenticated user info - Map HF username/ID to internal
user_idfor vault scoping
Rationale
- Zero-configuration: HF Spaces automatically injects OAuth environment variables (
OAUTH_CLIENT_ID,OAUTH_CLIENT_SECRET,OAUTH_SCOPES) whenhf_oauth: trueis set - Local dev friendly:
parse_huggingface_oauthreturns mock user in local mode, enabling seamless development without OAuth setup - Minimal code: Two function calls provide complete OAuth flow (login redirect, callback handling, session management)
- First-class support: Official HF library with guaranteed compatibility with Spaces platform
- Standard OAuth 2.0: Under the hood, implements industry-standard OAuth with PKCE
Alternatives Considered
Alternative 1: Manual OAuth implementation
- Rejected: Implement OAuth flow manually using
authliborrequests-oauthlib - Why rejected: Significantly more code, requires manual handling of PKCE, state validation, and token exchange; error-prone and loses HF Spaces auto-configuration
Alternative 2: Third-party auth provider (Auth0, WorkOS)
- Rejected: Use external auth service and connect HF as identity provider
- Why rejected: Adds unnecessary complexity and external dependencies for a system designed specifically for HF Spaces deployment
Alternative 3: Session-based auth without OAuth
- Rejected: Use simple username/password with cookie sessions
- Why rejected: Poor UX (users already have HF accounts), requires password management, doesn't leverage HF ecosystem integration
Implementation Notes
Space configuration (README.md frontmatter): ```yaml
title: Obsidian Docs Viewer emoji: π colorFrom: blue colorTo: green sdk: docker app_port: 8000 hf_oauth: true # <-- Enable OAuth
**Backend integration** (FastAPI):
```python
from fastapi import FastAPI, Request
from huggingface_hub import attach_huggingface_oauth, parse_huggingface_oauth
app = FastAPI()
# Auto-register OAuth endpoints
attach_huggingface_oauth(app)
@app.get("/")
def index(request: Request):
oauth_info = parse_huggingface_oauth(request)
if oauth_info is None:
return {"message": "Not logged in", "login_url": "/oauth/huggingface/login"}
# Extract user info
user_id = oauth_info.user_info.preferred_username # or use 'sub' for UUID
display_name = oauth_info.user_info.name
avatar = oauth_info.user_info.picture
return {
"user_id": user_id,
"display_name": display_name,
"avatar": avatar
}
@app.get("/api/me")
def get_current_user(request: Request):
oauth_info = parse_huggingface_oauth(request)
if oauth_info is None:
raise HTTPException(status_code=401, detail="Not authenticated")
# Map to internal user model
user_id = oauth_info.user_info.preferred_username
# Initialize vault on first login if needed
vault_service.ensure_vault_exists(user_id)
return {
"user_id": user_id,
"hf_profile": {
"username": oauth_info.user_info.preferred_username,
"name": oauth_info.user_info.name,
"avatar": oauth_info.user_info.picture
}
}
Frontend integration (React):
// Check auth status on app load
useEffect(() => {
fetch('/api/me')
.then(res => {
if (res.ok) return res.json();
throw new Error('Not authenticated');
})
.then(user => setCurrentUser(user))
.catch(() => window.location.href = '/oauth/huggingface/login');
}, []);
Key points:
attach_huggingface_oauthmust be called BEFORE defining routes that need authparse_huggingface_oauthreturnsNoneif not authenticated (check before accessing user_info)- In local development, returns mocked user with deterministic username (e.g., "local-user")
- OAuth tokens/sessions are managed by
huggingface_hub(stored in cookies) - For API/MCP access, issue separate JWT after OAuth login via
POST /api/tokens
Environment variables (auto-injected in HF Space):
OAUTH_CLIENT_ID: Public client identifierOAUTH_CLIENT_SECRET: Secret for token exchangeOAUTH_SCOPES: Space-specific scopes (typicallyopenid profile)
References:
- HF OAuth docs: https://huggingface.co/docs/hub/spaces-oauth
- huggingface_hub API: https://huggingface.co/docs/huggingface_hub/en/package_reference/oauth
3. SQLite Schema Design for Multi-Index Storage
Decision
Use SQLite with FTS5 (Full-Text Search 5) for full-text indexing, plus separate regular tables for tags and link graph. Implement per-user isolation via user_id column in all tables.
Schema approach:
- Full-text index: FTS5 virtual table with
titleandbodycolumns, usingportertokenizer for stemming - Tag index: Regular table with
user_id,tag,note_path(many-to-many relationship) - Link graph: Regular table with
user_id,source_path,target_path,link_text,is_resolved - Metadata index: Regular table with
user_id,note_path,version,created,updated,titlefor fast lookups - Index health: Regular table with
user_id,note_count,last_full_rebuild,last_incremental_update
Rationale
- FTS5 performance: Native full-text search with inverted index, sub-100ms query times for thousands of documents
- Separate concerns: Full-text (FTS5), tags (simple lookup), and links (graph traversal) have different query patterns; separate tables optimize each
- Per-user isolation:
user_idcolumn in all tables enables simple WHERE filtering without complex row-level security - External content pattern: FTS5 with
content=''(contentless) avoids storing document text twice (already in filesystem) - Version tracking: Metadata table stores version counter for optimistic concurrency without polluting frontmatter
- Prefix indexes: FTS5
prefix='2 3'option enables fast autocomplete/prefix search
Alternatives Considered
Alternative 1: Single FTS5 table for everything
- Rejected: Store tags and links as UNINDEXED columns in FTS5 table
- Why rejected: FTS5 is optimized for full-text, not structured data; complex queries (e.g., "all notes with tag X") would require scanning all rows; tags/links don't benefit from tokenization
Alternative 2: Separate SQLite database per user
- Rejected: One
.dbfile per user instead ofuser_idcolumn - Why rejected: Increases file I/O overhead, complicates connection pooling, harder to implement global admin queries (e.g., total user count)
Alternative 3: PostgreSQL with pg_trgm or RUM indexes
- Rejected: Use full Postgres instead of SQLite
- Why rejected: Overkill for single-server deployment, adds deployment complexity, SQLite is sufficient for target scale (5,000 notes/user, 10 concurrent users)
Alternative 4: In-memory index only
- Rejected: Build inverted index in Python dict, no persistence
- Why rejected: Slow startup (rebuild on every restart), no durability, doesn't scale beyond single process
Implementation Notes
Schema definition:
-- Metadata index (fast lookups, version tracking)
CREATE TABLE IF NOT EXISTS note_metadata (
user_id TEXT NOT NULL,
note_path TEXT NOT NULL,
version INTEGER NOT NULL DEFAULT 1,
title TEXT NOT NULL,
created TEXT NOT NULL, -- ISO 8601 timestamp
updated TEXT NOT NULL, -- ISO 8601 timestamp
PRIMARY KEY (user_id, note_path)
);
CREATE INDEX idx_metadata_user ON note_metadata(user_id);
CREATE INDEX idx_metadata_updated ON note_metadata(user_id, updated DESC);
-- Full-text search index (FTS5, contentless)
CREATE VIRTUAL TABLE IF NOT EXISTS note_fts USING fts5(
user_id UNINDEXED,
note_path UNINDEXED,
title,
body,
content='', -- Contentless (we don't store the actual text)
tokenize='porter unicode61', -- Stemming + Unicode support
prefix='2 3' -- Prefix indexes for autocomplete
);
-- Tag index (many-to-many: notes <-> tags)
CREATE TABLE IF NOT EXISTS note_tags (
user_id TEXT NOT NULL,
note_path TEXT NOT NULL,
tag TEXT NOT NULL,
PRIMARY KEY (user_id, note_path, tag)
);
CREATE INDEX idx_tags_user_tag ON note_tags(user_id, tag);
CREATE INDEX idx_tags_user_path ON note_tags(user_id, note_path);
-- Link graph (outgoing links from notes)
CREATE TABLE IF NOT EXISTS note_links (
user_id TEXT NOT NULL,
source_path TEXT NOT NULL,
target_path TEXT, -- NULL if unresolved
link_text TEXT NOT NULL, -- Original [[link text]]
is_resolved INTEGER NOT NULL DEFAULT 0, -- Boolean: 0=broken, 1=resolved
PRIMARY KEY (user_id, source_path, link_text)
);
CREATE INDEX idx_links_user_source ON note_links(user_id, source_path);
CREATE INDEX idx_links_user_target ON note_links(user_id, target_path);
CREATE INDEX idx_links_unresolved ON note_links(user_id, is_resolved);
-- Index health tracking
CREATE TABLE IF NOT EXISTS index_health (
user_id TEXT PRIMARY KEY,
note_count INTEGER NOT NULL DEFAULT 0,
last_full_rebuild TEXT, -- ISO 8601 timestamp
last_incremental_update TEXT -- ISO 8601 timestamp
);
Query patterns:
# Full-text search with ranking
cursor.execute("""
SELECT
note_path,
title,
bm25(note_fts, 3.0, 1.0) AS rank -- Title weight=3, body weight=1
FROM note_fts
WHERE user_id = ? AND note_fts MATCH ?
ORDER BY rank DESC
LIMIT 50
""", (user_id, query))
# Get all notes with a specific tag
cursor.execute("""
SELECT DISTINCT note_path, title
FROM note_tags t
JOIN note_metadata m USING (user_id, note_path)
WHERE t.user_id = ? AND t.tag = ?
ORDER BY m.updated DESC
""", (user_id, tag))
# Get backlinks for a note
cursor.execute("""
SELECT DISTINCT l.source_path, m.title
FROM note_links l
JOIN note_metadata m ON l.user_id = m.user_id AND l.source_path = m.note_path
WHERE l.user_id = ? AND l.target_path = ?
ORDER BY m.updated DESC
""", (user_id, target_path))
# Get all unresolved links for UI display
cursor.execute("""
SELECT source_path, link_text
FROM note_links
WHERE user_id = ? AND is_resolved = 0
""", (user_id,))
Incremental update strategy:
- On
write_note: Delete all existing rows for(user_id, note_path), then insert new rows - Use transactions to ensure atomicity (delete old + insert new = single atomic operation)
- Update
index_health.last_incremental_updateon every write
Full rebuild strategy:
- Delete all index rows for
user_id - Scan all
.mdfiles in vault directory - Parse each file and insert into all indexes
- Update
index_health.note_countandlast_full_rebuild
Key points:
- FTS5 with
content=''is contentless - we must manually INSERT/DELETE rows (no automatic synchronization) - Use
portertokenizer for English stemming (search "running" matches "run") bm25()function provides relevance ranking (better than simple MATCH count)- Prefix indexes (
prefix='2 3') enable fastMATCH 'prefix*'queries UNINDEXEDcolumns in FTS5 are retrievable but not searchable (good for IDs)
References:
- SQLite FTS5 docs: https://www.sqlite.org/fts5.html
- FTS5 structure deep dive: https://darksi.de/13.sqlite-fts5-structure/
4. Wikilink Normalization and Resolution
Decision
Implement case-insensitive normalized slug matching with deterministic ambiguity resolution based on Obsidian's behavior.
Normalization algorithm:
- Extract link text from
[[link text]] - Normalize: lowercase, replace spaces/hyphens/underscores with single dash, remove non-alphanumeric except dashes
- Match normalized slug against normalized filename stems AND normalized frontmatter titles
- If multiple matches: prefer same-folder match, then lexicographically smallest path
Slug normalization function:
import re
def normalize_slug(text: str) -> str:
"""Normalize text to slug for case-insensitive matching."""
text = text.lower()
text = re.sub(r'[\s_]+', '-', text) # Spaces/underscores β dash
text = re.sub(r'[^a-z0-9-]', '', text) # Keep only alphanumeric + dash
text = re.sub(r'-+', '-', text) # Collapse multiple dashes
return text.strip('-')
Rationale
- Obsidian compatibility: Matches Obsidian's link resolution behavior (case-insensitive, flexible matching)
- User-friendly: Users don't need to remember exact case or spacing (e.g.,
[[API Design]]matchesapi-design.md) - Deterministic: Same-folder preference + lexicographic tiebreaker ensures consistent resolution
- Efficient indexing: Normalized slugs can be pre-computed and indexed for O(1) lookup
- Graceful fallback: Broken links are tracked and displayed distinctly in UI
Alternatives Considered
Alternative 1: Exact case-sensitive matching
- Rejected: Require
[[exact-filename]]to matchexact-filename.md - Why rejected: Brittle user experience, doesn't match Obsidian behavior, forces users to remember exact capitalization
Alternative 2: Fuzzy matching (Levenshtein distance)
- Rejected: Use string similarity to find "close enough" matches
- Why rejected: Non-deterministic, slower, can match wrong notes ("Setup" matches "Startup"), confusing UX
Alternative 3: Path-based links only
- Rejected: Require full paths like
[[guides/setup]]instead of[[Setup]] - Why rejected: Verbose, doesn't match Obsidian's short-link paradigm, poor UX for large vaults
Alternative 4: UUID-based links
- Rejected: Use unique IDs like
[[#uuid-123]]for stable references - Why rejected: Not human-readable, requires additional metadata, doesn't match Obsidian convention
Implementation Notes
Resolution algorithm (priority order):
def resolve_wikilink(user_id: str, link_text: str, current_note_folder: str) -> str | None:
"""Resolve wikilink to note path, or None if unresolved."""
normalized = normalize_slug(link_text)
# Build candidate index: normalized_slug -> [note_paths]
candidates = defaultdict(list)
# Scan all notes for this user
for note in list_all_notes(user_id):
# Match against filename stem
stem = Path(note.path).stem
if normalize_slug(stem) == normalized:
candidates[note.path].append(note.path)
# Match against frontmatter title
if note.title and normalize_slug(note.title) == normalized:
candidates[note.path].append(note.path)
if not candidates:
return None # Unresolved link
paths = list(set(candidates.keys())) # Deduplicate
if len(paths) == 1:
return paths[0] # Unique match
# Ambiguity resolution
# 1. Prefer same-folder match
same_folder = [p for p in paths if Path(p).parent == current_note_folder]
if same_folder:
return sorted(same_folder)[0] # Lexicographic tiebreaker
# 2. Lexicographically smallest path
return sorted(paths)[0]
Index optimization:
Pre-compute normalized slugs for all notes and store in note_metadata table:
ALTER TABLE note_metadata ADD COLUMN normalized_title_slug TEXT;
ALTER TABLE note_metadata ADD COLUMN normalized_path_slug TEXT;
CREATE INDEX idx_metadata_title_slug ON note_metadata(user_id, normalized_title_slug);
CREATE INDEX idx_metadata_path_slug ON note_metadata(user_id, normalized_path_slug);
Link extraction from Markdown:
import re
def extract_wikilinks(markdown_body: str) -> list[str]:
"""Extract all wikilink texts from markdown body."""
pattern = r'\[\[([^\]]+)\]\]'
return re.findall(pattern, markdown_body)
Update link graph on write:
def update_link_graph(user_id: str, note_path: str, body: str):
"""Update outgoing links and backlinks for a note."""
current_folder = str(Path(note_path).parent)
# Extract wikilinks from body
link_texts = extract_wikilinks(body)
# Delete old links from this note
db.execute("DELETE FROM note_links WHERE user_id=? AND source_path=?",
(user_id, note_path))
# Insert new links
for link_text in link_texts:
target_path = resolve_wikilink(user_id, link_text, current_folder)
is_resolved = 1 if target_path else 0
db.execute("""
INSERT INTO note_links (user_id, source_path, target_path, link_text, is_resolved)
VALUES (?, ?, ?, ?, ?)
""", (user_id, note_path, target_path, link_text, is_resolved))
UI rendering:
// Transform wikilinks to clickable links in rendered Markdown
function transformWikilinks(markdown: string, linkIndex: Record<string, string>): string {
return markdown.replace(/\[\[([^\]]+)\]\]/g, (match, linkText) => {
const targetPath = linkIndex[linkText];
if (targetPath) {
// Resolved link
return `<a href="#/note/${encodeURIComponent(targetPath)}" class="wikilink">${linkText}</a>`;
} else {
// Broken link
return `<a href="#/create/${encodeURIComponent(linkText)}" class="wikilink broken">${linkText}</a>`;
}
});
}
Key points:
- Pre-compute and cache slug mappings for performance (avoid re-scanning on every link resolution)
- Same-folder preference matches Obsidian's behavior (local references are intuitive)
- Lexicographic tiebreaker ensures determinism (same input always resolves to same output)
- Track
is_resolvedflag to identify broken links for UI warnings/affordances - Update entire link graph on every note write (incremental update, not rebuild)
Edge cases:
- Empty link text
[[]]- ignore/skip - Nested brackets
[[foo [[bar]]]]- naive regex fails; use proper parser or limit to non-nested pattern - Link with pipe
[[link|display]]- out of scope for MVP; treat entire string as link text
5. React + shadcn/ui Directory Tree Component
Decision
Use shadcn-extension Tree View component with built-in virtualization via @tanstack/react-virtual for directory tree rendering.
Component choice: shadcn-extension Tree View
- Installation: Available at https://shadcn-extension.vercel.app/docs/tree-view
- Features: Virtualization, accordion-based expand/collapse, keyboard navigation, selection, custom icons
- Why this one: Only shadcn tree component with native virtualization support; critical for large vaults (5,000 notes)
Rationale
- Virtualization required: 5,000 notes would create 5,000+ DOM nodes without virtualization; TanStack Virtual renders only visible rows (~20-50 nodes)
- Performance: Virtualization reduces initial render from ~2s to <100ms for large trees
- shadcn ecosystem: Consistent styling with other shadcn/ui components (Button, ScrollArea, etc.)
- Accessibility: Built on Radix UI primitives with keyboard navigation and ARIA support
- Customizable: Supports custom icons per node, expand/collapse callbacks, and selection handling
Alternatives Considered
Alternative 1: MrLightful's shadcn Tree View
- Rejected: Feature-rich component with drag-and-drop, custom icons
- Why rejected: No virtualization support; would cause performance issues with 1,000+ notes
Alternative 2: Neigebaie's shadcn Tree View
- Rejected: Advanced features (multi-select, checkboxes, context menus)
- Why rejected: No virtualization; overkill for simple directory browsing
Alternative 3: react-arborist
- Rejected: Powerful tree view library with virtualization and drag-and-drop
- Why rejected: Not part of shadcn ecosystem; requires custom styling to match UI; heavier dependency
Alternative 4: Custom implementation with react-window
- Rejected: Build tree view from scratch using
react-windoworreact-virtual - Why rejected: Significant development effort; reinventing the wheel; shadcn-extension already provides this
Implementation Notes
Installation:
npx shadcn add https://shadcn-extension.vercel.app/registry/tree-view.json
Component usage:
import { Tree, TreeNode } from "@/components/ui/tree-view";
interface FileTreeNode {
id: string;
name: string;
path: string;
isFolder: boolean;
children?: FileTreeNode[];
}
function DirectoryTree({ vault, onSelectNote }: Props) {
// Transform vault notes into tree structure
const treeData = useMemo(() => buildTree(vault.notes), [vault.notes]);
return (
<Tree
data={treeData}
onSelectChange={(nodeId) => {
const node = findNode(treeData, nodeId);
if (!node.isFolder) {
onSelectNote(node.path);
}
}}
// Virtualization is enabled by default
className="w-full h-full"
/>
);
}
// Transform flat list of note paths into hierarchical tree
function buildTree(notes: Note[]): TreeNode[] {
const root: Map<string, TreeNode> = new Map();
for (const note of notes) {
const parts = note.path.split('/');
let currentLevel = root;
for (let i = 0; i < parts.length; i++) {
const part = parts[i];
const isFile = i === parts.length - 1;
const id = parts.slice(0, i + 1).join('/');
if (!currentLevel.has(part)) {
currentLevel.set(part, {
id,
name: isFile ? note.title : part,
path: id,
isFolder: !isFile,
children: isFile ? undefined : new Map()
});
}
if (!isFile) {
currentLevel = currentLevel.get(part)!.children!;
}
}
}
return Array.from(root.values());
}
Styling for Obsidian-like appearance:
/* Custom styles for file tree */
.tree-view-node {
@apply py-1 px-2 rounded hover:bg-accent transition-colors;
}
.tree-view-node.selected {
@apply bg-accent text-accent-foreground font-medium;
}
.tree-view-folder {
@apply flex items-center gap-2;
}
.tree-view-file {
@apply flex items-center gap-2 text-sm;
}
/* Icons */
.folder-icon {
@apply text-yellow-500;
}
.file-icon {
@apply text-gray-500;
}
Collapsible behavior:
// Track expanded folders in state
const [expanded, setExpanded] = useState<Set<string>>(new Set(['root']));
<Tree
data={treeData}
expanded={expanded}
onExpandedChange={setExpanded}
// Auto-expand to selected note's folder
onSelectChange={(nodeId) => {
const path = nodeId.split('/');
const folders = path.slice(0, -1);
setExpanded(new Set([...expanded, ...folders]));
}}
/>
Key points:
- Virtualization is automatic with shadcn-extension Tree View (uses TanStack Virtual internally)
- Must transform flat note list into nested tree structure (use
buildTreeutility) - Track expanded/collapsed state separately from tree data
- Custom icons per node type (folder vs file) via
iconprop - Use
ScrollAreacomponent from shadcn to wrap tree for custom scrollbars
Performance targets:
- Initial render: <200ms for 5,000 notes
- Expand/collapse: <50ms per folder
- Search filter: <100ms to re-render filtered tree
Accessibility:
- Keyboard navigation: Arrow keys to navigate, Enter to select, Space to expand/collapse
- Screen reader support: ARIA labels for folders/files, expand/collapse state
- Focus management: Visible focus indicators, focus restoration after selection
6. Optimistic Concurrency Implementation
Decision
Use version counter (integer) stored in SQLite index with if_version parameter for UI writes. Implement ETag-like validation via If-Match header in HTTP API.
Approach:
- Version counter: Integer field in
note_metadatatable, incremented on every write - UI writes: Include
if_version: NinPUT /api/notes/{path}body - Server validation: Compare
if_versionwith current version; return409 Conflictif mismatch - MCP writes: No version checking (last-write-wins)
- ETag header: Return
ETag: "<version>"inGET /api/notes/{path}response for HTTP compliance
Rationale
- Simple implementation: Integer counter is trivial to increment and compare
- Explicit versioning: Version in request body makes intent clear ("I'm updating version 5")
- Database-backed: Version persists in index, not frontmatter (keeps note content clean)
- HTTP-friendly: Can expose as ETag header for standards compliance
- Performance: Integer comparison is O(1), no hash computation needed
Alternatives Considered
Alternative 1: ETag with content hash
- Rejected: Compute MD5/SHA hash of note content, return as ETag header
- Why rejected: Hash computation on every read adds latency; version counter is sufficient and faster
Alternative 2: Last-Modified timestamps
- Rejected: Use
updatedtimestamp +If-Unmodified-Sinceheader - Why rejected: Timestamp precision issues (SQLite stores ISO strings, not microsecond precision); race conditions if multiple updates within same second
Alternative 3: Version in frontmatter
- Rejected: Store
version: 5in YAML frontmatter - Why rejected: Pollutes user-facing metadata; incrementing version requires parsing/re-serializing frontmatter; harder to manage
Alternative 4: MVCC (Multi-Version Concurrency Control)
- Rejected: Store multiple versions of each note, allow rollback
- Why rejected: Complex implementation; storage overhead; out of scope for MVP (no version history requirement)
Implementation Notes
Schema addition:
-- Version counter in note_metadata table
ALTER TABLE note_metadata ADD COLUMN version INTEGER NOT NULL DEFAULT 1;
API endpoint implementation:
from fastapi import HTTPException, Header
from typing import Optional
@app.put("/api/notes/{path}")
async def update_note(
path: str,
body: NoteUpdateRequest,
user_id: str = Depends(get_current_user),
if_match: Optional[str] = Header(None) # ETag header support
):
# Get current version
current = get_note_metadata(user_id, path)
# Check if_version in body OR If-Match header
expected_version = body.if_version or (int(if_match.strip('"')) if if_match else None)
if expected_version is not None and current.version != expected_version:
raise HTTPException(
status_code=409,
detail={
"error": "version_conflict",
"message": "Note was updated by another process",
"current_version": current.version,
"provided_version": expected_version
}
)
# Update note and increment version
new_version = current.version + 1
save_note(user_id, path, body.content)
update_metadata(user_id, path, version=new_version, updated=now())
return {
"status": "ok",
"version": new_version
}
@app.get("/api/notes/{path}")
async def get_note(
path: str,
user_id: str = Depends(get_current_user)
):
note = load_note(user_id, path)
return JSONResponse(
content={
"path": note.path,
"title": note.title,
"metadata": note.metadata,
"body": note.body,
"version": note.version,
"created": note.created,
"updated": note.updated
},
headers={
"ETag": f'"{note.version}"', # Expose version as ETag
"Cache-Control": "no-cache" # Prevent stale reads
}
)
Frontend implementation (React):
interface Note {
path: string;
title: string;
body: string;
version: number;
// ...
}
async function saveNote(note: Note, newBody: string) {
try {
const response = await fetch(`/api/notes/${encodeURIComponent(note.path)}`, {
method: 'PUT',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${token}`,
// Option 1: Version in body
},
body: JSON.stringify({
body: newBody,
if_version: note.version // Optimistic concurrency check
})
});
if (response.status === 409) {
const error = await response.json();
alert(`Conflict: Note was updated (current version: ${error.current_version}). Please reload and try again.`);
return;
}
const updated = await response.json();
// Update local state with new version
setNote({ ...note, body: newBody, version: updated.version });
} catch (error) {
console.error('Save failed:', error);
}
}
MCP tool implementation (last-write-wins):
@mcp.tool()
async def write_note(path: str, body: str, title: str = None) -> dict:
"""Write note via MCP (no version checking)."""
user_id = get_user_from_context()
# Load existing note to get current version (if exists)
try:
current = get_note_metadata(user_id, path)
new_version = current.version + 1
except NotFoundError:
new_version = 1 # New note
# Write without version check (last-write-wins)
save_note(user_id, path, body, title)
update_metadata(user_id, path, version=new_version, updated=now())
return {"status": "ok", "path": path, "version": new_version}
Conflict resolution UI:
function ConflictDialog({ currentVersion, serverVersion }: Props) {
return (
<Alert variant="destructive">
<AlertTitle>Version Conflict</AlertTitle>
<AlertDescription>
This note was updated while you were editing (version {currentVersion} β {serverVersion}).
<div className="mt-4 space-x-2">
<Button onClick={reload}>Reload and Discard Changes</Button>
<Button variant="outline" onClick={saveAsCopy}>Save as Copy</Button>
</div>
</AlertDescription>
</Alert>
);
}
Key points:
- Version counter starts at 1 for new notes, increments on every write
- HTTP API returns
409 Conflictwith detailed error message (current vs provided version) - ETag header is optional but recommended for HTTP standards compliance
- MCP writes skip version check (AI agents don't need optimistic concurrency)
- Frontend shows clear error message with options: reload, save as copy, or manual merge
Performance considerations:
- Version check is single integer comparison (O(1))
- No need to read entire note content for validation
- Version update is atomic (SQLite transaction)
References:
- Optimistic concurrency patterns: https://event-driven.io/en/how_to_use_etag_header_for_optimistic_concurrency/
- HTTP conditional requests: https://developer.mozilla.org/en-US/docs/Web/HTTP/Conditional_requests
7. Markdown Frontmatter Parsing with Fallback
Decision
Use python-frontmatter library for YAML parsing with try-except wrapper to handle malformed frontmatter gracefully. Implement fallback strategy: malformed YAML β treat as no frontmatter, extract title from first # Heading or filename stem.
Parsing approach:
import frontmatter
from pathlib import Path
def parse_note(file_path: str) -> dict:
"""Parse note with frontmatter fallback."""
try:
# Attempt to parse frontmatter
post = frontmatter.load(file_path)
metadata = dict(post.metadata)
body = post.content
except (yaml.scanner.ScannerError, yaml.parser.ParserError) as e:
# Malformed YAML - treat entire file as body
with open(file_path, 'r', encoding='utf-8') as f:
full_text = f.read()
metadata = {}
body = full_text
# Log warning for debugging
logger.warning(f"Malformed frontmatter in {file_path}: {e}")
# Extract title (priority: frontmatter > first H1 > filename)
title = (
metadata.get('title') or
extract_first_heading(body) or
Path(file_path).stem
)
return {
'title': title,
'metadata': metadata,
'body': body
}
def extract_first_heading(markdown: str) -> str | None:
"""Extract first # Heading from markdown body."""
match = re.match(r'^#\s+(.+)$', markdown, re.MULTILINE)
return match.group(1).strip() if match else None
Rationale
- Graceful degradation: Malformed YAML doesn't break the system; note is still readable
- User-friendly: Non-technical users may create invalid YAML; system should be forgiving
- Simple implementation: Try-except wrapper is minimal code;
python-frontmatterhandles valid cases - Fallback chain: Title extraction has clear priority order (explicit > inferred > default)
- Debugging support: Log warnings for malformed YAML so admins can fix source files
Alternatives Considered
Alternative 1: Strict parsing (fail on malformed YAML)
- Rejected: Raise error and reject note with invalid frontmatter
- Why rejected: Poor UX; users may accidentally create invalid YAML (e.g., unquoted colons); breaks read-first workflow
Alternative 2: TOML or JSON frontmatter
- Rejected: Use
+++TOML or{{{ }}}JSON delimiters instead of YAML - Why rejected: Obsidian uses YAML exclusively; compatibility is critical
Alternative 3: Lenient YAML parser
- Rejected: Use
ruamel.yamlwith error recovery instead of PyYAML - Why rejected: Adds complexity;
python-frontmatteruses PyYAML internally; fallback strategy is simpler
Alternative 4: Partial frontmatter extraction
- Rejected: Parse valid keys, ignore malformed keys
- Why rejected: Difficult to implement; unclear semantics (which keys are valid?); safer to treat all as invalid
Implementation Notes
Error types to catch:
import yaml
try:
post = frontmatter.load(file_path)
except (
yaml.scanner.ScannerError, # Invalid YAML syntax (e.g., unmatched quotes)
yaml.parser.ParserError, # Invalid YAML structure
UnicodeDecodeError # Non-UTF8 file encoding
) as e:
# Fallback to no frontmatter
pass
Common malformed YAML examples: ```yaml
title: API Design: Overview # Unquoted colon - INVALID tags: [backend, api]
title: "Setup Guide description: Installation steps # Unclosed quote - INVALID
title: Indented incorrectly # Bad indentation - INVALID tags: - frontend
**Auto-fix on write** (optional enhancement):
```python
def save_note(user_id: str, path: str, title: str, metadata: dict, body: str):
"""Save note with valid frontmatter (auto-fix on write)."""
# Merge title into metadata
metadata['title'] = title
# Create Post object with validated metadata
post = frontmatter.Post(body, **metadata)
# Serialize with valid YAML
file_content = frontmatter.dumps(post)
# Write to file
full_path = get_vault_path(user_id) / path
full_path.write_text(file_content, encoding='utf-8')
Title extraction regex:
def extract_first_heading(markdown: str) -> str | None:
"""Extract first # Heading (must be H1, not H2/H3)."""
# Match # Heading (H1 only, not ## or ###)
pattern = r'^#\s+(.+?)(?:\s+\{[^}]+\})?\s*$'
match = re.search(pattern, markdown, re.MULTILINE)
if match:
heading = match.group(1).strip()
# Remove Markdown formatting (e.g., **bold**, *italic*)
heading = re.sub(r'[*_`]', '', heading)
return heading
return None
Fallback priority:
metadata.get('title')- Explicit frontmatter titleextract_first_heading(body)- First# Headingin bodyPath(file_path).stem- Filename without.mdextension
Validation warnings:
# Add validation warnings to API response
if malformed_frontmatter:
warnings.append({
"type": "malformed_frontmatter",
"message": "YAML frontmatter is invalid and was ignored",
"line": error.problem_mark.line if hasattr(error, 'problem_mark') else None
})
UI display for warnings:
function NoteViewer({ note, warnings }: Props) {
return (
<div>
{warnings.map(w => (
<Alert key={w.type} variant="warning">
<AlertTitle>Warning</AlertTitle>
<AlertDescription>{w.message}</AlertDescription>
</Alert>
))}
<Markdown>{note.body}</Markdown>
</div>
);
}
Key points:
- Always catch
yaml.scanner.ScannerErrorandyaml.parser.ParserErrorfrom PyYAML - Log warnings with file path and error details for admin debugging
- Prefer graceful fallback over strict validation (read-first workflow)
- Auto-fix on write ensures newly saved notes have valid frontmatter
- Extract title from first
# Heading, not## Subheading(H1 only)
References:
- python-frontmatter docs: https://python-frontmatter.readthedocs.io/
- PyYAML error handling: https://pyyaml.org/wiki/PyYAMLDocumentation
8. JWT Token Management in React
Decision
Use hybrid approach: Store short-lived access token (JWT) in memory (React state/context), store long-lived refresh token in HttpOnly cookie (server-managed). For MVP without refresh tokens, store JWT in memory only with 90-day expiration.
MVP approach (no refresh tokens):
- Store JWT in React Context (memory)
- Token expires after 90 days (long-lived)
- On app load, check if token exists in memory β if not, redirect to login
- No localStorage (XSS vulnerability mitigation)
- No refresh flow (acceptable for MVP scale)
Production approach (with refresh tokens):
- Access token: 15-minute expiration, stored in memory
- Refresh token: 90-day expiration, stored in HttpOnly cookie
- Automatic refresh before access token expires
- Refresh endpoint:
POST /api/auth/refresh(validates cookie, issues new access token)
Rationale
- XSS protection: Memory storage prevents JavaScript-based token theft (localStorage is vulnerable to XSS)
- CSRF protection: HttpOnly cookies can't be accessed by JS, mitigating CSRF (when combined with SameSite attribute)
- Industry best practice (2025): Hybrid approach is current security standard for React SPAs
- Acceptable UX: User logs in once per 90 days (or once per session if memory-only)
- No additional dependencies: Built-in React Context API handles memory storage
Alternatives Considered
Alternative 1: localStorage for JWT
- Rejected: Store JWT in
localStorage.setItem('token', jwt) - Why rejected: Vulnerable to XSS attacks (malicious scripts can read localStorage); still in OWASP Top 10; unacceptable security risk for multi-tenant system
Alternative 2: sessionStorage for JWT
- Rejected: Store JWT in
sessionStorage(cleared on tab close) - Why rejected: Poor UX (re-login on every new tab); still vulnerable to XSS
Alternative 3: Cookies for both access and refresh tokens
- Rejected: Store JWT in regular cookies (not HttpOnly)
- Why rejected: Vulnerable to CSRF if not using HttpOnly; vulnerable to XSS if accessible to JS
Alternative 4: No token storage (re-authenticate on every request)
- Rejected: Use HF OAuth on every API call
- Why rejected: Unacceptable latency; OAuth flow is slow (~2-3s per request)
Implementation Notes
MVP implementation (memory-only, 90-day JWT):
// Auth context (memory storage)
import { createContext, useContext, useState, useEffect } from 'react';
interface AuthContextType {
token: string | null;
setToken: (token: string) => void;
logout: () => void;
}
const AuthContext = createContext<AuthContextType | null>(null);
export function AuthProvider({ children }: { children: React.ReactNode }) {
const [token, setTokenState] = useState<string | null>(null);
const setToken = (newToken: string) => {
setTokenState(newToken);
};
const logout = () => {
setTokenState(null);
window.location.href = '/oauth/huggingface/logout';
};
return (
<AuthContext.Provider value={{ token, setToken, logout }}>
{children}
</AuthContext.Provider>
);
}
export function useAuth() {
const context = useContext(AuthContext);
if (!context) throw new Error('useAuth must be used within AuthProvider');
return context;
}
// App initialization (fetch token after OAuth)
function App() {
const { token, setToken } = useAuth();
const [loading, setLoading] = useState(true);
useEffect(() => {
// Check if authenticated via HF OAuth
fetch('/api/me')
.then(res => {
if (!res.ok) throw new Error('Not authenticated');
return res.json();
})
.then(user => {
// Issue JWT token for API access
return fetch('/api/tokens', { method: 'POST' });
})
.then(res => res.json())
.then(data => {
setToken(data.token);
setLoading(false);
})
.catch(() => {
// Redirect to OAuth login
window.location.href = '/oauth/huggingface/login';
});
}, []);
if (loading) return <div>Loading...</div>;
return <MainApp />;
}
// API client (include token in headers)
async function apiRequest(endpoint: string, options: RequestInit = {}) {
const { token } = useAuth();
const response = await fetch(`/api${endpoint}`, {
...options,
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${token}`,
...options.headers
}
});
if (response.status === 401) {
// Token expired or invalid
logout();
throw new Error('Unauthorized');
}
return response;
}
Production implementation (with refresh tokens):
// Token refresh logic
let refreshPromise: Promise<string> | null = null;
async function refreshAccessToken(): Promise<string> {
// Prevent multiple concurrent refresh calls
if (refreshPromise) return refreshPromise;
refreshPromise = fetch('/api/auth/refresh', {
method: 'POST',
credentials: 'include' // Send HttpOnly cookie
})
.then(res => {
if (!res.ok) throw new Error('Refresh failed');
return res.json();
})
.then(data => {
setToken(data.access_token);
refreshPromise = null;
return data.access_token;
})
.catch(err => {
refreshPromise = null;
logout();
throw err;
});
return refreshPromise;
}
// Automatic refresh before token expires
useEffect(() => {
if (!token) return;
// Parse token to get expiration
const payload = JSON.parse(atob(token.split('.')[1]));
const expiresAt = payload.exp * 1000;
const now = Date.now();
const refreshAt = expiresAt - (5 * 60 * 1000); // 5 minutes before expiry
const timeoutId = setTimeout(() => {
refreshAccessToken();
}, refreshAt - now);
return () => clearTimeout(timeoutId);
}, [token]);
Backend refresh endpoint:
from fastapi import Cookie, HTTPException
@app.post("/api/auth/refresh")
async def refresh_token(
refresh_token: str = Cookie(None, httponly=True, samesite='strict')
):
if not refresh_token:
raise HTTPException(status_code=401, detail="No refresh token")
# Validate refresh token
try:
payload = jwt.decode(refresh_token, SECRET_KEY, algorithms=["HS256"])
user_id = payload["sub"]
except jwt.ExpiredSignatureError:
raise HTTPException(status_code=401, detail="Refresh token expired")
except jwt.InvalidTokenError:
raise HTTPException(status_code=401, detail="Invalid refresh token")
# Issue new access token (15-minute expiry)
access_token = create_jwt(user_id, expiration_minutes=15)
return {"access_token": access_token, "token_type": "bearer"}
Key points:
- Memory storage = token lost on page refresh (re-login required) β acceptable for MVP
- HttpOnly cookies cannot be accessed by JavaScript (XSS protection)
- Set
SameSite=stricton refresh token cookie (CSRF protection) - Refresh token rotation: issue new refresh token on each refresh (advanced security)
- Use
credentials: 'include'in fetch to send HttpOnly cookies - Parse JWT client-side to schedule refresh (or use server-sent expiry hint)
Security checklist:
- β Access token in memory (XSS-resistant)
- β Refresh token in HttpOnly cookie (XSS-resistant)
- β SameSite=strict on cookies (CSRF-resistant)
- β HTTPS required (prevent MITM)
- β Short access token expiry (limit blast radius)
- β Token refresh before expiry (seamless UX)
- β Logout clears both tokens
MVP vs Production tradeoff:
- MVP: 90-day JWT in memory β simpler, acceptable for hackathon/PoC
- Production: 15-min access + 90-day refresh β better security, more complex
References:
- JWT storage best practices: https://www.descope.com/blog/post/developer-guide-jwt-storage
- HttpOnly cookies vs localStorage: https://www.wisp.blog/blog/understanding-token-storage-local-storage-vs-httponly-cookies
- React authentication patterns: https://marmelab.com/blog/2020/07/02/manage-your-jwt-react-admin-authentication-in-memory.html
Summary of Key Decisions
| Topic | Decision | Primary Rationale |
|---|---|---|
| FastMCP Auth | Bearer token with JWT validation | Native FastMCP support, minimal config, standard-compliant |
| HF OAuth | attach_huggingface_oauth + parse_huggingface_oauth |
Zero-config, local dev friendly, official HF support |
| SQLite Schema | FTS5 for full-text + separate tables for tags/links | Performance, per-user isolation, optimized query patterns |
| Wikilink Resolution | Case-insensitive slug matching + same-folder preference | Obsidian compatibility, user-friendly, deterministic |
| Directory Tree | shadcn-extension Tree View with virtualization | Only shadcn option with virtualization for 5K+ notes |
| Optimistic Concurrency | Version counter in SQLite + if_version param |
Simple, fast, HTTP-friendly, no content hashing overhead |
| Frontmatter Parsing | python-frontmatter + fallback to no frontmatter |
Graceful degradation, user-friendly error handling |
| JWT Management | Memory storage (MVP) or memory + HttpOnly cookie (prod) | XSS protection, industry best practice (2025) |
Next Steps
With research complete, proceed to Phase 1: Data Model & Contracts:
- Create
data-model.mdwith detailed Pydantic models and SQLite schemas - Create
contracts/http-api.yamlwith OpenAPI 3.1 specification - Create
contracts/mcp-tools.jsonwith MCP tool schemas (JSON Schema format) - Create
quickstart.mdwith setup instructions and testing workflows
After Phase 1, run /speckit.tasks to generate dependency-ordered implementation tasks.