Spaces:
Running
Data Model: Multi-Tenant Obsidian-Like Docs Viewer
Feature Branch: 001-obsidian-docs-viewer
Created: 2025-11-15
Status: Draft
Table of Contents
- Overview
- Entity Relationship Diagram
- Core Entities
- Index Entities
- Authentication Entities
- SQLite Schema
- Pydantic Models
- TypeScript Type Definitions
- Validation Rules
- State Transitions
- Relationships and Constraints
Overview
This document defines the complete data model for a multi-tenant Obsidian-like documentation viewer. The system stores:
- User accounts with HF OAuth identity mapping
- Vaults as per-user directory trees containing Markdown notes
- Notes with YAML frontmatter, version tracking, and full-text indexing
- Wikilinks for bidirectional linking between notes
- Tags for categorization and filtering
- Index metadata for search optimization and health monitoring
Design principles:
- Per-user isolation: All data scoped by
user_id - Filesystem-backed: Notes stored as
.mdfiles, metadata in SQLite - Version-controlled: Integer version counter for optimistic concurrency
- Search-optimized: SQLite FTS5 for full-text search, separate indexes for tags and links
Entity Relationship Diagram
erDiagram
USER ||--o{ VAULT : owns
VAULT ||--o{ NOTE : contains
NOTE ||--o{ WIKILINK : has_outgoing
NOTE ||--o{ WIKILINK : has_incoming
NOTE ||--o{ TAG : tagged_with
USER ||--|| INDEX_HEALTH : tracks
USER ||--o{ TOKEN : issues
USER {
string user_id PK
string hf_username
string hf_name
string hf_avatar_url
datetime created
}
VAULT {
string user_id FK
string root_path
int note_count
}
NOTE {
string user_id FK
string note_path PK
int version
string title
json metadata
string body
datetime created
datetime updated
int size_bytes
}
WIKILINK {
string user_id FK
string source_path FK
string target_path FK
string link_text
bool is_resolved
}
TAG {
string user_id FK
string tag_name
string note_path FK
}
INDEX_HEALTH {
string user_id PK
int note_count
datetime last_full_rebuild
datetime last_incremental_update
}
TOKEN {
string token_id PK
string user_id FK
datetime issued_at
datetime expires_at
string token_type
}
Key relationships:
- One user owns one vault (1:1)
- One vault contains many notes (1:N)
- One note has many outgoing wikilinks (1:N)
- One note may be referenced by many backlinks (1:N)
- One note can have many tags (N:M via junction table)
- One user has one index health record (1:1)
- One user can issue many tokens (1:N)
Core Entities
User
Represents an authenticated user with HF OAuth identity.
Attributes:
user_id(string, PK): Internal unique identifier, derived from HF username or UUIDhf_username(string, nullable): HuggingFace username (e.g., "alice")hf_name(string, nullable): Display name from HF profilehf_avatar_url(string, nullable): Profile picture URLcreated(datetime): Account creation timestamp (ISO 8601)
Notes:
- In local mode,
user_id = "local-dev"with null HF fields - In HF Space mode,
user_id = hf_username(normalized to lowercase) createdtimestamp set on first OAuth login (vault initialization)
Lifecycle:
- User authenticates via HF OAuth
- Backend maps HF identity to
user_id - If new user, create vault directory and initialize index
- Return user info to frontend
Vault
A per-user directory tree containing Markdown notes.
Attributes:
user_id(string, FK): Owner of the vaultroot_path(string): Absolute filesystem path to vault root (e.g.,/data/vaults/alice/)note_count(int): Cached count of notes in vault (denormalized from index)
Constraints:
- Max 5,000 notes per vault (enforced by FR-008)
- Root path must exist and be writable
- Directory structure is arbitrary (user-defined nested folders)
Filesystem layout example:
/data/vaults/alice/
βββ README.md
βββ api/
β βββ design.md
β βββ endpoints.md
βββ guides/
β βββ setup.md
β βββ deployment.md
βββ notes/
βββ meeting-2025-01-15.md
Note
A Markdown file with optional YAML frontmatter and body content.
Attributes:
user_id(string, FK): Owner of the notenote_path(string, PK): Relative path to vault root, includes.md(e.g.,api/design.md)version(int): Optimistic concurrency version counter (starts at 1, increments on write)title(string): Display title (from frontmatter, first H1, or filename stem)metadata(JSON): Frontmatter key-value pairs (excludes auto-managed fields)body(string): Markdown content (excluding frontmatter)created(datetime, ISO 8601): Creation timestamp (auto-set if not in frontmatter)updated(datetime, ISO 8601): Last modification timestamp (auto-set on every write)size_bytes(int): UTF-8 byte size of full file content (frontmatter + body)
Constraints:
note_pathmax 256 characters, Unix-style separators (/), no..allowedsize_bytesmax 1 MiB (1,048,576 bytes) per FR-007versionstored in index, NOT in frontmattercreatedandupdatedstored in index, MAY appear in frontmatter (frontmatter is source of truth on read)
Title resolution priority (FR-006):
metadata.get('title')from frontmatter- First
# Headingin body (H1 only) - Filename stem (e.g.,
design.mdβ "design")
Metadata fields (common, but arbitrary):
tags(array of strings): Tag names for categorizationproject(string): Project identifiercreated(datetime): User-provided creation timestampupdated(datetime): User-provided update timestamp- Custom fields allowed (JSON object)
Index Entities
Wikilink
Represents a bidirectional link between two notes.
Attributes:
user_id(string, FK): Owner of the notessource_path(string, FK): Path of note containing the wikilinktarget_path(string, nullable, FK): Resolved path of linked note (null if unresolved)link_text(string): Original text from[[link text]]is_resolved(bool): True iftarget_pathis non-null, false if broken link
Extraction:
- Regex pattern:
\[\[([^\]]+)\]\] - Extract all matches from note body on every write
Resolution algorithm (FR-015, FR-016):
- Normalize
link_textto slug: lowercase, replace spaces/underscores with dash, strip non-alphanumeric - Match normalized slug against:
- Normalized filename stems (e.g.,
api-design.mdβ "api-design") - Normalized frontmatter titles (e.g.,
title: "API Design"β "api-design")
- Normalized filename stems (e.g.,
- If multiple matches:
- Prefer same-folder match (e.g.,
api/[[design]]βapi/design.mdoverguides/design.md) - Tiebreaker: lexicographically smallest path
- Prefer same-folder match (e.g.,
- If no match:
target_path = null,is_resolved = false
Slug normalization function:
import re
def normalize_slug(text: str) -> str:
text = text.lower()
text = re.sub(r'[\s_]+', '-', text) # Spaces/underscores β dash
text = re.sub(r'[^a-z0-9-]', '', text) # Keep alphanumeric + dash
text = re.sub(r'-+', '-', text) # Collapse dashes
return text.strip('-')
Backlinks:
- To get backlinks for
note_path, query:WHERE target_path = note_path - Backlinks are automatically updated when any note's wikilinks change
Tag
A metadata label applied to notes for categorization.
Attributes:
user_id(string, FK): Owner of the notesnote_path(string, FK): Path of tagged notetag_name(string): Tag identifier (lowercase, alphanumeric + hyphens)
Constraints:
- Many-to-many relationship: one note can have multiple tags, one tag can apply to multiple notes
- Tag names normalized: lowercase, strip whitespace
- Extracted from frontmatter
tags: [tag1, tag2]array
Tag count:
- Computed via
COUNT(DISTINCT note_path) GROUP BY tag_name - Used for tag cloud and filtering UI
Index Health
Tracks the state and freshness of per-user indices.
Attributes:
user_id(string, PK): Owner of the indexnote_count(int): Total number of notes indexedlast_full_rebuild(datetime, nullable, ISO 8601): Timestamp of last full index rebuildlast_incremental_update(datetime, nullable, ISO 8601): Timestamp of last incremental update (write/delete)
Usage:
- Displayed in UI as index health indicator
- Used to detect stale indices (e.g.,
note_countmismatch with actual file count) - Manual rebuild sets
last_full_rebuild = now()(FR-019) - Every write/delete sets
last_incremental_update = now()(FR-018)
Authentication Entities
Token (JWT)
A signed JSON Web Token used for API and MCP authentication.
JWT Claims (payload):
sub(string): Subject (user_id)iat(int): Issued at timestamp (Unix epoch)exp(int): Expiration timestamp (Unix epoch, iat + 90 days)
Header:
alg: "HS256": HMAC SHA-256 signature algorithmtyp: "JWT": Token type
Signature:
- Signed with server secret (env var
JWT_SECRET_KEY) - Validated on every API/MCP request via
Authorization: Bearer <token>header
Token lifecycle:
- User authenticates via HF OAuth
- User calls
POST /api/tokensto issue JWT - Frontend stores token in memory (React context)
- MCP clients pass token to
authparameter (FastMCP HTTP transport) - Server validates token on every request, extracts
user_idfromsubclaim - Token expires after 90 days, user must re-authenticate
Example token:
{
"header": {
"alg": "HS256",
"typ": "JWT"
},
"payload": {
"sub": "alice",
"iat": 1736956800,
"exp": 1744732800
},
"signature": "<HMAC-SHA256-signature>"
}
Encoded: eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJhbGljZSIsImlhdCI6MTczNjk1NjgwMCwiZXhwIjoxNzQ0NzMyODAwfQ.<signature>
SQLite Schema
Complete DDL for multi-index storage with per-user isolation.
Core Tables
note_metadata
Stores note metadata for fast lookups and version tracking.
CREATE TABLE IF NOT EXISTS note_metadata (
user_id TEXT NOT NULL,
note_path TEXT NOT NULL,
version INTEGER NOT NULL DEFAULT 1,
title TEXT NOT NULL,
created TEXT NOT NULL, -- ISO 8601 timestamp
updated TEXT NOT NULL, -- ISO 8601 timestamp
size_bytes INTEGER NOT NULL DEFAULT 0,
normalized_title_slug TEXT, -- Pre-computed for wikilink resolution
normalized_path_slug TEXT, -- Pre-computed for wikilink resolution
PRIMARY KEY (user_id, note_path)
);
CREATE INDEX idx_metadata_user ON note_metadata(user_id);
CREATE INDEX idx_metadata_updated ON note_metadata(user_id, updated DESC);
CREATE INDEX idx_metadata_title_slug ON note_metadata(user_id, normalized_title_slug);
CREATE INDEX idx_metadata_path_slug ON note_metadata(user_id, normalized_path_slug);
Notes:
- Composite primary key:
(user_id, note_path) versionstarts at 1, increments on every writenormalized_*_slugcolumns enable O(1) wikilink resolution- Index on
updated DESCfor recency-based sorting
note_fts
Full-text search index using SQLite FTS5.
CREATE VIRTUAL TABLE IF NOT EXISTS note_fts USING fts5(
user_id UNINDEXED,
note_path UNINDEXED,
title,
body,
content='', -- Contentless (external content pattern)
tokenize='porter unicode61', -- Stemming + Unicode support
prefix='2 3' -- Prefix indexes for autocomplete
);
Notes:
content=''(contentless): We manually INSERT/DELETE rows, no automatic syncUNINDEXEDcolumns are retrievable but not searchable (used for IDs)portertokenizer: English stemming (e.g., "running" matches "run")prefix='2 3': Enables fastMATCH 'prefix*'queries (2-char and 3-char prefixes)- Manual row management: On write,
DELETEold row +INSERTnew row - Ranking: Use
bm25(note_fts, 3.0, 1.0)for title weight=3x, body weight=1x
Query example:
SELECT
note_path,
title,
bm25(note_fts, 3.0, 1.0) AS rank
FROM note_fts
WHERE user_id = ? AND note_fts MATCH ?
ORDER BY rank DESC
LIMIT 50;
Safety: Incoming queries are tokenized into alphanumeric terms (per requirement to split on non-alphanumeric characters), each optionally preserving a trailing * for prefix searches, then wrapped in double quotes before being passed to MATCH. This neutralizes MATCH operators, trims punctuation such as apostrophes/ampersands, and prevents SQL syntax errors while preserving simple keyword semantics.
note_tags
Many-to-many junction table for note-tag relationships.
CREATE TABLE IF NOT EXISTS note_tags (
user_id TEXT NOT NULL,
note_path TEXT NOT NULL,
tag TEXT NOT NULL,
PRIMARY KEY (user_id, note_path, tag)
);
CREATE INDEX idx_tags_user_tag ON note_tags(user_id, tag);
CREATE INDEX idx_tags_user_path ON note_tags(user_id, note_path);
Notes:
- Composite primary key prevents duplicate tag assignments
- Index on
(user_id, tag)for "all notes with tag X" queries - Index on
(user_id, note_path)for "all tags for note Y" queries
Query examples:
-- Get all notes with tag "backend"
SELECT DISTINCT note_path, title
FROM note_tags t
JOIN note_metadata m USING (user_id, note_path)
WHERE t.user_id = ? AND t.tag = ?
ORDER BY m.updated DESC;
-- Get tag counts for user
SELECT tag, COUNT(DISTINCT note_path) as count
FROM note_tags
WHERE user_id = ?
GROUP BY tag
ORDER BY count DESC;
note_links
Stores wikilink graph for backlink navigation.
CREATE TABLE IF NOT EXISTS note_links (
user_id TEXT NOT NULL,
source_path TEXT NOT NULL,
target_path TEXT, -- NULL if unresolved
link_text TEXT NOT NULL,
is_resolved INTEGER NOT NULL DEFAULT 0, -- Boolean: 0=broken, 1=resolved
PRIMARY KEY (user_id, source_path, link_text)
);
CREATE INDEX idx_links_user_source ON note_links(user_id, source_path);
CREATE INDEX idx_links_user_target ON note_links(user_id, target_path);
CREATE INDEX idx_links_unresolved ON note_links(user_id, is_resolved);
Notes:
target_pathis nullable (null = broken link)is_resolvedis integer (0 or 1) for SQLite boolean representation- Composite primary key prevents duplicate links from same source with same text
- Index on
target_pathenables fast backlink queries
Query examples:
-- Get backlinks for a note
SELECT DISTINCT l.source_path, m.title
FROM note_links l
JOIN note_metadata m ON l.user_id = m.user_id AND l.source_path = m.note_path
WHERE l.user_id = ? AND l.target_path = ?
ORDER BY m.updated DESC;
-- Get all unresolved links for user
SELECT source_path, link_text
FROM note_links
WHERE user_id = ? AND is_resolved = 0;
index_health
Tracks index state and freshness per user.
CREATE TABLE IF NOT EXISTS index_health (
user_id TEXT PRIMARY KEY,
note_count INTEGER NOT NULL DEFAULT 0,
last_full_rebuild TEXT, -- ISO 8601 timestamp
last_incremental_update TEXT -- ISO 8601 timestamp
);
Notes:
- One row per user
last_full_rebuildset on manual rebuild (FR-042)last_incremental_updateset on every write/delete (FR-018)note_countis denormalized cache for quick health checks
Initialization Script
Complete schema initialization:
-- Enable FTS5 extension (usually built-in)
-- PRAGMA compile_options; -- Check if FTS5 is available
BEGIN TRANSACTION;
-- Core metadata table
CREATE TABLE IF NOT EXISTS note_metadata (
user_id TEXT NOT NULL,
note_path TEXT NOT NULL,
version INTEGER NOT NULL DEFAULT 1,
title TEXT NOT NULL,
created TEXT NOT NULL,
updated TEXT NOT NULL,
size_bytes INTEGER NOT NULL DEFAULT 0,
normalized_title_slug TEXT,
normalized_path_slug TEXT,
PRIMARY KEY (user_id, note_path)
);
CREATE INDEX IF NOT EXISTS idx_metadata_user ON note_metadata(user_id);
CREATE INDEX IF NOT EXISTS idx_metadata_updated ON note_metadata(user_id, updated DESC);
CREATE INDEX IF NOT EXISTS idx_metadata_title_slug ON note_metadata(user_id, normalized_title_slug);
CREATE INDEX IF NOT EXISTS idx_metadata_path_slug ON note_metadata(user_id, normalized_path_slug);
-- Full-text search index
CREATE VIRTUAL TABLE IF NOT EXISTS note_fts USING fts5(
user_id UNINDEXED,
note_path UNINDEXED,
title,
body,
content='',
tokenize='porter unicode61',
prefix='2 3'
);
-- Tag index
CREATE TABLE IF NOT EXISTS note_tags (
user_id TEXT NOT NULL,
note_path TEXT NOT NULL,
tag TEXT NOT NULL,
PRIMARY KEY (user_id, note_path, tag)
);
CREATE INDEX IF NOT EXISTS idx_tags_user_tag ON note_tags(user_id, tag);
CREATE INDEX IF NOT EXISTS idx_tags_user_path ON note_tags(user_id, note_path);
-- Link graph
CREATE TABLE IF NOT EXISTS note_links (
user_id TEXT NOT NULL,
source_path TEXT NOT NULL,
target_path TEXT,
link_text TEXT NOT NULL,
is_resolved INTEGER NOT NULL DEFAULT 0,
PRIMARY KEY (user_id, source_path, link_text)
);
CREATE INDEX IF NOT EXISTS idx_links_user_source ON note_links(user_id, source_path);
CREATE INDEX IF NOT EXISTS idx_links_user_target ON note_links(user_id, target_path);
CREATE INDEX IF NOT EXISTS idx_links_unresolved ON note_links(user_id, is_resolved);
-- Index health tracking
CREATE TABLE IF NOT EXISTS index_health (
user_id TEXT PRIMARY KEY,
note_count INTEGER NOT NULL DEFAULT 0,
last_full_rebuild TEXT,
last_incremental_update TEXT
);
COMMIT;
Pydantic Models
Python data models using Pydantic for validation and serialization.
User Models
from pydantic import BaseModel, Field
from datetime import datetime
from typing import Optional
class HFProfile(BaseModel):
"""HuggingFace OAuth profile information."""
username: str = Field(..., description="HF username")
name: Optional[str] = Field(None, description="Display name")
avatar_url: Optional[str] = Field(None, description="Profile picture URL")
class User(BaseModel):
"""User account with authentication info."""
user_id: str = Field(..., min_length=1, max_length=64, description="Internal user ID")
hf_profile: Optional[HFProfile] = Field(None, description="HF OAuth profile")
vault_path: str = Field(..., description="Absolute path to user's vault")
created: datetime = Field(..., description="Account creation timestamp")
class Config:
json_schema_extra = {
"example": {
"user_id": "alice",
"hf_profile": {
"username": "alice",
"name": "Alice Smith",
"avatar_url": "https://cdn-avatars.huggingface.co/v1/alice"
},
"vault_path": "/data/vaults/alice",
"created": "2025-01-15T10:30:00Z"
}
}
Note Models
from pathlib import Path
import re
class NoteMetadata(BaseModel):
"""Frontmatter metadata (arbitrary key-value pairs)."""
title: Optional[str] = None
tags: Optional[list[str]] = None
project: Optional[str] = None
created: Optional[datetime] = None
updated: Optional[datetime] = None
class Config:
extra = "allow" # Allow arbitrary fields
class Note(BaseModel):
"""Complete note with content and metadata."""
user_id: str = Field(..., description="Owner user ID")
note_path: str = Field(
...,
min_length=1,
max_length=256,
description="Relative path to vault root (includes .md)"
)
version: int = Field(..., ge=1, description="Optimistic concurrency version")
title: str = Field(..., min_length=1, description="Display title")
metadata: NoteMetadata = Field(default_factory=NoteMetadata, description="Frontmatter")
body: str = Field(..., description="Markdown content")
created: datetime = Field(..., description="Creation timestamp")
updated: datetime = Field(..., description="Last update timestamp")
size_bytes: int = Field(..., ge=0, le=1_048_576, description="File size in bytes")
@validator("note_path")
def validate_path(cls, v):
"""Validate note path format."""
# Must end with .md
if not v.endswith('.md'):
raise ValueError("Note path must end with .md")
# Must not contain ..
if '..' in v:
raise ValueError("Note path must not contain '..'")
# Must use Unix-style separators
if '\\' in v:
raise ValueError("Note path must use Unix-style separators (/)")
# Must not start with /
if v.startswith('/'):
raise ValueError("Note path must be relative (no leading /)")
return v
class Config:
json_schema_extra = {
"example": {
"user_id": "alice",
"note_path": "api/design.md",
"version": 5,
"title": "API Design",
"metadata": {
"tags": ["backend", "api"],
"project": "auth-service"
},
"body": "# API Design\n\nThis document describes...",
"created": "2025-01-10T09:00:00Z",
"updated": "2025-01-15T14:30:00Z",
"size_bytes": 4096
}
}
class NoteCreate(BaseModel):
"""Request to create a new note."""
note_path: str = Field(..., min_length=1, max_length=256)
title: Optional[str] = None
metadata: Optional[NoteMetadata] = None
body: str = Field(..., max_length=1_048_576)
class NoteUpdate(BaseModel):
"""Request to update an existing note."""
title: Optional[str] = None
metadata: Optional[NoteMetadata] = None
body: str = Field(..., max_length=1_048_576)
if_version: Optional[int] = Field(None, ge=1, description="Expected version for concurrency check")
class NoteSummary(BaseModel):
"""Lightweight note summary for listings."""
note_path: str
title: str
updated: datetime
Index Models
class Wikilink(BaseModel):
"""Bidirectional link between notes."""
user_id: str
source_path: str
target_path: Optional[str] = Field(None, description="Null if unresolved")
link_text: str
is_resolved: bool
class Config:
json_schema_extra = {
"example": {
"user_id": "alice",
"source_path": "api/design.md",
"target_path": "api/endpoints.md",
"link_text": "Endpoints",
"is_resolved": True
}
}
class Tag(BaseModel):
"""Tag with note count."""
tag_name: str
count: int = Field(..., ge=0)
class IndexHealth(BaseModel):
"""Index state and freshness metrics."""
user_id: str
note_count: int = Field(..., ge=0)
last_full_rebuild: Optional[datetime] = None
last_incremental_update: Optional[datetime] = None
class Config:
json_schema_extra = {
"example": {
"user_id": "alice",
"note_count": 142,
"last_full_rebuild": "2025-01-01T00:00:00Z",
"last_incremental_update": "2025-01-15T14:30:00Z"
}
}
Search Models
class SearchResult(BaseModel):
"""Full-text search result with snippet."""
note_path: str
title: str
snippet: str = Field(..., description="Highlighted excerpt from body")
score: float = Field(..., description="Relevance score (title 3x, body 1x, recency bonus)")
updated: datetime
class SearchRequest(BaseModel):
"""Full-text search query."""
query: str = Field(..., min_length=1, max_length=256)
limit: int = Field(50, ge=1, le=100)
Authentication Models
class TokenResponse(BaseModel):
"""JWT token issuance response."""
token: str = Field(..., description="JWT access token")
token_type: str = Field("bearer", description="Token type")
expires_at: datetime = Field(..., description="Expiration timestamp")
class JWTPayload(BaseModel):
"""JWT claims payload."""
sub: str = Field(..., description="Subject (user_id)")
iat: int = Field(..., description="Issued at (Unix timestamp)")
exp: int = Field(..., description="Expiration (Unix timestamp)")
TypeScript Type Definitions
Frontend type definitions for API contracts.
Core Types
/**
* User account with HF profile
*/
export interface User {
user_id: string;
hf_profile?: {
username: string;
name?: string;
avatar_url?: string;
};
vault_path: string;
created: string; // ISO 8601
}
/**
* Note metadata (frontmatter)
*/
export interface NoteMetadata {
title?: string;
tags?: string[];
project?: string;
created?: string; // ISO 8601
updated?: string; // ISO 8601
[key: string]: unknown; // Arbitrary fields
}
/**
* Complete note with content
*/
export interface Note {
user_id: string;
note_path: string;
version: number;
title: string;
metadata: NoteMetadata;
body: string;
created: string; // ISO 8601
updated: string; // ISO 8601
size_bytes: number;
}
/**
* Lightweight note summary for listings
*/
export interface NoteSummary {
note_path: string;
title: string;
updated: string; // ISO 8601
}
/**
* Request to create a note
*/
export interface NoteCreateRequest {
note_path: string;
title?: string;
metadata?: NoteMetadata;
body: string;
}
/**
* Request to update a note
*/
export interface NoteUpdateRequest {
title?: string;
metadata?: NoteMetadata;
body: string;
if_version?: number; // Optimistic concurrency
}
/**
* Wikilink with resolution status
*/
export interface Wikilink {
user_id: string;
source_path: string;
target_path: string | null; // Null if unresolved
link_text: string;
is_resolved: boolean;
}
/**
* Tag with note count
*/
export interface Tag {
tag_name: string;
count: number;
}
/**
* Index health metrics
*/
export interface IndexHealth {
user_id: string;
note_count: number;
last_full_rebuild: string | null; // ISO 8601
last_incremental_update: string | null; // ISO 8601
}
/**
* Search result with snippet
*/
export interface SearchResult {
note_path: string;
title: string;
snippet: string;
score: number;
updated: string; // ISO 8601
}
/**
* JWT token response
*/
export interface TokenResponse {
token: string;
token_type: "bearer";
expires_at: string; // ISO 8601
}
/**
* API error response
*/
export interface APIError {
error: string;
message: string;
detail?: Record<string, unknown>;
}
Validation Helpers
/**
* Validate note path format
*/
export function isValidNotePath(path: string): boolean {
return (
path.length > 0 &&
path.length <= 256 &&
path.endsWith('.md') &&
!path.includes('..') &&
!path.includes('\\') &&
!path.startsWith('/')
);
}
/**
* Normalize tag name (lowercase, trim)
*/
export function normalizeTag(tag: string): string {
return tag.toLowerCase().trim();
}
/**
* Normalize slug for wikilink resolution
*/
export function normalizeSlug(text: string): string {
return text
.toLowerCase()
.replace(/[\s_]+/g, '-') // Spaces/underscores β dash
.replace(/[^a-z0-9-]/g, '') // Keep alphanumeric + dash
.replace(/-+/g, '-') // Collapse dashes
.replace(/^-+|-+$/g, ''); // Trim dashes
}
/**
* Extract wikilinks from markdown body
*/
export function extractWikilinks(markdown: string): string[] {
const pattern = /\[\[([^\]]+)\]\]/g;
const matches: string[] = [];
let match;
while ((match = pattern.exec(markdown)) !== null) {
matches.push(match[1]);
}
return matches;
}
Validation Rules
Comprehensive validation constraints for all entities.
Note Path Validation
import re
from pathlib import Path
def validate_note_path(path: str) -> tuple[bool, str]:
"""
Validate note path format.
Returns (is_valid, error_message).
"""
# Length check
if not path or len(path) > 256:
return False, "Path must be 1-256 characters"
# Must end with .md
if not path.endswith('.md'):
return False, "Path must end with .md"
# Must not contain ..
if '..' in path:
return False, "Path must not contain '..'"
# Must use Unix-style separators
if '\\' in path:
return False, "Path must use Unix separators (/)"
# Must be relative
if path.startswith('/'):
return False, "Path must be relative (no leading /)"
# Must not have invalid characters
invalid_chars = ['<', '>', ':', '"', '|', '?', '*']
if any(c in path for c in invalid_chars):
return False, f"Path contains invalid characters: {invalid_chars}"
return True, ""
def sanitize_path(user_id: str, vault_root: str, note_path: str) -> Path:
"""
Sanitize and resolve note path within vault.
Raises ValueError if path escapes vault root.
"""
vault = Path(vault_root) / user_id
full_path = (vault / note_path).resolve()
# Ensure path is within vault
if not str(full_path).startswith(str(vault.resolve())):
raise ValueError(f"Path escapes vault root: {note_path}")
return full_path
Note Content Validation
def validate_note_content(body: str) -> tuple[bool, str]:
"""
Validate note content.
Returns (is_valid, error_message).
"""
# Size check (1 MiB max)
size_bytes = len(body.encode('utf-8'))
if size_bytes > 1_048_576:
return False, f"Note exceeds 1 MiB limit ({size_bytes} bytes)"
# UTF-8 validity
try:
body.encode('utf-8')
except UnicodeEncodeError as e:
return False, f"Invalid UTF-8 encoding: {e}"
return True, ""
def validate_frontmatter(metadata: dict) -> tuple[bool, str]:
"""
Validate frontmatter metadata.
Returns (is_valid, error_message).
"""
# Check for reserved fields
reserved = ['version'] # Version is managed by index, not frontmatter
for key in metadata.keys():
if key in reserved:
return False, f"Field '{key}' is reserved and cannot be set in frontmatter"
# Validate tags format
if 'tags' in metadata:
tags = metadata['tags']
if not isinstance(tags, list):
return False, "Field 'tags' must be an array"
if not all(isinstance(t, str) for t in tags):
return False, "All tags must be strings"
return True, ""
Vault Limits
def check_vault_limit(user_id: str, db) -> tuple[bool, str]:
"""
Check if vault is within note limit.
Returns (is_allowed, error_message).
"""
cursor = db.execute(
"SELECT note_count FROM index_health WHERE user_id = ?",
(user_id,)
)
row = cursor.fetchone()
if row is None:
return True, "" # New vault, no limit yet
note_count = row[0]
if note_count >= 5000:
return False, "Vault note limit exceeded (max 5,000 notes)"
return True, ""
Token Validation
import jwt
from datetime import datetime, timedelta
SECRET_KEY = "your-secret-key" # From env var
def create_jwt(user_id: str) -> str:
"""Create JWT with 90-day expiration."""
now = datetime.utcnow()
payload = {
"sub": user_id,
"iat": int(now.timestamp()),
"exp": int((now + timedelta(days=90)).timestamp())
}
return jwt.encode(payload, SECRET_KEY, algorithm="HS256")
def validate_jwt(token: str) -> tuple[bool, str, str]:
"""
Validate JWT and extract user_id.
Returns (is_valid, user_id, error_message).
"""
try:
payload = jwt.decode(token, SECRET_KEY, algorithms=["HS256"])
user_id = payload["sub"]
return True, user_id, ""
except jwt.ExpiredSignatureError:
return False, "", "Token expired"
except jwt.InvalidTokenError as e:
return False, "", f"Invalid token: {e}"
State Transitions
State machines for entity lifecycle and version management.
Note Lifecycle
stateDiagram-v2
[*] --> Creating: write_note (new path)
Creating --> Active: save to filesystem + insert metadata
Active --> Updating: write_note (existing path)
Updating --> Active: increment version + update index
Active --> Deleting: delete_note
Deleting --> [*]: remove file + delete index rows
Updating --> ConflictDetected: if_version mismatch (UI writes only)
ConflictDetected --> Active: reload or discard changes
State descriptions:
Creating: Note does not exist, first write in progress
- Validate path and content
- Set
version = 1,created = now(),updated = now() - Write file to filesystem
- Insert rows into
note_metadata,note_fts,note_tags,note_links
Active: Note exists and is readable/editable
- Can be read via API/MCP
- Can be updated via write_note
- Can be deleted via delete_note
Updating: Modification in progress
- Load current metadata (version, timestamps)
- If UI write: check
if_versionmatches current version - If version mismatch: transition to ConflictDetected
- If MCP write: skip version check (last-write-wins)
- Increment
version, setupdated = now() - Update file content
- Update all index rows (delete old, insert new)
ConflictDetected: Optimistic concurrency conflict (UI only)
- Return
409 Conflictwith current and expected versions - UI displays error: "Note changed since you opened it"
- User options: reload, save as copy, or discard changes
- Return
Deleting: Removal in progress
- Delete file from filesystem
- Delete rows from
note_metadata,note_fts,note_tags,note_links - Update backlinks (any note linking to deleted note now has unresolved link)
- Decrement
index_health.note_count
Version Increment Logic
def increment_version(user_id: str, note_path: str, if_version: int | None, db) -> int:
"""
Increment note version with optional concurrency check.
Returns new version number.
Raises ConflictError if if_version doesn't match.
"""
# Get current version
cursor = db.execute(
"SELECT version FROM note_metadata WHERE user_id = ? AND note_path = ?",
(user_id, note_path)
)
row = cursor.fetchone()
if row is None:
# New note
return 1
current_version = row[0]
# Optimistic concurrency check (UI writes only)
if if_version is not None and current_version != if_version:
raise ConflictError(
f"Version conflict: expected {if_version}, current is {current_version}"
)
# Increment version
new_version = current_version + 1
return new_version
Index Update Workflow
stateDiagram-v2
[*] --> IncrementalUpdate: write_note or delete_note
IncrementalUpdate --> DeleteOldRows: start transaction
DeleteOldRows --> ExtractMetadata: parse frontmatter + body
ExtractMetadata --> InsertNewRows: insert into all index tables
InsertNewRows --> UpdateHealth: set last_incremental_update
UpdateHealth --> [*]: commit transaction
[*] --> FullRebuild: POST /api/index/rebuild
FullRebuild --> DropUserRows: delete all rows for user_id
DropUserRows --> ScanVault: walk filesystem tree
ScanVault --> ProcessNote: for each .md file
ProcessNote --> ExtractMetadata
ProcessNote --> ScanVault: next file
ScanVault --> UpdateHealth: set last_full_rebuild
UpdateHealth --> [*]: commit transaction
Incremental update (on every write/delete):
- Start SQLite transaction
- Delete all existing rows for
(user_id, note_path)from:note_metadatanote_ftsnote_tagsnote_links
- Parse note content (frontmatter + body)
- Extract: title, tags, wikilinks
- Insert new rows into all index tables
- Resolve wikilinks and update
is_resolvedflags - Update
index_health.last_incremental_update = now() - Commit transaction
Full rebuild (manual trigger):
- Start SQLite transaction
- Delete all rows for
user_idfrom all index tables - Walk vault directory tree, find all
.mdfiles - For each file:
- Parse frontmatter + body
- Extract metadata, tags, wikilinks
- Insert rows into all index tables
- Resolve all wikilinks (second pass after all notes indexed)
- Update
index_health.note_countandlast_full_rebuild = now() - Commit transaction
Relationships and Constraints
Foreign Key Relationships
While SQLite supports foreign keys, we don't enforce them for performance reasons (multi-tenant with user-scoped queries). Instead, we rely on application-level referential integrity.
Logical relationships:
note_metadata.user_idβUser.user_idnote_tags.note_pathβnote_metadata.note_pathnote_links.source_pathβnote_metadata.note_pathnote_links.target_pathβnote_metadata.note_path(nullable)
Cascade semantics (application-enforced):
- On delete note: cascade delete from
note_tags,note_links(source),note_fts - On delete note: update
note_links(target) to setis_resolved = false
Uniqueness Constraints
| Table | Unique Constraint | Enforced By |
|---|---|---|
note_metadata |
(user_id, note_path) |
PRIMARY KEY |
note_tags |
(user_id, note_path, tag) |
PRIMARY KEY |
note_links |
(user_id, source_path, link_text) |
PRIMARY KEY |
index_health |
user_id |
PRIMARY KEY |
Cardinality
| Relationship | Type | Notes |
|---|---|---|
| User β Vault | 1:1 | One user owns one vault |
| Vault β Notes | 1:N | One vault contains many notes (max 5,000) |
| Note β Tags | N:M | Many-to-many via note_tags junction table |
| Note β Outgoing Links | 1:N | One note has many outgoing wikilinks |
| Note β Backlinks | 1:N | One note may be referenced by many backlinks |
| User β Tokens | 1:N | One user can issue multiple JWT tokens |
Invariants
Critical invariants maintained by the system:
- Version monotonicity:
note.versiononly increases (never decreases or resets) - Timestamp ordering:
note.created <= note.updatedalways - Path uniqueness: No two notes with same
(user_id, note_path)can exist - Size limit:
note.size_bytes <= 1_048_576always enforced - Vault limit:
COUNT(*) WHERE user_id = X <= 5000enforced before writes - Link consistency: If
note_links.target_pathis not null, target note must exist - Tag normalization: All
note_tags.tagvalues are lowercase - Index freshness:
index_health.last_incremental_updateis always >= most recentnote_metadata.updatedfor that user
Appendix
Common Queries Reference
-- Get all notes for user, sorted by recent update
SELECT note_path, title, updated
FROM note_metadata
WHERE user_id = ?
ORDER BY updated DESC
LIMIT 100;
-- Full-text search with title boost
SELECT
note_path,
title,
snippet(note_fts, 3, '<mark>', '</mark>', '...', 32) AS snippet,
bm25(note_fts, 3.0, 1.0) AS score
FROM note_fts
WHERE user_id = ? AND note_fts MATCH ?
ORDER BY score DESC
LIMIT 50;
-- Get all tags with counts
SELECT tag, COUNT(DISTINCT note_path) as count
FROM note_tags
WHERE user_id = ?
GROUP BY tag
ORDER BY count DESC;
-- Get backlinks for a note
SELECT DISTINCT l.source_path, m.title
FROM note_links l
JOIN note_metadata m ON l.user_id = m.user_id AND l.source_path = m.note_path
WHERE l.user_id = ? AND l.target_path = ?
ORDER BY m.updated DESC;
-- Get all unresolved wikilinks for a user
SELECT source_path, link_text, COUNT(*) as occurrences
FROM note_links
WHERE user_id = ? AND is_resolved = 0
GROUP BY source_path, link_text
ORDER BY occurrences DESC;
-- Check index health
SELECT note_count, last_full_rebuild, last_incremental_update
FROM index_health
WHERE user_id = ?;
Migration Strategy
For future schema changes:
# Example migration: Add column to note_metadata
def migrate_v1_to_v2(db):
"""Add normalized_title_slug column."""
db.execute("""
ALTER TABLE note_metadata
ADD COLUMN normalized_title_slug TEXT;
""")
# Backfill existing notes
db.execute("""
UPDATE note_metadata
SET normalized_title_slug = LOWER(
REPLACE(REPLACE(title, ' ', '-'), '_', '-')
);
""")
db.execute("""
CREATE INDEX idx_metadata_title_slug
ON note_metadata(user_id, normalized_title_slug);
""")
db.commit()
Document Status: Draft Last Updated: 2025-11-15 Next Review: After Phase 1 implementation