Spaces:

MCP-1st-Birthday
/

Vault.MCP

Running

App Files Files Community

Vault.MCP / specs /001-obsidian-docs-viewer /data-model.md

bigwolfe

sql query fully fixed

be2299f 27 days ago

preview code

raw

history blame contribute delete

41.4 kB

Data Model: Multi-Tenant Obsidian-Like Docs Viewer

Feature Branch: 001-obsidian-docs-viewer Created: 2025-11-15 Status: Draft

Overview
Entity Relationship Diagram
Core Entities
Index Entities
Authentication Entities
SQLite Schema
Pydantic Models
TypeScript Type Definitions
Validation Rules
State Transitions
Relationships and Constraints

Overview

This document defines the complete data model for a multi-tenant Obsidian-like documentation viewer. The system stores:

User accounts with HF OAuth identity mapping
Vaults as per-user directory trees containing Markdown notes
Notes with YAML frontmatter, version tracking, and full-text indexing
Wikilinks for bidirectional linking between notes
Tags for categorization and filtering
Index metadata for search optimization and health monitoring

Design principles:

Per-user isolation: All data scoped by user_id
Filesystem-backed: Notes stored as .md files, metadata in SQLite
Version-controlled: Integer version counter for optimistic concurrency
Search-optimized: SQLite FTS5 for full-text search, separate indexes for tags and links

Entity Relationship Diagram

erDiagram
    USER ||--o{ VAULT : owns
    VAULT ||--o{ NOTE : contains
    NOTE ||--o{ WIKILINK : has_outgoing
    NOTE ||--o{ WIKILINK : has_incoming
    NOTE ||--o{ TAG : tagged_with
    USER ||--|| INDEX_HEALTH : tracks
    USER ||--o{ TOKEN : issues

    USER {
        string user_id PK
        string hf_username
        string hf_name
        string hf_avatar_url
        datetime created
    }

    VAULT {
        string user_id FK
        string root_path
        int note_count
    }

    NOTE {
        string user_id FK
        string note_path PK
        int version
        string title
        json metadata
        string body
        datetime created
        datetime updated
        int size_bytes
    }

    WIKILINK {
        string user_id FK
        string source_path FK
        string target_path FK
        string link_text
        bool is_resolved
    }

    TAG {
        string user_id FK
        string tag_name
        string note_path FK
    }

    INDEX_HEALTH {
        string user_id PK
        int note_count
        datetime last_full_rebuild
        datetime last_incremental_update
    }

    TOKEN {
        string token_id PK
        string user_id FK
        datetime issued_at
        datetime expires_at
        string token_type
    }

Key relationships:

One user owns one vault (1:1)
One vault contains many notes (1:N)
One note has many outgoing wikilinks (1:N)
One note may be referenced by many backlinks (1:N)
One note can have many tags (N:M via junction table)
One user has one index health record (1:1)
One user can issue many tokens (1:N)

Core Entities

User

Represents an authenticated user with HF OAuth identity.

Attributes:

user_id (string, PK): Internal unique identifier, derived from HF username or UUID
hf_username (string, nullable): HuggingFace username (e.g., "alice")
hf_name (string, nullable): Display name from HF profile
hf_avatar_url (string, nullable): Profile picture URL
created (datetime): Account creation timestamp (ISO 8601)

Notes:

In local mode, user_id = "local-dev" with null HF fields
In HF Space mode, user_id = hf_username (normalized to lowercase)
created timestamp set on first OAuth login (vault initialization)

Lifecycle:

User authenticates via HF OAuth
Backend maps HF identity to user_id
If new user, create vault directory and initialize index
Return user info to frontend

Vault

A per-user directory tree containing Markdown notes.

Attributes:

user_id (string, FK): Owner of the vault
root_path (string): Absolute filesystem path to vault root (e.g., /data/vaults/alice/)
note_count (int): Cached count of notes in vault (denormalized from index)

Constraints:

Max 5,000 notes per vault (enforced by FR-008)
Root path must exist and be writable
Directory structure is arbitrary (user-defined nested folders)

Filesystem layout example:

/data/vaults/alice/
├── README.md
├── api/
│   ├── design.md
│   └── endpoints.md
├── guides/
│   ├── setup.md
│   └── deployment.md
└── notes/
    └── meeting-2025-01-15.md

Note

A Markdown file with optional YAML frontmatter and body content.

Attributes:

user_id (string, FK): Owner of the note
note_path (string, PK): Relative path to vault root, includes .md (e.g., api/design.md)
version (int): Optimistic concurrency version counter (starts at 1, increments on write)
title (string): Display title (from frontmatter, first H1, or filename stem)
metadata (JSON): Frontmatter key-value pairs (excludes auto-managed fields)
body (string): Markdown content (excluding frontmatter)
created (datetime, ISO 8601): Creation timestamp (auto-set if not in frontmatter)
updated (datetime, ISO 8601): Last modification timestamp (auto-set on every write)
size_bytes (int): UTF-8 byte size of full file content (frontmatter + body)

Constraints:

note_path max 256 characters, Unix-style separators (/), no .. allowed
size_bytes max 1 MiB (1,048,576 bytes) per FR-007
version stored in index, NOT in frontmatter
created and updated stored in index, MAY appear in frontmatter (frontmatter is source of truth on read)

Title resolution priority (FR-006):

metadata.get('title') from frontmatter
First # Heading in body (H1 only)
Filename stem (e.g., design.md → "design")

Metadata fields (common, but arbitrary):

tags (array of strings): Tag names for categorization
project (string): Project identifier
created (datetime): User-provided creation timestamp
updated (datetime): User-provided update timestamp
Custom fields allowed (JSON object)

Index Entities

Wikilink

Represents a bidirectional link between two notes.

Attributes:

user_id (string, FK): Owner of the notes
source_path (string, FK): Path of note containing the wikilink
target_path (string, nullable, FK): Resolved path of linked note (null if unresolved)
link_text (string): Original text from [[link text]]
is_resolved (bool): True if target_path is non-null, false if broken link

Extraction:

Regex pattern: \[\[([^\]]+)\]\]
Extract all matches from note body on every write

Resolution algorithm (FR-015, FR-016):

Normalize link_text to slug: lowercase, replace spaces/underscores with dash, strip non-alphanumeric
Match normalized slug against:
- Normalized filename stems (e.g., api-design.md → "api-design")
- Normalized frontmatter titles (e.g., title: "API Design" → "api-design")
If multiple matches:
- Prefer same-folder match (e.g., api/[[design]] → api/design.md over guides/design.md)
- Tiebreaker: lexicographically smallest path
If no match: target_path = null, is_resolved = false

Slug normalization function:

import re

def normalize_slug(text: str) -> str:
    text = text.lower()
    text = re.sub(r'[\s_]+', '-', text)  # Spaces/underscores → dash
    text = re.sub(r'[^a-z0-9-]', '', text)  # Keep alphanumeric + dash
    text = re.sub(r'-+', '-', text)  # Collapse dashes
    return text.strip('-')

Backlinks:

To get backlinks for note_path, query: WHERE target_path = note_path
Backlinks are automatically updated when any note's wikilinks change

Index Health

Tracks the state and freshness of per-user indices.

Attributes:

user_id (string, PK): Owner of the index
note_count (int): Total number of notes indexed
last_full_rebuild (datetime, nullable, ISO 8601): Timestamp of last full index rebuild
last_incremental_update (datetime, nullable, ISO 8601): Timestamp of last incremental update (write/delete)

Usage:

Displayed in UI as index health indicator
Used to detect stale indices (e.g., note_count mismatch with actual file count)
Manual rebuild sets last_full_rebuild = now() (FR-019)
Every write/delete sets last_incremental_update = now() (FR-018)

Authentication Entities

Token (JWT)

A signed JSON Web Token used for API and MCP authentication.

JWT Claims (payload):

sub (string): Subject (user_id)
iat (int): Issued at timestamp (Unix epoch)
exp (int): Expiration timestamp (Unix epoch, iat + 90 days)

Header:

alg: "HS256": HMAC SHA-256 signature algorithm
typ: "JWT": Token type

Signature:

Signed with server secret (env var JWT_SECRET_KEY)
Validated on every API/MCP request via Authorization: Bearer <token> header

Token lifecycle:

User authenticates via HF OAuth
User calls POST /api/tokens to issue JWT
Frontend stores token in memory (React context)
MCP clients pass token to auth parameter (FastMCP HTTP transport)
Server validates token on every request, extracts user_id from sub claim
Token expires after 90 days, user must re-authenticate

Example token:

{
  "header": {
    "alg": "HS256",
    "typ": "JWT"
  },
  "payload": {
    "sub": "alice",
    "iat": 1736956800,
    "exp": 1744732800
  },
  "signature": "<HMAC-SHA256-signature>"
}

Encoded: eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJhbGljZSIsImlhdCI6MTczNjk1NjgwMCwiZXhwIjoxNzQ0NzMyODAwfQ.<signature>

SQLite Schema

Complete DDL for multi-index storage with per-user isolation.

Core Tables

note_metadata

Stores note metadata for fast lookups and version tracking.

CREATE TABLE IF NOT EXISTS note_metadata (
    user_id TEXT NOT NULL,
    note_path TEXT NOT NULL,
    version INTEGER NOT NULL DEFAULT 1,
    title TEXT NOT NULL,
    created TEXT NOT NULL,  -- ISO 8601 timestamp
    updated TEXT NOT NULL,  -- ISO 8601 timestamp
    size_bytes INTEGER NOT NULL DEFAULT 0,
    normalized_title_slug TEXT,  -- Pre-computed for wikilink resolution
    normalized_path_slug TEXT,   -- Pre-computed for wikilink resolution
    PRIMARY KEY (user_id, note_path)
);

CREATE INDEX idx_metadata_user ON note_metadata(user_id);
CREATE INDEX idx_metadata_updated ON note_metadata(user_id, updated DESC);
CREATE INDEX idx_metadata_title_slug ON note_metadata(user_id, normalized_title_slug);
CREATE INDEX idx_metadata_path_slug ON note_metadata(user_id, normalized_path_slug);

Notes:

Composite primary key: (user_id, note_path)
version starts at 1, increments on every write
normalized_*_slug columns enable O(1) wikilink resolution
Index on updated DESC for recency-based sorting

note_fts

Full-text search index using SQLite FTS5.

CREATE VIRTUAL TABLE IF NOT EXISTS note_fts USING fts5(
    user_id UNINDEXED,
    note_path UNINDEXED,
    title,
    body,
    content='',  -- Contentless (external content pattern)
    tokenize='porter unicode61',  -- Stemming + Unicode support
    prefix='2 3'  -- Prefix indexes for autocomplete
);

Notes:

content='' (contentless): We manually INSERT/DELETE rows, no automatic sync
UNINDEXED columns are retrievable but not searchable (used for IDs)
porter tokenizer: English stemming (e.g., "running" matches "run")
prefix='2 3': Enables fast MATCH 'prefix*' queries (2-char and 3-char prefixes)
Manual row management: On write, DELETE old row + INSERT new row
Ranking: Use bm25(note_fts, 3.0, 1.0) for title weight=3x, body weight=1x

Query example:

SELECT
    note_path,
    title,
    bm25(note_fts, 3.0, 1.0) AS rank
FROM note_fts
WHERE user_id = ? AND note_fts MATCH ?
ORDER BY rank DESC
LIMIT 50;

Safety: Incoming queries are tokenized into alphanumeric terms (per requirement to split on non-alphanumeric characters), each optionally preserving a trailing * for prefix searches, then wrapped in double quotes before being passed to MATCH. This neutralizes MATCH operators, trims punctuation such as apostrophes/ampersands, and prevents SQL syntax errors while preserving simple keyword semantics.

note_tags

Many-to-many junction table for note-tag relationships.

CREATE TABLE IF NOT EXISTS note_tags (
    user_id TEXT NOT NULL,
    note_path TEXT NOT NULL,
    tag TEXT NOT NULL,
    PRIMARY KEY (user_id, note_path, tag)
);

CREATE INDEX idx_tags_user_tag ON note_tags(user_id, tag);
CREATE INDEX idx_tags_user_path ON note_tags(user_id, note_path);

Notes:

Composite primary key prevents duplicate tag assignments
Index on (user_id, tag) for "all notes with tag X" queries
Index on (user_id, note_path) for "all tags for note Y" queries

Query examples:

-- Get all notes with tag "backend"
SELECT DISTINCT note_path, title
FROM note_tags t
JOIN note_metadata m USING (user_id, note_path)
WHERE t.user_id = ? AND t.tag = ?
ORDER BY m.updated DESC;

-- Get tag counts for user
SELECT tag, COUNT(DISTINCT note_path) as count
FROM note_tags
WHERE user_id = ?
GROUP BY tag
ORDER BY count DESC;

note_links

Stores wikilink graph for backlink navigation.

CREATE TABLE IF NOT EXISTS note_links (
    user_id TEXT NOT NULL,
    source_path TEXT NOT NULL,
    target_path TEXT,  -- NULL if unresolved
    link_text TEXT NOT NULL,
    is_resolved INTEGER NOT NULL DEFAULT 0,  -- Boolean: 0=broken, 1=resolved
    PRIMARY KEY (user_id, source_path, link_text)
);

CREATE INDEX idx_links_user_source ON note_links(user_id, source_path);
CREATE INDEX idx_links_user_target ON note_links(user_id, target_path);
CREATE INDEX idx_links_unresolved ON note_links(user_id, is_resolved);

Notes:

target_path is nullable (null = broken link)
is_resolved is integer (0 or 1) for SQLite boolean representation
Composite primary key prevents duplicate links from same source with same text
Index on target_path enables fast backlink queries

Query examples:

-- Get backlinks for a note
SELECT DISTINCT l.source_path, m.title
FROM note_links l
JOIN note_metadata m ON l.user_id = m.user_id AND l.source_path = m.note_path
WHERE l.user_id = ? AND l.target_path = ?
ORDER BY m.updated DESC;

-- Get all unresolved links for user
SELECT source_path, link_text
FROM note_links
WHERE user_id = ? AND is_resolved = 0;

index_health

Tracks index state and freshness per user.

CREATE TABLE IF NOT EXISTS index_health (
    user_id TEXT PRIMARY KEY,
    note_count INTEGER NOT NULL DEFAULT 0,
    last_full_rebuild TEXT,  -- ISO 8601 timestamp
    last_incremental_update TEXT  -- ISO 8601 timestamp
);

Notes:

One row per user
last_full_rebuild set on manual rebuild (FR-042)
last_incremental_update set on every write/delete (FR-018)
note_count is denormalized cache for quick health checks

Initialization Script

Complete schema initialization:

-- Enable FTS5 extension (usually built-in)
-- PRAGMA compile_options;  -- Check if FTS5 is available

BEGIN TRANSACTION;

-- Core metadata table
CREATE TABLE IF NOT EXISTS note_metadata (
    user_id TEXT NOT NULL,
    note_path TEXT NOT NULL,
    version INTEGER NOT NULL DEFAULT 1,
    title TEXT NOT NULL,
    created TEXT NOT NULL,
    updated TEXT NOT NULL,
    size_bytes INTEGER NOT NULL DEFAULT 0,
    normalized_title_slug TEXT,
    normalized_path_slug TEXT,
    PRIMARY KEY (user_id, note_path)
);

CREATE INDEX IF NOT EXISTS idx_metadata_user ON note_metadata(user_id);
CREATE INDEX IF NOT EXISTS idx_metadata_updated ON note_metadata(user_id, updated DESC);
CREATE INDEX IF NOT EXISTS idx_metadata_title_slug ON note_metadata(user_id, normalized_title_slug);
CREATE INDEX IF NOT EXISTS idx_metadata_path_slug ON note_metadata(user_id, normalized_path_slug);

-- Full-text search index
CREATE VIRTUAL TABLE IF NOT EXISTS note_fts USING fts5(
    user_id UNINDEXED,
    note_path UNINDEXED,
    title,
    body,
    content='',
    tokenize='porter unicode61',
    prefix='2 3'
);

-- Tag index
CREATE TABLE IF NOT EXISTS note_tags (
    user_id TEXT NOT NULL,
    note_path TEXT NOT NULL,
    tag TEXT NOT NULL,
    PRIMARY KEY (user_id, note_path, tag)
);

CREATE INDEX IF NOT EXISTS idx_tags_user_tag ON note_tags(user_id, tag);
CREATE INDEX IF NOT EXISTS idx_tags_user_path ON note_tags(user_id, note_path);

-- Link graph
CREATE TABLE IF NOT EXISTS note_links (
    user_id TEXT NOT NULL,
    source_path TEXT NOT NULL,
    target_path TEXT,
    link_text TEXT NOT NULL,
    is_resolved INTEGER NOT NULL DEFAULT 0,
    PRIMARY KEY (user_id, source_path, link_text)
);

CREATE INDEX IF NOT EXISTS idx_links_user_source ON note_links(user_id, source_path);
CREATE INDEX IF NOT EXISTS idx_links_user_target ON note_links(user_id, target_path);
CREATE INDEX IF NOT EXISTS idx_links_unresolved ON note_links(user_id, is_resolved);

-- Index health tracking
CREATE TABLE IF NOT EXISTS index_health (
    user_id TEXT PRIMARY KEY,
    note_count INTEGER NOT NULL DEFAULT 0,
    last_full_rebuild TEXT,
    last_incremental_update TEXT
);

COMMIT;

Pydantic Models

Python data models using Pydantic for validation and serialization.

User Models

from pydantic import BaseModel, Field
from datetime import datetime
from typing import Optional


class HFProfile(BaseModel):
    """HuggingFace OAuth profile information."""
    username: str = Field(..., description="HF username")
    name: Optional[str] = Field(None, description="Display name")
    avatar_url: Optional[str] = Field(None, description="Profile picture URL")


class User(BaseModel):
    """User account with authentication info."""
    user_id: str = Field(..., min_length=1, max_length=64, description="Internal user ID")
    hf_profile: Optional[HFProfile] = Field(None, description="HF OAuth profile")
    vault_path: str = Field(..., description="Absolute path to user's vault")
    created: datetime = Field(..., description="Account creation timestamp")

    class Config:
        json_schema_extra = {
            "example": {
                "user_id": "alice",
                "hf_profile": {
                    "username": "alice",
                    "name": "Alice Smith",
                    "avatar_url": "https://cdn-avatars.huggingface.co/v1/alice"
                },
                "vault_path": "/data/vaults/alice",
                "created": "2025-01-15T10:30:00Z"
            }
        }

Note Models

from pathlib import Path
import re


class NoteMetadata(BaseModel):
    """Frontmatter metadata (arbitrary key-value pairs)."""
    title: Optional[str] = None
    tags: Optional[list[str]] = None
    project: Optional[str] = None
    created: Optional[datetime] = None
    updated: Optional[datetime] = None

    class Config:
        extra = "allow"  # Allow arbitrary fields


class Note(BaseModel):
    """Complete note with content and metadata."""
    user_id: str = Field(..., description="Owner user ID")
    note_path: str = Field(
        ...,
        min_length=1,
        max_length=256,
        description="Relative path to vault root (includes .md)"
    )
    version: int = Field(..., ge=1, description="Optimistic concurrency version")
    title: str = Field(..., min_length=1, description="Display title")
    metadata: NoteMetadata = Field(default_factory=NoteMetadata, description="Frontmatter")
    body: str = Field(..., description="Markdown content")
    created: datetime = Field(..., description="Creation timestamp")
    updated: datetime = Field(..., description="Last update timestamp")
    size_bytes: int = Field(..., ge=0, le=1_048_576, description="File size in bytes")

    @validator("note_path")
    def validate_path(cls, v):
        """Validate note path format."""
        # Must end with .md
        if not v.endswith('.md'):
            raise ValueError("Note path must end with .md")

        # Must not contain ..
        if '..' in v:
            raise ValueError("Note path must not contain '..'")

        # Must use Unix-style separators
        if '\\' in v:
            raise ValueError("Note path must use Unix-style separators (/)")

        # Must not start with /
        if v.startswith('/'):
            raise ValueError("Note path must be relative (no leading /)")

        return v

    class Config:
        json_schema_extra = {
            "example": {
                "user_id": "alice",
                "note_path": "api/design.md",
                "version": 5,
                "title": "API Design",
                "metadata": {
                    "tags": ["backend", "api"],
                    "project": "auth-service"
                },
                "body": "# API Design\n\nThis document describes...",
                "created": "2025-01-10T09:00:00Z",
                "updated": "2025-01-15T14:30:00Z",
                "size_bytes": 4096
            }
        }


class NoteCreate(BaseModel):
    """Request to create a new note."""
    note_path: str = Field(..., min_length=1, max_length=256)
    title: Optional[str] = None
    metadata: Optional[NoteMetadata] = None
    body: str = Field(..., max_length=1_048_576)


class NoteUpdate(BaseModel):
    """Request to update an existing note."""
    title: Optional[str] = None
    metadata: Optional[NoteMetadata] = None
    body: str = Field(..., max_length=1_048_576)
    if_version: Optional[int] = Field(None, ge=1, description="Expected version for concurrency check")


class NoteSummary(BaseModel):
    """Lightweight note summary for listings."""
    note_path: str
    title: str
    updated: datetime

Index Models

class Wikilink(BaseModel):
    """Bidirectional link between notes."""
    user_id: str
    source_path: str
    target_path: Optional[str] = Field(None, description="Null if unresolved")
    link_text: str
    is_resolved: bool

    class Config:
        json_schema_extra = {
            "example": {
                "user_id": "alice",
                "source_path": "api/design.md",
                "target_path": "api/endpoints.md",
                "link_text": "Endpoints",
                "is_resolved": True
            }
        }


class Tag(BaseModel):
    """Tag with note count."""
    tag_name: str
    count: int = Field(..., ge=0)


class IndexHealth(BaseModel):
    """Index state and freshness metrics."""
    user_id: str
    note_count: int = Field(..., ge=0)
    last_full_rebuild: Optional[datetime] = None
    last_incremental_update: Optional[datetime] = None

    class Config:
        json_schema_extra = {
            "example": {
                "user_id": "alice",
                "note_count": 142,
                "last_full_rebuild": "2025-01-01T00:00:00Z",
                "last_incremental_update": "2025-01-15T14:30:00Z"
            }
        }

Search Models

class SearchResult(BaseModel):
    """Full-text search result with snippet."""
    note_path: str
    title: str
    snippet: str = Field(..., description="Highlighted excerpt from body")
    score: float = Field(..., description="Relevance score (title 3x, body 1x, recency bonus)")
    updated: datetime


class SearchRequest(BaseModel):
    """Full-text search query."""
    query: str = Field(..., min_length=1, max_length=256)
    limit: int = Field(50, ge=1, le=100)

Authentication Models

class TokenResponse(BaseModel):
    """JWT token issuance response."""
    token: str = Field(..., description="JWT access token")
    token_type: str = Field("bearer", description="Token type")
    expires_at: datetime = Field(..., description="Expiration timestamp")


class JWTPayload(BaseModel):
    """JWT claims payload."""
    sub: str = Field(..., description="Subject (user_id)")
    iat: int = Field(..., description="Issued at (Unix timestamp)")
    exp: int = Field(..., description="Expiration (Unix timestamp)")

TypeScript Type Definitions

Frontend type definitions for API contracts.

Core Types

/**
 * User account with HF profile
 */
export interface User {
  user_id: string;
  hf_profile?: {
    username: string;
    name?: string;
    avatar_url?: string;
  };
  vault_path: string;
  created: string;  // ISO 8601
}

/**
 * Note metadata (frontmatter)
 */
export interface NoteMetadata {
  title?: string;
  tags?: string[];
  project?: string;
  created?: string;  // ISO 8601
  updated?: string;  // ISO 8601
  [key: string]: unknown;  // Arbitrary fields
}

/**
 * Complete note with content
 */
export interface Note {
  user_id: string;
  note_path: string;
  version: number;
  title: string;
  metadata: NoteMetadata;
  body: string;
  created: string;  // ISO 8601
  updated: string;  // ISO 8601
  size_bytes: number;
}

/**
 * Lightweight note summary for listings
 */
export interface NoteSummary {
  note_path: string;
  title: string;
  updated: string;  // ISO 8601
}

/**
 * Request to create a note
 */
export interface NoteCreateRequest {
  note_path: string;
  title?: string;
  metadata?: NoteMetadata;
  body: string;
}

/**
 * Request to update a note
 */
export interface NoteUpdateRequest {
  title?: string;
  metadata?: NoteMetadata;
  body: string;
  if_version?: number;  // Optimistic concurrency
}

/**
 * Wikilink with resolution status
 */
export interface Wikilink {
  user_id: string;
  source_path: string;
  target_path: string | null;  // Null if unresolved
  link_text: string;
  is_resolved: boolean;
}

/**
 * Tag with note count
 */
export interface Tag {
  tag_name: string;
  count: number;
}

/**
 * Index health metrics
 */
export interface IndexHealth {
  user_id: string;
  note_count: number;
  last_full_rebuild: string | null;  // ISO 8601
  last_incremental_update: string | null;  // ISO 8601
}

/**
 * Search result with snippet
 */
export interface SearchResult {
  note_path: string;
  title: string;
  snippet: string;
  score: number;
  updated: string;  // ISO 8601
}

/**
 * JWT token response
 */
export interface TokenResponse {
  token: string;
  token_type: "bearer";
  expires_at: string;  // ISO 8601
}

/**
 * API error response
 */
export interface APIError {
  error: string;
  message: string;
  detail?: Record<string, unknown>;
}

Validation Helpers

/**
 * Validate note path format
 */
export function isValidNotePath(path: string): boolean {
  return (
    path.length > 0 &&
    path.length <= 256 &&
    path.endsWith('.md') &&
    !path.includes('..') &&
    !path.includes('\\') &&
    !path.startsWith('/')
  );
}

/**
 * Normalize tag name (lowercase, trim)
 */
export function normalizeTag(tag: string): string {
  return tag.toLowerCase().trim();
}

/**
 * Normalize slug for wikilink resolution
 */
export function normalizeSlug(text: string): string {
  return text
    .toLowerCase()
    .replace(/[\s_]+/g, '-')  // Spaces/underscores → dash
    .replace(/[^a-z0-9-]/g, '')  // Keep alphanumeric + dash
    .replace(/-+/g, '-')  // Collapse dashes
    .replace(/^-+|-+$/g, '');  // Trim dashes
}

/**
 * Extract wikilinks from markdown body
 */
export function extractWikilinks(markdown: string): string[] {
  const pattern = /\[\[([^\]]+)\]\]/g;
  const matches: string[] = [];
  let match;

  while ((match = pattern.exec(markdown)) !== null) {
    matches.push(match[1]);
  }

  return matches;
}

Validation Rules

Comprehensive validation constraints for all entities.

Note Path Validation

import re
from pathlib import Path

def validate_note_path(path: str) -> tuple[bool, str]:
    """
    Validate note path format.

    Returns (is_valid, error_message).
    """
    # Length check
    if not path or len(path) > 256:
        return False, "Path must be 1-256 characters"

    # Must end with .md
    if not path.endswith('.md'):
        return False, "Path must end with .md"

    # Must not contain ..
    if '..' in path:
        return False, "Path must not contain '..'"

    # Must use Unix-style separators
    if '\\' in path:
        return False, "Path must use Unix separators (/)"

    # Must be relative
    if path.startswith('/'):
        return False, "Path must be relative (no leading /)"

    # Must not have invalid characters
    invalid_chars = ['<', '>', ':', '"', '|', '?', '*']
    if any(c in path for c in invalid_chars):
        return False, f"Path contains invalid characters: {invalid_chars}"

    return True, ""


def sanitize_path(user_id: str, vault_root: str, note_path: str) -> Path:
    """
    Sanitize and resolve note path within vault.

    Raises ValueError if path escapes vault root.
    """
    vault = Path(vault_root) / user_id
    full_path = (vault / note_path).resolve()

    # Ensure path is within vault
    if not str(full_path).startswith(str(vault.resolve())):
        raise ValueError(f"Path escapes vault root: {note_path}")

    return full_path

Note Content Validation

def validate_note_content(body: str) -> tuple[bool, str]:
    """
    Validate note content.

    Returns (is_valid, error_message).
    """
    # Size check (1 MiB max)
    size_bytes = len(body.encode('utf-8'))
    if size_bytes > 1_048_576:
        return False, f"Note exceeds 1 MiB limit ({size_bytes} bytes)"

    # UTF-8 validity
    try:
        body.encode('utf-8')
    except UnicodeEncodeError as e:
        return False, f"Invalid UTF-8 encoding: {e}"

    return True, ""


def validate_frontmatter(metadata: dict) -> tuple[bool, str]:
    """
    Validate frontmatter metadata.

    Returns (is_valid, error_message).
    """
    # Check for reserved fields
    reserved = ['version']  # Version is managed by index, not frontmatter
    for key in metadata.keys():
        if key in reserved:
            return False, f"Field '{key}' is reserved and cannot be set in frontmatter"

    # Validate tags format
    if 'tags' in metadata:
        tags = metadata['tags']
        if not isinstance(tags, list):
            return False, "Field 'tags' must be an array"

        if not all(isinstance(t, str) for t in tags):
            return False, "All tags must be strings"

    return True, ""

Vault Limits

def check_vault_limit(user_id: str, db) -> tuple[bool, str]:
    """
    Check if vault is within note limit.

    Returns (is_allowed, error_message).
    """
    cursor = db.execute(
        "SELECT note_count FROM index_health WHERE user_id = ?",
        (user_id,)
    )
    row = cursor.fetchone()

    if row is None:
        return True, ""  # New vault, no limit yet

    note_count = row[0]

    if note_count >= 5000:
        return False, "Vault note limit exceeded (max 5,000 notes)"

    return True, ""

Token Validation

import jwt
from datetime import datetime, timedelta

SECRET_KEY = "your-secret-key"  # From env var

def create_jwt(user_id: str) -> str:
    """Create JWT with 90-day expiration."""
    now = datetime.utcnow()
    payload = {
        "sub": user_id,
        "iat": int(now.timestamp()),
        "exp": int((now + timedelta(days=90)).timestamp())
    }
    return jwt.encode(payload, SECRET_KEY, algorithm="HS256")


def validate_jwt(token: str) -> tuple[bool, str, str]:
    """
    Validate JWT and extract user_id.

    Returns (is_valid, user_id, error_message).
    """
    try:
        payload = jwt.decode(token, SECRET_KEY, algorithms=["HS256"])
        user_id = payload["sub"]
        return True, user_id, ""

    except jwt.ExpiredSignatureError:
        return False, "", "Token expired"

    except jwt.InvalidTokenError as e:
        return False, "", f"Invalid token: {e}"

State Transitions

State machines for entity lifecycle and version management.

Note Lifecycle

stateDiagram-v2
    [*] --> Creating: write_note (new path)
    Creating --> Active: save to filesystem + insert metadata
    Active --> Updating: write_note (existing path)
    Updating --> Active: increment version + update index
    Active --> Deleting: delete_note
    Deleting --> [*]: remove file + delete index rows

    Updating --> ConflictDetected: if_version mismatch (UI writes only)
    ConflictDetected --> Active: reload or discard changes

State descriptions:

Creating: Note does not exist, first write in progress
- Validate path and content
- Set version = 1, created = now(), updated = now()
- Write file to filesystem
- Insert rows into note_metadata, note_fts, note_tags, note_links
Active: Note exists and is readable/editable
- Can be read via API/MCP
- Can be updated via write_note
- Can be deleted via delete_note
Updating: Modification in progress
- Load current metadata (version, timestamps)
- If UI write: check if_version matches current version
- If version mismatch: transition to ConflictDetected
- If MCP write: skip version check (last-write-wins)
- Increment version, set updated = now()
- Update file content
- Update all index rows (delete old, insert new)
ConflictDetected: Optimistic concurrency conflict (UI only)
- Return 409 Conflict with current and expected versions
- UI displays error: "Note changed since you opened it"
- User options: reload, save as copy, or discard changes
Deleting: Removal in progress
- Delete file from filesystem
- Delete rows from note_metadata, note_fts, note_tags, note_links
- Update backlinks (any note linking to deleted note now has unresolved link)
- Decrement index_health.note_count

Version Increment Logic

def increment_version(user_id: str, note_path: str, if_version: int | None, db) -> int:
    """
    Increment note version with optional concurrency check.

    Returns new version number.
    Raises ConflictError if if_version doesn't match.
    """
    # Get current version
    cursor = db.execute(
        "SELECT version FROM note_metadata WHERE user_id = ? AND note_path = ?",
        (user_id, note_path)
    )
    row = cursor.fetchone()

    if row is None:
        # New note
        return 1

    current_version = row[0]

    # Optimistic concurrency check (UI writes only)
    if if_version is not None and current_version != if_version:
        raise ConflictError(
            f"Version conflict: expected {if_version}, current is {current_version}"
        )

    # Increment version
    new_version = current_version + 1

    return new_version

Index Update Workflow

stateDiagram-v2
    [*] --> IncrementalUpdate: write_note or delete_note
    IncrementalUpdate --> DeleteOldRows: start transaction
    DeleteOldRows --> ExtractMetadata: parse frontmatter + body
    ExtractMetadata --> InsertNewRows: insert into all index tables
    InsertNewRows --> UpdateHealth: set last_incremental_update
    UpdateHealth --> [*]: commit transaction

    [*] --> FullRebuild: POST /api/index/rebuild
    FullRebuild --> DropUserRows: delete all rows for user_id
    DropUserRows --> ScanVault: walk filesystem tree
    ScanVault --> ProcessNote: for each .md file
    ProcessNote --> ExtractMetadata
    ProcessNote --> ScanVault: next file
    ScanVault --> UpdateHealth: set last_full_rebuild
    UpdateHealth --> [*]: commit transaction

Incremental update (on every write/delete):

Start SQLite transaction
Delete all existing rows for (user_id, note_path) from:
- note_metadata
- note_fts
- note_tags
- note_links
Parse note content (frontmatter + body)
Extract: title, tags, wikilinks
Insert new rows into all index tables
Resolve wikilinks and update is_resolved flags
Update index_health.last_incremental_update = now()
Commit transaction

Full rebuild (manual trigger):

Start SQLite transaction
Delete all rows for user_id from all index tables
Walk vault directory tree, find all .md files
For each file:
- Parse frontmatter + body
- Extract metadata, tags, wikilinks
- Insert rows into all index tables
Resolve all wikilinks (second pass after all notes indexed)
Update index_health.note_count and last_full_rebuild = now()
Commit transaction

Relationships and Constraints

Foreign Key Relationships

While SQLite supports foreign keys, we don't enforce them for performance reasons (multi-tenant with user-scoped queries). Instead, we rely on application-level referential integrity.

Logical relationships:

note_metadata.user_id → User.user_id
note_tags.note_path → note_metadata.note_path
note_links.source_path → note_metadata.note_path
note_links.target_path → note_metadata.note_path (nullable)

Cascade semantics (application-enforced):

On delete note: cascade delete from note_tags, note_links (source), note_fts
On delete note: update note_links (target) to set is_resolved = false

Uniqueness Constraints

Table	Unique Constraint	Enforced By
`note_metadata`	`(user_id, note_path)`	PRIMARY KEY
`note_tags`	`(user_id, note_path, tag)`	PRIMARY KEY
`note_links`	`(user_id, source_path, link_text)`	PRIMARY KEY
`index_health`	`user_id`	PRIMARY KEY

Cardinality

Relationship	Type	Notes
User → Vault	1:1	One user owns one vault
Vault → Notes	1:N	One vault contains many notes (max 5,000)
Note → Tags	N:M	Many-to-many via `note_tags` junction table
Note → Outgoing Links	1:N	One note has many outgoing wikilinks
Note → Backlinks	1:N	One note may be referenced by many backlinks
User → Tokens	1:N	One user can issue multiple JWT tokens

Invariants

Critical invariants maintained by the system:

Version monotonicity: note.version only increases (never decreases or resets)
Timestamp ordering: note.created <= note.updated always
Path uniqueness: No two notes with same (user_id, note_path) can exist
Size limit: note.size_bytes <= 1_048_576 always enforced
Vault limit: COUNT(*) WHERE user_id = X <= 5000 enforced before writes
Link consistency: If note_links.target_path is not null, target note must exist
Tag normalization: All note_tags.tag values are lowercase
Index freshness: index_health.last_incremental_update is always >= most recent note_metadata.updated for that user

Appendix

Common Queries Reference

-- Get all notes for user, sorted by recent update
SELECT note_path, title, updated
FROM note_metadata
WHERE user_id = ?
ORDER BY updated DESC
LIMIT 100;

-- Full-text search with title boost
SELECT
    note_path,
    title,
    snippet(note_fts, 3, '<mark>', '</mark>', '...', 32) AS snippet,
    bm25(note_fts, 3.0, 1.0) AS score
FROM note_fts
WHERE user_id = ? AND note_fts MATCH ?
ORDER BY score DESC
LIMIT 50;

-- Get all tags with counts
SELECT tag, COUNT(DISTINCT note_path) as count
FROM note_tags
WHERE user_id = ?
GROUP BY tag
ORDER BY count DESC;

-- Get backlinks for a note
SELECT DISTINCT l.source_path, m.title
FROM note_links l
JOIN note_metadata m ON l.user_id = m.user_id AND l.source_path = m.note_path
WHERE l.user_id = ? AND l.target_path = ?
ORDER BY m.updated DESC;

-- Get all unresolved wikilinks for a user
SELECT source_path, link_text, COUNT(*) as occurrences
FROM note_links
WHERE user_id = ? AND is_resolved = 0
GROUP BY source_path, link_text
ORDER BY occurrences DESC;

-- Check index health
SELECT note_count, last_full_rebuild, last_incremental_update
FROM index_health
WHERE user_id = ?;

Migration Strategy

For future schema changes:

# Example migration: Add column to note_metadata
def migrate_v1_to_v2(db):
    """Add normalized_title_slug column."""
    db.execute("""
        ALTER TABLE note_metadata
        ADD COLUMN normalized_title_slug TEXT;
    """)

    # Backfill existing notes
    db.execute("""
        UPDATE note_metadata
        SET normalized_title_slug = LOWER(
            REPLACE(REPLACE(title, ' ', '-'), '_', '-')
        );
    """)

    db.execute("""
        CREATE INDEX idx_metadata_title_slug
        ON note_metadata(user_id, normalized_title_slug);
    """)

    db.commit()

Document Status: Draft Last Updated: 2025-11-15 Next Review: After Phase 1 implementation

Data Model: Multi-Tenant Obsidian-Like Docs Viewer

Table of Contents

Overview

Entity Relationship Diagram

Core Entities

User

Vault

Note

Index Entities

Wikilink

Tag

Index Health

Authentication Entities

Token (JWT)

SQLite Schema

Core Tables

note_metadata

note_fts

note_tags

note_links

index_health

Initialization Script

Pydantic Models

User Models

Note Models

Index Models

Search Models

Authentication Models

TypeScript Type Definitions

Core Types

Validation Helpers

Validation Rules

Note Path Validation

Note Content Validation

Vault Limits

Token Validation

State Transitions

Note Lifecycle

Version Increment Logic

Index Update Workflow

Relationships and Constraints

Foreign Key Relationships

Uniqueness Constraints

Cardinality

Invariants

Appendix

Common Queries Reference

Migration Strategy