Spaces:

MCP-1st-Birthday
/

Vault.MCP

Running

File size: 41,426 Bytes

# Data Model: Multi-Tenant Obsidian-Like Docs Viewer

**Feature Branch**: `001-obsidian-docs-viewer`
**Created**: 2025-11-15
**Status**: Draft

## Table of Contents

1. [Overview](#overview)
2. [Entity Relationship Diagram](#entity-relationship-diagram)
3. [Core Entities](#core-entities)
4. [Index Entities](#index-entities)
5. [Authentication Entities](#authentication-entities)
6. [SQLite Schema](#sqlite-schema)
7. [Pydantic Models](#pydantic-models)
8. [TypeScript Type Definitions](#typescript-type-definitions)
9. [Validation Rules](#validation-rules)
10. [State Transitions](#state-transitions)
11. [Relationships and Constraints](#relationships-and-constraints)

---

## Overview

This document defines the complete data model for a multi-tenant Obsidian-like documentation viewer. The system stores:

- **User accounts** with HF OAuth identity mapping
- **Vaults** as per-user directory trees containing Markdown notes
- **Notes** with YAML frontmatter, version tracking, and full-text indexing
- **Wikilinks** for bidirectional linking between notes
- **Tags** for categorization and filtering
- **Index metadata** for search optimization and health monitoring

**Design principles**:
- **Per-user isolation**: All data scoped by `user_id`
- **Filesystem-backed**: Notes stored as `.md` files, metadata in SQLite
- **Version-controlled**: Integer version counter for optimistic concurrency
- **Search-optimized**: SQLite FTS5 for full-text search, separate indexes for tags and links

---

## Entity Relationship Diagram

```mermaid
erDiagram
    USER ||--o{ VAULT : owns
    VAULT ||--o{ NOTE : contains
    NOTE ||--o{ WIKILINK : has_outgoing
    NOTE ||--o{ WIKILINK : has_incoming
    NOTE ||--o{ TAG : tagged_with
    USER ||--|| INDEX_HEALTH : tracks
    USER ||--o{ TOKEN : issues

    USER {
        string user_id PK
        string hf_username
        string hf_name
        string hf_avatar_url
        datetime created
    }

    VAULT {
        string user_id FK
        string root_path
        int note_count
    }

    NOTE {
        string user_id FK
        string note_path PK
        int version
        string title
        json metadata
        string body
        datetime created
        datetime updated
        int size_bytes
    }

    WIKILINK {
        string user_id FK
        string source_path FK
        string target_path FK
        string link_text
        bool is_resolved
    }

    TAG {
        string user_id FK
        string tag_name
        string note_path FK
    }

    INDEX_HEALTH {
        string user_id PK
        int note_count
        datetime last_full_rebuild
        datetime last_incremental_update
    }

    TOKEN {
        string token_id PK
        string user_id FK
        datetime issued_at
        datetime expires_at
        string token_type
    }
```

**Key relationships**:
- One user owns one vault (1:1)
- One vault contains many notes (1:N)
- One note has many outgoing wikilinks (1:N)
- One note may be referenced by many backlinks (1:N)
- One note can have many tags (N:M via junction table)
- One user has one index health record (1:1)
- One user can issue many tokens (1:N)

---

## Core Entities

### User

Represents an authenticated user with HF OAuth identity.

**Attributes**:
- `user_id` (string, PK): Internal unique identifier, derived from HF username or UUID
- `hf_username` (string, nullable): HuggingFace username (e.g., "alice")
- `hf_name` (string, nullable): Display name from HF profile
- `hf_avatar_url` (string, nullable): Profile picture URL
- `created` (datetime): Account creation timestamp (ISO 8601)

**Notes**:
- In local mode, `user_id = "local-dev"` with null HF fields
- In HF Space mode, `user_id = hf_username` (normalized to lowercase)
- `created` timestamp set on first OAuth login (vault initialization)

**Lifecycle**:
1. User authenticates via HF OAuth
2. Backend maps HF identity to `user_id`
3. If new user, create vault directory and initialize index
4. Return user info to frontend

---

### Vault

A per-user directory tree containing Markdown notes.

**Attributes**:
- `user_id` (string, FK): Owner of the vault
- `root_path` (string): Absolute filesystem path to vault root (e.g., `/data/vaults/alice/`)
- `note_count` (int): Cached count of notes in vault (denormalized from index)

**Constraints**:
- Max 5,000 notes per vault (enforced by FR-008)
- Root path must exist and be writable
- Directory structure is arbitrary (user-defined nested folders)

**Filesystem layout example**:
```
/data/vaults/alice/
├── README.md
├── api/
│   ├── design.md
│   └── endpoints.md
├── guides/
│   ├── setup.md
│   └── deployment.md
└── notes/
    └── meeting-2025-01-15.md
```

---

### Note

A Markdown file with optional YAML frontmatter and body content.

**Attributes**:
- `user_id` (string, FK): Owner of the note
- `note_path` (string, PK): Relative path to vault root, includes `.md` (e.g., `api/design.md`)
- `version` (int): Optimistic concurrency version counter (starts at 1, increments on write)
- `title` (string): Display title (from frontmatter, first H1, or filename stem)
- `metadata` (JSON): Frontmatter key-value pairs (excludes auto-managed fields)
- `body` (string): Markdown content (excluding frontmatter)
- `created` (datetime, ISO 8601): Creation timestamp (auto-set if not in frontmatter)
- `updated` (datetime, ISO 8601): Last modification timestamp (auto-set on every write)
- `size_bytes` (int): UTF-8 byte size of full file content (frontmatter + body)

**Constraints**:
- `note_path` max 256 characters, Unix-style separators (`/`), no `..` allowed
- `size_bytes` max 1 MiB (1,048,576 bytes) per FR-007
- `version` stored in index, NOT in frontmatter
- `created` and `updated` stored in index, MAY appear in frontmatter (frontmatter is source of truth on read)

**Title resolution priority** (FR-006):
1. `metadata.get('title')` from frontmatter
2. First `# Heading` in body (H1 only)
3. Filename stem (e.g., `design.md` → "design")

**Metadata fields** (common, but arbitrary):
- `tags` (array of strings): Tag names for categorization
- `project` (string): Project identifier
- `created` (datetime): User-provided creation timestamp
- `updated` (datetime): User-provided update timestamp
- Custom fields allowed (JSON object)

---

## Index Entities

### Wikilink

Represents a bidirectional link between two notes.

**Attributes**:
- `user_id` (string, FK): Owner of the notes
- `source_path` (string, FK): Path of note containing the wikilink
- `target_path` (string, nullable, FK): Resolved path of linked note (null if unresolved)
- `link_text` (string): Original text from `[[link text]]`
- `is_resolved` (bool): True if `target_path` is non-null, false if broken link

**Extraction**:
- Regex pattern: `\[\[([^\]]+)\]\]`
- Extract all matches from note body on every write

**Resolution algorithm** (FR-015, FR-016):
1. Normalize `link_text` to slug: lowercase, replace spaces/underscores with dash, strip non-alphanumeric
2. Match normalized slug against:
   - Normalized filename stems (e.g., `api-design.md` → "api-design")
   - Normalized frontmatter titles (e.g., `title: "API Design"` → "api-design")
3. If multiple matches:
   - Prefer same-folder match (e.g., `api/[[design]]` → `api/design.md` over `guides/design.md`)
   - Tiebreaker: lexicographically smallest path
4. If no match: `target_path = null`, `is_resolved = false`

**Slug normalization function**:
```python
import re

def normalize_slug(text: str) -> str:
    text = text.lower()
    text = re.sub(r'[\s_]+', '-', text)  # Spaces/underscores → dash
    text = re.sub(r'[^a-z0-9-]', '', text)  # Keep alphanumeric + dash
    text = re.sub(r'-+', '-', text)  # Collapse dashes
    return text.strip('-')
```

**Backlinks**:
- To get backlinks for `note_path`, query: `WHERE target_path = note_path`
- Backlinks are automatically updated when any note's wikilinks change

---

### Tag

A metadata label applied to notes for categorization.

**Attributes**:
- `user_id` (string, FK): Owner of the notes
- `note_path` (string, FK): Path of tagged note
- `tag_name` (string): Tag identifier (lowercase, alphanumeric + hyphens)

**Constraints**:
- Many-to-many relationship: one note can have multiple tags, one tag can apply to multiple notes
- Tag names normalized: lowercase, strip whitespace
- Extracted from frontmatter `tags: [tag1, tag2]` array

**Tag count**:
- Computed via `COUNT(DISTINCT note_path) GROUP BY tag_name`
- Used for tag cloud and filtering UI

---

### Index Health

Tracks the state and freshness of per-user indices.

**Attributes**:
- `user_id` (string, PK): Owner of the index
- `note_count` (int): Total number of notes indexed
- `last_full_rebuild` (datetime, nullable, ISO 8601): Timestamp of last full index rebuild
- `last_incremental_update` (datetime, nullable, ISO 8601): Timestamp of last incremental update (write/delete)

**Usage**:
- Displayed in UI as index health indicator
- Used to detect stale indices (e.g., `note_count` mismatch with actual file count)
- Manual rebuild sets `last_full_rebuild = now()` (FR-019)
- Every write/delete sets `last_incremental_update = now()` (FR-018)

---

## Authentication Entities

### Token (JWT)

A signed JSON Web Token used for API and MCP authentication.

**JWT Claims** (payload):
- `sub` (string): Subject (user_id)
- `iat` (int): Issued at timestamp (Unix epoch)
- `exp` (int): Expiration timestamp (Unix epoch, iat + 90 days)

**Header**:
- `alg: "HS256"`: HMAC SHA-256 signature algorithm
- `typ: "JWT"`: Token type

**Signature**:
- Signed with server secret (env var `JWT_SECRET_KEY`)
- Validated on every API/MCP request via `Authorization: Bearer <token>` header

**Token lifecycle**:
1. User authenticates via HF OAuth
2. User calls `POST /api/tokens` to issue JWT
3. Frontend stores token in memory (React context)
4. MCP clients pass token to `auth` parameter (FastMCP HTTP transport)
5. Server validates token on every request, extracts `user_id` from `sub` claim
6. Token expires after 90 days, user must re-authenticate

**Example token**:
```json
{
  "header": {
    "alg": "HS256",
    "typ": "JWT"
  },
  "payload": {
    "sub": "alice",
    "iat": 1736956800,
    "exp": 1744732800
  },
  "signature": "<HMAC-SHA256-signature>"
}
```

Encoded: `eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJhbGljZSIsImlhdCI6MTczNjk1NjgwMCwiZXhwIjoxNzQ0NzMyODAwfQ.<signature>`

---

## SQLite Schema

Complete DDL for multi-index storage with per-user isolation.

### Core Tables

#### note_metadata

Stores note metadata for fast lookups and version tracking.

```sql
CREATE TABLE IF NOT EXISTS note_metadata (
    user_id TEXT NOT NULL,
    note_path TEXT NOT NULL,
    version INTEGER NOT NULL DEFAULT 1,
    title TEXT NOT NULL,
    created TEXT NOT NULL,  -- ISO 8601 timestamp
    updated TEXT NOT NULL,  -- ISO 8601 timestamp
    size_bytes INTEGER NOT NULL DEFAULT 0,
    normalized_title_slug TEXT,  -- Pre-computed for wikilink resolution
    normalized_path_slug TEXT,   -- Pre-computed for wikilink resolution
    PRIMARY KEY (user_id, note_path)
);

CREATE INDEX idx_metadata_user ON note_metadata(user_id);
CREATE INDEX idx_metadata_updated ON note_metadata(user_id, updated DESC);
CREATE INDEX idx_metadata_title_slug ON note_metadata(user_id, normalized_title_slug);
CREATE INDEX idx_metadata_path_slug ON note_metadata(user_id, normalized_path_slug);
```

**Notes**:
- Composite primary key: `(user_id, note_path)`
- `version` starts at 1, increments on every write
- `normalized_*_slug` columns enable O(1) wikilink resolution
- Index on `updated DESC` for recency-based sorting

---

#### note_fts

Full-text search index using SQLite FTS5.

```sql
CREATE VIRTUAL TABLE IF NOT EXISTS note_fts USING fts5(
    user_id UNINDEXED,
    note_path UNINDEXED,
    title,
    body,
    content='',  -- Contentless (external content pattern)
    tokenize='porter unicode61',  -- Stemming + Unicode support
    prefix='2 3'  -- Prefix indexes for autocomplete
);
```

**Notes**:
- `content=''` (contentless): We manually INSERT/DELETE rows, no automatic sync
- `UNINDEXED` columns are retrievable but not searchable (used for IDs)
- `porter` tokenizer: English stemming (e.g., "running" matches "run")
- `prefix='2 3'`: Enables fast `MATCH 'prefix*'` queries (2-char and 3-char prefixes)
- Manual row management: On write, `DELETE` old row + `INSERT` new row
- Ranking: Use `bm25(note_fts, 3.0, 1.0)` for title weight=3x, body weight=1x

**Query example**:
```sql
SELECT
    note_path,
    title,
    bm25(note_fts, 3.0, 1.0) AS rank
FROM note_fts
WHERE user_id = ? AND note_fts MATCH ?
ORDER BY rank DESC
LIMIT 50;
```

**Safety**: Incoming queries are tokenized into alphanumeric terms (per requirement to split on non-alphanumeric characters), each optionally preserving a trailing `*` for prefix searches, then wrapped in double quotes before being passed to `MATCH`. This neutralizes MATCH operators, trims punctuation such as apostrophes/ampersands, and prevents SQL syntax errors while preserving simple keyword semantics.

---

#### note_tags

Many-to-many junction table for note-tag relationships.

```sql
CREATE TABLE IF NOT EXISTS note_tags (
    user_id TEXT NOT NULL,
    note_path TEXT NOT NULL,
    tag TEXT NOT NULL,
    PRIMARY KEY (user_id, note_path, tag)
);

CREATE INDEX idx_tags_user_tag ON note_tags(user_id, tag);
CREATE INDEX idx_tags_user_path ON note_tags(user_id, note_path);
```

**Notes**:
- Composite primary key prevents duplicate tag assignments
- Index on `(user_id, tag)` for "all notes with tag X" queries
- Index on `(user_id, note_path)` for "all tags for note Y" queries

**Query examples**:
```sql
-- Get all notes with tag "backend"
SELECT DISTINCT note_path, title
FROM note_tags t
JOIN note_metadata m USING (user_id, note_path)
WHERE t.user_id = ? AND t.tag = ?
ORDER BY m.updated DESC;

-- Get tag counts for user
SELECT tag, COUNT(DISTINCT note_path) as count
FROM note_tags
WHERE user_id = ?
GROUP BY tag
ORDER BY count DESC;
```

---

#### note_links

Stores wikilink graph for backlink navigation.

```sql
CREATE TABLE IF NOT EXISTS note_links (
    user_id TEXT NOT NULL,
    source_path TEXT NOT NULL,
    target_path TEXT,  -- NULL if unresolved
    link_text TEXT NOT NULL,
    is_resolved INTEGER NOT NULL DEFAULT 0,  -- Boolean: 0=broken, 1=resolved
    PRIMARY KEY (user_id, source_path, link_text)
);

CREATE INDEX idx_links_user_source ON note_links(user_id, source_path);
CREATE INDEX idx_links_user_target ON note_links(user_id, target_path);
CREATE INDEX idx_links_unresolved ON note_links(user_id, is_resolved);
```

**Notes**:
- `target_path` is nullable (null = broken link)
- `is_resolved` is integer (0 or 1) for SQLite boolean representation
- Composite primary key prevents duplicate links from same source with same text
- Index on `target_path` enables fast backlink queries

**Query examples**:
```sql
-- Get backlinks for a note
SELECT DISTINCT l.source_path, m.title
FROM note_links l
JOIN note_metadata m ON l.user_id = m.user_id AND l.source_path = m.note_path
WHERE l.user_id = ? AND l.target_path = ?
ORDER BY m.updated DESC;

-- Get all unresolved links for user
SELECT source_path, link_text
FROM note_links
WHERE user_id = ? AND is_resolved = 0;
```

---

#### index_health

Tracks index state and freshness per user.

```sql
CREATE TABLE IF NOT EXISTS index_health (
    user_id TEXT PRIMARY KEY,
    note_count INTEGER NOT NULL DEFAULT 0,
    last_full_rebuild TEXT,  -- ISO 8601 timestamp
    last_incremental_update TEXT  -- ISO 8601 timestamp
);
```

**Notes**:
- One row per user
- `last_full_rebuild` set on manual rebuild (FR-042)
- `last_incremental_update` set on every write/delete (FR-018)
- `note_count` is denormalized cache for quick health checks

---

### Initialization Script

Complete schema initialization:

```sql
-- Enable FTS5 extension (usually built-in)
-- PRAGMA compile_options;  -- Check if FTS5 is available

BEGIN TRANSACTION;

-- Core metadata table
CREATE TABLE IF NOT EXISTS note_metadata (
    user_id TEXT NOT NULL,
    note_path TEXT NOT NULL,
    version INTEGER NOT NULL DEFAULT 1,
    title TEXT NOT NULL,
    created TEXT NOT NULL,
    updated TEXT NOT NULL,
    size_bytes INTEGER NOT NULL DEFAULT 0,
    normalized_title_slug TEXT,
    normalized_path_slug TEXT,
    PRIMARY KEY (user_id, note_path)
);

CREATE INDEX IF NOT EXISTS idx_metadata_user ON note_metadata(user_id);
CREATE INDEX IF NOT EXISTS idx_metadata_updated ON note_metadata(user_id, updated DESC);
CREATE INDEX IF NOT EXISTS idx_metadata_title_slug ON note_metadata(user_id, normalized_title_slug);
CREATE INDEX IF NOT EXISTS idx_metadata_path_slug ON note_metadata(user_id, normalized_path_slug);

-- Full-text search index
CREATE VIRTUAL TABLE IF NOT EXISTS note_fts USING fts5(
    user_id UNINDEXED,
    note_path UNINDEXED,
    title,
    body,
    content='',
    tokenize='porter unicode61',
    prefix='2 3'
);

-- Tag index
CREATE TABLE IF NOT EXISTS note_tags (
    user_id TEXT NOT NULL,
    note_path TEXT NOT NULL,
    tag TEXT NOT NULL,
    PRIMARY KEY (user_id, note_path, tag)
);

CREATE INDEX IF NOT EXISTS idx_tags_user_tag ON note_tags(user_id, tag);
CREATE INDEX IF NOT EXISTS idx_tags_user_path ON note_tags(user_id, note_path);

-- Link graph
CREATE TABLE IF NOT EXISTS note_links (
    user_id TEXT NOT NULL,
    source_path TEXT NOT NULL,
    target_path TEXT,
    link_text TEXT NOT NULL,
    is_resolved INTEGER NOT NULL DEFAULT 0,
    PRIMARY KEY (user_id, source_path, link_text)
);

CREATE INDEX IF NOT EXISTS idx_links_user_source ON note_links(user_id, source_path);
CREATE INDEX IF NOT EXISTS idx_links_user_target ON note_links(user_id, target_path);
CREATE INDEX IF NOT EXISTS idx_links_unresolved ON note_links(user_id, is_resolved);

-- Index health tracking
CREATE TABLE IF NOT EXISTS index_health (
    user_id TEXT PRIMARY KEY,
    note_count INTEGER NOT NULL DEFAULT 0,
    last_full_rebuild TEXT,
    last_incremental_update TEXT
);

COMMIT;
```

---

## Pydantic Models

Python data models using Pydantic for validation and serialization.

### User Models

```python
from pydantic import BaseModel, Field
from datetime import datetime
from typing import Optional


class HFProfile(BaseModel):
    """HuggingFace OAuth profile information."""
    username: str = Field(..., description="HF username")
    name: Optional[str] = Field(None, description="Display name")
    avatar_url: Optional[str] = Field(None, description="Profile picture URL")


class User(BaseModel):
    """User account with authentication info."""
    user_id: str = Field(..., min_length=1, max_length=64, description="Internal user ID")
    hf_profile: Optional[HFProfile] = Field(None, description="HF OAuth profile")
    vault_path: str = Field(..., description="Absolute path to user's vault")
    created: datetime = Field(..., description="Account creation timestamp")

    class Config:
        json_schema_extra = {
            "example": {
                "user_id": "alice",
                "hf_profile": {
                    "username": "alice",
                    "name": "Alice Smith",
                    "avatar_url": "https://cdn-avatars.huggingface.co/v1/alice"
                },
                "vault_path": "/data/vaults/alice",
                "created": "2025-01-15T10:30:00Z"
            }
        }
```

---

### Note Models

```python
from pathlib import Path
import re


class NoteMetadata(BaseModel):
    """Frontmatter metadata (arbitrary key-value pairs)."""
    title: Optional[str] = None
    tags: Optional[list[str]] = None
    project: Optional[str] = None
    created: Optional[datetime] = None
    updated: Optional[datetime] = None

    class Config:
        extra = "allow"  # Allow arbitrary fields


class Note(BaseModel):
    """Complete note with content and metadata."""
    user_id: str = Field(..., description="Owner user ID")
    note_path: str = Field(
        ...,
        min_length=1,
        max_length=256,
        description="Relative path to vault root (includes .md)"
    )
    version: int = Field(..., ge=1, description="Optimistic concurrency version")
    title: str = Field(..., min_length=1, description="Display title")
    metadata: NoteMetadata = Field(default_factory=NoteMetadata, description="Frontmatter")
    body: str = Field(..., description="Markdown content")
    created: datetime = Field(..., description="Creation timestamp")
    updated: datetime = Field(..., description="Last update timestamp")
    size_bytes: int = Field(..., ge=0, le=1_048_576, description="File size in bytes")

    @validator("note_path")
    def validate_path(cls, v):
        """Validate note path format."""
        # Must end with .md
        if not v.endswith('.md'):
            raise ValueError("Note path must end with .md")

        # Must not contain ..
        if '..' in v:
            raise ValueError("Note path must not contain '..'")

        # Must use Unix-style separators
        if '\\' in v:
            raise ValueError("Note path must use Unix-style separators (/)")

        # Must not start with /
        if v.startswith('/'):
            raise ValueError("Note path must be relative (no leading /)")

        return v

    class Config:
        json_schema_extra = {
            "example": {
                "user_id": "alice",
                "note_path": "api/design.md",
                "version": 5,
                "title": "API Design",
                "metadata": {
                    "tags": ["backend", "api"],
                    "project": "auth-service"
                },
                "body": "# API Design\n\nThis document describes...",
                "created": "2025-01-10T09:00:00Z",
                "updated": "2025-01-15T14:30:00Z",
                "size_bytes": 4096
            }
        }


class NoteCreate(BaseModel):
    """Request to create a new note."""
    note_path: str = Field(..., min_length=1, max_length=256)
    title: Optional[str] = None
    metadata: Optional[NoteMetadata] = None
    body: str = Field(..., max_length=1_048_576)


class NoteUpdate(BaseModel):
    """Request to update an existing note."""
    title: Optional[str] = None
    metadata: Optional[NoteMetadata] = None
    body: str = Field(..., max_length=1_048_576)
    if_version: Optional[int] = Field(None, ge=1, description="Expected version for concurrency check")


class NoteSummary(BaseModel):
    """Lightweight note summary for listings."""
    note_path: str
    title: str
    updated: datetime
```

---

### Index Models

```python
class Wikilink(BaseModel):
    """Bidirectional link between notes."""
    user_id: str
    source_path: str
    target_path: Optional[str] = Field(None, description="Null if unresolved")
    link_text: str
    is_resolved: bool

    class Config:
        json_schema_extra = {
            "example": {
                "user_id": "alice",
                "source_path": "api/design.md",
                "target_path": "api/endpoints.md",
                "link_text": "Endpoints",
                "is_resolved": True
            }
        }


class Tag(BaseModel):
    """Tag with note count."""
    tag_name: str
    count: int = Field(..., ge=0)


class IndexHealth(BaseModel):
    """Index state and freshness metrics."""
    user_id: str
    note_count: int = Field(..., ge=0)
    last_full_rebuild: Optional[datetime] = None
    last_incremental_update: Optional[datetime] = None

    class Config:
        json_schema_extra = {
            "example": {
                "user_id": "alice",
                "note_count": 142,
                "last_full_rebuild": "2025-01-01T00:00:00Z",
                "last_incremental_update": "2025-01-15T14:30:00Z"
            }
        }
```

---

### Search Models

```python
class SearchResult(BaseModel):
    """Full-text search result with snippet."""
    note_path: str
    title: str
    snippet: str = Field(..., description="Highlighted excerpt from body")
    score: float = Field(..., description="Relevance score (title 3x, body 1x, recency bonus)")
    updated: datetime


class SearchRequest(BaseModel):
    """Full-text search query."""
    query: str = Field(..., min_length=1, max_length=256)
    limit: int = Field(50, ge=1, le=100)
```

---

### Authentication Models

```python
class TokenResponse(BaseModel):
    """JWT token issuance response."""
    token: str = Field(..., description="JWT access token")
    token_type: str = Field("bearer", description="Token type")
    expires_at: datetime = Field(..., description="Expiration timestamp")


class JWTPayload(BaseModel):
    """JWT claims payload."""
    sub: str = Field(..., description="Subject (user_id)")
    iat: int = Field(..., description="Issued at (Unix timestamp)")
    exp: int = Field(..., description="Expiration (Unix timestamp)")
```

---

## TypeScript Type Definitions

Frontend type definitions for API contracts.

### Core Types

```typescript
/**
 * User account with HF profile
 */
export interface User {
  user_id: string;
  hf_profile?: {
    username: string;
    name?: string;
    avatar_url?: string;
  };
  vault_path: string;
  created: string;  // ISO 8601
}

/**
 * Note metadata (frontmatter)
 */
export interface NoteMetadata {
  title?: string;
  tags?: string[];
  project?: string;
  created?: string;  // ISO 8601
  updated?: string;  // ISO 8601
  [key: string]: unknown;  // Arbitrary fields
}

/**
 * Complete note with content
 */
export interface Note {
  user_id: string;
  note_path: string;
  version: number;
  title: string;
  metadata: NoteMetadata;
  body: string;
  created: string;  // ISO 8601
  updated: string;  // ISO 8601
  size_bytes: number;
}

/**
 * Lightweight note summary for listings
 */
export interface NoteSummary {
  note_path: string;
  title: string;
  updated: string;  // ISO 8601
}

/**
 * Request to create a note
 */
export interface NoteCreateRequest {
  note_path: string;
  title?: string;
  metadata?: NoteMetadata;
  body: string;
}

/**
 * Request to update a note
 */
export interface NoteUpdateRequest {
  title?: string;
  metadata?: NoteMetadata;
  body: string;
  if_version?: number;  // Optimistic concurrency
}

/**
 * Wikilink with resolution status
 */
export interface Wikilink {
  user_id: string;
  source_path: string;
  target_path: string | null;  // Null if unresolved
  link_text: string;
  is_resolved: boolean;
}

/**
 * Tag with note count
 */
export interface Tag {
  tag_name: string;
  count: number;
}

/**
 * Index health metrics
 */
export interface IndexHealth {
  user_id: string;
  note_count: number;
  last_full_rebuild: string | null;  // ISO 8601
  last_incremental_update: string | null;  // ISO 8601
}

/**
 * Search result with snippet
 */
export interface SearchResult {
  note_path: string;
  title: string;
  snippet: string;
  score: number;
  updated: string;  // ISO 8601
}

/**
 * JWT token response
 */
export interface TokenResponse {
  token: string;
  token_type: "bearer";
  expires_at: string;  // ISO 8601
}

/**
 * API error response
 */
export interface APIError {
  error: string;
  message: string;
  detail?: Record<string, unknown>;
}
```

---

### Validation Helpers

```typescript
/**
 * Validate note path format
 */
export function isValidNotePath(path: string): boolean {
  return (
    path.length > 0 &&
    path.length <= 256 &&
    path.endsWith('.md') &&
    !path.includes('..') &&
    !path.includes('\\') &&
    !path.startsWith('/')
  );
}

/**
 * Normalize tag name (lowercase, trim)
 */
export function normalizeTag(tag: string): string {
  return tag.toLowerCase().trim();
}

/**
 * Normalize slug for wikilink resolution
 */
export function normalizeSlug(text: string): string {
  return text
    .toLowerCase()
    .replace(/[\s_]+/g, '-')  // Spaces/underscores → dash
    .replace(/[^a-z0-9-]/g, '')  // Keep alphanumeric + dash
    .replace(/-+/g, '-')  // Collapse dashes
    .replace(/^-+|-+$/g, '');  // Trim dashes
}

/**
 * Extract wikilinks from markdown body
 */
export function extractWikilinks(markdown: string): string[] {
  const pattern = /\[\[([^\]]+)\]\]/g;
  const matches: string[] = [];
  let match;

  while ((match = pattern.exec(markdown)) !== null) {
    matches.push(match[1]);
  }

  return matches;
}
```

---

## Validation Rules

Comprehensive validation constraints for all entities.

### Note Path Validation

```python
import re
from pathlib import Path

def validate_note_path(path: str) -> tuple[bool, str]:
    """
    Validate note path format.

    Returns (is_valid, error_message).
    """
    # Length check
    if not path or len(path) > 256:
        return False, "Path must be 1-256 characters"

    # Must end with .md
    if not path.endswith('.md'):
        return False, "Path must end with .md"

    # Must not contain ..
    if '..' in path:
        return False, "Path must not contain '..'"

    # Must use Unix-style separators
    if '\\' in path:
        return False, "Path must use Unix separators (/)"

    # Must be relative
    if path.startswith('/'):
        return False, "Path must be relative (no leading /)"

    # Must not have invalid characters
    invalid_chars = ['<', '>', ':', '"', '|', '?', '*']
    if any(c in path for c in invalid_chars):
        return False, f"Path contains invalid characters: {invalid_chars}"

    return True, ""


def sanitize_path(user_id: str, vault_root: str, note_path: str) -> Path:
    """
    Sanitize and resolve note path within vault.

    Raises ValueError if path escapes vault root.
    """
    vault = Path(vault_root) / user_id
    full_path = (vault / note_path).resolve()

    # Ensure path is within vault
    if not str(full_path).startswith(str(vault.resolve())):
        raise ValueError(f"Path escapes vault root: {note_path}")

    return full_path
```

---

### Note Content Validation

```python
def validate_note_content(body: str) -> tuple[bool, str]:
    """
    Validate note content.

    Returns (is_valid, error_message).
    """
    # Size check (1 MiB max)
    size_bytes = len(body.encode('utf-8'))
    if size_bytes > 1_048_576:
        return False, f"Note exceeds 1 MiB limit ({size_bytes} bytes)"

    # UTF-8 validity
    try:
        body.encode('utf-8')
    except UnicodeEncodeError as e:
        return False, f"Invalid UTF-8 encoding: {e}"

    return True, ""


def validate_frontmatter(metadata: dict) -> tuple[bool, str]:
    """
    Validate frontmatter metadata.

    Returns (is_valid, error_message).
    """
    # Check for reserved fields
    reserved = ['version']  # Version is managed by index, not frontmatter
    for key in metadata.keys():
        if key in reserved:
            return False, f"Field '{key}' is reserved and cannot be set in frontmatter"

    # Validate tags format
    if 'tags' in metadata:
        tags = metadata['tags']
        if not isinstance(tags, list):
            return False, "Field 'tags' must be an array"

        if not all(isinstance(t, str) for t in tags):
            return False, "All tags must be strings"

    return True, ""
```

---

### Vault Limits

```python
def check_vault_limit(user_id: str, db) -> tuple[bool, str]:
    """
    Check if vault is within note limit.

    Returns (is_allowed, error_message).
    """
    cursor = db.execute(
        "SELECT note_count FROM index_health WHERE user_id = ?",
        (user_id,)
    )
    row = cursor.fetchone()

    if row is None:
        return True, ""  # New vault, no limit yet

    note_count = row[0]

    if note_count >= 5000:
        return False, "Vault note limit exceeded (max 5,000 notes)"

    return True, ""
```

---

### Token Validation

```python
import jwt
from datetime import datetime, timedelta

SECRET_KEY = "your-secret-key"  # From env var

def create_jwt(user_id: str) -> str:
    """Create JWT with 90-day expiration."""
    now = datetime.utcnow()
    payload = {
        "sub": user_id,
        "iat": int(now.timestamp()),
        "exp": int((now + timedelta(days=90)).timestamp())
    }
    return jwt.encode(payload, SECRET_KEY, algorithm="HS256")


def validate_jwt(token: str) -> tuple[bool, str, str]:
    """
    Validate JWT and extract user_id.

    Returns (is_valid, user_id, error_message).
    """
    try:
        payload = jwt.decode(token, SECRET_KEY, algorithms=["HS256"])
        user_id = payload["sub"]
        return True, user_id, ""

    except jwt.ExpiredSignatureError:
        return False, "", "Token expired"

    except jwt.InvalidTokenError as e:
        return False, "", f"Invalid token: {e}"
```

---

## State Transitions

State machines for entity lifecycle and version management.

### Note Lifecycle

```mermaid
stateDiagram-v2
    [*] --> Creating: write_note (new path)
    Creating --> Active: save to filesystem + insert metadata
    Active --> Updating: write_note (existing path)
    Updating --> Active: increment version + update index
    Active --> Deleting: delete_note
    Deleting --> [*]: remove file + delete index rows

    Updating --> ConflictDetected: if_version mismatch (UI writes only)
    ConflictDetected --> Active: reload or discard changes
```

**State descriptions**:

1. **Creating**: Note does not exist, first write in progress
   - Validate path and content
   - Set `version = 1`, `created = now()`, `updated = now()`
   - Write file to filesystem
   - Insert rows into `note_metadata`, `note_fts`, `note_tags`, `note_links`

2. **Active**: Note exists and is readable/editable
   - Can be read via API/MCP
   - Can be updated via write_note
   - Can be deleted via delete_note

3. **Updating**: Modification in progress
   - Load current metadata (version, timestamps)
   - If UI write: check `if_version` matches current version
   - If version mismatch: transition to ConflictDetected
   - If MCP write: skip version check (last-write-wins)
   - Increment `version`, set `updated = now()`
   - Update file content
   - Update all index rows (delete old, insert new)

4. **ConflictDetected**: Optimistic concurrency conflict (UI only)
   - Return `409 Conflict` with current and expected versions
   - UI displays error: "Note changed since you opened it"
   - User options: reload, save as copy, or discard changes

5. **Deleting**: Removal in progress
   - Delete file from filesystem
   - Delete rows from `note_metadata`, `note_fts`, `note_tags`, `note_links`
   - Update backlinks (any note linking to deleted note now has unresolved link)
   - Decrement `index_health.note_count`

---

### Version Increment Logic

```python
def increment_version(user_id: str, note_path: str, if_version: int | None, db) -> int:
    """
    Increment note version with optional concurrency check.

    Returns new version number.
    Raises ConflictError if if_version doesn't match.
    """
    # Get current version
    cursor = db.execute(
        "SELECT version FROM note_metadata WHERE user_id = ? AND note_path = ?",
        (user_id, note_path)
    )
    row = cursor.fetchone()

    if row is None:
        # New note
        return 1

    current_version = row[0]

    # Optimistic concurrency check (UI writes only)
    if if_version is not None and current_version != if_version:
        raise ConflictError(
            f"Version conflict: expected {if_version}, current is {current_version}"
        )

    # Increment version
    new_version = current_version + 1

    return new_version
```

---

### Index Update Workflow

```mermaid
stateDiagram-v2
    [*] --> IncrementalUpdate: write_note or delete_note
    IncrementalUpdate --> DeleteOldRows: start transaction
    DeleteOldRows --> ExtractMetadata: parse frontmatter + body
    ExtractMetadata --> InsertNewRows: insert into all index tables
    InsertNewRows --> UpdateHealth: set last_incremental_update
    UpdateHealth --> [*]: commit transaction

    [*] --> FullRebuild: POST /api/index/rebuild
    FullRebuild --> DropUserRows: delete all rows for user_id
    DropUserRows --> ScanVault: walk filesystem tree
    ScanVault --> ProcessNote: for each .md file
    ProcessNote --> ExtractMetadata
    ProcessNote --> ScanVault: next file
    ScanVault --> UpdateHealth: set last_full_rebuild
    UpdateHealth --> [*]: commit transaction
```

**Incremental update** (on every write/delete):
1. Start SQLite transaction
2. Delete all existing rows for `(user_id, note_path)` from:
   - `note_metadata`
   - `note_fts`
   - `note_tags`
   - `note_links`
3. Parse note content (frontmatter + body)
4. Extract: title, tags, wikilinks
5. Insert new rows into all index tables
6. Resolve wikilinks and update `is_resolved` flags
7. Update `index_health.last_incremental_update = now()`
8. Commit transaction

**Full rebuild** (manual trigger):
1. Start SQLite transaction
2. Delete all rows for `user_id` from all index tables
3. Walk vault directory tree, find all `.md` files
4. For each file:
   - Parse frontmatter + body
   - Extract metadata, tags, wikilinks
   - Insert rows into all index tables
5. Resolve all wikilinks (second pass after all notes indexed)
6. Update `index_health.note_count` and `last_full_rebuild = now()`
7. Commit transaction

---

## Relationships and Constraints

### Foreign Key Relationships

While SQLite supports foreign keys, we don't enforce them for performance reasons (multi-tenant with user-scoped queries). Instead, we rely on application-level referential integrity.

**Logical relationships**:
- `note_metadata.user_id` → `User.user_id`
- `note_tags.note_path` → `note_metadata.note_path`
- `note_links.source_path` → `note_metadata.note_path`
- `note_links.target_path` → `note_metadata.note_path` (nullable)

**Cascade semantics** (application-enforced):
- On delete note: cascade delete from `note_tags`, `note_links` (source), `note_fts`
- On delete note: update `note_links` (target) to set `is_resolved = false`

---

### Uniqueness Constraints

| Table | Unique Constraint | Enforced By |
|-------|------------------|-------------|
| `note_metadata` | `(user_id, note_path)` | PRIMARY KEY |
| `note_tags` | `(user_id, note_path, tag)` | PRIMARY KEY |
| `note_links` | `(user_id, source_path, link_text)` | PRIMARY KEY |
| `index_health` | `user_id` | PRIMARY KEY |

---

### Cardinality

| Relationship | Type | Notes |
|-------------|------|-------|
| User → Vault | 1:1 | One user owns one vault |
| Vault → Notes | 1:N | One vault contains many notes (max 5,000) |
| Note → Tags | N:M | Many-to-many via `note_tags` junction table |
| Note → Outgoing Links | 1:N | One note has many outgoing wikilinks |
| Note → Backlinks | 1:N | One note may be referenced by many backlinks |
| User → Tokens | 1:N | One user can issue multiple JWT tokens |

---

### Invariants

Critical invariants maintained by the system:

1. **Version monotonicity**: `note.version` only increases (never decreases or resets)
2. **Timestamp ordering**: `note.created <= note.updated` always
3. **Path uniqueness**: No two notes with same `(user_id, note_path)` can exist
4. **Size limit**: `note.size_bytes <= 1_048_576` always enforced
5. **Vault limit**: `COUNT(*) WHERE user_id = X <= 5000` enforced before writes
6. **Link consistency**: If `note_links.target_path` is not null, target note must exist
7. **Tag normalization**: All `note_tags.tag` values are lowercase
8. **Index freshness**: `index_health.last_incremental_update` is always >= most recent `note_metadata.updated` for that user

---

## Appendix

### Common Queries Reference

```sql
-- Get all notes for user, sorted by recent update
SELECT note_path, title, updated
FROM note_metadata
WHERE user_id = ?
ORDER BY updated DESC
LIMIT 100;

-- Full-text search with title boost
SELECT
    note_path,
    title,
    snippet(note_fts, 3, '<mark>', '</mark>', '...', 32) AS snippet,
    bm25(note_fts, 3.0, 1.0) AS score
FROM note_fts
WHERE user_id = ? AND note_fts MATCH ?
ORDER BY score DESC
LIMIT 50;

-- Get all tags with counts
SELECT tag, COUNT(DISTINCT note_path) as count
FROM note_tags
WHERE user_id = ?
GROUP BY tag
ORDER BY count DESC;

-- Get backlinks for a note
SELECT DISTINCT l.source_path, m.title
FROM note_links l
JOIN note_metadata m ON l.user_id = m.user_id AND l.source_path = m.note_path
WHERE l.user_id = ? AND l.target_path = ?
ORDER BY m.updated DESC;

-- Get all unresolved wikilinks for a user
SELECT source_path, link_text, COUNT(*) as occurrences
FROM note_links
WHERE user_id = ? AND is_resolved = 0
GROUP BY source_path, link_text
ORDER BY occurrences DESC;

-- Check index health
SELECT note_count, last_full_rebuild, last_incremental_update
FROM index_health
WHERE user_id = ?;
```

---

### Migration Strategy

For future schema changes:

```python
# Example migration: Add column to note_metadata
def migrate_v1_to_v2(db):
    """Add normalized_title_slug column."""
    db.execute("""
        ALTER TABLE note_metadata
        ADD COLUMN normalized_title_slug TEXT;
    """)

    # Backfill existing notes
    db.execute("""
        UPDATE note_metadata
        SET normalized_title_slug = LOWER(
            REPLACE(REPLACE(title, ' ', '-'), '_', '-')
        );
    """)

    db.execute("""
        CREATE INDEX idx_metadata_title_slug
        ON note_metadata(user_id, normalized_title_slug);
    """)

    db.commit()
```

---

**Document Status**: Draft
**Last Updated**: 2025-11-15
**Next Review**: After Phase 1 implementation