bigwolfe
sql query fully fixed
be2299f
# Data Model: Multi-Tenant Obsidian-Like Docs Viewer
**Feature Branch**: `001-obsidian-docs-viewer`
**Created**: 2025-11-15
**Status**: Draft
## Table of Contents
1. [Overview](#overview)
2. [Entity Relationship Diagram](#entity-relationship-diagram)
3. [Core Entities](#core-entities)
4. [Index Entities](#index-entities)
5. [Authentication Entities](#authentication-entities)
6. [SQLite Schema](#sqlite-schema)
7. [Pydantic Models](#pydantic-models)
8. [TypeScript Type Definitions](#typescript-type-definitions)
9. [Validation Rules](#validation-rules)
10. [State Transitions](#state-transitions)
11. [Relationships and Constraints](#relationships-and-constraints)
---
## Overview
This document defines the complete data model for a multi-tenant Obsidian-like documentation viewer. The system stores:
- **User accounts** with HF OAuth identity mapping
- **Vaults** as per-user directory trees containing Markdown notes
- **Notes** with YAML frontmatter, version tracking, and full-text indexing
- **Wikilinks** for bidirectional linking between notes
- **Tags** for categorization and filtering
- **Index metadata** for search optimization and health monitoring
**Design principles**:
- **Per-user isolation**: All data scoped by `user_id`
- **Filesystem-backed**: Notes stored as `.md` files, metadata in SQLite
- **Version-controlled**: Integer version counter for optimistic concurrency
- **Search-optimized**: SQLite FTS5 for full-text search, separate indexes for tags and links
---
## Entity Relationship Diagram
```mermaid
erDiagram
USER ||--o{ VAULT : owns
VAULT ||--o{ NOTE : contains
NOTE ||--o{ WIKILINK : has_outgoing
NOTE ||--o{ WIKILINK : has_incoming
NOTE ||--o{ TAG : tagged_with
USER ||--|| INDEX_HEALTH : tracks
USER ||--o{ TOKEN : issues
USER {
string user_id PK
string hf_username
string hf_name
string hf_avatar_url
datetime created
}
VAULT {
string user_id FK
string root_path
int note_count
}
NOTE {
string user_id FK
string note_path PK
int version
string title
json metadata
string body
datetime created
datetime updated
int size_bytes
}
WIKILINK {
string user_id FK
string source_path FK
string target_path FK
string link_text
bool is_resolved
}
TAG {
string user_id FK
string tag_name
string note_path FK
}
INDEX_HEALTH {
string user_id PK
int note_count
datetime last_full_rebuild
datetime last_incremental_update
}
TOKEN {
string token_id PK
string user_id FK
datetime issued_at
datetime expires_at
string token_type
}
```
**Key relationships**:
- One user owns one vault (1:1)
- One vault contains many notes (1:N)
- One note has many outgoing wikilinks (1:N)
- One note may be referenced by many backlinks (1:N)
- One note can have many tags (N:M via junction table)
- One user has one index health record (1:1)
- One user can issue many tokens (1:N)
---
## Core Entities
### User
Represents an authenticated user with HF OAuth identity.
**Attributes**:
- `user_id` (string, PK): Internal unique identifier, derived from HF username or UUID
- `hf_username` (string, nullable): HuggingFace username (e.g., "alice")
- `hf_name` (string, nullable): Display name from HF profile
- `hf_avatar_url` (string, nullable): Profile picture URL
- `created` (datetime): Account creation timestamp (ISO 8601)
**Notes**:
- In local mode, `user_id = "local-dev"` with null HF fields
- In HF Space mode, `user_id = hf_username` (normalized to lowercase)
- `created` timestamp set on first OAuth login (vault initialization)
**Lifecycle**:
1. User authenticates via HF OAuth
2. Backend maps HF identity to `user_id`
3. If new user, create vault directory and initialize index
4. Return user info to frontend
---
### Vault
A per-user directory tree containing Markdown notes.
**Attributes**:
- `user_id` (string, FK): Owner of the vault
- `root_path` (string): Absolute filesystem path to vault root (e.g., `/data/vaults/alice/`)
- `note_count` (int): Cached count of notes in vault (denormalized from index)
**Constraints**:
- Max 5,000 notes per vault (enforced by FR-008)
- Root path must exist and be writable
- Directory structure is arbitrary (user-defined nested folders)
**Filesystem layout example**:
```
/data/vaults/alice/
β”œβ”€β”€ README.md
β”œβ”€β”€ api/
β”‚ β”œβ”€β”€ design.md
β”‚ └── endpoints.md
β”œβ”€β”€ guides/
β”‚ β”œβ”€β”€ setup.md
β”‚ └── deployment.md
└── notes/
└── meeting-2025-01-15.md
```
---
### Note
A Markdown file with optional YAML frontmatter and body content.
**Attributes**:
- `user_id` (string, FK): Owner of the note
- `note_path` (string, PK): Relative path to vault root, includes `.md` (e.g., `api/design.md`)
- `version` (int): Optimistic concurrency version counter (starts at 1, increments on write)
- `title` (string): Display title (from frontmatter, first H1, or filename stem)
- `metadata` (JSON): Frontmatter key-value pairs (excludes auto-managed fields)
- `body` (string): Markdown content (excluding frontmatter)
- `created` (datetime, ISO 8601): Creation timestamp (auto-set if not in frontmatter)
- `updated` (datetime, ISO 8601): Last modification timestamp (auto-set on every write)
- `size_bytes` (int): UTF-8 byte size of full file content (frontmatter + body)
**Constraints**:
- `note_path` max 256 characters, Unix-style separators (`/`), no `..` allowed
- `size_bytes` max 1 MiB (1,048,576 bytes) per FR-007
- `version` stored in index, NOT in frontmatter
- `created` and `updated` stored in index, MAY appear in frontmatter (frontmatter is source of truth on read)
**Title resolution priority** (FR-006):
1. `metadata.get('title')` from frontmatter
2. First `# Heading` in body (H1 only)
3. Filename stem (e.g., `design.md` β†’ "design")
**Metadata fields** (common, but arbitrary):
- `tags` (array of strings): Tag names for categorization
- `project` (string): Project identifier
- `created` (datetime): User-provided creation timestamp
- `updated` (datetime): User-provided update timestamp
- Custom fields allowed (JSON object)
---
## Index Entities
### Wikilink
Represents a bidirectional link between two notes.
**Attributes**:
- `user_id` (string, FK): Owner of the notes
- `source_path` (string, FK): Path of note containing the wikilink
- `target_path` (string, nullable, FK): Resolved path of linked note (null if unresolved)
- `link_text` (string): Original text from `[[link text]]`
- `is_resolved` (bool): True if `target_path` is non-null, false if broken link
**Extraction**:
- Regex pattern: `\[\[([^\]]+)\]\]`
- Extract all matches from note body on every write
**Resolution algorithm** (FR-015, FR-016):
1. Normalize `link_text` to slug: lowercase, replace spaces/underscores with dash, strip non-alphanumeric
2. Match normalized slug against:
- Normalized filename stems (e.g., `api-design.md` β†’ "api-design")
- Normalized frontmatter titles (e.g., `title: "API Design"` β†’ "api-design")
3. If multiple matches:
- Prefer same-folder match (e.g., `api/[[design]]` β†’ `api/design.md` over `guides/design.md`)
- Tiebreaker: lexicographically smallest path
4. If no match: `target_path = null`, `is_resolved = false`
**Slug normalization function**:
```python
import re
def normalize_slug(text: str) -> str:
text = text.lower()
text = re.sub(r'[\s_]+', '-', text) # Spaces/underscores β†’ dash
text = re.sub(r'[^a-z0-9-]', '', text) # Keep alphanumeric + dash
text = re.sub(r'-+', '-', text) # Collapse dashes
return text.strip('-')
```
**Backlinks**:
- To get backlinks for `note_path`, query: `WHERE target_path = note_path`
- Backlinks are automatically updated when any note's wikilinks change
---
### Tag
A metadata label applied to notes for categorization.
**Attributes**:
- `user_id` (string, FK): Owner of the notes
- `note_path` (string, FK): Path of tagged note
- `tag_name` (string): Tag identifier (lowercase, alphanumeric + hyphens)
**Constraints**:
- Many-to-many relationship: one note can have multiple tags, one tag can apply to multiple notes
- Tag names normalized: lowercase, strip whitespace
- Extracted from frontmatter `tags: [tag1, tag2]` array
**Tag count**:
- Computed via `COUNT(DISTINCT note_path) GROUP BY tag_name`
- Used for tag cloud and filtering UI
---
### Index Health
Tracks the state and freshness of per-user indices.
**Attributes**:
- `user_id` (string, PK): Owner of the index
- `note_count` (int): Total number of notes indexed
- `last_full_rebuild` (datetime, nullable, ISO 8601): Timestamp of last full index rebuild
- `last_incremental_update` (datetime, nullable, ISO 8601): Timestamp of last incremental update (write/delete)
**Usage**:
- Displayed in UI as index health indicator
- Used to detect stale indices (e.g., `note_count` mismatch with actual file count)
- Manual rebuild sets `last_full_rebuild = now()` (FR-019)
- Every write/delete sets `last_incremental_update = now()` (FR-018)
---
## Authentication Entities
### Token (JWT)
A signed JSON Web Token used for API and MCP authentication.
**JWT Claims** (payload):
- `sub` (string): Subject (user_id)
- `iat` (int): Issued at timestamp (Unix epoch)
- `exp` (int): Expiration timestamp (Unix epoch, iat + 90 days)
**Header**:
- `alg: "HS256"`: HMAC SHA-256 signature algorithm
- `typ: "JWT"`: Token type
**Signature**:
- Signed with server secret (env var `JWT_SECRET_KEY`)
- Validated on every API/MCP request via `Authorization: Bearer <token>` header
**Token lifecycle**:
1. User authenticates via HF OAuth
2. User calls `POST /api/tokens` to issue JWT
3. Frontend stores token in memory (React context)
4. MCP clients pass token to `auth` parameter (FastMCP HTTP transport)
5. Server validates token on every request, extracts `user_id` from `sub` claim
6. Token expires after 90 days, user must re-authenticate
**Example token**:
```json
{
"header": {
"alg": "HS256",
"typ": "JWT"
},
"payload": {
"sub": "alice",
"iat": 1736956800,
"exp": 1744732800
},
"signature": "<HMAC-SHA256-signature>"
}
```
Encoded: `eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJhbGljZSIsImlhdCI6MTczNjk1NjgwMCwiZXhwIjoxNzQ0NzMyODAwfQ.<signature>`
---
## SQLite Schema
Complete DDL for multi-index storage with per-user isolation.
### Core Tables
#### note_metadata
Stores note metadata for fast lookups and version tracking.
```sql
CREATE TABLE IF NOT EXISTS note_metadata (
user_id TEXT NOT NULL,
note_path TEXT NOT NULL,
version INTEGER NOT NULL DEFAULT 1,
title TEXT NOT NULL,
created TEXT NOT NULL, -- ISO 8601 timestamp
updated TEXT NOT NULL, -- ISO 8601 timestamp
size_bytes INTEGER NOT NULL DEFAULT 0,
normalized_title_slug TEXT, -- Pre-computed for wikilink resolution
normalized_path_slug TEXT, -- Pre-computed for wikilink resolution
PRIMARY KEY (user_id, note_path)
);
CREATE INDEX idx_metadata_user ON note_metadata(user_id);
CREATE INDEX idx_metadata_updated ON note_metadata(user_id, updated DESC);
CREATE INDEX idx_metadata_title_slug ON note_metadata(user_id, normalized_title_slug);
CREATE INDEX idx_metadata_path_slug ON note_metadata(user_id, normalized_path_slug);
```
**Notes**:
- Composite primary key: `(user_id, note_path)`
- `version` starts at 1, increments on every write
- `normalized_*_slug` columns enable O(1) wikilink resolution
- Index on `updated DESC` for recency-based sorting
---
#### note_fts
Full-text search index using SQLite FTS5.
```sql
CREATE VIRTUAL TABLE IF NOT EXISTS note_fts USING fts5(
user_id UNINDEXED,
note_path UNINDEXED,
title,
body,
content='', -- Contentless (external content pattern)
tokenize='porter unicode61', -- Stemming + Unicode support
prefix='2 3' -- Prefix indexes for autocomplete
);
```
**Notes**:
- `content=''` (contentless): We manually INSERT/DELETE rows, no automatic sync
- `UNINDEXED` columns are retrievable but not searchable (used for IDs)
- `porter` tokenizer: English stemming (e.g., "running" matches "run")
- `prefix='2 3'`: Enables fast `MATCH 'prefix*'` queries (2-char and 3-char prefixes)
- Manual row management: On write, `DELETE` old row + `INSERT` new row
- Ranking: Use `bm25(note_fts, 3.0, 1.0)` for title weight=3x, body weight=1x
**Query example**:
```sql
SELECT
note_path,
title,
bm25(note_fts, 3.0, 1.0) AS rank
FROM note_fts
WHERE user_id = ? AND note_fts MATCH ?
ORDER BY rank DESC
LIMIT 50;
```
**Safety**: Incoming queries are tokenized into alphanumeric terms (per requirement to split on non-alphanumeric characters), each optionally preserving a trailing `*` for prefix searches, then wrapped in double quotes before being passed to `MATCH`. This neutralizes MATCH operators, trims punctuation such as apostrophes/ampersands, and prevents SQL syntax errors while preserving simple keyword semantics.
---
#### note_tags
Many-to-many junction table for note-tag relationships.
```sql
CREATE TABLE IF NOT EXISTS note_tags (
user_id TEXT NOT NULL,
note_path TEXT NOT NULL,
tag TEXT NOT NULL,
PRIMARY KEY (user_id, note_path, tag)
);
CREATE INDEX idx_tags_user_tag ON note_tags(user_id, tag);
CREATE INDEX idx_tags_user_path ON note_tags(user_id, note_path);
```
**Notes**:
- Composite primary key prevents duplicate tag assignments
- Index on `(user_id, tag)` for "all notes with tag X" queries
- Index on `(user_id, note_path)` for "all tags for note Y" queries
**Query examples**:
```sql
-- Get all notes with tag "backend"
SELECT DISTINCT note_path, title
FROM note_tags t
JOIN note_metadata m USING (user_id, note_path)
WHERE t.user_id = ? AND t.tag = ?
ORDER BY m.updated DESC;
-- Get tag counts for user
SELECT tag, COUNT(DISTINCT note_path) as count
FROM note_tags
WHERE user_id = ?
GROUP BY tag
ORDER BY count DESC;
```
---
#### note_links
Stores wikilink graph for backlink navigation.
```sql
CREATE TABLE IF NOT EXISTS note_links (
user_id TEXT NOT NULL,
source_path TEXT NOT NULL,
target_path TEXT, -- NULL if unresolved
link_text TEXT NOT NULL,
is_resolved INTEGER NOT NULL DEFAULT 0, -- Boolean: 0=broken, 1=resolved
PRIMARY KEY (user_id, source_path, link_text)
);
CREATE INDEX idx_links_user_source ON note_links(user_id, source_path);
CREATE INDEX idx_links_user_target ON note_links(user_id, target_path);
CREATE INDEX idx_links_unresolved ON note_links(user_id, is_resolved);
```
**Notes**:
- `target_path` is nullable (null = broken link)
- `is_resolved` is integer (0 or 1) for SQLite boolean representation
- Composite primary key prevents duplicate links from same source with same text
- Index on `target_path` enables fast backlink queries
**Query examples**:
```sql
-- Get backlinks for a note
SELECT DISTINCT l.source_path, m.title
FROM note_links l
JOIN note_metadata m ON l.user_id = m.user_id AND l.source_path = m.note_path
WHERE l.user_id = ? AND l.target_path = ?
ORDER BY m.updated DESC;
-- Get all unresolved links for user
SELECT source_path, link_text
FROM note_links
WHERE user_id = ? AND is_resolved = 0;
```
---
#### index_health
Tracks index state and freshness per user.
```sql
CREATE TABLE IF NOT EXISTS index_health (
user_id TEXT PRIMARY KEY,
note_count INTEGER NOT NULL DEFAULT 0,
last_full_rebuild TEXT, -- ISO 8601 timestamp
last_incremental_update TEXT -- ISO 8601 timestamp
);
```
**Notes**:
- One row per user
- `last_full_rebuild` set on manual rebuild (FR-042)
- `last_incremental_update` set on every write/delete (FR-018)
- `note_count` is denormalized cache for quick health checks
---
### Initialization Script
Complete schema initialization:
```sql
-- Enable FTS5 extension (usually built-in)
-- PRAGMA compile_options; -- Check if FTS5 is available
BEGIN TRANSACTION;
-- Core metadata table
CREATE TABLE IF NOT EXISTS note_metadata (
user_id TEXT NOT NULL,
note_path TEXT NOT NULL,
version INTEGER NOT NULL DEFAULT 1,
title TEXT NOT NULL,
created TEXT NOT NULL,
updated TEXT NOT NULL,
size_bytes INTEGER NOT NULL DEFAULT 0,
normalized_title_slug TEXT,
normalized_path_slug TEXT,
PRIMARY KEY (user_id, note_path)
);
CREATE INDEX IF NOT EXISTS idx_metadata_user ON note_metadata(user_id);
CREATE INDEX IF NOT EXISTS idx_metadata_updated ON note_metadata(user_id, updated DESC);
CREATE INDEX IF NOT EXISTS idx_metadata_title_slug ON note_metadata(user_id, normalized_title_slug);
CREATE INDEX IF NOT EXISTS idx_metadata_path_slug ON note_metadata(user_id, normalized_path_slug);
-- Full-text search index
CREATE VIRTUAL TABLE IF NOT EXISTS note_fts USING fts5(
user_id UNINDEXED,
note_path UNINDEXED,
title,
body,
content='',
tokenize='porter unicode61',
prefix='2 3'
);
-- Tag index
CREATE TABLE IF NOT EXISTS note_tags (
user_id TEXT NOT NULL,
note_path TEXT NOT NULL,
tag TEXT NOT NULL,
PRIMARY KEY (user_id, note_path, tag)
);
CREATE INDEX IF NOT EXISTS idx_tags_user_tag ON note_tags(user_id, tag);
CREATE INDEX IF NOT EXISTS idx_tags_user_path ON note_tags(user_id, note_path);
-- Link graph
CREATE TABLE IF NOT EXISTS note_links (
user_id TEXT NOT NULL,
source_path TEXT NOT NULL,
target_path TEXT,
link_text TEXT NOT NULL,
is_resolved INTEGER NOT NULL DEFAULT 0,
PRIMARY KEY (user_id, source_path, link_text)
);
CREATE INDEX IF NOT EXISTS idx_links_user_source ON note_links(user_id, source_path);
CREATE INDEX IF NOT EXISTS idx_links_user_target ON note_links(user_id, target_path);
CREATE INDEX IF NOT EXISTS idx_links_unresolved ON note_links(user_id, is_resolved);
-- Index health tracking
CREATE TABLE IF NOT EXISTS index_health (
user_id TEXT PRIMARY KEY,
note_count INTEGER NOT NULL DEFAULT 0,
last_full_rebuild TEXT,
last_incremental_update TEXT
);
COMMIT;
```
---
## Pydantic Models
Python data models using Pydantic for validation and serialization.
### User Models
```python
from pydantic import BaseModel, Field
from datetime import datetime
from typing import Optional
class HFProfile(BaseModel):
"""HuggingFace OAuth profile information."""
username: str = Field(..., description="HF username")
name: Optional[str] = Field(None, description="Display name")
avatar_url: Optional[str] = Field(None, description="Profile picture URL")
class User(BaseModel):
"""User account with authentication info."""
user_id: str = Field(..., min_length=1, max_length=64, description="Internal user ID")
hf_profile: Optional[HFProfile] = Field(None, description="HF OAuth profile")
vault_path: str = Field(..., description="Absolute path to user's vault")
created: datetime = Field(..., description="Account creation timestamp")
class Config:
json_schema_extra = {
"example": {
"user_id": "alice",
"hf_profile": {
"username": "alice",
"name": "Alice Smith",
"avatar_url": "https://cdn-avatars.huggingface.co/v1/alice"
},
"vault_path": "/data/vaults/alice",
"created": "2025-01-15T10:30:00Z"
}
}
```
---
### Note Models
```python
from pathlib import Path
import re
class NoteMetadata(BaseModel):
"""Frontmatter metadata (arbitrary key-value pairs)."""
title: Optional[str] = None
tags: Optional[list[str]] = None
project: Optional[str] = None
created: Optional[datetime] = None
updated: Optional[datetime] = None
class Config:
extra = "allow" # Allow arbitrary fields
class Note(BaseModel):
"""Complete note with content and metadata."""
user_id: str = Field(..., description="Owner user ID")
note_path: str = Field(
...,
min_length=1,
max_length=256,
description="Relative path to vault root (includes .md)"
)
version: int = Field(..., ge=1, description="Optimistic concurrency version")
title: str = Field(..., min_length=1, description="Display title")
metadata: NoteMetadata = Field(default_factory=NoteMetadata, description="Frontmatter")
body: str = Field(..., description="Markdown content")
created: datetime = Field(..., description="Creation timestamp")
updated: datetime = Field(..., description="Last update timestamp")
size_bytes: int = Field(..., ge=0, le=1_048_576, description="File size in bytes")
@validator("note_path")
def validate_path(cls, v):
"""Validate note path format."""
# Must end with .md
if not v.endswith('.md'):
raise ValueError("Note path must end with .md")
# Must not contain ..
if '..' in v:
raise ValueError("Note path must not contain '..'")
# Must use Unix-style separators
if '\\' in v:
raise ValueError("Note path must use Unix-style separators (/)")
# Must not start with /
if v.startswith('/'):
raise ValueError("Note path must be relative (no leading /)")
return v
class Config:
json_schema_extra = {
"example": {
"user_id": "alice",
"note_path": "api/design.md",
"version": 5,
"title": "API Design",
"metadata": {
"tags": ["backend", "api"],
"project": "auth-service"
},
"body": "# API Design\n\nThis document describes...",
"created": "2025-01-10T09:00:00Z",
"updated": "2025-01-15T14:30:00Z",
"size_bytes": 4096
}
}
class NoteCreate(BaseModel):
"""Request to create a new note."""
note_path: str = Field(..., min_length=1, max_length=256)
title: Optional[str] = None
metadata: Optional[NoteMetadata] = None
body: str = Field(..., max_length=1_048_576)
class NoteUpdate(BaseModel):
"""Request to update an existing note."""
title: Optional[str] = None
metadata: Optional[NoteMetadata] = None
body: str = Field(..., max_length=1_048_576)
if_version: Optional[int] = Field(None, ge=1, description="Expected version for concurrency check")
class NoteSummary(BaseModel):
"""Lightweight note summary for listings."""
note_path: str
title: str
updated: datetime
```
---
### Index Models
```python
class Wikilink(BaseModel):
"""Bidirectional link between notes."""
user_id: str
source_path: str
target_path: Optional[str] = Field(None, description="Null if unresolved")
link_text: str
is_resolved: bool
class Config:
json_schema_extra = {
"example": {
"user_id": "alice",
"source_path": "api/design.md",
"target_path": "api/endpoints.md",
"link_text": "Endpoints",
"is_resolved": True
}
}
class Tag(BaseModel):
"""Tag with note count."""
tag_name: str
count: int = Field(..., ge=0)
class IndexHealth(BaseModel):
"""Index state and freshness metrics."""
user_id: str
note_count: int = Field(..., ge=0)
last_full_rebuild: Optional[datetime] = None
last_incremental_update: Optional[datetime] = None
class Config:
json_schema_extra = {
"example": {
"user_id": "alice",
"note_count": 142,
"last_full_rebuild": "2025-01-01T00:00:00Z",
"last_incremental_update": "2025-01-15T14:30:00Z"
}
}
```
---
### Search Models
```python
class SearchResult(BaseModel):
"""Full-text search result with snippet."""
note_path: str
title: str
snippet: str = Field(..., description="Highlighted excerpt from body")
score: float = Field(..., description="Relevance score (title 3x, body 1x, recency bonus)")
updated: datetime
class SearchRequest(BaseModel):
"""Full-text search query."""
query: str = Field(..., min_length=1, max_length=256)
limit: int = Field(50, ge=1, le=100)
```
---
### Authentication Models
```python
class TokenResponse(BaseModel):
"""JWT token issuance response."""
token: str = Field(..., description="JWT access token")
token_type: str = Field("bearer", description="Token type")
expires_at: datetime = Field(..., description="Expiration timestamp")
class JWTPayload(BaseModel):
"""JWT claims payload."""
sub: str = Field(..., description="Subject (user_id)")
iat: int = Field(..., description="Issued at (Unix timestamp)")
exp: int = Field(..., description="Expiration (Unix timestamp)")
```
---
## TypeScript Type Definitions
Frontend type definitions for API contracts.
### Core Types
```typescript
/**
* User account with HF profile
*/
export interface User {
user_id: string;
hf_profile?: {
username: string;
name?: string;
avatar_url?: string;
};
vault_path: string;
created: string; // ISO 8601
}
/**
* Note metadata (frontmatter)
*/
export interface NoteMetadata {
title?: string;
tags?: string[];
project?: string;
created?: string; // ISO 8601
updated?: string; // ISO 8601
[key: string]: unknown; // Arbitrary fields
}
/**
* Complete note with content
*/
export interface Note {
user_id: string;
note_path: string;
version: number;
title: string;
metadata: NoteMetadata;
body: string;
created: string; // ISO 8601
updated: string; // ISO 8601
size_bytes: number;
}
/**
* Lightweight note summary for listings
*/
export interface NoteSummary {
note_path: string;
title: string;
updated: string; // ISO 8601
}
/**
* Request to create a note
*/
export interface NoteCreateRequest {
note_path: string;
title?: string;
metadata?: NoteMetadata;
body: string;
}
/**
* Request to update a note
*/
export interface NoteUpdateRequest {
title?: string;
metadata?: NoteMetadata;
body: string;
if_version?: number; // Optimistic concurrency
}
/**
* Wikilink with resolution status
*/
export interface Wikilink {
user_id: string;
source_path: string;
target_path: string | null; // Null if unresolved
link_text: string;
is_resolved: boolean;
}
/**
* Tag with note count
*/
export interface Tag {
tag_name: string;
count: number;
}
/**
* Index health metrics
*/
export interface IndexHealth {
user_id: string;
note_count: number;
last_full_rebuild: string | null; // ISO 8601
last_incremental_update: string | null; // ISO 8601
}
/**
* Search result with snippet
*/
export interface SearchResult {
note_path: string;
title: string;
snippet: string;
score: number;
updated: string; // ISO 8601
}
/**
* JWT token response
*/
export interface TokenResponse {
token: string;
token_type: "bearer";
expires_at: string; // ISO 8601
}
/**
* API error response
*/
export interface APIError {
error: string;
message: string;
detail?: Record<string, unknown>;
}
```
---
### Validation Helpers
```typescript
/**
* Validate note path format
*/
export function isValidNotePath(path: string): boolean {
return (
path.length > 0 &&
path.length <= 256 &&
path.endsWith('.md') &&
!path.includes('..') &&
!path.includes('\\') &&
!path.startsWith('/')
);
}
/**
* Normalize tag name (lowercase, trim)
*/
export function normalizeTag(tag: string): string {
return tag.toLowerCase().trim();
}
/**
* Normalize slug for wikilink resolution
*/
export function normalizeSlug(text: string): string {
return text
.toLowerCase()
.replace(/[\s_]+/g, '-') // Spaces/underscores β†’ dash
.replace(/[^a-z0-9-]/g, '') // Keep alphanumeric + dash
.replace(/-+/g, '-') // Collapse dashes
.replace(/^-+|-+$/g, ''); // Trim dashes
}
/**
* Extract wikilinks from markdown body
*/
export function extractWikilinks(markdown: string): string[] {
const pattern = /\[\[([^\]]+)\]\]/g;
const matches: string[] = [];
let match;
while ((match = pattern.exec(markdown)) !== null) {
matches.push(match[1]);
}
return matches;
}
```
---
## Validation Rules
Comprehensive validation constraints for all entities.
### Note Path Validation
```python
import re
from pathlib import Path
def validate_note_path(path: str) -> tuple[bool, str]:
"""
Validate note path format.
Returns (is_valid, error_message).
"""
# Length check
if not path or len(path) > 256:
return False, "Path must be 1-256 characters"
# Must end with .md
if not path.endswith('.md'):
return False, "Path must end with .md"
# Must not contain ..
if '..' in path:
return False, "Path must not contain '..'"
# Must use Unix-style separators
if '\\' in path:
return False, "Path must use Unix separators (/)"
# Must be relative
if path.startswith('/'):
return False, "Path must be relative (no leading /)"
# Must not have invalid characters
invalid_chars = ['<', '>', ':', '"', '|', '?', '*']
if any(c in path for c in invalid_chars):
return False, f"Path contains invalid characters: {invalid_chars}"
return True, ""
def sanitize_path(user_id: str, vault_root: str, note_path: str) -> Path:
"""
Sanitize and resolve note path within vault.
Raises ValueError if path escapes vault root.
"""
vault = Path(vault_root) / user_id
full_path = (vault / note_path).resolve()
# Ensure path is within vault
if not str(full_path).startswith(str(vault.resolve())):
raise ValueError(f"Path escapes vault root: {note_path}")
return full_path
```
---
### Note Content Validation
```python
def validate_note_content(body: str) -> tuple[bool, str]:
"""
Validate note content.
Returns (is_valid, error_message).
"""
# Size check (1 MiB max)
size_bytes = len(body.encode('utf-8'))
if size_bytes > 1_048_576:
return False, f"Note exceeds 1 MiB limit ({size_bytes} bytes)"
# UTF-8 validity
try:
body.encode('utf-8')
except UnicodeEncodeError as e:
return False, f"Invalid UTF-8 encoding: {e}"
return True, ""
def validate_frontmatter(metadata: dict) -> tuple[bool, str]:
"""
Validate frontmatter metadata.
Returns (is_valid, error_message).
"""
# Check for reserved fields
reserved = ['version'] # Version is managed by index, not frontmatter
for key in metadata.keys():
if key in reserved:
return False, f"Field '{key}' is reserved and cannot be set in frontmatter"
# Validate tags format
if 'tags' in metadata:
tags = metadata['tags']
if not isinstance(tags, list):
return False, "Field 'tags' must be an array"
if not all(isinstance(t, str) for t in tags):
return False, "All tags must be strings"
return True, ""
```
---
### Vault Limits
```python
def check_vault_limit(user_id: str, db) -> tuple[bool, str]:
"""
Check if vault is within note limit.
Returns (is_allowed, error_message).
"""
cursor = db.execute(
"SELECT note_count FROM index_health WHERE user_id = ?",
(user_id,)
)
row = cursor.fetchone()
if row is None:
return True, "" # New vault, no limit yet
note_count = row[0]
if note_count >= 5000:
return False, "Vault note limit exceeded (max 5,000 notes)"
return True, ""
```
---
### Token Validation
```python
import jwt
from datetime import datetime, timedelta
SECRET_KEY = "your-secret-key" # From env var
def create_jwt(user_id: str) -> str:
"""Create JWT with 90-day expiration."""
now = datetime.utcnow()
payload = {
"sub": user_id,
"iat": int(now.timestamp()),
"exp": int((now + timedelta(days=90)).timestamp())
}
return jwt.encode(payload, SECRET_KEY, algorithm="HS256")
def validate_jwt(token: str) -> tuple[bool, str, str]:
"""
Validate JWT and extract user_id.
Returns (is_valid, user_id, error_message).
"""
try:
payload = jwt.decode(token, SECRET_KEY, algorithms=["HS256"])
user_id = payload["sub"]
return True, user_id, ""
except jwt.ExpiredSignatureError:
return False, "", "Token expired"
except jwt.InvalidTokenError as e:
return False, "", f"Invalid token: {e}"
```
---
## State Transitions
State machines for entity lifecycle and version management.
### Note Lifecycle
```mermaid
stateDiagram-v2
[*] --> Creating: write_note (new path)
Creating --> Active: save to filesystem + insert metadata
Active --> Updating: write_note (existing path)
Updating --> Active: increment version + update index
Active --> Deleting: delete_note
Deleting --> [*]: remove file + delete index rows
Updating --> ConflictDetected: if_version mismatch (UI writes only)
ConflictDetected --> Active: reload or discard changes
```
**State descriptions**:
1. **Creating**: Note does not exist, first write in progress
- Validate path and content
- Set `version = 1`, `created = now()`, `updated = now()`
- Write file to filesystem
- Insert rows into `note_metadata`, `note_fts`, `note_tags`, `note_links`
2. **Active**: Note exists and is readable/editable
- Can be read via API/MCP
- Can be updated via write_note
- Can be deleted via delete_note
3. **Updating**: Modification in progress
- Load current metadata (version, timestamps)
- If UI write: check `if_version` matches current version
- If version mismatch: transition to ConflictDetected
- If MCP write: skip version check (last-write-wins)
- Increment `version`, set `updated = now()`
- Update file content
- Update all index rows (delete old, insert new)
4. **ConflictDetected**: Optimistic concurrency conflict (UI only)
- Return `409 Conflict` with current and expected versions
- UI displays error: "Note changed since you opened it"
- User options: reload, save as copy, or discard changes
5. **Deleting**: Removal in progress
- Delete file from filesystem
- Delete rows from `note_metadata`, `note_fts`, `note_tags`, `note_links`
- Update backlinks (any note linking to deleted note now has unresolved link)
- Decrement `index_health.note_count`
---
### Version Increment Logic
```python
def increment_version(user_id: str, note_path: str, if_version: int | None, db) -> int:
"""
Increment note version with optional concurrency check.
Returns new version number.
Raises ConflictError if if_version doesn't match.
"""
# Get current version
cursor = db.execute(
"SELECT version FROM note_metadata WHERE user_id = ? AND note_path = ?",
(user_id, note_path)
)
row = cursor.fetchone()
if row is None:
# New note
return 1
current_version = row[0]
# Optimistic concurrency check (UI writes only)
if if_version is not None and current_version != if_version:
raise ConflictError(
f"Version conflict: expected {if_version}, current is {current_version}"
)
# Increment version
new_version = current_version + 1
return new_version
```
---
### Index Update Workflow
```mermaid
stateDiagram-v2
[*] --> IncrementalUpdate: write_note or delete_note
IncrementalUpdate --> DeleteOldRows: start transaction
DeleteOldRows --> ExtractMetadata: parse frontmatter + body
ExtractMetadata --> InsertNewRows: insert into all index tables
InsertNewRows --> UpdateHealth: set last_incremental_update
UpdateHealth --> [*]: commit transaction
[*] --> FullRebuild: POST /api/index/rebuild
FullRebuild --> DropUserRows: delete all rows for user_id
DropUserRows --> ScanVault: walk filesystem tree
ScanVault --> ProcessNote: for each .md file
ProcessNote --> ExtractMetadata
ProcessNote --> ScanVault: next file
ScanVault --> UpdateHealth: set last_full_rebuild
UpdateHealth --> [*]: commit transaction
```
**Incremental update** (on every write/delete):
1. Start SQLite transaction
2. Delete all existing rows for `(user_id, note_path)` from:
- `note_metadata`
- `note_fts`
- `note_tags`
- `note_links`
3. Parse note content (frontmatter + body)
4. Extract: title, tags, wikilinks
5. Insert new rows into all index tables
6. Resolve wikilinks and update `is_resolved` flags
7. Update `index_health.last_incremental_update = now()`
8. Commit transaction
**Full rebuild** (manual trigger):
1. Start SQLite transaction
2. Delete all rows for `user_id` from all index tables
3. Walk vault directory tree, find all `.md` files
4. For each file:
- Parse frontmatter + body
- Extract metadata, tags, wikilinks
- Insert rows into all index tables
5. Resolve all wikilinks (second pass after all notes indexed)
6. Update `index_health.note_count` and `last_full_rebuild = now()`
7. Commit transaction
---
## Relationships and Constraints
### Foreign Key Relationships
While SQLite supports foreign keys, we don't enforce them for performance reasons (multi-tenant with user-scoped queries). Instead, we rely on application-level referential integrity.
**Logical relationships**:
- `note_metadata.user_id` β†’ `User.user_id`
- `note_tags.note_path` β†’ `note_metadata.note_path`
- `note_links.source_path` β†’ `note_metadata.note_path`
- `note_links.target_path` β†’ `note_metadata.note_path` (nullable)
**Cascade semantics** (application-enforced):
- On delete note: cascade delete from `note_tags`, `note_links` (source), `note_fts`
- On delete note: update `note_links` (target) to set `is_resolved = false`
---
### Uniqueness Constraints
| Table | Unique Constraint | Enforced By |
|-------|------------------|-------------|
| `note_metadata` | `(user_id, note_path)` | PRIMARY KEY |
| `note_tags` | `(user_id, note_path, tag)` | PRIMARY KEY |
| `note_links` | `(user_id, source_path, link_text)` | PRIMARY KEY |
| `index_health` | `user_id` | PRIMARY KEY |
---
### Cardinality
| Relationship | Type | Notes |
|-------------|------|-------|
| User β†’ Vault | 1:1 | One user owns one vault |
| Vault β†’ Notes | 1:N | One vault contains many notes (max 5,000) |
| Note β†’ Tags | N:M | Many-to-many via `note_tags` junction table |
| Note β†’ Outgoing Links | 1:N | One note has many outgoing wikilinks |
| Note β†’ Backlinks | 1:N | One note may be referenced by many backlinks |
| User β†’ Tokens | 1:N | One user can issue multiple JWT tokens |
---
### Invariants
Critical invariants maintained by the system:
1. **Version monotonicity**: `note.version` only increases (never decreases or resets)
2. **Timestamp ordering**: `note.created <= note.updated` always
3. **Path uniqueness**: No two notes with same `(user_id, note_path)` can exist
4. **Size limit**: `note.size_bytes <= 1_048_576` always enforced
5. **Vault limit**: `COUNT(*) WHERE user_id = X <= 5000` enforced before writes
6. **Link consistency**: If `note_links.target_path` is not null, target note must exist
7. **Tag normalization**: All `note_tags.tag` values are lowercase
8. **Index freshness**: `index_health.last_incremental_update` is always >= most recent `note_metadata.updated` for that user
---
## Appendix
### Common Queries Reference
```sql
-- Get all notes for user, sorted by recent update
SELECT note_path, title, updated
FROM note_metadata
WHERE user_id = ?
ORDER BY updated DESC
LIMIT 100;
-- Full-text search with title boost
SELECT
note_path,
title,
snippet(note_fts, 3, '<mark>', '</mark>', '...', 32) AS snippet,
bm25(note_fts, 3.0, 1.0) AS score
FROM note_fts
WHERE user_id = ? AND note_fts MATCH ?
ORDER BY score DESC
LIMIT 50;
-- Get all tags with counts
SELECT tag, COUNT(DISTINCT note_path) as count
FROM note_tags
WHERE user_id = ?
GROUP BY tag
ORDER BY count DESC;
-- Get backlinks for a note
SELECT DISTINCT l.source_path, m.title
FROM note_links l
JOIN note_metadata m ON l.user_id = m.user_id AND l.source_path = m.note_path
WHERE l.user_id = ? AND l.target_path = ?
ORDER BY m.updated DESC;
-- Get all unresolved wikilinks for a user
SELECT source_path, link_text, COUNT(*) as occurrences
FROM note_links
WHERE user_id = ? AND is_resolved = 0
GROUP BY source_path, link_text
ORDER BY occurrences DESC;
-- Check index health
SELECT note_count, last_full_rebuild, last_incremental_update
FROM index_health
WHERE user_id = ?;
```
---
### Migration Strategy
For future schema changes:
```python
# Example migration: Add column to note_metadata
def migrate_v1_to_v2(db):
"""Add normalized_title_slug column."""
db.execute("""
ALTER TABLE note_metadata
ADD COLUMN normalized_title_slug TEXT;
""")
# Backfill existing notes
db.execute("""
UPDATE note_metadata
SET normalized_title_slug = LOWER(
REPLACE(REPLACE(title, ' ', '-'), '_', '-')
);
""")
db.execute("""
CREATE INDEX idx_metadata_title_slug
ON note_metadata(user_id, normalized_title_slug);
""")
db.commit()
```
---
**Document Status**: Draft
**Last Updated**: 2025-11-15
**Next Review**: After Phase 1 implementation