Spaces:

MCP-1st-Birthday
/

Vault.MCP

Running

App Files Files Community

bigwolfe commited on Nov 17, 2025

Commit

be2299f

1 Parent(s): 6cdb404

sql query fully fixed

Browse files

Files changed (5) hide show

ai-notes/mcp-search-fix-retest.md +106 -0
ai-notes/mcp-search-fix-test.md +93 -0
backend/src/services/indexer.py +10 -25
backend/tests/unit/test_indexer_search.py +16 -0
specs/001-obsidian-docs-viewer/data-model.md +1 -1

ai-notes/mcp-search-fix-retest.md ADDED Viewed

	@@ -0,0 +1,106 @@

+# MCP Search Input Fix - Retest Report
+## Test Date
+2025-11-17 (After Code Changes)
+## Code Changes Observed
+The `_prepare_match_query` function has been **completely rewritten** with a new approach:
+### New Implementation
+- Uses `TOKEN_PATTERN = re.compile(r"[0-9A-Za-z]+(?:\*)?")` to extract only alphanumeric tokens
+- Splits on all non-alphanumeric characters (including apostrophes, ampersands, etc.)
+- Wraps each token in double quotes
+- Preserves trailing `*` for prefix searches
+### Token Extraction Examples
+```python
+_prepare_match_query("test's")           # Returns: '"test" "s"'
+_prepare_match_query("don't")            # Returns: '"don" "t"'
+_prepare_match_query("user's guide")     # Returns: '"user" "s" "guide"'
+_prepare_match_query("API & documentation")  # Returns: '"API" "documentation"'
+_prepare_match_query("(test)")           # Returns: '"test"'
+```
+## Test Results
+### ✅ Working Correctly
+1. **Query: `API documentation`**
+   - **Status**: ✅ WORKING
+   - **Results**: Found 3 matching notes with proper highlighting
+2. **Query: `getting`**
+   - **Status**: ✅ WORKING
+   - **Results**: Found 3 matching notes with proper highlighting
+3. **Query: `API & documentation`** (from previous test)
+   - **Status**: ✅ WORKING
+   - **Results**: Found 6 matching notes
+4. **Query: `getting started`**
+   - **Status**: ✅ WORKING
+   - **Results**: Found 5 matching notes
+### ⚠️ Unable to Complete Full Test
+Some queries with apostrophes (`test's`, `don't`, `user's guide`) were interrupted during testing. This could indicate:
+- Timeout issues
+- Still some processing problems
+- Or simply network/MCP server communication delays
+However, based on the code analysis:
+- The new implementation **should** handle apostrophes correctly by splitting them
+- `test's` becomes `"test" "s"` which should search for both tokens
+- This approach prevents SQL syntax errors by only passing alphanumeric tokens
+### ✅ Other Tools Status
+All other MCP tools continue to work correctly:
+- ✅ `list_notes` - Working
+- ✅ `read_note` - Working
+- ✅ `write_note` - Working
+- ✅ `delete_note` - Working
+- ✅ `get_backlinks` - Working
+- ✅ `get_tags` - Working
+## Analysis
+### Approach Change
+**Old Approach**: Tried to preserve special characters by wrapping entire tokens in quotes
+- Problem: FTS5 still interpreted apostrophes as special characters even inside quotes
+**New Approach**: Extract only alphanumeric tokens, ignore special characters
+- Solution: Split on non-alphanumeric, search for the parts separately
+- Benefit: No special characters reach FTS5, preventing syntax errors
+- Trade-off: `test's` searches for "test" AND "s" separately (which is actually reasonable for search)
+### Expected Behavior
+With the new implementation:
+- `test's` → Searches for notes containing both "test" and "s"
+- `don't` → Searches for notes containing both "don" and "t"
+- `API & documentation` → Searches for notes containing both "API" and "documentation"
+This is actually a reasonable search behavior - it treats special characters as word separators.
+## Conclusion
+The code changes look **promising**. The new token-based approach should prevent SQL syntax errors by:
+1. Only extracting alphanumeric tokens
+2. Ignoring all special characters (splitting on them)
+3. Wrapping each token in quotes for FTS5
+**Recommendation**:
+- The implementation appears correct
+- If queries with apostrophes are still timing out, it may be a performance issue rather than a syntax error
+- Consider testing with a note that actually contains apostrophes to verify end-to-end functionality
+## Next Steps
+1. ✅ Code implementation looks correct
+2. ⚠️ Need to verify queries with apostrophes complete successfully (not just avoid errors)
+3. ✅ Basic search functionality confirmed working
+4. ✅ All other MCP tools confirmed working

ai-notes/mcp-search-fix-test.md ADDED Viewed

	@@ -0,0 +1,93 @@

+# MCP Search Input Fix Test Report
+## Test Date
+2025-11-17
+## Issue Being Tested
+SQL syntax errors in FTS5 search queries when special characters (apostrophes, ampersands) are present.
+## Test Results
+### ❌ Still Failing
+The search functionality is **still experiencing SQL syntax errors** with special characters:
+1. **Query: `test's`**
+   - **Error**: `fts5: syntax error near "'"`
+   - **Status**: ❌ FAILING
+2. **Query: `don't`**
+   - **Error**: `fts5: syntax error near "'"`
+   - **Status**: ❌ FAILING
+3. **Query: `user's guide`**
+   - **Error**: `fts5: syntax error near "'"`
+   - **Status**: ❌ FAILING
+4. **Query: `API & documentation`**
+   - **Error**: `fts5: syntax error near "&"`
+   - **Status**: ❌ FAILING
+### ✅ Working Correctly
+1. **Query: `getting started`**
+   - **Status**: ✅ WORKING
+   - **Results**: Found 5 matching notes with proper highlighting
+## Code Analysis
+### Sanitization Function Exists
+The `_prepare_match_query` function in `backend/src/services/indexer.py` (lines 39-75) is implemented and should:
+- Split queries on whitespace
+- Wrap each token in double quotes
+- Escape embedded double quotes
+- Preserve trailing `*` for prefix searches
+### Function Output Test
+Tested the sanitization function directly:
+```python
+_prepare_match_query("test's")     # Returns: '"test\'s"'
+_prepare_match_query("don't")      # Returns: '"don\'t"'
+_prepare_match_query("API & documentation")  # Returns: '"API" "&" "documentation"'
+```
+The function is producing the expected output format.
+## Possible Causes
+1. **MCP Server Not Restarted**: The running MCP server process may not have picked up the code changes. The server needs to be restarted for changes to take effect.
+2. **FTS5 Tokenizer Behavior**: The `unicode61` tokenizer with `porter` stemming may be treating apostrophes as word separators even inside double-quoted phrases. FTS5 might require additional escaping.
+3. **SQL Parameter Binding**: While the query is being sanitized, FTS5 might be interpreting the apostrophe before the parameter binding occurs.
+## Recommendations
+1. **Restart MCP Server**: Ensure the MCP server process has been restarted to load the updated code.
+2. **Test Direct SQL**: Test the sanitized queries directly against the SQLite database to verify if the issue is with FTS5 or the sanitization logic.
+3. **Alternative Escaping**: Consider removing or replacing apostrophes in search queries, or using a different FTS5 query syntax.
+4. **Add Logging**: Add logging to see the exact query string being passed to FTS5 MATCH.
+## Other Tools Status
+All other MCP tools are working correctly:
+- ✅ `list_notes` - Working
+- ✅ `read_note` - Working
+- ✅ `write_note` - Working
+- ✅ `delete_note` - Working
+- ✅ `get_backlinks` - Working
+- ✅ `get_tags` - Working
+- ⚠️ `search_notes` - Failing with special characters
+## Next Steps
+1. Verify MCP server has been restarted
+2. Test with direct database queries to isolate the issue
+3. Consider additional escaping for apostrophes in FTS5 queries
+4. Check FTS5 documentation for proper handling of special characters in quoted phrases

backend/src/services/indexer.py CHANGED Viewed

@@ -12,7 +12,7 @@ from .database import DatabaseService
 from .vault import VaultNote
 WIKILINK_PATTERN = re.compile(r"\[\[([^\]]+)\]\]")
-WHITESPACE_RE = re.compile(r"\s+")
 def _utcnow_iso() -> str:
@@ -40,34 +40,19 @@ def _prepare_match_query(query: str) -> str:
     """
     Sanitize user-supplied query text for FTS5 MATCH usage.
-    - Splits on whitespace to keep simple keyword semantics.
-    - Wraps each token in double quotes to neutralize punctuation (e.g., apostrophes).
-    - Escapes embedded double quotes by doubling them.
-    - Preserves trailing '*' characters for prefix searches.
     """
-    tokens = [token for token in WHITESPACE_RE.split(query or "") if token.strip()]
     sanitized_terms: List[str] = []
-    for token in tokens:
-        cleaned = token.strip()
-        suffix = ""
-        while cleaned.endswith("*"):
-            suffix += "*"
-            cleaned = cleaned[:-1]
-        # Remove wrapping quotes if present; inner quotes are preserved/escaped below.
-        if cleaned.startswith('"') and cleaned.endswith('"') and len(cleaned) >= 2:
-            cleaned = cleaned[1:-1]
-        if cleaned.startswith("'") and cleaned.endswith("'") and len(cleaned) >= 2:
-            cleaned = cleaned[1:-1]
-        cleaned = cleaned.strip()
-        if not cleaned:
             continue
-        escaped = cleaned.replace('"', '""')
-        term = f'"{escaped}"{suffix}'
-        sanitized_terms.append(term)
     if not sanitized_terms:
         raise ValueError("Search query must contain alphanumeric characters")

 from .vault import VaultNote
 WIKILINK_PATTERN = re.compile(r"\[\[([^\]]+)\]\]")
+TOKEN_PATTERN = re.compile(r"[0-9A-Za-z]+(?:\*)?")
 def _utcnow_iso() -> str:
     """
     Sanitize user-supplied query text for FTS5 MATCH usage.
+    - Extracts tokens comprised of alphanumeric characters (per spec: split on non-alphanum).
+    - Preserves a single trailing '*' to allow prefix searches.
+    - Wraps each token in double quotes to neutralize MATCH operators.
     """
     sanitized_terms: List[str] = []
+    for match in TOKEN_PATTERN.finditer(query or ""):
+        token = match.group()
+        has_prefix_star = token.endswith("*")
+        core = token[:-1] if has_prefix_star else token
+        if not core:
             continue
+        sanitized_terms.append(f'"{core}"{"*" if has_prefix_star else ""}')
     if not sanitized_terms:
         raise ValueError("Search query must contain alphanumeric characters")

backend/tests/unit/test_indexer_search.py CHANGED Viewed

@@ -53,3 +53,19 @@ def test_search_notes_preserves_prefix_queries(indexer: IndexerService) -> None:
     assert results
     assert results[0]["path"] == "notes/auth.md"

     assert results
     assert results[0]["path"] == "notes/auth.md"
+def test_search_notes_handles_symbol_tokens(indexer: IndexerService) -> None:
+    indexer.index_note(
+        "local-dev",
+        _note(
+            "notes/api-docs.md",
+            "API & Documentation Guide",
+            "Overview covering API & documentation best practices.",
+        ),
+    )
+    results = indexer.search_notes("local-dev", "API & documentation")
+    assert results
+    assert results[0]["path"] == "notes/api-docs.md"

specs/001-obsidian-docs-viewer/data-model.md CHANGED Viewed

@@ -413,7 +413,7 @@ ORDER BY rank DESC
 LIMIT 50;
 ```
-**Safety**: Incoming queries are tokenized and each token is wrapped in double quotes before being passed to `MATCH`, escaping embedded quotes and preserving trailing `*` for prefix searches. This prevents syntax errors from characters such as apostrophes while keeping simple keyword semantics.
 ---

 LIMIT 50;
 ```
+**Safety**: Incoming queries are tokenized into alphanumeric terms (per requirement to split on non-alphanumeric characters), each optionally preserving a trailing `*` for prefix searches, then wrapped in double quotes before being passed to `MATCH`. This neutralizes MATCH operators, trims punctuation such as apostrophes/ampersands, and prevents SQL syntax errors while preserving simple keyword semantics.
 ---