bigwolfe commited on
Commit
be2299f
Β·
1 Parent(s): 6cdb404

sql query fully fixed

Browse files
ai-notes/mcp-search-fix-retest.md ADDED
@@ -0,0 +1,106 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # MCP Search Input Fix - Retest Report
2
+
3
+ ## Test Date
4
+ 2025-11-17 (After Code Changes)
5
+
6
+ ## Code Changes Observed
7
+
8
+ The `_prepare_match_query` function has been **completely rewritten** with a new approach:
9
+
10
+ ### New Implementation
11
+ - Uses `TOKEN_PATTERN = re.compile(r"[0-9A-Za-z]+(?:\*)?")` to extract only alphanumeric tokens
12
+ - Splits on all non-alphanumeric characters (including apostrophes, ampersands, etc.)
13
+ - Wraps each token in double quotes
14
+ - Preserves trailing `*` for prefix searches
15
+
16
+ ### Token Extraction Examples
17
+ ```python
18
+ _prepare_match_query("test's") # Returns: '"test" "s"'
19
+ _prepare_match_query("don't") # Returns: '"don" "t"'
20
+ _prepare_match_query("user's guide") # Returns: '"user" "s" "guide"'
21
+ _prepare_match_query("API & documentation") # Returns: '"API" "documentation"'
22
+ _prepare_match_query("(test)") # Returns: '"test"'
23
+ ```
24
+
25
+ ## Test Results
26
+
27
+ ### βœ… Working Correctly
28
+
29
+ 1. **Query: `API documentation`**
30
+ - **Status**: βœ… WORKING
31
+ - **Results**: Found 3 matching notes with proper highlighting
32
+
33
+ 2. **Query: `getting`**
34
+ - **Status**: βœ… WORKING
35
+ - **Results**: Found 3 matching notes with proper highlighting
36
+
37
+ 3. **Query: `API & documentation`** (from previous test)
38
+ - **Status**: βœ… WORKING
39
+ - **Results**: Found 6 matching notes
40
+
41
+ 4. **Query: `getting started`**
42
+ - **Status**: βœ… WORKING
43
+ - **Results**: Found 5 matching notes
44
+
45
+ ### ⚠️ Unable to Complete Full Test
46
+
47
+ Some queries with apostrophes (`test's`, `don't`, `user's guide`) were interrupted during testing. This could indicate:
48
+ - Timeout issues
49
+ - Still some processing problems
50
+ - Or simply network/MCP server communication delays
51
+
52
+ However, based on the code analysis:
53
+ - The new implementation **should** handle apostrophes correctly by splitting them
54
+ - `test's` becomes `"test" "s"` which should search for both tokens
55
+ - This approach prevents SQL syntax errors by only passing alphanumeric tokens
56
+
57
+ ### βœ… Other Tools Status
58
+
59
+ All other MCP tools continue to work correctly:
60
+ - βœ… `list_notes` - Working
61
+ - βœ… `read_note` - Working
62
+ - βœ… `write_note` - Working
63
+ - βœ… `delete_note` - Working
64
+ - βœ… `get_backlinks` - Working
65
+ - βœ… `get_tags` - Working
66
+
67
+ ## Analysis
68
+
69
+ ### Approach Change
70
+
71
+ **Old Approach**: Tried to preserve special characters by wrapping entire tokens in quotes
72
+ - Problem: FTS5 still interpreted apostrophes as special characters even inside quotes
73
+
74
+ **New Approach**: Extract only alphanumeric tokens, ignore special characters
75
+ - Solution: Split on non-alphanumeric, search for the parts separately
76
+ - Benefit: No special characters reach FTS5, preventing syntax errors
77
+ - Trade-off: `test's` searches for "test" AND "s" separately (which is actually reasonable for search)
78
+
79
+ ### Expected Behavior
80
+
81
+ With the new implementation:
82
+ - `test's` β†’ Searches for notes containing both "test" and "s"
83
+ - `don't` β†’ Searches for notes containing both "don" and "t"
84
+ - `API & documentation` β†’ Searches for notes containing both "API" and "documentation"
85
+
86
+ This is actually a reasonable search behavior - it treats special characters as word separators.
87
+
88
+ ## Conclusion
89
+
90
+ The code changes look **promising**. The new token-based approach should prevent SQL syntax errors by:
91
+ 1. Only extracting alphanumeric tokens
92
+ 2. Ignoring all special characters (splitting on them)
93
+ 3. Wrapping each token in quotes for FTS5
94
+
95
+ **Recommendation**:
96
+ - The implementation appears correct
97
+ - If queries with apostrophes are still timing out, it may be a performance issue rather than a syntax error
98
+ - Consider testing with a note that actually contains apostrophes to verify end-to-end functionality
99
+
100
+ ## Next Steps
101
+
102
+ 1. βœ… Code implementation looks correct
103
+ 2. ⚠️ Need to verify queries with apostrophes complete successfully (not just avoid errors)
104
+ 3. βœ… Basic search functionality confirmed working
105
+ 4. βœ… All other MCP tools confirmed working
106
+
ai-notes/mcp-search-fix-test.md ADDED
@@ -0,0 +1,93 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # MCP Search Input Fix Test Report
2
+
3
+ ## Test Date
4
+ 2025-11-17
5
+
6
+ ## Issue Being Tested
7
+ SQL syntax errors in FTS5 search queries when special characters (apostrophes, ampersands) are present.
8
+
9
+ ## Test Results
10
+
11
+ ### ❌ Still Failing
12
+
13
+ The search functionality is **still experiencing SQL syntax errors** with special characters:
14
+
15
+ 1. **Query: `test's`**
16
+ - **Error**: `fts5: syntax error near "'"`
17
+ - **Status**: ❌ FAILING
18
+
19
+ 2. **Query: `don't`**
20
+ - **Error**: `fts5: syntax error near "'"`
21
+ - **Status**: ❌ FAILING
22
+
23
+ 3. **Query: `user's guide`**
24
+ - **Error**: `fts5: syntax error near "'"`
25
+ - **Status**: ❌ FAILING
26
+
27
+ 4. **Query: `API & documentation`**
28
+ - **Error**: `fts5: syntax error near "&"`
29
+ - **Status**: ❌ FAILING
30
+
31
+ ### βœ… Working Correctly
32
+
33
+ 1. **Query: `getting started`**
34
+ - **Status**: βœ… WORKING
35
+ - **Results**: Found 5 matching notes with proper highlighting
36
+
37
+ ## Code Analysis
38
+
39
+ ### Sanitization Function Exists
40
+
41
+ The `_prepare_match_query` function in `backend/src/services/indexer.py` (lines 39-75) is implemented and should:
42
+ - Split queries on whitespace
43
+ - Wrap each token in double quotes
44
+ - Escape embedded double quotes
45
+ - Preserve trailing `*` for prefix searches
46
+
47
+ ### Function Output Test
48
+
49
+ Tested the sanitization function directly:
50
+ ```python
51
+ _prepare_match_query("test's") # Returns: '"test\'s"'
52
+ _prepare_match_query("don't") # Returns: '"don\'t"'
53
+ _prepare_match_query("API & documentation") # Returns: '"API" "&" "documentation"'
54
+ ```
55
+
56
+ The function is producing the expected output format.
57
+
58
+ ## Possible Causes
59
+
60
+ 1. **MCP Server Not Restarted**: The running MCP server process may not have picked up the code changes. The server needs to be restarted for changes to take effect.
61
+
62
+ 2. **FTS5 Tokenizer Behavior**: The `unicode61` tokenizer with `porter` stemming may be treating apostrophes as word separators even inside double-quoted phrases. FTS5 might require additional escaping.
63
+
64
+ 3. **SQL Parameter Binding**: While the query is being sanitized, FTS5 might be interpreting the apostrophe before the parameter binding occurs.
65
+
66
+ ## Recommendations
67
+
68
+ 1. **Restart MCP Server**: Ensure the MCP server process has been restarted to load the updated code.
69
+
70
+ 2. **Test Direct SQL**: Test the sanitized queries directly against the SQLite database to verify if the issue is with FTS5 or the sanitization logic.
71
+
72
+ 3. **Alternative Escaping**: Consider removing or replacing apostrophes in search queries, or using a different FTS5 query syntax.
73
+
74
+ 4. **Add Logging**: Add logging to see the exact query string being passed to FTS5 MATCH.
75
+
76
+ ## Other Tools Status
77
+
78
+ All other MCP tools are working correctly:
79
+ - βœ… `list_notes` - Working
80
+ - βœ… `read_note` - Working
81
+ - βœ… `write_note` - Working
82
+ - βœ… `delete_note` - Working
83
+ - βœ… `get_backlinks` - Working
84
+ - βœ… `get_tags` - Working
85
+ - ⚠️ `search_notes` - Failing with special characters
86
+
87
+ ## Next Steps
88
+
89
+ 1. Verify MCP server has been restarted
90
+ 2. Test with direct database queries to isolate the issue
91
+ 3. Consider additional escaping for apostrophes in FTS5 queries
92
+ 4. Check FTS5 documentation for proper handling of special characters in quoted phrases
93
+
backend/src/services/indexer.py CHANGED
@@ -12,7 +12,7 @@ from .database import DatabaseService
12
  from .vault import VaultNote
13
 
14
  WIKILINK_PATTERN = re.compile(r"\[\[([^\]]+)\]\]")
15
- WHITESPACE_RE = re.compile(r"\s+")
16
 
17
 
18
  def _utcnow_iso() -> str:
@@ -40,34 +40,19 @@ def _prepare_match_query(query: str) -> str:
40
  """
41
  Sanitize user-supplied query text for FTS5 MATCH usage.
42
 
43
- - Splits on whitespace to keep simple keyword semantics.
44
- - Wraps each token in double quotes to neutralize punctuation (e.g., apostrophes).
45
- - Escapes embedded double quotes by doubling them.
46
- - Preserves trailing '*' characters for prefix searches.
47
  """
48
- tokens = [token for token in WHITESPACE_RE.split(query or "") if token.strip()]
49
  sanitized_terms: List[str] = []
50
 
51
- for token in tokens:
52
- cleaned = token.strip()
53
- suffix = ""
54
- while cleaned.endswith("*"):
55
- suffix += "*"
56
- cleaned = cleaned[:-1]
57
-
58
- # Remove wrapping quotes if present; inner quotes are preserved/escaped below.
59
- if cleaned.startswith('"') and cleaned.endswith('"') and len(cleaned) >= 2:
60
- cleaned = cleaned[1:-1]
61
- if cleaned.startswith("'") and cleaned.endswith("'") and len(cleaned) >= 2:
62
- cleaned = cleaned[1:-1]
63
-
64
- cleaned = cleaned.strip()
65
- if not cleaned:
66
  continue
67
-
68
- escaped = cleaned.replace('"', '""')
69
- term = f'"{escaped}"{suffix}'
70
- sanitized_terms.append(term)
71
 
72
  if not sanitized_terms:
73
  raise ValueError("Search query must contain alphanumeric characters")
 
12
  from .vault import VaultNote
13
 
14
  WIKILINK_PATTERN = re.compile(r"\[\[([^\]]+)\]\]")
15
+ TOKEN_PATTERN = re.compile(r"[0-9A-Za-z]+(?:\*)?")
16
 
17
 
18
  def _utcnow_iso() -> str:
 
40
  """
41
  Sanitize user-supplied query text for FTS5 MATCH usage.
42
 
43
+ - Extracts tokens comprised of alphanumeric characters (per spec: split on non-alphanum).
44
+ - Preserves a single trailing '*' to allow prefix searches.
45
+ - Wraps each token in double quotes to neutralize MATCH operators.
 
46
  """
 
47
  sanitized_terms: List[str] = []
48
 
49
+ for match in TOKEN_PATTERN.finditer(query or ""):
50
+ token = match.group()
51
+ has_prefix_star = token.endswith("*")
52
+ core = token[:-1] if has_prefix_star else token
53
+ if not core:
 
 
 
 
 
 
 
 
 
 
54
  continue
55
+ sanitized_terms.append(f'"{core}"{"*" if has_prefix_star else ""}')
 
 
 
56
 
57
  if not sanitized_terms:
58
  raise ValueError("Search query must contain alphanumeric characters")
backend/tests/unit/test_indexer_search.py CHANGED
@@ -53,3 +53,19 @@ def test_search_notes_preserves_prefix_queries(indexer: IndexerService) -> None:
53
  assert results
54
  assert results[0]["path"] == "notes/auth.md"
55
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
53
  assert results
54
  assert results[0]["path"] == "notes/auth.md"
55
 
56
+
57
+ def test_search_notes_handles_symbol_tokens(indexer: IndexerService) -> None:
58
+ indexer.index_note(
59
+ "local-dev",
60
+ _note(
61
+ "notes/api-docs.md",
62
+ "API & Documentation Guide",
63
+ "Overview covering API & documentation best practices.",
64
+ ),
65
+ )
66
+
67
+ results = indexer.search_notes("local-dev", "API & documentation")
68
+
69
+ assert results
70
+ assert results[0]["path"] == "notes/api-docs.md"
71
+
specs/001-obsidian-docs-viewer/data-model.md CHANGED
@@ -413,7 +413,7 @@ ORDER BY rank DESC
413
  LIMIT 50;
414
  ```
415
 
416
- **Safety**: Incoming queries are tokenized and each token is wrapped in double quotes before being passed to `MATCH`, escaping embedded quotes and preserving trailing `*` for prefix searches. This prevents syntax errors from characters such as apostrophes while keeping simple keyword semantics.
417
 
418
  ---
419
 
 
413
  LIMIT 50;
414
  ```
415
 
416
+ **Safety**: Incoming queries are tokenized into alphanumeric terms (per requirement to split on non-alphanumeric characters), each optionally preserving a trailing `*` for prefix searches, then wrapped in double quotes before being passed to `MATCH`. This neutralizes MATCH operators, trims punctuation such as apostrophes/ampersands, and prevents SQL syntax errors while preserving simple keyword semantics.
417
 
418
  ---
419