Security: Validate SQL identifiers in Spanner search tool to prevent injection#5952
Security: Validate SQL identifiers in Spanner search tool to prevent injection#5952Ashutosh0x wants to merge 1 commit into
Conversation
…tion Add input validation for structural SQL identifiers (table_name, columns, embedding_column_to_search) and filter expressions (additional_filter) in the Spanner vector search tool before they are interpolated into SQL via f-strings. Problem: The _generate_sql_for_knn() and _generate_sql_for_ann() functions build SQL queries using f-string interpolation. Since similarity_search is registered as a GoogleTool, the LLM populates these parameters at runtime. A prompt injection attack can cause the LLM to supply malicious values like: - table_name='docs JOIN secrets ON TRUE' (cross-table read) - columns=['(SELECT ... FROM INFORMATION_SCHEMA.TABLES)'] (schema enum) - additional_filter='1=1 UNION ALL SELECT password FROM creds' (exfil) Fix: - Regex-validate identifiers against a safe pattern (alphanumeric, _, .) - Reject common injection patterns (UNION, ;, --, /* */) in filters - Validate GSQL model name as identifier and PG endpoint as URI format - Enforce int type on top_k and num_leaves_to_search Fixes google#5913
|
Response from ADK Triaging Agent Hello @Ashutosh0x, thank you for creating this PR! It is fantastic to see SQL injection validation being added to the Spanner search tool. To help our reviewers process your contribution more efficiently, could you please update the PR to include:
Thank you for contributing to the ADK project! |
|
Hi @wukath — this PR fixes a SQL injection vulnerability in the Spanner search tool (issue #5913). The similarity_search function is registered as a GoogleTool, meaning the LLM populates parameters like able_name, columns, and additional_filter at runtime. These values are interpolated directly into SQL via f-strings in _generate_sql_for_knn() and _generate_sql_for_ann() without any validation — a prompt injection can cause the LLM to supply malicious values enabling UNION-based data exfiltration, schema enumeration, or cross-table JOINs. The fix adds regex-based identifier validation and keyword filtering at a single chokepoint (_validate_identifier, _validate_additional_filter), plus type enforcement on op_k. I also included a test suite covering the main injection primitives. Happy to adjust the approach if needed! |
Summary
Fix SQL injection vulnerabilities in google.adk.tools.spanner.search_tool where LLM-controllable parameters ( able_name, columns, embedding_column_to_search, additional_filter, op_k) are interpolated into SQL via f-strings without validation.
Problem
The _generate_sql_for_knn() and _generate_sql_for_ann() functions build SQL queries using f-string interpolation. Since similarity_search is registered as a GoogleTool in SpannerToolset.get_tools(), the LLM populates these parameters at runtime via tool calls. A successful prompt injection can cause the LLM to call the tool with malicious values, enabling:
UNION-based exfiltration via �dditional_filter:
sql WHERE 1=1 UNION ALL SELECT password, 0.0 FROM admin_credentialsINFORMATION_SCHEMA enumeration via columns:
sql SELECT (SELECT STRING_AGG(table_name, ',') FROM INFORMATION_SCHEMA.TABLES) AS dumpCross-table reads via able_name:
sql FROM documents JOIN admin_credentials ac ON TRUEFix
^[A-Za-z_][A-Za-z0-9_]*(\.[A-Za-z_][A-Za-z0-9_]*)*$or quoted with backticks/double-quotes)UNION,;,--,/* */) inadditional_filtertop_kandnum_leaves_to_searchtointTesting
Added
tests/unittests/tools/spanner/test_spanner_sql_validation.pywith tests covering:Fixes #5913