[SEA-NodeJS] SEA async execute, TLS & connection/statement options#413
Open
msrathore-db wants to merge 5 commits into
Open
[SEA-NodeJS] SEA async execute, TLS & connection/statement options#413msrathore-db wants to merge 5 commits into
msrathore-db wants to merge 5 commits into
Conversation
09fb870 to
32929aa
Compare
msrathore-db
added a commit
that referenced
this pull request
Jun 3, 2026
Code-review #413 (81/100). Validated each against the code + a live warehouse: - F1 (HIGH): the async poll loop threw plain HiveDriverError for server-driven Cancelled/Closed/Unknown. The DBSQLOperation facade only mirrors its cancelled/closed flags when `err instanceof OperationStateError` (and OperationStateError extends HiveDriverError, not the reverse), so a server-side cancel/close/admin-kill left the facade desynced. Now throws OperationStateError(Canceled/Closed/Unknown) — matching the Thrift backend. The Failed branch still surfaces the kernel SQL-error envelope via awaitResult. - F2 (MED): the server-Cancelled test asserted only instanceOf(HiveDriverError), which passes for both the correct and incorrect type — it couldn't catch F1. Now asserts instanceOf(OperationStateError) + errorCode, plus a new Closed test. - F3 (MED): queryTimeout was forwarded to submitStatement but the kernel ignores queryTimeoutSecs on submit (always wait_timeout=0s), so the documented public option was a silent no-op, and the poll loop had no client-side deadline (a stalled Running statement polled forever). Now enforced client-side: the poll loop tracks a deadline, best-effort cancels the statement on expiry, and throws OperationStateError(Timeout) — matching Thrift's server TIMEDOUT outcome. Stopped forwarding the ignored queryTimeoutSecs to the napi options. Validated live: a 2s timeout interrupts a slow cross-join with TIMEOUT. - F4 (LOW): customCaCert PEM string check now requires the END marker too (a truncated/headerless cert no longer passes), consistent with the Buffer path. - F5 (LOW): SeaAuth reads the SEA-only fields (checkServerCertificate / customCaCert / maxConnections) through `InternalConnectionOptions` instead of ad-hoc inline casts, so a typo'd key fails to compile. - F6 (LOW): corrected the poll-loop comment — the prior text justified polling by an incorrect "blocking awaitResult holds the mutex and queues cancel" claim; cancel() is documented lock-free. The real rationale (real-time status to the progress callback + cancel observed between ticks) is now stated. Co-authored-by: Isaac Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
This was referenced Jun 3, 2026
Collaborator
|
🟠 P1 (Important — should fix)
🟡 P2 (Minor)
|
Wire the SEA connection-level and per-statement option surfaces onto the merged-kernel napi binding (thin forwarding — the kernel owns the behaviour): Connection options (SeaAuth.buildSeaConnectionOptions): - `maxConnections` → kernel pool size, validated as a positive integer within the napi u32 range. - TLS: `checkServerCertificate` (secure-by-default — omit to keep the kernel's verify-on default; `false` opts into insecure) and `customCaCert` (PEM string or Buffer; strings are PEM-sanity-checked and normalised to a Buffer before the FFI boundary), via the new `buildSeaTlsOptions`. - `intervalsAsString: true` is always set so SEA interval/duration columns render as strings — a byte-compatible drop-in for the Thrift backend. `complexTypesAsJson` is intentionally left at the kernel default (native Arrow), which already decodes identically to Thrift via the shared converter. Statement options (SeaSessionBackend.executeStatement, via buildExecuteOptions): - `queryTimeout` → `queryTimeoutSecs`; `rowLimit` → `rowLimit` (SEA-only cap). - `queryTags` serialised JS-side (reusing Thrift's `serializeQueryTags`) into the reserved `query_tags` conf key, merged with any explicit `statementConf` — the napi `queryTags` field can't carry null-valued tags, and the kernel rejects setting both. `queryTags` / `queryTimeout` are no longer rejected. - Still rejected (genuinely unsupported on SEA): `useCloudFetch`, `useLZ4Compression`, `stagingAllowedLocalPath`. `rowLimit` / `statementConf` added to the public `ExecuteStatementOptions`; SEA-only knobs (`maxConnections` / `checkServerCertificate` / `customCaCert`) added to the internal `InternalConnectionOptions`. Validated against a live warehouse: secure-by-default connect, maxConnections, checkServerCertificate, rowLimit (caps rows), queryTimeout, queryTags, statementConf, and non-PEM customCaCert rejection. Co-authored-by: Isaac Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
Switch the SEA query path from the blocking `executeStatement` to the kernel's async `submitStatement`, matching the Thrift backend's always-async (`runAsync: true`) model. `submitStatement` returns immediately with a pending `AsyncStatement` (kernel `wait_timeout=0s`) while the query runs server-side. SeaOperationBackend becomes dual-mode (exactly one of): - `asyncStatement` (query path): `waitUntilReady()` polls `status()` to a terminal state on a 100ms cadence (matching Thrift), firing the progress callback each tick. Polling `status()` rather than blocking on `awaitResult()` keeps `cancel()` responsive — a blocking awaitResult would hold the kernel statement mutex for the whole query and queue cancel behind it. On Succeeded it materialises the result handle (first fetch is free); on Failed it drives `awaitResult()` to surface the kernel's typed SQL-error envelope; on a server-side Cancelled/Closed/Unknown it throws a clear error. `status()` reports the real Pending/Running/Succeeded state. - `statement` (metadata path): the kernel `list*`/`get*` statement is already terminal, so `waitUntilReady()` stays the one-shot completion tick. The fetch pipeline is shared: `awaitResult()`'s `AsyncResultHandle` and the metadata `Statement` expose the same `fetchNextBatch()` / `schema()` surface, so `SeaResultsProvider` → `ArrowResultConverter` → `ResultSlicer` consume either interchangeably via a single memoised fetch handle. cancel()/close() route through a `lifecycleHandle` abstraction over whichever handle backs the op. Re-exports the kernel `AsyncStatement` / `AsyncResultHandle` types from `SeaNativeLoader`. Validated against a live warehouse: async fetchAll correctness, multi-row drain (5000 rows), long-running aggregate (count over 20M), kernel SQL-error surfacing, and cancellation mid-execution. PR1's params/metadata/getInfo all still pass through the new async path. Co-authored-by: Isaac Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
… code-style) The CI "Check code style" step runs `prettier . --check` (whole repo); these files were committed without prettier formatting. Formatting-only. Co-authored-by: Isaac Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
Code-review #413 (81/100). Validated each against the code + a live warehouse: - F1 (HIGH): the async poll loop threw plain HiveDriverError for server-driven Cancelled/Closed/Unknown. The DBSQLOperation facade only mirrors its cancelled/closed flags when `err instanceof OperationStateError` (and OperationStateError extends HiveDriverError, not the reverse), so a server-side cancel/close/admin-kill left the facade desynced. Now throws OperationStateError(Canceled/Closed/Unknown) — matching the Thrift backend. The Failed branch still surfaces the kernel SQL-error envelope via awaitResult. - F2 (MED): the server-Cancelled test asserted only instanceOf(HiveDriverError), which passes for both the correct and incorrect type — it couldn't catch F1. Now asserts instanceOf(OperationStateError) + errorCode, plus a new Closed test. - F3 (MED): queryTimeout was forwarded to submitStatement but the kernel ignores queryTimeoutSecs on submit (always wait_timeout=0s), so the documented public option was a silent no-op, and the poll loop had no client-side deadline (a stalled Running statement polled forever). Now enforced client-side: the poll loop tracks a deadline, best-effort cancels the statement on expiry, and throws OperationStateError(Timeout) — matching Thrift's server TIMEDOUT outcome. Stopped forwarding the ignored queryTimeoutSecs to the napi options. Validated live: a 2s timeout interrupts a slow cross-join with TIMEOUT. - F4 (LOW): customCaCert PEM string check now requires the END marker too (a truncated/headerless cert no longer passes), consistent with the Buffer path. - F5 (LOW): SeaAuth reads the SEA-only fields (checkServerCertificate / customCaCert / maxConnections) through `InternalConnectionOptions` instead of ad-hoc inline casts, so a typo'd key fails to compile. - F6 (LOW): corrected the poll-loop comment — the prior text justified polling by an incorrect "blocking awaitResult holds the mutex and queues cancel" claim; cancel() is documented lock-free. The real rationale (real-time status to the progress callback + cancel observed between ticks) is now stated. Co-authored-by: Isaac Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
Consolidates the last net-new bit of the superseded #408: two SEA-path DBSQLParameterType variants for binding timezone-explicit timestamps. The type name flows through the existing param codec (toSparkParameter → sqlType), which the kernel accepts — validated live (SELECT ? with TIMESTAMP_NTZ/LTZ returns the bound values). On the Thrift backend they degrade to a plain TIMESTAMP bind. Co-authored-by: Isaac Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
591a357 to
38eea67
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stacked on #412 (params + metadata + getInfo). Review/merge #412 first; this PR retargets to
mainonce #412 lands.Consolidates the remaining SEA features into one PR (two commits), each a thin layer over the merged kernel.
Commit 1 — connection & statement options
Connection options (
SeaAuth.buildSeaConnectionOptions):maxConnections→ kernel pool size (validated positive-integer, u32 range).checkServerCertificate(secure-by-default — omit to keep the kernel's verify-on default;falseopts into insecure) andcustomCaCert(PEM string or Buffer; strings PEM-sanity-checked and normalised to a Buffer), via the newbuildSeaTlsOptions.intervalsAsString: truealways set (byte-compatible interval rendering vs Thrift);complexTypesAsJsonleft at the kernel's native-Arrow default.Statement options (
SeaSessionBackend, viabuildExecuteOptions):queryTimeout→queryTimeoutSecs;rowLimit→rowLimit(SEA-only cap).queryTagsserialised (reusing Thrift'sserializeQueryTags) into the reservedquery_tagsconf key, merged with explicitstatementConf.queryTags/queryTimeoutno longer rejected.useCloudFetch,useLZ4Compression,stagingAllowedLocalPath.Commit 2 — async execute
Switches the query path from blocking
executeStatementto the kernel's asyncsubmitStatement(matching Thrift's always-async model).SeaOperationBackendbecomes dual-mode:asyncStatement):waitUntilReady()pollsstatus()to terminal (100ms cadence), firing the progress callback each tick. Polling rather than blocking onawaitResult()keepscancel()responsive. On Succeeded it materialises the result handle; on Failed it surfaces the kernel's typed SQL-error envelope viaawaitResult(); on Cancelled/Closed it throws clearly.status()reports the real state.statement): already-terminal kernel statement;waitUntilReady()stays a one-shot tick.The fetch pipeline is shared —
AsyncResultHandleand the metadataStatementexpose the samefetchNextBatch()/schema()surface, consumed via a single memoised fetch handle.Testing
lib/lints clean. New unit coverage: TLS/maxConnections validation (connectionOptions.test.ts), statement-option forwarding, and the async lifecycle (poll-to-terminal, real status, Failed→error, server-side Cancelled, cancel-responsiveness).This pull request and its description were written by Isaac.