Skip to content

[SEA-NodeJS] SEA query parameters, metadata & getInfo#412

Open
msrathore-db wants to merge 4 commits into
mainfrom
msrathore/sea-params-metadata-getinfo
Open

[SEA-NodeJS] SEA query parameters, metadata & getInfo#412
msrathore-db wants to merge 4 commits into
mainfrom
msrathore/sea-params-metadata-getinfo

Conversation

@msrathore-db
Copy link
Copy Markdown
Contributor

What

Consolidates three SEA session-backend capabilities into one focused PR, each a thin adapter over the merged kernel (which does the heavy lifting):

1. Bound parameters

  • Forwards ordinalParameters / namedParameters to the kernel's ExecuteOptions.positionalParams / namedParams.
  • Reuses the existing DBSQLParameter.toSparkParameter codec (the same type-inference the Thrift path uses), then adapts the type name to the kernel codec: DECIMALDECIMAL(p,s), INTERVAL *INTERVAL, missing value → VOID/NULL.
  • Positional (?) and named (:name) parameters are mutually exclusive — rejected up-front.

2. Metadata

  • getCatalogs / getSchemas / getTables / getColumns / getFunctions / getTableTypes / getTypeInfo / getPrimaryKeys / getCrossReference forward to the kernel Connection.list* / get* methods and wrap the returned Statement.
  • The kernel owns SQL synthesis, column projection, and the client-side TABLE_TYPE filter — the driver only maps request fields to positional arguments. (The previously-proposed driver-side SeaTableTypeFilter is therefore not needed and was dropped.)

3. getInfo

  • Synthesized client-side (the SEA REST protocol / kernel have no getInfo endpoint), mirroring the three TGetInfoType values the Databricks Thrift server answers (server name / DBMS name / DBMS version) and rejecting every other value — byte-identical to the Thrift path.

Notes

Testing

  • Full unit suite green (1112 passing); lib/ lints clean.
  • Validated against a live warehouse via the SEA path: positional + named params return bound values, mutual-exclusion is rejected, all 9 metadata calls return rows (incl. kernel-side tableTypes=[VIEW] filter), and getInfo(CLI_DBMS_NAME) returns Spark SQL.

This pull request and its description were written by Isaac.

Wire three SEA session-backend capabilities onto the merged-kernel napi
surface, all as thin adapters over the kernel (which owns the heavy lifting):

- Bound parameters: forward `ordinalParameters`/`namedParameters` to the
  kernel's `ExecuteOptions.positionalParams`/`namedParams`, reusing the
  existing `DBSQLParameter.toSparkParameter` codec (DECIMAL→DECIMAL(p,s),
  INTERVAL→INTERVAL, NULL→VOID). Positional and named are mutually exclusive.
- Metadata: `getCatalogs`/`getSchemas`/`getTables`/`getColumns`/`getFunctions`/
  `getTableTypes`/`getTypeInfo`/`getPrimaryKeys`/`getCrossReference` forward to
  the kernel `Connection.list*`/`get*` calls and wrap the returned Statement.
  The kernel owns SQL synthesis, column projection, and the client-side
  `TABLE_TYPE` filter — the driver only maps request fields to arguments.
- getInfo: synthesized client-side (no SEA/kernel endpoint), mirroring the
  three TGetInfoType values the Databricks Thrift server answers and rejecting
  the rest, byte-identical to the Thrift path.

`SeaInputValidation` is slimmed to the one bind-time gate the driver needs
(`assertBindableValue`); marker counting is the kernel's job. Re-exports the
kernel's `ExecuteOptions`/`TypedValueInput` types from `SeaNativeLoader` rather
than re-declaring them, so the driver param shapes can't drift from the kernel.

Regenerates the committed napi `.d.ts` snapshot against the merged kernel.

Validated against a live warehouse: positional + named params, mutual-exclusion
rejection, all metadata calls (incl. kernel-side tableTypes filter), and getInfo.

Co-authored-by: Isaac
Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
Code-review-squad #412 review (79/100). Validated each finding against the
kernel source + a live warehouse before fixing:

- F1 (HIGH): getPrimaryKeys turned an omitted catalog into `?? ''`, which the
  kernel's `Identifier::new` rejects with an opaque "identifier must not be
  empty" — a regression vs Thrift (which forwards undefined). The napi
  `getPrimaryKeys(catalog: string, …)` requires a non-empty catalog, so reject
  up front with a clear, actionable message and document the divergence. Added
  a test (omitted + empty-string catalog) asserting the kernel call is never
  reached. (getCrossReference is unaffected — its parent args are nullable and
  passed through without `?? ''`.)

- F3 (MED): mutual-exclusivity of ordinal/named params threw HiveDriverError
  with a different message than the Thrift path, which throws
  ParameterError('Driver does not support both ordinal and named parameters.')
  (ParameterError extends Error, not HiveDriverError — so a caller catching it
  worked on Thrift and silently failed on SEA). Now throws the identical
  ParameterError + message. Test updated.

- F2 (MED): empirically re-verified against the live warehouse over Thrift —
  the server returns the exact same three values we synthesize ("Spark SQL" /
  "Spark SQL" / "3.1.1") and errors on every other TGetInfoType (probed
  CLI_MAX_DRIVER_CONNECTIONS, CLI_DATA_SOURCE_NAME, …). So throwing on an
  unsupported type matches Thrift's effective behaviour rather than narrowing
  it; softened the "byte-for-byte" assertion into the validated fact in both
  the SeaServerInfo and getInfo doc comments.

- F4 (LOW): getInfo's rejection now names the accepted enum identifiers/values
  (CLI_SERVER_NAME (13), CLI_DBMS_NAME (17), CLI_DBMS_VER (18)) so an
  agent/SDK consumer can self-correct.

- F5 (LOW): metadata + bound-param execute failures now emit a debug-level
  breadcrumb (mapped error + session id) before throwing, via a shared
  `logAndMapError` helper, matching the rest of the SEA backend's logging.

- F7 (LOW): dropped the vestigial `nativeStatement!` non-null assertion in
  executeStatement (typed the local, mirroring runMetadata).

- F8 (test): added coverage for the getPrimaryKeys omitted-catalog path (F1),
  the INTERVAL *→INTERVAL collapse, and the DECIMAL precision clamp-to-38.

F6 (kernel diagnostic getters unconsumed) is tracked as a follow-up — wiring
them in must land with the errorDetailsJson size-guard, out of scope here.

Validated live: getPrimaryKeys(with/without catalog), the ParameterError
parity, and the named-identifier getInfo error all behave as asserted.

Co-authored-by: Isaac
Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
The CI "Check code style" step runs `prettier . --check` (whole repo), not
just eslint; these SEA test files were committed without prettier formatting
and failed it. Formatting-only; no behavioural change.

Co-authored-by: Isaac
Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
@gopalldb
Copy link
Copy Markdown
Collaborator

gopalldb commented Jun 3, 2026

🟠 P1 (Important — should fix)

  1. Decimal (p,s) derived by digit-counting misrepresents values; >38-digit clamp is silent data loss — SeaPositionalParams.ts:115-121. Precision/scale are inferred by counting digit characters in the string,
    which doesn't equal true significant digits: "0.00001" → DECIMAL(6,5) (leading 0 counted), and critically a 40-digit value → DECIMAL(38,0) while the full 40-digit value is still sent (I confirmed this
    output). Clamping the type to 38 while sending an unrepresentable value is internally inconsistent — the kernel will reject or silently truncate/round. The test positionalParams.test.ts:1408 blesses the
    40-digit clamp as correct, which is the wrong contract. → error at bind time on >38-digit overflow (clear ParameterError) rather than clamp-and-send; compute precision excluding insignificant leading zeros;
    consider accepting explicit precision/scale when the caller passes DBSQLParameterType.DECIMAL.
  2. assertBindableValue lets values through to stringification with format surprises — SeaInputValidation.ts:39-52. Validation gates shape only; a number like 1e21 stringifies to "1e+21", which combined with
    the digit-counting decimal logic mis-derives (p,s). This is shared toSparkParameter behavior (parity-neutral, not a SEA regression), but the decimal fix above should also reject non-[+-]?digits[.digits]?
    strings when the explicit type is DECIMAL. No test covers exponential/NaN/Infinity/bigint-edge stringification.

🟡 P2 (Minor)

  • Empty/non-numeric DECIMAL values ("", "abc") → DECIMAL(1,0) + raw value; type string is safe but the kernel rejects with an opaque error. → optional numeric-regex validation at bind for a clearer message.
  • getInfo hardcodes 3.1.1/Spark SQL (SeaServerInfo.ts:238-249) — can drift from the live server vs. Thrift; documented Thrift-parity rationale + comparator suite. Unsupported InfoType correctly throws.
    Acceptable.
  • native/sea/index.js loader — the [SEA-NodeJS] (1/3) SEA connect + auth (PAT + OAuth M2M/U2M) #409 garbled-name P0 is NOT present here. I checked: this loader uses clean @databricks/sql-kernel- names (one correct triple per branch), not the cross-triple
    garble. The only delta vs. base is adding AsyncStatement/AsyncResultHandle exports (matches index.d.ts). (package.json/optionalDependencies coverage is out of scope here — verify separately, but not
    introduced/regressed.)

@msrathore-db msrathore-db requested a review from gopalldb June 3, 2026 09:09
…ic, drop leading-zero miscount

Code-review #412 (gopalldb P1). `decimalPrecisionScale` derived precision by
counting digit characters, which (a) counted insignificant leading zeros
("0.00001" → DECIMAL(6,5)) and (b) clamped precision to 38 while still sending
the full value — so a 40-digit value produced DECIMAL(38,0) alongside an
unrepresentable 40-digit value, which the kernel rejects or silently truncates.
The prior test blessed that clamp as correct (wrong contract).

Now:
- Precision = significant integer digits (insignificant leading zeros stripped,
  matching Spark's decimal-literal rule) + fractional scale. "0.00001" →
  DECIMAL(5,5); "007.50" → DECIMAL(3,2); "0" → DECIMAL(1,0).
- A value needing precision > 38 throws ParameterError at bind time instead of
  clamp-and-send (P1.1).
- A non-numeric / exponential value ("1e+21", "abc", "") throws ParameterError
  with a clear message rather than mis-deriving (p,s) and surfacing an opaque
  kernel error (P1.2 / P2).

Validated live: 0.00001 binds as DECIMAL(5,5) and round-trips; a 40-digit value
is rejected client-side with ParameterError (no server round-trip). Unit tests
updated to assert the new reject contract + leading-zero handling.

Co-authored-by: Isaac
Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
@msrathore-db msrathore-db deployed to azure-prod June 3, 2026 09:36 — with GitHub Actions Active
@msrathore-db
Copy link
Copy Markdown
Contributor Author

@gopalldb thanks — the DECIMAL P1 was a real bug; fixed in the latest commit and validated live.

P1.1 — digit-counting precision + >38 clamp-and-send. Confirmed both: 0.00001DECIMAL(6,5) (leading 0 counted), and a 40-digit value → DECIMAL(38,0) while still sending the 40-digit value (internally inconsistent — kernel rejects/truncates). Fixed in decimalPrecisionScale:

  • Precision is now significant integer digits (insignificant leading zeros stripped, matching Spark's decimal-literal rule) + fractional scale: 0.00001DECIMAL(5,5), 007.50DECIMAL(3,2), 0DECIMAL(1,0).
  • A value needing precision > 38 throws ParameterError at bind time instead of clamp-and-send.
  • The test that blessed the 40-digit clamp now asserts the reject contract.

P1.2 / P2 — format surprises (1e+21, abc, ``) mis-derive (p,s). Fixed: when the explicit type is DECIMAL, the value must match `[+-]?digits[.digits]` — exponential / non-numeric / empty values now throw a clear `ParameterError` rather than emitting a bogus type the kernel rejects opaquely. (As you noted, the underlying `toSparkParameter` stringification is shared/parity-neutral, so I gated this at the DECIMAL-derivation step rather than changing shared behavior.)

Validated live: 0.00001 binds as DECIMAL(5,5) and round-trips; a 40-digit value is rejected client-side with ParameterError (no server round-trip). Unit tests cover leading-zero handling, >38 overflow, and non-numeric/exponential rejection.

P2 — getInfo 3.1.1/Spark SQL hardcode: acknowledged acceptable; I separately re-verified these against the live server (server returns the same three values, errors on every other InfoType). loader names: agreed — clean @databricks/sql-kernel-<triple> names, only the AsyncStatement/AsyncResultHandle exports added (matches index.d.ts); optionalDependencies coverage is tracked separately (#414 / the unpublished-packages note in native/sea/README.md).

This reply was written by Isaac.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants