Skip to content

[feature][integration][java] Add built-in Google Gemini chat model integration#718

Open
section9-lab wants to merge 7 commits into
apache:mainfrom
section9-lab:feature/gemini-chat-model-integration
Open

[feature][integration][java] Add built-in Google Gemini chat model integration#718
section9-lab wants to merge 7 commits into
apache:mainfrom
section9-lab:feature/gemini-chat-model-integration

Conversation

@section9-lab
Copy link
Copy Markdown

[## What is the purpose of the change

Add a dedicated Gemini chat model module (integrations/chat-models/gemini/) using the official google-genai Java SDK (v1.56.0), as proposed in #648.

Key design decisions

  • Native Gemini protocol: system messages → systemInstruction, roles user/model, tool calls via functionCall/functionResponse parts with native id (no tool_call_id).
  • Gemini 3 thoughtSignature: captured from response parts and echoed back on multi-turn tool-use turns (required by gemini-3-pro-preview).
  • Proxy support: base_url parameter routes requests through a local proxy (e.g. CC Switch). When api_key is omitted but base_url is set, a placeholder key satisfies the SDK validation while the proxy injects the real credential.
  • SDK: com.google.genai:google-genai:1.56.0 — supports both Gemini Developer API (API key) and Vertex AI (follow-up).

Scope

Java side: text + function calling only. Multimodal, streaming, full Vertex AI auth wiring, and the Python counterpart are tracked as follow-ups per #648.

Files changed

File Purpose
integrations/chat-models/gemini/pom.xml New module POM (parent: chat-models, dep: google-genai SDK)
GeminiChatModelConnection.java Connection class: chat(), message/tool/functionCall conversion, thoughtSignature round-trip
GeminiChatModelSetup.java Setup class: defaults (model, temperature, maxOutputTokens), getParameters()
GeminiChatModelConnectionTest.java 11 unit tests (constructor validation, role mapping, tool-call round-trip, signature preservation)
GeminiChatModelSetupTest.java 5 unit tests (defaults, custom params, validation)
integrations/chat-models/pom.xml Register gemini module
integrations/pom.xml Add google.genai.version property

Testing

  • 16 unit tests, all pass, no API key required (CI-safe).
  • End-to-end verified against real gemini-3-pro-preview via local proxy: text generation ✅, function calling ✅, thoughtSignature echo ✅, multi-turn tool-result → final answer ✅.

Closes #648

…tegration

Add a dedicated Gemini chat model module under integrations/chat-models/gemini/
using the official google-genai Java SDK (v1.56.0).

This module handles Gemini's native protocol differences:
- System messages passed as systemInstruction (not a system role)
- Conversation roles: user/model (assistant maps to model)
- Function calls via functionCall/functionResponse parts with native id
- Gemini 3 thoughtSignature capture and echo-back for multi-turn tool use
- Custom base_url support for proxy-based deployments (e.g. CC Switch)

Scope: text + function calling on the Java side.
Follow-ups: multimodal, streaming, Vertex AI auth, Python counterpart.

Closes apache#648
@github-actions github-actions Bot added doc-label-missing The Bot applies this label either because none or multiple labels were provided. fixVersion/0.3.0 The feature or bug should be implemented/fixed in the 0.3.0 version. priority/major Default priority of the PR or issue. labels May 31, 2026
Copy link
Copy Markdown
Collaborator

@weiqingy weiqingy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this on! A few questions inline, plus one cross-file finding below.

  1. The module builds (it's in the integrations/chat-models reactor), but I don't see it added to dist/pom.xml — every sibling connector is listed there (ollama, openai, anthropic, azureai, bedrock) so it gets bundled into the dist JARs copied into the Python wheel at python/flink_agents/lib/. Without that entry, GeminiChatModelConnection won't be in the shipped artifact and users can't load it from the released distribution. Would adding the gemini artifact alongside the siblings be the right move here? One thing worth checking when you do: google-genai:1.56.0's POM drags in protobuf-java, guava, google-auth-library-oauth2-http, Apache httpclient, and a few Jackson modules — more transitive weight than the existing connectors carry, so the dist shade config may need a look for relocation/duplicate-class issues.


case TOOL:
Object name = message.getExtraArgs().get("name");
if (name == null) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TOOL branch requires name in extraArgs, but the runtime never supplies it. The only producer of TOOL messages is ChatModelAction.java (~line 476), which populates only externalId — every sibling connector reads externalId too (AnthropicChatModelConnection.java:271, OpenAIChatCompletionsUtils.java:96). So any multi-turn tool-calling agent driven through the Flink runtime throws Tool message must have a 'name' in extraArgs for Gemini the moment a tool result is replayed: single-shot chat and the first tool request work, but the tool-result turn fails — and function calling is the headline of #648.

The connector already has what it needs to resolve this without the hard requirement: ToolCallAction sets externalId == original_id, and this connector writes both name and original_id onto the assistant turn's tool-call map in convertFunctionCall (lines 384, 378). chat() receives the full messages list including that prior ASSISTANT turn. Would building an original_id → name map from the assistant turns' toolCalls and looking up by the TOOL message's externalId work here? Part.fromFunctionResponse(name, map) needs the function name specifically (it exists in 1.56.0), which is why externalId alone isn't enough and the name-resolution step is needed. Keeping a clear error only when the name genuinely can't be resolved would preserve the safety net.

One thing I'd love to understand: does the multi-turn tool-result e2e run through ChatModelAction, or through a harness that pre-seeds name? If it's the runtime path I'd have expected this to surface — so I'm probably missing something about how the e2e is wired, and would be glad to be corrected here.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — fixed in ff6da88. Implemented exactly the lookup you suggested: chat() now scans prior ASSISTANT turns to build a original_id → name map (see buildToolCallIdToNameMap), and the TOOL branch in convertToContent resolves the function name via that map keyed by the runtime-supplied externalId. Explicit name in extraArgs still takes precedence as a safety net; the error is only raised when the name genuinely can't be resolved.

To your last paragraph — you weren't missing anything. The e2e I ran was driving the SDK directly with pre-seeded name, not through ChatModelAction, which is exactly why this hadn't surfaced. Apologies for the gap.

@DisplayName("convertToContent maps TOOL role to a functionResponse part")
void testConvertToolMessage() {
ChatMessage tool = ChatMessage.tool("sunny, 22C");
tool.getExtraArgs().put("name", "get_weather");
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line hand-injects name into extraArgs, but the production runtime never does — it supplies only externalId. Paired with testConvertToolMessageWithoutName (line 124) asserting the throw path, the suite codifies the broken behavior as the contract: both tests pass against a message shape the runtime never emits, so a green suite gives false confidence that runtime tool-calling works (this is what masks the TOOL name issue at GeminiChatModelConnection.java:294).

Would a realistic two-turn test help pin the real contract? Something like, if it's useful: an ASSISTANT message whose toolCalls carry {name, original_id} as produced by convertFunctionCall, followed by a TOOL message carrying only externalId (== that original_id, no name), then assert convertToContent/chat produces a functionResponse with the correct name. A test in that shape would lock the runtime contract in place once the resolution above lands — happy to be redirected if you see a simpler way to cover it.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in ff6da88. Removed testConvertToolMessageWithoutName (it was locking in the wrong contract) and added two tests in the shape you described:

  • testRuntimeShapeToolMessageResolvesNameFromExternalId — TOOL message with only externalId and a lookup map; asserts the resulting functionResponse carries the correct name.
  • testRuntimeShapeMultiTurn — full ASSISTANT ({name, original_id} from convertFunctionCall) → TOOL (only externalId == original_id) round-trip, asserting the recovered name on the model side.

Also kept testConvertToolMessageWithExplicitName for the case where the caller does supply name directly, and added testConvertToolMessageThrowsWhenUnresolvable so the failure case is still locked down — but only when the name is genuinely missing.

builder.temperature(((Number) temperature).floatValue());
}

Object maxOutputTokens = arguments.remove("max_output_tokens");
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

buildConfig reads temperature and max_output_tokens, but additional_kwargs (top_k, top_p, …) is never read here — even though GeminiChatModelSetup documents and forwards it via getParameters(). The net effect is a user who sets top_k/top_p through the documented additional_kwargs path sees them silently ignored: no error, just wrong sampling behavior. AnthropicChatModelConnection.java:212-217 removes and applies additional_kwargs via applyAdditionalKwargs. Would mirroring that — reading additional_kwargs and mapping known keys (top_k → topK, top_p → topP) onto the builder — be worth doing? GenerateContentConfig.Builder exposes topK/topP in 1.56.0.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in ff6da88. Added applyAdditionalKwargs mirroring the Anthropic connector: top_k → topK, top_p → topP, stop_sequences → stopSequences. One small wrinkle worth flagging — Gemini's GenerateContentConfig.Builder types both topK and topP as Float (verified via javap on 1.56.0), which is unusual for topK but matches the upstream protocol. Unknown keys are silently ignored rather than rejected, mirroring how the sibling connectors handle forward-compatible parameters. Two new tests cover the forwarding (testApplyAdditionalKwargs) and the ignore-unknown behavior.

recordUsage(result, modelName, response);

return result;
} catch (Exception e) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The model-name validation IllegalArgumentException at line 182 is thrown inside this try, so it gets re-wrapped into a generic RuntimeException — while the constructor throws IllegalArgumentException directly (line 114). A caller catching IllegalArgumentException to distinguish validation errors will catch the constructor case but not the in-chat model-name case. Is there a reason to keep the contract asymmetric across the two entry points, or would validating the model name before the try (or rethrowing validation IAEs unwrapped) make it consistent?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in ff6da88. Did both of the things you suggested as a belt-and-braces approach: (1) moved the model validation out of the try block so the IAE is thrown at the same lexical point as the constructor's IAE, and (2) added an explicit catch (IllegalArgumentException) { throw; } clause before the generic catch, so any future validation IAEs from downstream callees (e.g. in convertToContent when a tool-result can't be resolved) also surface unwrapped instead of being re-wrapped into a generic RuntimeException.

builder.httpOptions(httpOptions.build());
}

if (useVertex) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Vertex AI constructor branch (and the vertex_ai/project/location params the javadoc advertises) has no test, unlike the other constructor paths. Worth a construction smoke test if Vertex is in scope for #648, or a note that it's a documented-but-untested follow-up?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took the smoke-test path in ff6da88, with a caveat. Added testConstructorVertexAiIsWired plus a javadoc note that the Vertex branch is wired and smoke-tested at construction but not e2e-tested.

Caveat: the SDK eagerly resolves ADC at Client.build(), which means a CI box without ADC throws while a dev box with gcloud auth succeeds. The test accepts both outcomes — what it actually asserts is that the vertex_ai flag is not silently dropped (which would make the flag dead code). It's a weaker assertion than I'd like, but tightening it requires either an ADC fixture or mocking the SDK builder, both of which felt out of scope for this PR. Happy to take a stronger approach if you have a preference.

dist/pom.xml:
- Register the gemini module as a dist dependency so it ships in the
  release JARs and Python wheel.

GeminiChatModelConnection:
- Recover the function name in TOOL messages from the prior ASSISTANT
  turn's tool-call map via externalId, matching how ChatModelAction
  actually emits tool-result turns (was previously requiring an
  explicit 'name' that the runtime never supplies).
- Forward additional_kwargs (top_k, top_p, stop_sequences) to the
  Gemini config, mirroring the Anthropic connector.
- Surface IllegalArgumentException from the model-name validation
  unwrapped, consistent with the constructor's error contract.
- Clamp the HttpOptions timeout in long arithmetic to avoid int
  overflow.
- Raise an explicit error when the response has no candidates
  (safety-block / quota) instead of returning an empty assistant
  message.
- Switch the default model to gemini-3.1-pro-preview, since
  gemini-3-pro-preview is deprecated on the live Google endpoint.

Tests:
- Replace the test that locked in the broken 'name required' contract
  with a realistic two-turn shape (ASSISTANT carries tool-call map,
  TOOL carries only externalId).
- Cover the additional_kwargs forwarding path.
- Add a smoke test that the Vertex AI builder branch is wired
  (documented as a build-time check, e2e is a follow-up).
@section9-lab
Copy link
Copy Markdown
Author

Thanks for the thorough review @weiqingy! All five inline comments are addressed in ff6da88 — replies inline on each. Two summary items for the top-level comment:

1. dist/pom.xml registration

Added gemini to the dist dependency list alongside the other connectors. Verified by clean-rebuilding dist/flink-2.2 and checking the produced JARs:

  • GeminiChatModelConnection/GeminiChatModelSetup classes are present in both thin and fat jars ✓
  • com/google/genai/* SDK classes: 3363 ✓
  • com/google/auth/*: 188 ✓ ; io/grpc/*: 4003 ✓ ; com/google/protobuf/*: 1474 ✓

2. Shade impact (the relocation/duplicate-class question)

Audited the shade output. A few data points:

Fat-jar size. dist/common grows from ~118 MB to ~225 MB (+104 MB). The delta is exactly the transitive set you flagged: google-genai + protobuf-java + grpc + google-auth + Apache httpclient. Not small, but not pathological for an LLM SDK either.

Duplicate-class warnings. Total class-overlap warnings: ~41 (vs. ~38 pre-existing without gemini). The single largest new overlap is at dist/flink-2.x shade time:

flink-agents-dist-common.jar, google-genai-1.56.0.jar define 3323 overlapping classes and resources

This is the same structural pattern that already exists for ollama4j (common + ollama4j → 90 overlapping): dist/flink-2.x shades common (which already contains the transitive deps) AND the transitive deps directly. It's last-write-wins on identical classes, so no runtime breakage — but it does inflate jar size.

No new cross-vendor conflicts. google-genai's transitive guava/jackson/error_prone/etc. all align with versions already in the reactor. NOTICE/LICENSE entries are overlapping (same as every connector) but collapsed by the existing ManifestResourceTransformer.

My suggestion: the cleanup belongs in dist/flink-2.x shade config (exclude transitive-deps-already-in-common from being pulled in twice), and it applies to ollama4j today too — so I'd treat it as a follow-up cleanup rather than gating this PR on it, unless you'd prefer I include it here.

Other changes since the previous push

While addressing the review I also:

  • Switched the default model from gemini-3-pro-preview to gemini-3.1-pro-preview, because the former is now deprecated on the live Google Developer API (404 "no longer available" when I ran a direct e2e against generativelanguage.googleapis.com).
  • Clamped the HttpOptions timeout computation in long to avoid int overflow.
  • Made convertResponse raise an explicit IllegalStateException on empty candidates (safety-block / quota), instead of silently returning an empty assistant message. Matches the Anthropic connector's contract.

E2E

Verified the full chain (text → function call → thoughtSignature echo → tool-result replay → final answer) directly against the real Google Developer API endpoint with gemini-3.1-pro-preview. No proxy in the path this time.

PTAL!

- extractSystemInstruction: use Content.fromParts(Part...) instead of
  Content.builder().parts(...).build(), since the system instruction
  doesn't carry a role.
- convertResponse: call response.checkFinishReason() so that
  unexpected finish reasons (SAFETY, MAX_TOKENS, RECITATION, …) raise
  IllegalArgumentException with the SDK's own message, instead of
  silently returning a partial/filtered assistant message. The IAE is
  propagated unwrapped by chat()'s catch block.
…tly ignoring

Three small improvements informed by re-reading the function:

- default branch (unknown key) and the type-mismatch branches now emit
  a SLF4J warning instead of dropping the kwarg silently. Reasoning:
  the Anthropic/OpenAI siblings can transparently forward unknown keys
  via SDK escape hatches (putAdditionalBodyProperty), but Gemini's
  GenerateContentConfig.Builder is AutoValue-generated and does not
  accept arbitrary body fields, so we cannot replicate the same
  transparency without risking client-level HttpOptions (baseUrl/
  timeout) being lost via per-request override. Logging is the
  zero-risk middle ground: users can see which kwarg was dropped.

- Extract the Number-to-Float coercion into applyFloat() so top_k and
  top_p share one place for type validation and the warn message.

- Build the stop_sequences list with a stream pipeline instead of an
  explicit accumulator loop.

Two new tests lock down that unknown keys and type-mismatched values
are ignored (do not leak into known fields) and do not throw.
…own keys

Re-checked the surrounding modules: all five sibling chat-model
connectors (anthropic, openai, azureai, bedrock, ollama) and the rest
of integrations/ avoid Logger almost entirely (1 usage across the
entire integrations tree). The earlier patch added 3 LOG.warn calls,
which would have made this the only Connection in the module to log.

Trim to one: keep the warn on the default branch (truly unknown key —
the Gemini SDK can't transparently forward those the way Anthropic/
OpenAI do via putAdditionalBodyProperty, so silent drop would be the
worst option), drop the warns on type-mismatched values (caller's
responsibility to pass the documented type, matches the lenient
behavior of the sibling connectors).

Drops the applyFloat helper since it's no longer needed at one call site.
@weiqingy
Copy link
Copy Markdown
Collaborator

weiqingy commented Jun 2, 2026

@section9-lab Thanks for addressing all the comments.

Confirming each:

  1. Tool-name resolutionbuildToolCallIdToNameMap + resolveToolFunctionName recover the name from externalId, with explicit name as a precedence override and a clear throw only when it's genuinely unresolvable. testRuntimeShapeMultiTurn exercises the full ASSISTANT→TOOL round-trip against the runtime shape — that's the contract locked in.
  2. Test contract — dropping testConvertToolMessageWithoutName and keeping testConvertToolMessageThrowsWhenUnresolvable for the genuinely-missing case is exactly right; the suite now fails if the runtime path regresses.
  3. additional_kwargstop_k/top_p/stop_sequences forwarded, and warning on dropped unknown keys is a nice touch beyond what I'd asked for. Good call flagging the Float topK — surprising, but it matches the upstream protocol.
  4. IAE contract — validating model before the try plus the explicit catch (IllegalArgumentException) { throw; } makes both entry points symmetric; nice that checkFinishReason()'s IAE surfaces unwrapped too.
  5. Vertex — smoke test + javadoc note is the right scope. Accepting both ADC outcomes while asserting the flag isn't dropped is a reasonable bound for a path that can't run e2e in CI.
  6. dist/pom.xml — gemini is now bundled alongside the siblings, and CI staying green says google-genai's heavier transitive set (protobuf/guava/auth/httpclient) shaded cleanly without duplicate-class fallout. That was my main worry, so 👍.

LGTM from my side.

@section9-lab
Copy link
Copy Markdown
Author

Thanks for the thorough review @weiqingy! Could you submit a formal approval via the GitHub review button when you get a chance? I don't have merge access on the repo, so I'd need a committer to approve + merge.

@weiqingy
Copy link
Copy Markdown
Collaborator

weiqingy commented Jun 2, 2026

Thanks for the thorough review @weiqingy! Could you submit a formal approval via the GitHub review button when you get a chance? I don't have merge access on the repo, so I'd need a committer to approve + merge.

@section9-lab I don’t have merge access to this repo. @wenjin272 or @xintongsong should be able to help review, approve and merge the PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc-label-missing The Bot applies this label either because none or multiple labels were provided. fixVersion/0.3.0 The feature or bug should be implemented/fixed in the 0.3.0 version. priority/major Default priority of the PR or issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] [integration] [java] Integrate built-in Google Gemini chat model

2 participants