Python: Add timeout parameter to FoundryAgent to fix ConnectTimeout on multi-turn conversations#6263
Python: Add timeout parameter to FoundryAgent to fix ConnectTimeout on multi-turn conversations#6263moonbox3 wants to merge 5 commits into
timeout parameter to FoundryAgent to fix ConnectTimeout on multi-turn conversations#6263Conversation
…icrosoft#6241) Expose a `timeout` parameter on `RawFoundryAgentChatClient`, `_FoundryAgentChatClient`, `RawFoundryAgent`, `FoundryAgent`, and `RawOpenAIChatClient` so callers can override the HTTP timeout used by the underlying AsyncOpenAI client. Root cause: `RawFoundryAgentChatClient.__init__` called `project_client.get_openai_client()` without configuring any timeout, inheriting the OpenAI SDK default of `httpx.Timeout(connect=5.0)`. When connections are recycled between turns under load, the 5 s connect timeout fires and surfaces as `openai.APITimeoutError`. Fix: - `load_openai_service_settings` (`_shared.py`): accept `timeout` and include it in `client_args` for all three `AsyncOpenAI`/ `AsyncAzureOpenAI` construction paths. - `RawOpenAIChatClient.__init__` (`_chat_client.py`): accept `timeout` and forward to `load_openai_service_settings`. - `RawFoundryAgentChatClient.__init__` (`_agent.py`): accept `timeout` and set `openai_client.timeout = timeout` on the client returned by `get_openai_client()` before passing it to the base class. - `_FoundryAgentChatClient`, `RawFoundryAgent`, `FoundryAgent`: accept and propagate `timeout` through the construction chain. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Expose a timeout parameter on RawFoundryAgentChatClient, _FoundryAgentChatClient, RawFoundryAgent, FoundryAgent, and RawOpenAIChatClient. When provided, the value is applied to the underlying AsyncOpenAI client so that connect timeouts under load or after connection recycling can be tuned by callers. Previously, get_openai_client() was called without any timeout override, so the SDK default of httpx.Timeout(connect=5.0) was inherited and could fire on multi-turn conversations where the underlying connection is recycled between turns. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…out` on multi-turn conversations Fixes microsoft#6241
There was a problem hiding this comment.
Pull request overview
Adds an optional timeout: float | None parameter through FoundryAgent → _FoundryAgentChatClient → RawFoundryAgentChatClient → RawOpenAIChatClient → load_openai_service_settings, so callers can override the OpenAI SDK's 5s default connect timeout that was causing ConnectTimeout on multi-turn Foundry conversations (issue #6241). Also bumps some azure-ai-agentserver-* dependencies and applies minor formatting touch-ups in unrelated test/source files.
Changes:
- Thread a
timeoutparameter through the OpenAI/Foundry client constructors and into the underlyingAsyncOpenAI/AsyncAzureOpenAIclient. - Add unit tests asserting the parameter is exposed (not absorbed by
**kwargs) and that the timeout is/isn't applied based onNonevs non-None. - Refresh
uv.lock(notablyazure-ai-agentserver-core2.0.0b3→2.0.0b5,azure-ai-agentserver-responses1.0.0b5→1.0.0b7, newmicrosoft-opentelemetry) plus minor formatting tweaks in bedrock and foundry_hosting tests.
Reviewed changes
Copilot reviewed 8 out of 9 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| python/packages/openai/agent_framework_openai/_shared.py | Accept timeout and forward to AsyncOpenAI / AsyncAzureOpenAI constructors across all routing branches. |
| python/packages/openai/agent_framework_openai/_chat_client.py | Add timeout kwarg + docstrings to all RawOpenAIChatClient.__init__ overloads; pass into load_openai_service_settings. |
| python/packages/openai/tests/openai/test_openai_chat_client.py | Tests that timeout is an explicit parameter and accepted when a preconfigured client is supplied. |
| python/packages/foundry/agent_framework_foundry/_agent.py | Add timeout to RawFoundryAgentChatClient, _FoundryAgentChatClient, RawFoundryAgent, FoundryAgent; mutate openai_client.timeout post-get_openai_client. |
| python/packages/foundry/tests/foundry/test_foundry_agent.py | Coverage for timeout propagation and the None no-op across all four entry points. |
| python/packages/bedrock/agent_framework_bedrock/_chat_client.py | Formatting collapse of a few multi-line literals (no behavior change). |
| python/packages/bedrock/tests/test_bedrock_structured_output.py | Cosmetic blank line. |
| python/packages/foundry_hosting/tests/test_responses.py | Collapse two list comprehensions onto single lines (cosmetic). |
| python/uv.lock | Bump azure-ai-agentserver-core/responses and pull in microsoft-opentelemetry + opentelemetry-instrumentation-httpx/openai*/util-genai. |
moonbox3
left a comment
There was a problem hiding this comment.
Automated Code Review
Reviewers: 4 | Confidence: 74% | Result: All clear
Reviewed: Correctness, Security Reliability, Test Coverage, Design Approach
Automated review by automated agents
… timeout (microsoft#6241) Replace direct assignment with in RawFoundryAgentChatClient.__init__. The Azure AI Projects SDK caches and returns a shared AsyncOpenAI client per AIProjectClient. Mutating its .timeout attribute leaked the override to all other code paths sharing that client (other agents, user code). with_options() returns a new client instance with the override applied, leaving the original shared client untouched. Update tests to assert with_options is called with the correct timeout and that the original shared client's timeout attribute is not mutated. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Automated Code Review
Reviewers: 4 | Confidence: 89%
✓ Correctness
The PR correctly adds a
timeoutparameter through the FoundryAgent, RawFoundryAgent, _FoundryAgentChatClient, RawFoundryAgentChatClient, and RawOpenAIChatClient constructors, threading it down toload_openai_service_settingswhere it is applied to all three client construction paths (AsyncOpenAI for OpenAI, AsyncOpenAI for Azure /openai/v1 endpoint, AsyncAzureOpenAI for standard Azure). The Foundry agent correctly mutates the client's timeout attribute after obtaining it from the factory. Control flow, attribute mutation ordering, and MRO propagation are all correct. No correctness bugs found.
✓ Security Reliability
The PR cleanly threads a
timeoutparameter through all OpenAI/Azure client construction paths. The timeout is applied correctly: as a constructor argument for newly-created clients inload_openai_service_settings, and via direct attribute mutation for Foundry'sproject_client.get_openai_client()path. No input validation issues, injection risks, or resource leaks were found. The silent no-op when bothasync_clientandtimeoutare provided toRawOpenAIChatClientis tested and intentional (user is responsible for configuring their own pre-built client).
✓ Test Coverage
The PR adds comprehensive test coverage for the timeout parameter in the foundry package (verifying timeout is applied/not-applied to the client). However, the openai package has a significant test coverage gap:
load_openai_service_settingshas 3 newly-added code paths that threadtimeoutintoAsyncOpenAI()/AsyncAzureOpenAI()constructors, but none of these paths have unit tests. The only openai-package test (test_raw_openai_chat_client_accepts_preconfigured_client_with_timeout) exercises the pre-configured client path where timeout is intentionally ignored, and asserts onlyclient is not None. A test verifying timeout is actually applied whenRawOpenAIChatClientconstructs its own client (the primary fix path) would strengthen confidence in the openai-layer implementation.
✗ Design Approach
The timeout plumbing is mostly consistent, but one design gap remains in the Foundry wrapper path: after adding
timeoutas part ofRawFoundryAgentChatClient's public configuration, converting that client withas_agent()still recreates aFoundryAgentwithout preserving the timeout, which silently falls back to the SDK default on that path.
Flagged Issues
-
RawFoundryAgentChatClient.as_agent()drops the newly addedtimeoutsetting. The method is documented to "reuse this client's Foundry configuration" but rebuildsFoundryAgentwith onlyproject_client,agent_name, andagent_version(lines 298-315). A caller usingRawFoundryAgentChatClient(..., timeout=60).as_agent()will silently lose the timeout and fall back to the SDK default.
Automated review by moonbox3's agents
moonbox3
left a comment
There was a problem hiding this comment.
Automated Code Review
Reviewers: 4 | Confidence: 91%
✓ Correctness
The PR correctly fixes a shared-client mutation bug by replacing
openai_client.timeout = timeoutwithopenai_client = openai_client.with_options(timeout=timeout)inRawFoundryAgentChatClient.__init__. The returned client is immediately reassigned and passed tosuper().__init__, so the timeout is applied correctly. Both_FoundryAgentChatClientandFoundryAgentfunel theirtimeoutparameters through this single code path (lines 585 and 1003), so the fix covers all three public classes. Tests are consistent with the implementation. No correctness issues found.
✓ Security Reliability
The PR correctly fixes a shared-state mutation bug by replacing
openai_client.timeout = timeout(direct attribute mutation of a shared client) withopenai_client = openai_client.with_options(timeout=timeout)(immutable copy with the new timeout). The change is surgical, confined to one line at _agent.py:268, and fully covered by the updated tests. No security or reliability issues were identified.
✓ Test Coverage
The production fix is correct —
openai_client.with_options(timeout=timeout)is called and its return value is reassigned. All four timeout tests verify the call was made and the original mock was not mutated. However, none of them assert that the return value ofwith_optionsis actually stored in the constructed instance's.clientattribute. A future regression that callswith_optionsbut discards the return value would pass all tests while silently ignoring the timeout.
✓ Design Approach
I did not find a design-approach issue in this change. The new
with_options(timeout=...)call inpython/packages/foundry/agent_framework_foundry/_agent.pyscopes timeout to the wrapper-specific OpenAI client instance without mutating the shared client returned byproject_client.get_openai_client(...), and that matches the new tests' stated invariant. I also did not find conflicting lifecycle or ownership behavior in the surrounding Foundry wrapper code that would make this approach unsafe.
Automated review by automated agents
…ent (microsoft#6241) The four timeout propagation tests verified that with_options was called but did not confirm that the returned (timeout-configured) client was actually stored on the instance. A silent discard of the return value would have left the tests green while the timeout had no effect. Each test now captures the constructed instance and asserts: assert <instance>.client is openai_client_mock.with_options.return_value Affected tests: - test_raw_foundry_agent_chat_client_init_applies_timeout_to_openai_client - test_raw_foundry_agent_chat_client_init_applies_timeout_with_preview_enabled - test_foundry_agent_chat_client_init_propagates_timeout - test_foundry_agent_init_propagates_timeout_to_openai_client Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
moonbox3
left a comment
There was a problem hiding this comment.
Automated Code Review
Reviewers: 4 | Confidence: 93% | Result: All clear
Reviewed: Correctness, Security Reliability, Test Coverage, Design Approach
Automated review by automated agents
Motivation and Context
On Azure AI endpoints, idle connections between conversation turns can be recycled by the network, causing the next request to re-establish a TCP connection. The OpenAI SDK's default connect timeout (5 s) is too short for Azure AI Foundry endpoints under load, leaving users no way to override it—resulting in
httpx.ConnectTimeout→openai.APITimeoutErroron every second (and subsequent)agent.run()call.Fixes #6241
Description
The root cause is that
FoundryAgent(and the underlyingRawFoundryAgentChatClient) created theAsyncOpenAI/AsyncAzureOpenAIclient without exposing any timeout knob, so calers were stuck with the SDK's hardcoded 5 s connect timeout. The fix adds an optionaltimeout: float | Noneparameter toFoundryAgent,RawFoundryAgent,_FoundryAgentChatClient,RawFoundryAgentChatClient, andRawOpenAIChatClient, threading it down toload_openai_service_settingswhere it is forwarded as thetimeoutargument when constructing the underlying async OpenAI client. Whentimeout=None(the default), existing behavior is preserved. Tests cover that a non-Nonetimeout is applied to the client, thatNoneleaves the client's default intact, and that the parameter is present (not absorbed by**kwargs) on all public constructors.Contribution Checklist