Skip to content

Python: Add timeout parameter to FoundryAgent to fix ConnectTimeout on multi-turn conversations#6263

Open
moonbox3 wants to merge 5 commits into
microsoft:mainfrom
moonbox3:agent/fix-6241-1
Open

Python: Add timeout parameter to FoundryAgent to fix ConnectTimeout on multi-turn conversations#6263
moonbox3 wants to merge 5 commits into
microsoft:mainfrom
moonbox3:agent/fix-6241-1

Conversation

@moonbox3
Copy link
Copy Markdown
Contributor

@moonbox3 moonbox3 commented Jun 2, 2026

Motivation and Context

On Azure AI endpoints, idle connections between conversation turns can be recycled by the network, causing the next request to re-establish a TCP connection. The OpenAI SDK's default connect timeout (5 s) is too short for Azure AI Foundry endpoints under load, leaving users no way to override it—resulting in httpx.ConnectTimeoutopenai.APITimeoutError on every second (and subsequent) agent.run() call.

Fixes #6241

Description

The root cause is that FoundryAgent (and the underlying RawFoundryAgentChatClient) created the AsyncOpenAI / AsyncAzureOpenAI client without exposing any timeout knob, so calers were stuck with the SDK's hardcoded 5 s connect timeout. The fix adds an optional timeout: float | None parameter to FoundryAgent, RawFoundryAgent, _FoundryAgentChatClient, RawFoundryAgentChatClient, and RawOpenAIChatClient, threading it down to load_openai_service_settings where it is forwarded as the timeout argument when constructing the underlying async OpenAI client. When timeout=None (the default), existing behavior is preserved. Tests cover that a non-None timeout is applied to the client, that None leaves the client's default intact, and that the parameter is present (not absorbed by **kwargs) on all public constructors.

Contribution Checklist

  • The code builds clean without any errors or warnings
  • The PR follows the Contribution Guidelines
  • All unit tests pass, and I have added new tests where possible
  • Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.

Note: PR autogenerated by an agent

Copilot and others added 3 commits June 2, 2026 09:40
…icrosoft#6241)

Expose a `timeout` parameter on `RawFoundryAgentChatClient`,
`_FoundryAgentChatClient`, `RawFoundryAgent`, `FoundryAgent`, and
`RawOpenAIChatClient` so callers can override the HTTP timeout used by
the underlying AsyncOpenAI client.

Root cause: `RawFoundryAgentChatClient.__init__` called
`project_client.get_openai_client()` without configuring any timeout,
inheriting the OpenAI SDK default of `httpx.Timeout(connect=5.0)`.
When connections are recycled between turns under load, the 5 s connect
timeout fires and surfaces as `openai.APITimeoutError`.

Fix:
- `load_openai_service_settings` (`_shared.py`): accept `timeout` and
  include it in `client_args` for all three `AsyncOpenAI`/
  `AsyncAzureOpenAI` construction paths.
- `RawOpenAIChatClient.__init__` (`_chat_client.py`): accept `timeout`
  and forward to `load_openai_service_settings`.
- `RawFoundryAgentChatClient.__init__` (`_agent.py`): accept `timeout`
  and set `openai_client.timeout = timeout` on the client returned by
  `get_openai_client()` before passing it to the base class.
- `_FoundryAgentChatClient`, `RawFoundryAgent`, `FoundryAgent`: accept
  and propagate `timeout` through the construction chain.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Expose a timeout parameter on RawFoundryAgentChatClient,
_FoundryAgentChatClient, RawFoundryAgent, FoundryAgent, and
RawOpenAIChatClient. When provided, the value is applied to the
underlying AsyncOpenAI client so that connect timeouts under load
or after connection recycling can be tuned by callers.

Previously, get_openai_client() was called without any timeout
override, so the SDK default of httpx.Timeout(connect=5.0) was
inherited and could fire on multi-turn conversations where the
underlying connection is recycled between turns.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 2, 2026 10:08
@moonbox3 moonbox3 added the python label Jun 2, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an optional timeout: float | None parameter through FoundryAgent_FoundryAgentChatClientRawFoundryAgentChatClientRawOpenAIChatClientload_openai_service_settings, so callers can override the OpenAI SDK's 5s default connect timeout that was causing ConnectTimeout on multi-turn Foundry conversations (issue #6241). Also bumps some azure-ai-agentserver-* dependencies and applies minor formatting touch-ups in unrelated test/source files.

Changes:

  • Thread a timeout parameter through the OpenAI/Foundry client constructors and into the underlying AsyncOpenAI/AsyncAzureOpenAI client.
  • Add unit tests asserting the parameter is exposed (not absorbed by **kwargs) and that the timeout is/isn't applied based on None vs non-None.
  • Refresh uv.lock (notably azure-ai-agentserver-core 2.0.0b3→2.0.0b5, azure-ai-agentserver-responses 1.0.0b5→1.0.0b7, new microsoft-opentelemetry) plus minor formatting tweaks in bedrock and foundry_hosting tests.

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
python/packages/openai/agent_framework_openai/_shared.py Accept timeout and forward to AsyncOpenAI / AsyncAzureOpenAI constructors across all routing branches.
python/packages/openai/agent_framework_openai/_chat_client.py Add timeout kwarg + docstrings to all RawOpenAIChatClient.__init__ overloads; pass into load_openai_service_settings.
python/packages/openai/tests/openai/test_openai_chat_client.py Tests that timeout is an explicit parameter and accepted when a preconfigured client is supplied.
python/packages/foundry/agent_framework_foundry/_agent.py Add timeout to RawFoundryAgentChatClient, _FoundryAgentChatClient, RawFoundryAgent, FoundryAgent; mutate openai_client.timeout post-get_openai_client.
python/packages/foundry/tests/foundry/test_foundry_agent.py Coverage for timeout propagation and the None no-op across all four entry points.
python/packages/bedrock/agent_framework_bedrock/_chat_client.py Formatting collapse of a few multi-line literals (no behavior change).
python/packages/bedrock/tests/test_bedrock_structured_output.py Cosmetic blank line.
python/packages/foundry_hosting/tests/test_responses.py Collapse two list comprehensions onto single lines (cosmetic).
python/uv.lock Bump azure-ai-agentserver-core/responses and pull in microsoft-opentelemetry + opentelemetry-instrumentation-httpx/openai*/util-genai.

Comment thread python/packages/foundry/agent_framework_foundry/_agent.py
Copy link
Copy Markdown
Contributor Author

@moonbox3 moonbox3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 4 | Confidence: 74% | Result: All clear

Reviewed: Correctness, Security Reliability, Test Coverage, Design Approach


Automated review by automated agents

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 2, 2026

Python Test Coverage

Python Test Coverage Report •
FileStmtsMissCoverMissing
packages/bedrock/agent_framework_bedrock
   _chat_client.py4459877%304–305, 321–330, 336, 404, 413, 424, 426, 428, 433, 452–453, 477, 490, 502, 505, 513–514, 517–518, 520–521, 526–528, 530, 540–541, 563, 570, 579–580, 582–583, 585–587, 589, 591–592, 598–600, 603–604, 610–613, 619–629, 632, 651, 656, 701–702, 715, 741, 753, 758, 786, 790–791, 794, 812, 836, 848, 852, 866, 874–875, 879, 881–888
packages/foundry/agent_framework_foundry
   _agent.py2425676%119, 122, 244–245, 249–251, 256–259, 352, 425–426, 438–439, 451–453, 455–456, 458–464, 466–467, 469, 471, 477–479, 482–491, 495–496, 694–695, 698, 724, 734, 750, 820, 825, 829
packages/openai/agent_framework_openai
   _chat_client.py108414886%276, 289, 639–643, 651–654, 660–664, 714–721, 723–725, 732–734, 780, 788, 811, 929, 1028, 1087, 1089, 1091, 1093, 1159, 1173, 1253, 1263, 1268, 1311, 1422–1423, 1438, 1647, 1652, 1656–1658, 1662–1663, 1746, 1756, 1783, 1789, 1799, 1805, 1810, 1816, 1821–1822, 1841, 1844–1847, 1861, 1863, 1871–1872, 1884, 1926, 2016, 2038–2039, 2054–2055, 2073–2074, 2117, 2283, 2321–2322, 2340, 2420–2428, 2458, 2568, 2603, 2618, 2638–2648, 2661, 2672–2676, 2690, 2704–2715, 2724, 2756–2759, 2769–2770, 2781–2783, 2797–2799, 2809–2810, 2816, 2831
   _shared.py1561689%223, 242–244, 256, 266, 278, 284, 306, 310, 344–345, 364, 383–384, 386
TOTAL37790442388% 

Python Unit Test Overview

Tests Skipped Failures Errors Time
7516 34 💤 0 ❌ 0 🔥 1m 56s ⏱️

… timeout (microsoft#6241)

Replace direct assignment  with
 in
RawFoundryAgentChatClient.__init__.

The Azure AI Projects SDK caches and returns a shared AsyncOpenAI client
per AIProjectClient. Mutating its .timeout attribute leaked the override
to all other code paths sharing that client (other agents, user code).
with_options() returns a new client instance with the override applied,
leaving the original shared client untouched.

Update tests to assert with_options is called with the correct timeout
and that the original shared client's timeout attribute is not mutated.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 4 | Confidence: 89%

✓ Correctness

The PR correctly adds a timeout parameter through the FoundryAgent, RawFoundryAgent, _FoundryAgentChatClient, RawFoundryAgentChatClient, and RawOpenAIChatClient constructors, threading it down to load_openai_service_settings where it is applied to all three client construction paths (AsyncOpenAI for OpenAI, AsyncOpenAI for Azure /openai/v1 endpoint, AsyncAzureOpenAI for standard Azure). The Foundry agent correctly mutates the client's timeout attribute after obtaining it from the factory. Control flow, attribute mutation ordering, and MRO propagation are all correct. No correctness bugs found.

✓ Security Reliability

The PR cleanly threads a timeout parameter through all OpenAI/Azure client construction paths. The timeout is applied correctly: as a constructor argument for newly-created clients in load_openai_service_settings, and via direct attribute mutation for Foundry's project_client.get_openai_client() path. No input validation issues, injection risks, or resource leaks were found. The silent no-op when both async_client and timeout are provided to RawOpenAIChatClient is tested and intentional (user is responsible for configuring their own pre-built client).

✓ Test Coverage

The PR adds comprehensive test coverage for the timeout parameter in the foundry package (verifying timeout is applied/not-applied to the client). However, the openai package has a significant test coverage gap: load_openai_service_settings has 3 newly-added code paths that thread timeout into AsyncOpenAI() / AsyncAzureOpenAI() constructors, but none of these paths have unit tests. The only openai-package test (test_raw_openai_chat_client_accepts_preconfigured_client_with_timeout) exercises the pre-configured client path where timeout is intentionally ignored, and asserts only client is not None. A test verifying timeout is actually applied when RawOpenAIChatClient constructs its own client (the primary fix path) would strengthen confidence in the openai-layer implementation.

✗ Design Approach

The timeout plumbing is mostly consistent, but one design gap remains in the Foundry wrapper path: after adding timeout as part of RawFoundryAgentChatClient's public configuration, converting that client with as_agent() still recreates a FoundryAgent without preserving the timeout, which silently falls back to the SDK default on that path.

Flagged Issues

  • RawFoundryAgentChatClient.as_agent() drops the newly added timeout setting. The method is documented to "reuse this client's Foundry configuration" but rebuilds FoundryAgent with only project_client, agent_name, and agent_version (lines 298-315). A caller using RawFoundryAgentChatClient(..., timeout=60).as_agent() will silently lose the timeout and fall back to the SDK default.

Automated review by moonbox3's agents

Comment thread python/packages/openai/tests/openai/test_openai_chat_client.py
Copy link
Copy Markdown
Contributor Author

@moonbox3 moonbox3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 4 | Confidence: 91%

✓ Correctness

The PR correctly fixes a shared-client mutation bug by replacing openai_client.timeout = timeout with openai_client = openai_client.with_options(timeout=timeout) in RawFoundryAgentChatClient.__init__. The returned client is immediately reassigned and passed to super().__init__, so the timeout is applied correctly. Both _FoundryAgentChatClient and FoundryAgent funel their timeout parameters through this single code path (lines 585 and 1003), so the fix covers all three public classes. Tests are consistent with the implementation. No correctness issues found.

✓ Security Reliability

The PR correctly fixes a shared-state mutation bug by replacing openai_client.timeout = timeout (direct attribute mutation of a shared client) with openai_client = openai_client.with_options(timeout=timeout) (immutable copy with the new timeout). The change is surgical, confined to one line at _agent.py:268, and fully covered by the updated tests. No security or reliability issues were identified.

✓ Test Coverage

The production fix is correct — openai_client.with_options(timeout=timeout) is called and its return value is reassigned. All four timeout tests verify the call was made and the original mock was not mutated. However, none of them assert that the return value of with_options is actually stored in the constructed instance's .client attribute. A future regression that calls with_options but discards the return value would pass all tests while silently ignoring the timeout.

✓ Design Approach

I did not find a design-approach issue in this change. The new with_options(timeout=...) call in python/packages/foundry/agent_framework_foundry/_agent.py scopes timeout to the wrapper-specific OpenAI client instance without mutating the shared client returned by project_client.get_openai_client(...), and that matches the new tests' stated invariant. I also did not find conflicting lifecycle or ownership behavior in the surrounding Foundry wrapper code that would make this approach unsafe.


Automated review by automated agents

Comment thread python/packages/foundry/tests/foundry/test_foundry_agent.py
Comment thread python/packages/foundry/tests/foundry/test_foundry_agent.py
Comment thread python/packages/foundry/tests/foundry/test_foundry_agent.py
Comment thread python/packages/foundry/tests/foundry/test_foundry_agent.py
@moonbox3 moonbox3 enabled auto-merge June 2, 2026 10:38
…ent (microsoft#6241)

The four timeout propagation tests verified that with_options was called
but did not confirm that the returned (timeout-configured) client was
actually stored on the instance. A silent discard of the return value
would have left the tests green while the timeout had no effect.

Each test now captures the constructed instance and asserts:
  assert <instance>.client is openai_client_mock.with_options.return_value

Affected tests:
- test_raw_foundry_agent_chat_client_init_applies_timeout_to_openai_client
- test_raw_foundry_agent_chat_client_init_applies_timeout_with_preview_enabled
- test_foundry_agent_chat_client_init_propagates_timeout
- test_foundry_agent_init_propagates_timeout_to_openai_client

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

@moonbox3 moonbox3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 4 | Confidence: 93% | Result: All clear

Reviewed: Correctness, Security Reliability, Test Coverage, Design Approach


Automated review by automated agents

@moonbox3 moonbox3 requested review from giles17 and semenshi June 2, 2026 11:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Python: [Bug]: FoundryAgent causing ConnectTimeout on multi-turn conversations

3 participants