perf(tracing): raise span-queue batch defaults and make batch_size env-tunable#395
Open
NiteshDhanpal wants to merge 1 commit into
Open
perf(tracing): raise span-queue batch defaults and make batch_size env-tunable#395NiteshDhanpal wants to merge 1 commit into
NiteshDhanpal wants to merge 1 commit into
Conversation
…v-tunable The async span queue batched at 50 spans / 100ms linger. For high-volume span ingest that means many small upsert_batch PUTs — each a separate HTTP round trip and a separate INSERT statement on the backend. Raise the defaults to 200 spans / 250ms so batches fill before flushing, amortizing the per-request and per-statement overhead (still well under the backend's 1000-row cap). Also make batch_size resolvable from AGENTEX_SPAN_QUEUE_BATCH_SIZE, matching the existing env-override pattern for linger_ms / max_size / max_retries / concurrency (batch_size was the only queue knob not tunable without an SDK release). Resolution order: explicit arg > AGENTEX_SPAN_QUEUE_BATCH_SIZE env > default. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
This PR is targeting The
See |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Tune the async span-queue's batching so high-volume span ingest sends fewer, fuller batches, and make the batch size configurable per-deploy.
Defaults raised (
span_queue.py):_DEFAULT_BATCH_SIZE50 → 200_DEFAULT_LINGER_MS100 → 250At 50 spans / 100ms, the queue ships many small
upsert_batchPUTs — each a separate HTTP round trip and a separateINSERT ... ON CONFLICTstatement on the backend, and (downstream) more, smaller ClickHouse parts to merge. Spans typically arrive a few ms apart, so a 100ms linger rarely fills a batch. Raising to 200 / 250ms lets batches fill before flushing, amortizing per-request and per-statement overhead. 200 stays well under the backend's 1000-row batch cap.batch_sizeis now env-tunable. It was the only queue knob without anAGENTEX_SPAN_QUEUE_*override — every other parameter (linger_ms,max_size,max_retries,concurrency) already reads one. AddedAGENTEX_SPAN_QUEUE_BATCH_SIZEwith the same_read_int_envpattern, so the batch size can be tuned per-deploy without an SDK release.Resolution order (matches the other knobs): explicit constructor arg >
AGENTEX_SPAN_QUEUE_BATCH_SIZEenv > default, clamped to a minimum of 1.Trade-offs
Larger batches + longer linger slightly increase worst-case in-memory dwell and the loss window if a producer crashes before a flush (bounded by linger + queue semantics). The values keep worst-case ingest latency sub-second.
Tests
TestAsyncSpanQueueBatchSizeConfig: default, explicit-arg override, min-1 clamp, env override, explicit-arg-beats-env, and invalid-env-falls-back-to-default. All 37 tests intest_span_queue.pypass;ruff checkclean.Note
Independent of #394 (skip span-start upsert) — that changes whether a write happens; this changes how writes are batched. Branched off
main.🤖 Generated with Claude Code
Greptile Summary
This PR raises the default
batch_size(50→200) andlinger_ms(100→250) for the async span queue to reduce the number of smallupsert_batchHTTP round trips under high span volume. It also closes the only knob that lacked anAGENTEX_SPAN_QUEUE_*env override by addingAGENTEX_SPAN_QUEUE_BATCH_SIZE, following the identical resolution pattern (explicit arg > env > default, clamped to minimum 1) used by every other parameter.span_queue.py: Default constants raised with detailed rationale comments;batch_sizeconstructor parameter changed fromint = _DEFAULT_BATCH_SIZEtoint | None = Noneto support env-driven resolution.test_span_queue.py: NewTestAsyncSpanQueueBatchSizeConfigclass covers default, explicit override, min-1 clamp, env override, explicit-beats-env, and invalid-env-fallback scenarios, all consistent with patterns used for other queue knobs.Confidence Score: 5/5
Safe to merge — the changes are limited to tuning constants and adding a missing env-var override that follows an established, well-tested pattern already used by every other queue knob.
Both changes are narrow and mechanical: the constant bumps are well-justified and bounded below the backend's documented 1000-row cap, and the new env-override path mirrors the identical pattern applied to linger_ms, max_retries, and concurrency. Six dedicated tests cover all resolution-order branches, and asyncio_mode=auto ensures they actually run.
No files require special attention.
Important Files Changed
Flowchart
%%{init: {'theme': 'neutral'}}%% flowchart TD A[AsyncSpanQueue.__init__ called] --> B{batch_size arg provided?} B -- "None (default)" --> C["Read AGENTEX_SPAN_QUEUE_BATCH_SIZE env"] C --> D{Env var set?} D -- "Yes, valid int" --> E["max(1, int(env)) → _batch_size"] D -- "Yes, invalid" --> F["Log warning → _DEFAULT_BATCH_SIZE (200)"] D -- "Not set" --> G["_DEFAULT_BATCH_SIZE (200)"] B -- "Explicit int" --> H["max(1, batch_size) → _batch_size"] E --> Z[_batch_size resolved] F --> Z G --> Z H --> Z Z --> I["Drain loop uses _batch_size as batch fill cap"]Reviews (1): Last reviewed commit: "perf(tracing): raise span-queue batch de..." | Re-trigger Greptile