perf(tracing): skip span-start upsert by default (end-only ingest)#394
Draft
NiteshDhanpal wants to merge 2 commits into
Draft
perf(tracing): skip span-start upsert by default (end-only ingest)#394NiteshDhanpal wants to merge 2 commits into
NiteshDhanpal wants to merge 2 commits into
Conversation
Tracing writes each span twice — once on start (no end_time) and once on end — so the start row is only ever overwritten by the end write moments later. Persisting it doubles span-ingest write volume and, on the SGP backend, costs a non-HOT UPDATE (tsvector/GIN recompute + index churn) plus a dead tuple per span. Skip the span-start upsert by default so each span is persisted once, on end (a single INSERT). Set AGENTEX_TRACING_SKIP_SPAN_START=0/false/no/off to restore the start write when in-flight or never-ending spans must be visible. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
This PR is targeting The
See |
The end-only skip is governed by AGENTEX_TRACING_SKIP_SPAN_START (default ON) but was silent — an operator could only infer it from the absence of start-export metrics. Emit a one-time INFO at processor init stating whether span-start upsert is enabled or skipped, so the deployment's tracing mode is visible in logs. Off the hot path (once per construction). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Move the "end-only span ingest" optimization (originally done on the EGP backend in scaleapi#145863) into the SDK producer, so the wasted span-start write is never sent on the wire in the first place.
Tracing writes each span twice — once on start (no
end_time) and once on end. The start row is only ever overwritten by the end write moments later, so persisting it:joined_data_tsvectorGIN index is recomputed whenoutputarrives on the end write) plus a dead tuple.This change makes
SGPSyncTracingProcessor/SGPAsyncTracingProcessorskip the span-start upsert, so each span is persisted once, on end (a singleINSERTserver-side). Doing it in the SDK also eliminates the wasted start HTTP call entirely — not just the DB write.Behavior
on_span_start(sync) andon_spans_start(async) become no-ops by default.AGENTEX_TRACING_SKIP_SPAN_START=0(false/no/offalso work) to restore the start write — e.g. if you need in-flight spans visible before they complete, or spans that never end (process crash) to still be persisted.on_span_end/on_spans_endare unchanged; the end write already carries the full span (start_time + end_time + input/output), so nothing is lost in the default path for spans that complete normally.Parsing mirrors the SDK's existing
AGENTEX_TRACING_METRICSconvention (raw not in ("0","false","no","off")).Trade-offs
on_span_end) are not persisted.Both are reversible per-deployment via the env var.
Tests
on_span_start/on_spans_startare no-ops by default; emit when skip is disabled;_skip_span_start_enabled()env parsing (default + falsy/other values).AGENTEX_TRACING_SKIP_SPAN_START=0so they continue to exercise the start path.test_sgp_tracing_processor.pypass;ruff checkclean.Note
Branched off
v0.11.8per request; rebased ontomain(0.12.0) for a clean PR — the processor file and its tests are byte-identical between the tag andmain, so the code change is exactly the same.🤖 Generated with Claude Code
Greptile Summary
This PR moves the "end-only span ingest" optimization into the SDK producer itself, so the wasted span-start HTTP write is never sent on the wire. Each span is now persisted exactly once — on end — carrying the full payload (start_time, end_time, input, output), eliminating the non-HOT UPDATE / dead-tuple cost on the SGP backend.
_skip_span_start_enabled()readsAGENTEX_TRACING_SKIP_SPAN_STARTon every call (mirroring the existingAGENTEX_TRACING_METRICSconvention), defaults to ON, and can be reverted per-deployment by setting the var to0/false/no/off.SGPSyncTracingProcessor.on_span_startandSGPAsyncTracingProcessor.on_spans_startgain an early-return guard;on_span_end/on_spans_endare untouched.AGENTEX_TRACING_SKIP_SPAN_START=0to keep exercising the start-write path.Confidence Score: 5/5
Safe to merge — the change is a simple early-return guard on the span-start path with no effect on span-end writes, and the opt-out env var is documented and tested.
The implementation is minimal: two early-return guards, one helper function reading an env var, and startup log lines. The end-write path (which carries the full span payload) is completely unchanged. Tests cover the default behavior, the opt-in restore path, and env-var parsing including case/whitespace variants. The existing cross-pod (end-without-start) and batching tests continue to pass. There are no data-loss risks for spans that complete normally, and the trade-offs for in-flight or crashed spans are documented and reversible.
No files require special attention.
Important Files Changed
_skip_span_start_enabled()env-var guard (default ON) that makeson_span_start/on_spans_startearly-return no-ops;on_span_end/on_spans_endare unchanged. Startup log added to both processor__init__s.monkeypatch.setenv("AGENTEX_TRACING_SKIP_SPAN_START", "0")on the 5 existing tests that exercise the start-write path. All 31 tests remain valid.Sequence Diagram
sequenceDiagram participant SDK as SDK Processor participant Env as os.environ participant SGP as SGP Backend Note over SDK: Span starts SDK->>Env: _skip_span_start_enabled()? alt AGENTEX_TRACING_SKIP_SPAN_START unset or truthy (default) Env-->>SDK: True — early return (no-op) Note over SGP: No HTTP write for span start else "AGENTEX_TRACING_SKIP_SPAN_START=0/false/no/off" Env-->>SDK: False — proceed SDK->>SGP: upsert span (no end_time) end Note over SDK: Span ends (always) SDK->>SGP: upsert span (start_time + end_time + input/output) Note over SGP: Single INSERT (clean row, no dead tuple)Reviews (2): Last reviewed commit: "observability: log resolved span-start m..." | Re-trigger Greptile