Skip to content

feat(buildrunner): add vendor-agnostic BuildRunner interface + RFC#175

Merged
behinddwalls merged 2 commits into
mainfrom
preetam/build-mgr-interface
Jun 3, 2026
Merged

feat(buildrunner): add vendor-agnostic BuildRunner interface + RFC#175
behinddwalls merged 2 commits into
mainfrom
preetam/build-mgr-interface

Conversation

@behinddwalls
Copy link
Copy Markdown
Collaborator

@behinddwalls behinddwalls commented Jun 2, 2026

Summary

Why?

The build stage needs a vendor-agnostic abstraction for talking to a
Build Runner — one that fits the existing extension family (storage,
queue, conflict) and that does not have to change when the first real
backend lands. Design rationale lives in doc/rfc/build-runner.md.

What?

Introduces the BuildRunner contract — three verbs, all keyed by an entity.BuildID: Trigger submits a build (ordered base + head change sets plus a free-form metadata map) and returns its ID; Status fetches the current BuildStatus and runner-defined metadata; Cancel requests cancellation. See extension/buildrunner/build_runner.go for the signatures.

Highlights:

  • Trigger takes ordered base (assumed-good prefix from dependency
    batches) and head (changes being verified) — a runner can cache or
    short-circuit a prefix it has validated before, and attribute terminal
    failures to base vs head via BuildMetadata.
  • Build identifiers are the typed entity.BuildID (a lightweight
    {ID string} value, also the queue payload envelope) rather than a
    bare string, so the runner contract and the on-queue messages share
    one identifier type and the compiler distinguishes a build ID from
    other string IDs.
  • metadata is a free-form map[string]string for caller-supplied
    attributes (requester, ticket ID, trace ID) the runner MAY persist or
    echo back via Status.
  • Trigger and Cancel return promptly; Status MAY be lengthy.
  • The BuildStatus enum is narrowed to Accepted / Running /
    Succeeded / Failed / Cancelled so every backend can map its
    native lifecycle without leaking runner-specific stages.
  • BuildMetadata is added as a free-form map for round-tripping
    runner-defined attributes (build URL, duration, SHA, etc.).

Naming: the interface is BuildRunner (not BuildManager — the
"Manager" suffix doesn't describe what it does, and "Build" alone would
collide cognitively with entity.Build). The package is
extension/buildrunner to follow the repo convention used by
mergechecker, changeprovider, pusher, counter, and storage.

Implementation (noop backend, queue PublishAfter primitive, controller
wiring, and the poll-driven buildsignal loop) lands in the follow-up
PRs stacked on top of this one.

Test Plan

  • make test — entity/build tests pass; mock-only consumers of the
    interface compile and pass.
  • make fmt lint check-tidy check-gazelle check-mocks — clean.
  • make build — all targets compile.

Stack

  1. @ feat(buildrunner): add vendor-agnostic BuildRunner interface + RFC #175
  2. feat(buildrunner/noop): add pass-through BuildRunner stub #179
  3. feat(queue): add Publisher.PublishAfter delay primitive #176
  4. feat(buildrunner): noop impl, poll-driven buildsignal, pipeline wiring #177

@behinddwalls behinddwalls force-pushed the preetam/build-mgr-interface branch 2 times, most recently from 975793b to 293db26 Compare June 2, 2026 17:47
@behinddwalls behinddwalls changed the title feat(build): add vendor-agnostic BuildManager interface + RFC feat(buildrunner): add vendor-agnostic BuildRunner interface + RFC Jun 2, 2026
@behinddwalls behinddwalls marked this pull request as ready for review June 2, 2026 17:58
@behinddwalls behinddwalls requested review from a team and sbalabanov as code owners June 2, 2026 17:58
@behinddwalls behinddwalls force-pushed the preetam/build-mgr-interface branch from 2b8dd80 to 293db26 Compare June 2, 2026 18:02
@behinddwalls behinddwalls force-pushed the preetam/build-mgr-interface branch from 293db26 to 9e9cf91 Compare June 2, 2026 18:09
Comment thread doc/rfc/build-runner.md Outdated
Comment thread doc/rfc/build-runner.md Outdated
Comment thread extension/buildrunner/README.md Outdated
@behinddwalls behinddwalls force-pushed the preetam/build-mgr-interface branch 2 times, most recently from dc4b4cb to b6f1414 Compare June 2, 2026 18:47
@behinddwalls behinddwalls requested a review from JamyDev June 2, 2026 18:47
@behinddwalls behinddwalls force-pushed the preetam/build-mgr-interface branch from b6f1414 to 518ef64 Compare June 2, 2026 18:57
@albertywu
Copy link
Copy Markdown
Contributor

albertywu commented Jun 2, 2026

When would the orchestration engine call the synchronous build Status API vs use the buildsignal topic to handle reading build results asynchronously?

edit: nvm i see your 4th PR here: #177

@behinddwalls behinddwalls force-pushed the preetam/build-mgr-interface branch 3 times, most recently from 7d6bf16 to f612697 Compare June 2, 2026 22:53
## Summary

### Why?

The build stage needs a vendor-agnostic abstraction for talking to a
Build Runner — one that fits the existing extension family (storage,
queue, conflict) and that does not have to change when the first real
backend lands. Design rationale lives in `doc/rfc/build-runner.md`.

### What?

Introduces the `BuildRunner` contract — three verbs, all keyed by an `entity.BuildID`: `Trigger` submits a build (ordered `base` + `head` change sets plus a free-form metadata map) and returns its ID; `Status` fetches the current `BuildStatus` and runner-defined metadata; `Cancel` requests cancellation. See `extension/buildrunner/build_runner.go` for the signatures.

Highlights:

- `Trigger` takes ordered `base` (assumed-good prefix from dependency
  batches) and `head` (changes being verified) — a runner can cache or
  short-circuit a prefix it has validated before, and attribute terminal
  failures to base vs head via `BuildMetadata`.
- Build identifiers are the typed `entity.BuildID` (a lightweight
  `{ID string}` value, also the queue payload envelope) rather than a
  bare `string`, so the runner contract and the on-queue messages share
  one identifier type and the compiler distinguishes a build ID from
  other string IDs.
- `metadata` is a free-form `map[string]string` for caller-supplied
  attributes (requester, ticket ID, trace ID) the runner MAY persist or
  echo back via `Status`.
- `Trigger` and `Cancel` return promptly; `Status` MAY be lengthy.
- The `BuildStatus` enum is narrowed to `Accepted` / `Running` /
  `Succeeded` / `Failed` / `Cancelled` so every backend can map its
  native lifecycle without leaking runner-specific stages.
- `BuildMetadata` is added as a free-form map for round-tripping
  runner-defined attributes (build URL, duration, SHA, etc.).

Naming: the interface is `BuildRunner` (not `BuildManager` — the
"Manager" suffix doesn't describe what it does, and "Build" alone would
collide cognitively with `entity.Build`). The package is
`extension/buildrunner` to follow the repo convention used by
`mergechecker`, `changeprovider`, `pusher`, `counter`, and `storage`.

Implementation (noop backend, queue `PublishAfter` primitive, controller
wiring, and the poll-driven `buildsignal` loop) lands in the follow-up
PRs stacked on top of this one.

## Test Plan

- ✅ `make test` — entity/build tests pass; mock-only consumers of the
  interface compile and pass.
- ✅ `make fmt lint check-tidy check-gazelle check-mocks` — clean.
- ✅ `make build` — all targets compile.
@behinddwalls behinddwalls force-pushed the preetam/build-mgr-interface branch from f612697 to 4f0d367 Compare June 2, 2026 23:08
@behinddwalls behinddwalls merged commit ab8796d into main Jun 3, 2026
13 checks passed
@behinddwalls behinddwalls deleted the preetam/build-mgr-interface branch June 3, 2026 02:00
behinddwalls added a commit that referenced this pull request Jun 3, 2026
## Summary
### Why?

Wiring tests for the orchestrator's build stage need a `BuildRunner`
implementation, but real backends (BuildKite, Jenkins, etc.) won't land
until later. A noop that immediately reports success is also useful as
the local default in `make local-start` and as a best-case baseline
where every build passes.

### What?

Adds `extension/buildrunner/noop` — a `BuildRunner` that does no real
work. Every `Trigger` returns a unique `noop-<n>` build ID with no
error; every `Status` returns `BuildStatusSucceeded` with no metadata;
`Cancel` is a no-op. The atomic counter keeps IDs unique under
concurrent use.

## Test Plan
- ✅ `make test` — 36 unit tests pass, including the new
  `extension/buildrunner/noop:noop_test` (Trigger uniqueness, Status,
  Cancel, interface satisfaction).
- ✅ `make fmt lint check-tidy check-gazelle check-mocks` — clean.
- ✅ `make build` — all targets compile.

## Issues


## Stack
1. #175
1. @ #179
1. #176
1. #177
behinddwalls added a commit that referenced this pull request Jun 3, 2026
## Summary
### Why?

The orchestrator's build poll loop needs to space `Status` calls out
without consuming `retry_count` / DLQ retry slots. `Delivery.Nack` does
schedule redelivery after a delay, but it overloads `retry_count` —
which the operator relies on to flag real failures. The fix is a
separate "postpone this work" primitive, distinct from "this delivery
failed, try again", so both signals stay meaningful.

### What?

Adds `Publisher.PublishAfter(ctx, topic, msg, delayMs)` to the queue
extension: a fresh message inserted into the topic, made visible to
subscribers only after `delayMs` from now. Distinct from
`Delivery.Nack(requeueAfterMillis)`:

- `Nack` is "this delivery failed, try again" — it bumps `retry_count`
  and eventually trips DLQ.
- `PublishAfter` is "postpone this work" — `retry_count` resets to 0,
  DLQ stays available for true failures.

SQL backing in `extension/queue/mysql`:

- New `visible_after BIGINT UNSIGNED NOT NULL DEFAULT 0` column on
  `queue_messages`. Default 0 means immediately visible — back-compat
  for any existing rows.
- `messageStore.InsertDelayed(ctx, topic, messages, visibleAfterMs)` is
  the underlying primitive. `Insert` is now a thin wrapper that passes
  `visibleAfterMs = 0`.
- `FetchByOffset` gains a `nowMs` parameter and an
  `AND visible_after <= ?` predicate so subscribers skip rows whose
  delivery is still deferred.
- `MoveToDLQ` writes `visible_after = 0` explicitly: any delay on the
  original message has already been consumed by the time it failed.

The first consumer of `PublishAfter` is the orchestrator's poll-driven
`buildsignal` loop, which lands in the stacked PR on top of this one.

## Test Plan
- ✅ `bazel test //extension/queue/...` — all pass, including new
  coverage for `PublishAfter` (positive / zero / negative delay,
  closed-publisher) and for `FetchByOffset` skipping deferred rows via
  the new `visible_after` predicate.
- ✅ `make fmt lint check-tidy check-gazelle check-mocks` — clean.
- ✅ `make build` — all targets compile.

## Issues


## Stack
1. #175
1. #179
1. @ #176
1. #177
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants