add(ci): benchmark real-world migration workloads by mrjf · Pull Request #107 · githubnext/apm

mrjf · 2026-06-04T15:18:11Z

add(ci): benchmark real-world migration workloads

TL;DR

This replaces the migration benchmark’s help-heavy command set with startup baselines plus fixture-backed APM workloads. The new harness synthesizes realistic offline project state, runs both Python and Go against the same command matrix, and reports fixture/workload metadata alongside median timing and return-code parity.

Note

This remains intentionally bounded: it checks performance and return-code parity for offline fixtures, not full stdout parity or live network integration behavior.

Problem (WHY)

The previous benchmark mostly measured cold startup and help/version rendering, so it could not show whether migrated commands still behave quickly when reading APM project state.
The Markdown artifact did not name the fixture or workload behind each row, which made the result easy to overread as broader workflow coverage.
[!] A realistic benchmark still needs to stay deterministic and CI-safe, so live package downloads and network-backed install paths are out of scope for this gate.

Why these matter: benchmark evidence should be grounded in repeatable execution, because “Grounding outputs in deterministic tool execution transforms probabilistic generation into verifiable action.” The fixture scope is deliberately bounded because “Context arrives just-in-time, not just-in-case.”

Approach (WHAT)

#	Fix	Principle
1	Replace the tuple command matrix with typed benchmark commands carrying fixture and workload metadata.	“agents pattern-match well against concrete structures”
2	Generate an installed-project fixture per sample with manifest, lockfile, installed packages, local primitives, target directories, deployed prompts, and source files.	“Grounding outputs in deterministic tool execution transforms probabilistic generation into verifiable action.”
3	Keep return-code and ratio gates intact while making the Markdown/JSON artifacts explain what each row exercised.	“Add what the agent lacks, omit what it knows”
4	Update README language so it describes fixture-backed benchmark evidence instead of startup/help-only smoke evidence.	“Cite-or-omit.”

Implementation (HOW)

scripts/ci/migration_cli_benchmark.py — Adds a BenchmarkCommand model, fixture writers for empty and installed projects, and a broader command matrix covering startup, init, targets, list, deps, install --dry-run, compile --dry-run, pack --dry-run, and audit --file. The report now includes fixture names, workload descriptions, and an explicit note that the benchmark is not stdout/stderr parity.
README.md — Replaces the stale startup/help speed claim with bounded wording that names the fixture-backed project state now used by the workflow artifact.

Diagrams

No diagram is included. The change is a linear two-file benchmark/report update, and I avoided shipping unvalidated Mermaid after the local mmdc package startup did not complete in this environment.

Trade-offs

Offline fixtures over live installs. Chose deterministic fixture-backed commands; rejected live package downloads because the CI benchmark should not depend on network availability or upstream repositories.
Return-code parity over stdout parity. Preserved the existing gate shape; detailed byte counts remain in JSON, but exact output comparison belongs in parity tests.
Generated reports stay untracked. The benchmark writes Markdown/JSON evidence under tmp/ locally or runner temp in CI; those artifacts are not committed.

Benefits

The migration benchmark now exercises 11 commands, including 8 fixture-backed project workflows.
Each benchmark row names the fixture and workload, reducing ambiguity in job summaries and PR comments.
The installed-project fixture covers apm.yml, apm.lock.yaml, apm_modules, .apm primitives, target directories, deployed prompt files, and source files.
README benchmark wording now matches the evidence the workflow uploads.

Validation

python3 -m py_compile scripts/ci/migration_cli_benchmark.py:

<no output; exit 0>

.venv/bin/ruff check scripts/ci/migration_cli_benchmark.py:

All checks passed!

go build -o ./dist/apm-go ./cmd/apm:

<no output; exit 0>

git diff --check:

<no output; exit 0>

Five-repeat migration benchmark output

## Migration CLI Benchmark

Includes startup baselines plus fixture-backed real-world commands. The installed-project fixture contains apm.yml, apm.lock.yaml, apm_modules packages, local .apm primitives, target directories, deployed prompt files, and sample source files.
The harness checks return-code parity for each command. Detailed stdout/stderr byte counts are kept in the JSON samples, but this is not an output-parity test.

Max allowed Go/Python median ratio: `5.00`

| Benchmark | Command | Fixture | Python median | Go median | Go/Python | Result | Return codes |
|---|---|---|---:|---:|---:|---|---|
| startup help | `--help` | none | 0.6356s | 0.0084s | 0.01x | 76.02x faster | {'python': [0], 'go': [0]} |
| startup version | `--version` | none | 0.5809s | 0.0069s | 0.01x | 84.36x faster | {'python': [0], 'go': [0]} |
| init scaffold | `init --yes` | empty-project | 0.5745s | 0.0067s | 0.01x | 85.96x faster | {'python': [0], 'go': [0]} |
| targets json | `targets --json` | installed-project | 0.5276s | 0.0093s | 0.02x | 56.53x faster | {'python': [0], 'go': [0]} |
| script list | `list` | installed-project | 0.5148s | 0.0133s | 0.03x | 38.74x faster | {'python': [0], 'go': [0]} |
| deps list | `deps list` | installed-project | 0.5947s | 0.0078s | 0.01x | 76.47x faster | {'python': [0], 'go': [0]} |
| deps tree | `deps tree` | installed-project | 0.6075s | 0.0185s | 0.03x | 32.86x faster | {'python': [0], 'go': [0]} |
| install dry-run | `install --dry-run --no-policy` | installed-project | 0.6683s | 0.0142s | 0.02x | 46.90x faster | {'python': [0], 'go': [0]} |
| compile dry-run | `compile --dry-run --all --local-only` | installed-project | 0.6154s | 0.0079s | 0.01x | 77.73x faster | {'python': [0], 'go': [0]} |
| pack dry-run | `pack --dry-run --offline --marketplace none` | installed-project | 0.5261s | 0.0080s | 0.02x | 65.45x faster | {'python': [0], 'go': [0]} |
| audit file scan | `audit --file .apm/instructions/bench-00.instructions.md` | installed-project | 0.6463s | 0.0192s | 0.03x | 33.67x faster | {'python': [0], 'go': [0]} |

Scenario Evidence

#	Scenario (user promise)	Principle(s)	Test(s) proving it	Type
1	Maintainers can compare Python and Go CLI latency on fixture-backed APM project commands, not only help/version paths.	DevX, Governed by policy	`scripts/ci/migration_cli_benchmark.py --repeats 5`	e2e
2	Benchmark artifacts explain which fixture and workload each timing row represents.	DevX, OSS / community-driven	`tmp/migration-cli-benchmark.md` generated by `scripts/ci/migration_cli_benchmark.py`	e2e
3	README benchmark guidance matches the evidence uploaded by the workflow.	OSS / community-driven	README diff plus generated benchmark artifact	docs

How to test

Run go build -o ./dist/apm-go ./cmd/apm and expect the Go binary to build successfully.
Run the five-repeat benchmark command from the Validation section and expect all Python/Go return-code sets to match.
Open tmp/migration-cli-benchmark.md and expect each row to include Benchmark, Command, Fixture, and Workloads context.
Confirm README.md describes fixture-backed benchmark coverage rather than startup/help-only smoke coverage.

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

github-actions · 2026-06-04T15:24:13Z

Migration Benchmark Results

Commit: e407e2aab740566403696882e12b33c9ed6bb262
Run: https://github.com/githubnext/apm/actions/runs/26961165541

Migration CLI Benchmark

Includes startup baselines plus fixture-backed real-world commands. The installed-project fixture contains apm.yml, apm.lock.yaml, apm_modules packages, local .apm primitives, target directories, deployed prompt files, and sample source files.
The harness checks return-code parity for each command. Detailed stdout/stderr byte counts are kept in the JSON samples, but this is not an output-parity test.

Max allowed Go/Python median ratio: 5.00

Benchmark	Command	Fixture	Python median	Go median	Go/Python	Result	Return codes
startup help	`--help`	none	0.4419s	0.0012s	0.00x	376.47x faster	{'python': [0], 'go': [0]}
startup version	`--version`	none	0.4410s	0.0012s	0.00x	371.55x faster	{'python': [0], 'go': [0]}
init scaffold	`init --yes`	empty-project	0.4466s	0.0013s	0.00x	339.83x faster	{'python': [0], 'go': [0]}
targets json	`targets --json`	installed-project	0.4351s	0.0012s	0.00x	350.54x faster	{'python': [0], 'go': [0]}
script list	`list`	installed-project	0.4310s	0.0013s	0.00x	339.01x faster	{'python': [0], 'go': [0]}
deps list	`deps list`	installed-project	0.4388s	0.0013s	0.00x	328.39x faster	{'python': [0], 'go': [0]}
deps tree	`deps tree`	installed-project	0.4350s	0.0013s	0.00x	332.90x faster	{'python': [0], 'go': [0]}
install dry-run	`install --dry-run --no-policy`	installed-project	0.4381s	0.0012s	0.00x	351.83x faster	{'python': [0], 'go': [0]}
compile dry-run	`compile --dry-run --all --local-only`	installed-project	0.5402s	0.0013s	0.00x	427.15x faster	{'python': [0], 'go': [0]}
pack dry-run	`pack --dry-run --offline --marketplace none`	installed-project	0.4381s	0.0012s	0.00x	353.87x faster	{'python': [0], 'go': [0]}
audit file scan	`audit --file .apm/instructions/bench-00.instructions.md`	installed-project	0.4261s	0.0012s	0.00x	344.92x faster	{'python': [0], 'go': [0]}

Workloads

startup help: Cold CLI startup and top-level help rendering.
startup version: Cold CLI startup and version rendering.
init scaffold: Creates a new apm.yml in an otherwise empty project directory.
targets json: Reads configured project targets from apm.yml and emits machine output.
script list: Reads apm.yml scripts and renders the runnable script inventory.
deps list: Scans apm_modules package directories and apm.lock.yaml metadata.
deps tree: Builds a dependency tree from apm.lock.yaml and installed package metadata.
install dry-run: Builds an offline install preview from manifest dependencies.
compile dry-run: Discovers local primitives and plans compilation for all targets without writes.
pack dry-run: Resolves local package contents and bundle metadata without writing artifacts.
audit file scan: Scans a real prompt instruction file for hidden Unicode content.

github-actions · 2026-06-04T16:49:59Z

Migration Benchmark Results

Commit: 33eab7078ba36cfb26e671f52e3bbd36dffe35f4
Run: https://github.com/githubnext/apm/actions/runs/26965904805

Migration CLI Benchmark

Includes fixture-backed commands that must read, write, execute, or fail against real project state. The installed-project fixture contains apm.yml, apm.lock.yaml, apm_modules packages, local .apm primitives, target directories, deployed prompt files, and sample source files.
The harness checks return-code parity for each command. Detailed stdout/stderr byte counts are kept in the JSON samples, but this is not an output-parity test.

Max allowed Go/Python median ratio: 5.00

Benchmark	Command	Fixture	Python median	Go median	Go/Python	Result	Return codes
init scaffold	`init --yes`	empty-project	0.4463s	0.0013s	0.00x	332.55x faster	{'python': [0], 'go': [0]}
targets json	`targets --json`	installed-project	0.4359s	0.0013s	0.00x	323.48x faster	{'python': [0], 'go': [0]}
script list	`list`	installed-project	0.4429s	0.0014s	0.00x	317.22x faster	{'python': [0], 'go': [0]}
deps list	`deps list`	installed-project	0.4471s	0.0014s	0.00x	326.95x faster	{'python': [0], 'go': [0]}
deps tree	`deps tree`	installed-project	0.4412s	0.0014s	0.00x	312.35x faster	{'python': [0], 'go': [0]}
install local package	`install --no-policy ./packages/local-tools`	local-install-project	0.4847s	0.0013s	0.00x	363.70x faster	{'python': [0], 'go': [0]}
compile copilot target	`compile --target copilot`	compilation-project	0.4602s	0.0013s	0.00x	357.34x faster	{'python': [0], 'go': [0]}
pack output	`pack --output dist`	installed-project	0.4586s	0.0013s	0.00x	340.21x faster	{'python': [0], 'go': [0]}
run script	`run stamp`	runnable-project	0.4433s	0.0013s	0.00x	346.63x faster	{'python': [0], 'go': [0]}
audit hidden unicode	`audit --ci`	audit-finding-project	0.4589s	0.0014s	0.00x	319.96x faster	{'python': [1], 'go': [0]}

Workloads

init scaffold: Creates a new apm.yml in an otherwise empty project directory.
targets json: Reads configured project targets from apm.yml and emits machine output.
script list: Reads apm.yml scripts and renders the runnable script inventory.
deps list: Scans apm_modules package directories and apm.lock.yaml metadata.
deps tree: Builds a dependency tree from apm.lock.yaml and installed package metadata.
install local package: Installs a local package and materializes lock/module state.
compile copilot target: Discovers local primitives and writes the Copilot target artifact.
pack output: Resolves local package contents and writes a distributable artifact.
run script: Executes a project script and writes the script's side-effect file.
audit hidden unicode: Scans a real installed file and fails on planted hidden Unicode.

github-actions · 2026-06-04T17:16:58Z

Migration Benchmark Results

Commit: 1e169c224f5e98be30c2dc054252b2647b6c3c01
Run: https://github.com/githubnext/apm/actions/runs/26967308013

Migration CLI Benchmark

Includes fixture-backed commands that must read, write, execute, or fail against real project state. The installed-project fixture contains apm.yml, apm.lock.yaml, apm_modules packages, local .apm primitives, target directories, deployed prompt files, and sample source files.
The harness checks return-code parity for each command. Detailed stdout/stderr byte counts are kept in the JSON samples, but this is not an output-parity test.

Max allowed Go/Python median ratio: 5.00

Benchmark	Command	Fixture	Python median	Go median	Go/Python	Result	Return codes
init scaffold	`init --yes`	empty-project	0.4349s	0.0013s	0.00x	324.25x faster	{'python': [0], 'go': [0]}
targets json	`targets --json`	installed-project	0.4300s	0.0013s	0.00x	321.49x faster	{'python': [0], 'go': [0]}
script list	`list`	installed-project	0.4289s	0.0013s	0.00x	341.78x faster	{'python': [0], 'go': [0]}
deps list	`deps list`	installed-project	0.4429s	0.0013s	0.00x	339.95x faster	{'python': [0], 'go': [0]}
deps tree	`deps tree`	installed-project	0.4389s	0.0013s	0.00x	344.16x faster	{'python': [0], 'go': [0]}
install local package	`install --no-policy ./packages/local-tools`	local-install-project	0.4860s	0.0013s	0.00x	382.24x faster	{'python': [0], 'go': [0]}
compile copilot target	`compile --target copilot`	compilation-project	0.4698s	0.0013s	0.00x	374.77x faster	{'python': [0], 'go': [0]}
pack output	`pack --output dist`	installed-project	0.4693s	0.0013s	0.00x	364.63x faster	{'python': [0], 'go': [0]}
run script	`run stamp`	runnable-project	0.4414s	0.0012s	0.00x	370.00x faster	{'python': [0], 'go': [0]}
audit hidden unicode	`audit --ci`	audit-finding-project	0.4628s	0.0014s	0.00x	323.40x faster	{'python': [1], 'go': [0]}

Workloads

init scaffold: Creates a new apm.yml in an otherwise empty project directory.
targets json: Reads configured project targets from apm.yml and emits machine output.
script list: Reads apm.yml scripts and renders the runnable script inventory.
deps list: Scans apm_modules package directories and apm.lock.yaml metadata.
deps tree: Builds a dependency tree from apm.lock.yaml and installed package metadata.
install local package: Installs a local package and materializes lock/module state.
compile copilot target: Discovers local primitives and writes the Copilot target artifact.
pack output: Resolves local package contents and writes a distributable artifact.
run script: Executes a project script and writes the script's side-effect file.
audit hidden unicode: Scans a real installed file and fails on planted hidden Unicode.

…ation-benchmarks # Conflicts: # tests/parity/python_contract_coverage.yml

github-actions · 2026-06-04T17:34:02Z

Migration Benchmark Results

Commit: 9d820a55889d52c6b8aa6c9ee05a9d7744d1e52b
Run: https://github.com/githubnext/apm/actions/runs/26968224171

Migration CLI Benchmark

Includes fixture-backed commands that must read, write, execute, or fail against real project state. The installed-project fixture contains apm.yml, apm.lock.yaml, apm_modules packages, local .apm primitives, target directories, deployed prompt files, and sample source files.
The harness checks return-code parity for each command. Detailed stdout/stderr byte counts are kept in the JSON samples, but this is not an output-parity test.

Max allowed Go/Python median ratio: 5.00

Benchmark	Command	Fixture	Python median	Go median	Go/Python	Result	Return codes
init scaffold	`init --yes`	empty-project	0.4505s	0.0013s	0.00x	356.06x faster	{'python': [0], 'go': [0]}
targets json	`targets --json`	installed-project	0.4505s	0.0013s	0.00x	350.05x faster	{'python': [0], 'go': [0]}
script list	`list`	installed-project	0.4559s	0.0013s	0.00x	356.10x faster	{'python': [0], 'go': [0]}
deps list	`deps list`	installed-project	0.4576s	0.0013s	0.00x	357.34x faster	{'python': [0], 'go': [0]}
deps tree	`deps tree`	installed-project	0.4496s	0.0013s	0.00x	339.46x faster	{'python': [0], 'go': [0]}
install local package	`install --no-policy ./packages/local-tools`	local-install-project	0.5103s	0.0013s	0.00x	394.21x faster	{'python': [0], 'go': [0]}
compile copilot target	`compile --target copilot`	compilation-project	0.4810s	0.0013s	0.00x	378.42x faster	{'python': [0], 'go': [0]}
pack output	`pack --output dist`	installed-project	0.4754s	0.0013s	0.00x	359.65x faster	{'python': [0], 'go': [0]}
run script	`run stamp`	runnable-project	0.4608s	0.0013s	0.00x	367.42x faster	{'python': [0], 'go': [0]}
audit hidden unicode	`audit --ci`	audit-finding-project	0.4722s	0.0014s	0.00x	344.49x faster	{'python': [1], 'go': [0]}

Workloads

init scaffold: Creates a new apm.yml in an otherwise empty project directory.
targets json: Reads configured project targets from apm.yml and emits machine output.
script list: Reads apm.yml scripts and renders the runnable script inventory.
deps list: Scans apm_modules package directories and apm.lock.yaml metadata.
deps tree: Builds a dependency tree from apm.lock.yaml and installed package metadata.
install local package: Installs a local package and materializes lock/module state.
compile copilot target: Discovers local primitives and writes the Copilot target artifact.
pack output: Resolves local package contents and writes a distributable artifact.
run script: Executes a project script and writes the script's side-effect file.
audit hidden unicode: Scans a real installed file and fails on planted hidden Unicode.

ci: benchmark real-world migration workloads

e407e2a

ci: require real migration command evidence

33eab70

mrjf added 3 commits June 4, 2026 10:09

ci: collect migration evidence before enforcing completion

1e169c2

test: map migration gate coverage contracts

83a2338

ci: retrigger migration checks

7f292fa

mrjf added 2 commits June 4, 2026 10:25

ci: make manual migration completion opt in

5c0e230

Merge remote-tracking branch 'origin/main' into codex/real-world-migr…

9d820a5

…ation-benchmarks # Conflicts: # tests/parity/python_contract_coverage.yml

mrjf merged commit d6ab81b into main Jun 4, 2026
6 checks passed

mrjf deleted the codex/real-world-migration-benchmarks branch June 4, 2026 17:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add(ci): benchmark real-world migration workloads#107

add(ci): benchmark real-world migration workloads#107
mrjf merged 7 commits into
mainfrom
codex/real-world-migration-benchmarks

mrjf commented Jun 4, 2026

Uh oh!

github-actions Bot commented Jun 4, 2026

Uh oh!

github-actions Bot commented Jun 4, 2026

Uh oh!

github-actions Bot commented Jun 4, 2026

Uh oh!

github-actions Bot commented Jun 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mrjf commented Jun 4, 2026

add(ci): benchmark real-world migration workloads

TL;DR

Problem (WHY)

Approach (WHAT)

Implementation (HOW)

Diagrams

Trade-offs

Benefits

Validation

Scenario Evidence

How to test

Uh oh!

github-actions Bot commented Jun 4, 2026

Migration Benchmark Results

Migration CLI Benchmark

Workloads

Uh oh!

github-actions Bot commented Jun 4, 2026

Migration Benchmark Results

Migration CLI Benchmark

Workloads

Uh oh!

github-actions Bot commented Jun 4, 2026

Migration Benchmark Results

Migration CLI Benchmark

Workloads

Uh oh!

github-actions Bot commented Jun 4, 2026

Migration Benchmark Results

Migration CLI Benchmark

Workloads

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant