add(ci): benchmark real-world migration workloads#107
Conversation
Migration Benchmark Results
Migration CLI BenchmarkIncludes startup baselines plus fixture-backed real-world commands. The installed-project fixture contains apm.yml, apm.lock.yaml, apm_modules packages, local .apm primitives, target directories, deployed prompt files, and sample source files. Max allowed Go/Python median ratio:
Workloads
|
Migration Benchmark Results
Migration CLI BenchmarkIncludes fixture-backed commands that must read, write, execute, or fail against real project state. The installed-project fixture contains apm.yml, apm.lock.yaml, apm_modules packages, local .apm primitives, target directories, deployed prompt files, and sample source files. Max allowed Go/Python median ratio:
Workloads
|
Migration Benchmark Results
Migration CLI BenchmarkIncludes fixture-backed commands that must read, write, execute, or fail against real project state. The installed-project fixture contains apm.yml, apm.lock.yaml, apm_modules packages, local .apm primitives, target directories, deployed prompt files, and sample source files. Max allowed Go/Python median ratio:
Workloads
|
…ation-benchmarks # Conflicts: # tests/parity/python_contract_coverage.yml
Migration Benchmark Results
Migration CLI BenchmarkIncludes fixture-backed commands that must read, write, execute, or fail against real project state. The installed-project fixture contains apm.yml, apm.lock.yaml, apm_modules packages, local .apm primitives, target directories, deployed prompt files, and sample source files. Max allowed Go/Python median ratio:
Workloads
|
add(ci): benchmark real-world migration workloads
TL;DR
This replaces the migration benchmark’s help-heavy command set with startup baselines plus fixture-backed APM workloads. The new harness synthesizes realistic offline project state, runs both Python and Go against the same command matrix, and reports fixture/workload metadata alongside median timing and return-code parity.
Note
This remains intentionally bounded: it checks performance and return-code parity for offline fixtures, not full stdout parity or live network integration behavior.
Problem (WHY)
Why these matter: benchmark evidence should be grounded in repeatable execution, because “Grounding outputs in deterministic tool execution transforms probabilistic generation into verifiable action.” The fixture scope is deliberately bounded because “Context arrives just-in-time, not just-in-case.”
Approach (WHAT)
Implementation (HOW)
scripts/ci/migration_cli_benchmark.py— Adds aBenchmarkCommandmodel, fixture writers for empty and installed projects, and a broader command matrix covering startup,init,targets,list,deps,install --dry-run,compile --dry-run,pack --dry-run, andaudit --file. The report now includes fixture names, workload descriptions, and an explicit note that the benchmark is not stdout/stderr parity.README.md— Replaces the stale startup/help speed claim with bounded wording that names the fixture-backed project state now used by the workflow artifact.Diagrams
No diagram is included. The change is a linear two-file benchmark/report update, and I avoided shipping unvalidated Mermaid after the local
mmdcpackage startup did not complete in this environment.Trade-offs
tmp/locally or runner temp in CI; those artifacts are not committed.Benefits
apm.yml,apm.lock.yaml,apm_modules,.apmprimitives, target directories, deployed prompt files, and source files.Validation
python3 -m py_compile scripts/ci/migration_cli_benchmark.py:.venv/bin/ruff check scripts/ci/migration_cli_benchmark.py:go build -o ./dist/apm-go ./cmd/apm:git diff --check:Five-repeat migration benchmark output
Scenario Evidence
scripts/ci/migration_cli_benchmark.py --repeats 5tmp/migration-cli-benchmark.mdgenerated byscripts/ci/migration_cli_benchmark.pyHow to test
go build -o ./dist/apm-go ./cmd/apmand expect the Go binary to build successfully.tmp/migration-cli-benchmark.mdand expect each row to includeBenchmark,Command,Fixture, andWorkloadscontext.README.mddescribes fixture-backed benchmark coverage rather than startup/help-only smoke coverage.Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com