fix: daily ET guardrail step never fails activation job; improve conclusion reporting#36497
Conversation
…prove conclusion reporting
- Wrap all GitHub API calls in check_daily_effective_workflow_guardrail.cjs in a
top-level try-catch so transient API errors never fail the activation job step.
On error the step logs a warning and exits cleanly with the default
daily_effective_workflow_exceeded=false output, letting the agent run rather than
producing a confusing activation failure.
- Enhance daily_effective_workflow_exceeded.md with progressive disclosure:
three <details> sections covering how to raise the daily limit, what the
guardrail is, and how to disable it entirely.
- Add {daily_effective_workflow_exceeded_context} placeholder to
agent_failure_comment.md so repeat-failure comments include the ET context
(previously the context was only present in new-issue bodies).
- Update check_daily_effective_workflow_guardrail.test.cjs with a test asserting
main() resolves (does not throw) when GitHub API calls fail.
- Update handle_agent_failure_daily_effective_workflow.test.cjs to assert the
three progressive-disclosure sections are present in the rendered template.
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR hardens the daily per-workflow effective-token (ET) guardrail step so transient GitHub API failures no longer fail the activation job, and improves the user-facing messaging when the guardrail is exceeded (including ensuring repeat-failure comments include the same guidance).
Changes:
- Wrap the guardrail’s GitHub API interactions in a top-level
try/catchso the activation job step exits cleanly on API errors and preserves the safe default (daily_effective_workflow_exceeded=false). - Expand the “daily ET guardrail exceeded” markdown into actionable, progressive-disclosure guidance with concrete frontmatter and CLI commands.
- Add coverage for the new markdown structure and for the guardrail’s error-recovery behavior; include the guardrail context placeholder in repeat-failure comments.
Show a summary per file
| File | Description |
|---|---|
| actions/setup/js/check_daily_effective_workflow_guardrail.cjs | Adds top-level error recovery around GitHub API calls and documents the bypass-on-error contract. |
| actions/setup/md/daily_effective_workflow_exceeded.md | Replaces the single-paragraph message with <details> sections explaining raise/disable/what-it-is. |
| actions/setup/md/agent_failure_comment.md | Adds {daily_effective_workflow_exceeded_context} so repeat-failure comments include ET guidance. |
| actions/setup/js/handle_agent_failure_daily_effective_workflow.test.cjs | Updates assertions to cover the new progressive-disclosure headings/content. |
| actions/setup/js/check_daily_effective_workflow_guardrail.test.cjs | Adds a test ensuring main() resolves (doesn’t throw) when GitHub API calls fail. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 5/5 changed files
- Comments generated: 1
| const workflowID = process.env.GH_AW_WORKFLOW_ID || ""; | ||
| const workflowName = process.env.GH_AW_WORKFLOW_NAME || workflowID || "workflow"; | ||
| const runUrl = process.env.GH_AW_RUN_URL || currentRun.data.html_url || ""; | ||
| const actorLogin = process.env.GITHUB_TRIGGERING_ACTOR || currentRun.data.triggering_actor?.login || currentRun.data.actor?.login || process.env.GITHUB_ACTOR || ""; |
|
✅ PR Code Quality Reviewer completed the code quality review. |
|
🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅ |
|
🧪 Test Quality Sentinel completed test quality analysis. |
|
✅ Design Decision Gate 🏗️ completed the design decision gate check. No ADR enforcement needed: PR #36497 does not have the 'implementation' label (has_implementation_label=false) and has 0 new lines of code in business logic directories (requires_adr_by_default_volume=false, threshold=100). Neither enforcement condition is met. |
🧪 Test Quality Sentinel Report✅ Test Quality Score: 91/100 — Excellent
📊 Metrics & Test Classification (10 tests analyzed)
Test Classification Details
Language SupportTests analyzed:
Verdict
📖 Understanding Test ClassificationsDesign Tests (High Value) verify what the system does:
Implementation Tests (Low Value) verify how the system does it:
Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators.
|
There was a problem hiding this comment.
✅ Test Quality Sentinel: 91/100. Test quality is excellent — 0% of new tests are implementation tests (threshold: 30%). All 10 new tests verify behavioral contracts, including the critical error-resilience contract for main() that directly corresponds to the fix in this PR.
There was a problem hiding this comment.
Skills-Based Review 🧠
Applied /diagnose and /tdd — approving with minor suggestions on test structure and observability.
📋 Key Themes & Highlights
Key Themes
- Test assertions in
tryblock: The new regression test puts itsexpect()calls inside thetry, so if the first assertion throws, subsequent ones are skipped silently. Minor structural fix recommended. - Missing "exceeded=true" regression test: The big refactor (wrapping all of
main()in a single outertry/catch) isn't validated from the happy path — a future logic bug could silently preventdaily_effective_workflow_exceededfrom being set to"true"without a test catching it. - Permanent vs. transient bypass: The outer
catchcorrectly swallows all errors by design, but persistent configuration errors (bad token, missing permission) will bypass the guardrail on every run with only a job-log warning as signal.
Positive Highlights
- ✅ Root cause correctly identified: absence of outer try/catch was the real issue, not any individual API call
- ✅ The safe-bypass contract (fail →
false, not fail → job failure) is clearly documented in both the JSDoc and inline comments - ✅ The
agent_failure_comment.mdone-liner fix is exactly right — no overengineering - ✅
daily_effective_workflow_exceeded.mdupgrade is excellent: progressive disclosure, concrete YAML snippets, caution callout for the disable path - ✅ Template regression tests updated in lockstep with the template changes
🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · sonnet46 1.7M
|
|
||
| try { | ||
| // Should resolve without throwing even though the API calls throw | ||
| await expect(exports.main()).resolves.toBeUndefined(); |
There was a problem hiding this comment.
[/tdd] Assertions inside the try block mean that if the first expect throws, later assertions are silently skipped, reducing test confidence.
💡 Suggested refactor
Move the await expect(...) call outside the try/finally (or keep the try/finally only for cleanup), so Jest sees all assertion failures independently:
let threw = false;
try {
await exports.main();
} catch (err) {
threw = true;
} finally {
delete global.core;
delete global.github;
delete global.context;
delete process.env.GH_AW_MAX_DAILY_EFFECTIVE_TOKENS;
delete process.env.GH_AW_GITHUB_TOKEN;
}
expect(threw).toBe(false);
expect(coreOutputs["daily_effective_workflow_exceeded"]).toBe("false");
expect(coreWarnings.some(w => /unexpected error.*skipped/i.test(w))).toBe(true);This way all three assertions are always evaluated regardless of which one fails.
| delete process.env.GH_AW_GITHUB_TOKEN; | ||
| } | ||
| }); | ||
| }); |
There was a problem hiding this comment.
[/tdd] The refactor wraps the entire success path in the outer try/catch too, but there's no regression test verifying that a normal execution where the threshold is exceeded still sets daily_effective_workflow_exceeded to "true". A logic bug inside the try block (e.g., an early return or swallowed error) could silently prevent the output from being set without any test catching it.
💡 What to add
Add a companion test that mocks a successful API response returning runs whose total ET exceeds the threshold, and asserts:
it("main() sets daily_effective_workflow_exceeded=true when threshold is exceeded", async () => {
// ... mock APIs returning one run with runEffectiveTokens > threshold ...
await exports.main();
expect(coreOutputs["daily_effective_workflow_exceeded"]).toBe("true");
});This guards against the outer catch accidentally consuming an exception that would have set the output.
| core.info(`Daily workflow ET guardrail not exceeded (${totalEffectiveTokens}/${threshold}).`); | ||
| return; | ||
| core.warning(`Daily workflow ET guardrail exceeded for ${workflowName}: ${totalEffectiveTokens}/${threshold}.`); | ||
| } catch (error) { |
There was a problem hiding this comment.
[/diagnose] The outer catch treats all errors — including permanent ones like missing permissions or misconfigured tokens — the same as transient network blips. A persistent configuration error will silently bypass the guardrail on every run, with only a core.warning in the job log as the signal.
💡 Suggestion
The current design is explicitly documented as intentional (safe bypass > broken activation), so this is not a blocker. However, consider adding a core.error() call alongside the warning for non-retryable error classes, or emitting a core.summary line so the bypass is visible in the Actions run summary rather than only in the raw log:
} catch (error) {
core.warning(`Daily workflow ET guardrail encountered an unexpected error and will be skipped: ${getErrorMessage(error)}`);
// Make the bypass visible in the run summary so operators notice persistent failures
await core.summary.addRaw(`> [!WARNING]\n> ET guardrail skipped due to error: ${getErrorMessage(error)}`).write().catch(() => {});
}This preserves the safe-bypass contract while making a stuck guardrail discoverable without digging into logs.
There was a problem hiding this comment.
The design is sound — wrapping GitHub API calls in a try-catch so that transient failures never block the activation job is the right call, and the comments explaining the "safe bypass" intent are clear. The inline comments flag the most actionable issues; no blocking correctness bugs found.
### Review themes
Test mock completeness (medium) — mockCore is missing error/debug/notice stubs. A missing stub causes a TypeError that gets silently caught by the very try-catch under test, producing a warning that matches the regex and making the test pass vacuously. See inline comment on line 133.
Global state cleanup (medium) — The finally block uses delete rather than save-and-restore, which is fragile if beforeEach ever starts populating these globals. See inline comment on line 172.
Observability gap (low) — daily_effective_workflow_exceeded=false is ambiguous: it could mean "under budget" or "evaluation errored and was skipped". A dedicated daily_effective_workflow_guardrail_error output would make cost audits and dashboards significantly more reliable. See inline comment on catch block.
Coverage gap (low) — There is no test for the case where an exception is thrown after core.setOutput("daily_effective_workflow_exceeded", "true") has already been set (i.e., appendDailyEffectiveWorkflowSummary throws on the exceeded path). The behavior is actually correct — exceeded stays "true" and the warning fires — but this scenario is worth an explicit test since it represents the most operationally significant partial-state.
🔎 Code quality review by PR Code Quality Reviewer · sonnet46 327.1K
| // The step should catch the error and NOT rethrow it, keeping daily_effective_workflow_exceeded at "false". | ||
| const coreOutputs = {}; | ||
| const coreWarnings = []; | ||
| const mockCore = { |
There was a problem hiding this comment.
Incomplete mockCore stub may cause vacuously-passing tests. mockCore is missing error, debug, and notice methods. If any production code path inside the try block calls one of them, the TypeError: core.xxx is not a function will be caught by the outer catch and produce a warning like "Daily workflow ET guardrail encountered an unexpected error and will be skipped: core.error is not a function" — which matches the regex /unexpected error.*skipped/i. The test then passes while hiding the actual regression.
💡 Suggested fix
Add no-op stubs for the missing methods:
const mockCore = {
setOutput: (key, value) => { coreOutputs[key] = value; },
info: () => {},
warning: msg => coreWarnings.push(msg),
error: () => {}, // prevent silent swallow of TypeError
debug: () => {},
notice: () => {},
};This ensures that any accidental call to these methods causes a type-safe no-op rather than a silently-caught TypeError that fools the regex assertion.
| // A warning must be emitted describing the error | ||
| expect(coreWarnings.some(w => /unexpected error.*skipped/i.test(w))).toBe(true); | ||
| } finally { | ||
| delete global.core; |
There was a problem hiding this comment.
Global state deleted rather than restored — fragile isolation. delete global.core permanently removes the property rather than restoring whatever was there before. If a future beforeEach or test-helper ever pre-populates these globals, this finally block will silently trash them for subsequent tests, causing confusing cross-test contamination.
💡 Suggested fix
Save and restore:
const savedCore = global.core;
const savedGithub = global.github;
const savedContext = global.context;
global.core = mockCore;
global.github = mockGithub;
global.context = mockContext;
try {
// assertions...
} finally {
if (savedCore === undefined) delete global.core; else global.core = savedCore;
if (savedGithub === undefined) delete global.github; else global.github = savedGithub;
if (savedContext === undefined) delete global.context; else global.context = savedContext;
delete process.env.GH_AW_MAX_DAILY_EFFECTIVE_TOKENS;
delete process.env.GH_AW_GITHUB_TOKEN;
}Alternatively, use afterEach for cleanup so Vitest handles the lifecycle correctly.
| // Treat any unexpected error as a non-blocking skip so the step never fails the | ||
| // activation job. The output stays at the default "false", allowing the agent to | ||
| // run. The guardrail is effectively bypassed for this invocation. | ||
| core.warning(`Daily workflow ET guardrail encountered an unexpected error and will be skipped: ${getErrorMessage(error)}`); |
There was a problem hiding this comment.
Silent bypass is indistinguishable from a clean pass in workflow outputs. When the catch fires, daily_effective_workflow_exceeded stays at "false" — the same value it has when the guardrail genuinely evaluated to "not exceeded". Any downstream observability (dashboards, audits) that reads this output cannot tell whether the guardrail passed or failed silently. The only signal is a transient ::warning:: in the step log that is easily missed.
💡 Suggested improvement
Consider adding a dedicated output that makes the skip explicit:
} catch (error) {
core.warning(`Daily workflow ET guardrail encountered an unexpected error and will be skipped: ${getErrorMessage(error)}`);
core.setOutput("daily_effective_workflow_guardrail_error", "true"); // new
}Downstream steps and dashboards can then distinguish three states:
exceeded=true→ guardrail fired, block the agentexceeded=false, guardrail_error=false→ evaluated cleanly, under budgetexceeded=false, guardrail_error=true→ evaluation failed, agent allowed as safe fallback
This is especially important for post-incident cost audits where "did the guardrail skip silently?" is a key question.
|
@copilot review comments and review comments. apply copilot-review skill |
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Fixed in |
The
check_daily_effective_workflow_guardrailstep had no top-level error handling, so any transient GitHub API failure would throw an unhandled exception, fail the activation job, and leavedaily_effective_workflow_exceeded=false— letting the agent run against a broken guardrail. Separately,agent_failure_comment.mdwas missing{daily_effective_workflow_exceeded_context}, so repeat-failure comments had no ET guidance, and the exceeded template lacked actionable help.Changes
check_daily_effective_workflow_guardrail.cjs— wrap all GitHub API calls after the initial guard checks in atry/catch; on error, emit a warning and return withdaily_effective_workflow_exceeded=falseso the step always exits cleanly. Also documents the bypass-on-error contract in the JSDoc.daily_effective_workflow_exceeded.md— replace the single-paragraph template with three<details>progressive-disclosure sections: How to raise the daily limit, What the guardrail is, and How to disable it, each with concrete frontmatter snippets and CLI commands.agent_failure_comment.md— add the missing{daily_effective_workflow_exceeded_context}placeholder after{stale_lock_file_failed_context}so repeat-failure comments include ET guidance, not just newly created issue bodies.Tests — add a
main()error-recovery test asserting the step resolves without throwing when API calls fail; update template assertions to cover the three new<details>headings.