fuzzing: Improve testcase isolation by draining the IPC staging queue#10839
Open
tmleman wants to merge 2 commits into
Open
fuzzing: Improve testcase isolation by draining the IPC staging queue#10839tmleman wants to merge 2 commits into
tmleman wants to merge 2 commits into
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR improves libFuzzer testcase isolation for the Zephyr POSIX simulator fuzz harness by ensuring staged IPC input is drained between libFuzzer calls and by explicitly aborting leftover staged state when the simulator tick budget is exhausted, reducing non-reproducible crashes due to cross-testcase state leakage.
Changes:
- Added IPC-layer helpers to reset/observe/abort staged fuzz input state between testcases.
- Updated
LLVMFuzzerTestOneInput()to run the simulator in small time quanta, exiting early once staged input is drained, and aborting pending state when the time budget is exhausted.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
src/platform/posix/ipc.c |
Adds testcase-isolation helper APIs that reset/observe/abort the IPC staging buffer state used by the fuzz interrupt handler. |
src/platform/posix/fuzz.c |
Implements a bounded “drain-or-abort” loop around nsi_exec_for() and calls the new IPC-layer helpers to isolate testcases. |
Comment on lines
+38
to
+58
| /* | ||
| * Testcase-isolation helpers used by the libFuzzer entry point in | ||
| * fuzz.c. They keep ownership of the cross-call state in one module | ||
| * so a new testcase never observes leftovers from a previous one that | ||
| * failed to drain inside the simulator tick budget. | ||
| */ | ||
| void posix_fuzz_case_begin(void) | ||
| { | ||
| fuzz_in_sz = 0; | ||
| } | ||
|
|
||
| bool posix_fuzz_case_pending(void) | ||
| { | ||
| return posix_fuzz_sz != 0 || fuzz_in_sz != 0; | ||
| } | ||
|
|
||
| void posix_fuzz_case_abort(void) | ||
| { | ||
| posix_fuzz_sz = 0; | ||
| fuzz_in_sz = 0; | ||
| } |
The libFuzzer entry point in fuzz.c stages each testcase by writing
posix_fuzz_buf/sz and raising the fuzz IRQ; fuzz_isr() then drains
those bytes into the static fuzz_in[] queue and feeds them into the
IPC layer one message at a time. Two pieces of state therefore
survive across LLVMFuzzerTestOneInput() calls:
* `posix_fuzz_sz` - the raw input length still to consume,
* `fuzz_in[] / _sz` - the per-call staging queue.
The fuzzer harness has no way to inspect either of them today, which
makes it impossible to tell whether a previous testcase fully
drained before the next one begins. That is the root cause of the
"not reproducible" crashes documented in
FUZZER_ISOLATION_RESEARCH.md.
Introduce three small helpers, kept in the module that owns the
state, with no callers yet:
posix_fuzz_case_begin() - drop the staging queue at the start of
a new testcase,
posix_fuzz_case_pending() - true while either buffer still has
bytes to deliver,
posix_fuzz_case_abort() - wipe both buffers (used when a case
exceeds the simulator tick budget).
A follow-up commit wires these into LLVMFuzzerTestOneInput(). This
commit is a pure code-addition refactor: no callers, no behaviour
change, the build still emits the same object code for the existing
entry points.
Signed-off-by: Tomasz Leman <tomasz.m.leman@intel.com>
The libFuzzer harness used to stage the testcase bytes, raise the
fuzz IRQ, and then unconditionally run the native_sim scheduler for
CONFIG_ZEPHYR_POSIX_FUZZ_TICKS ticks before returning. That has two
problems for reproducibility:
* If the OS finishes draining the IPC much faster than the tick
budget (the common case), we still burn the full budget, which
slows exec/s without buying any coverage.
* If the OS does NOT finish within the budget (deep handlers, long
pipeline walks, large payloads), the staged input buffer plus
the per-call fuzz_in[] queue carry over into the next testcase.
That leaks state across cases and is the root cause of crashes
that disappear when replayed individually.
Split the budget into POSIX_FUZZ_DRAIN_QUANTA (=8) quanta and after
each one ask the IPC layer whether anything is still pending; return
as soon as the queue is empty, otherwise run the abort hook to wipe
both the raw fuzz buffer and the staged IPC payload before the next
call. Together with the hooks added in the previous commit this
guarantees that LLVMFuzzerTestOneInput observes a clean staging
state on entry regardless of what the previous case did.
No protocol or coverage change is intended; the goal is reproducible
crashes and slightly higher throughput on short inputs.
Signed-off-by: Tomasz Leman <tomasz.m.leman@intel.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Improve fuzz testcase isolation by draining the IPC staging queue between libFuzzer calls and aborting stale state when the tick budget is exhausted, fixing non-reproducible crashes caused by inter-testcase state leakage.