fix(workflows): validate run_id in RunState.load before touching the …#2813
Open
Quratulain-bilal wants to merge 1 commit into
Open
fix(workflows): validate run_id in RunState.load before touching the …#2813Quratulain-bilal wants to merge 1 commit into
Quratulain-bilal wants to merge 1 commit into
Conversation
…filesystem
``RunState.load(run_id, project_root)`` interpolates ``run_id`` directly
into ``project_root / ".specify" / "workflows" / "runs" / run_id`` and
then calls ``state_path.exists()`` and ``json.load`` on the result. The
run_id is reachable from user input via ``specify workflow resume
<run_id>`` (CLI argument) and via ``SPECKIT_WORKFLOW_RUN_ID`` (env var
override on the engine's run path), so a value like ``../escape``
turns ``runs_dir`` into ``.specify/workflows/escape/`` and:
* ``state_path.exists()`` becomes a file-existence oracle for any
path the process can read.
* if a ``state.json`` exists at the traversed location (planted by
a malicious dependency, a misconfigured shared workspace, or an
older spec-kit version that happened to write there),
``json.load`` parses it and the workflow resumes under the
attacker-chosen ``workflow_id`` / step state.
* a subsequent ``state.save()`` then writes back to the traversed
location, persisting the corruption.
``RunState.__init__`` already validates ``run_id`` against
``r'^[a-zA-Z0-9][a-zA-Z0-9_-]*$'`` — but that check runs on
``state_data["run_id"]`` *after* ``load`` has already done the file
lookup, which is too late to prevent the disclosure.
This change extracts the pattern into a class-level constant
``_RUN_ID_PATTERN`` and a single ``_validate_run_id`` classmethod so
``__init__`` and ``load`` cannot drift, then calls the validator at the
top of ``load`` before any path is built. Mirrors the precedent in
``src/specify_cli/agents.py::_ensure_within_directory`` (used at line
437 of that file) which guards extension-install paths against the
same threat model.
Regression tests parametrize 9 traversal vectors (``../escape``,
``..``, ``../../etc/passwd``, ``foo/bar``, ``foo\bar``, ``.hidden``,
``-flag``, ``foo\x00bar``, empty) and plant a malicious ``state.json``
outside ``runs/`` so a missing guard would surface as a successful
load rather than the ambiguous ``FileNotFoundError``. A second test
asserts ``__init__`` and ``load`` reject the same representative
malformed ID, so future changes to one path can't silently drift from
the other.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR hardens workflow run-state loading against path traversal by validating run_id before it is interpolated into the .specify/workflows/runs/<run_id>/... filesystem path, aligning behavior with the existing RunState.__init__ validation.
Changes:
- Centralized
run_idvalidation intoRunState._RUN_ID_PATTERN+RunState._validate_run_id()and reused it from both__init__andload. - Added early validation in
RunState.load()to prevent probing/reading files outside the intended runs directory. - Added tests covering path traversal and validation consistency.
Show a summary per file
| File | Description |
|---|---|
src/specify_cli/workflows/engine.py |
Adds a shared run-id validator and calls it before building the load path to prevent traversal. |
tests/test_workflows.py |
Adds parameterized tests for malicious run_id values and a consistency test for validation behavior. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 2/2 changed files
- Comments generated: 1
Comment on lines
+2476
to
+2491
| def test_init_and_load_share_validation(self): | ||
| """``__init__`` and ``load`` reject the same set of malformed IDs. | ||
|
|
||
| The two entry points must stay in sync — drift would let an ID | ||
| slip in via one path that the other would reject, producing | ||
| confusing crashes mid-workflow. | ||
| """ | ||
| from specify_cli.workflows.engine import RunState | ||
|
|
||
| # Sample a representative malformed ID — exhaustive coverage is | ||
| # in ``test_load_rejects_path_traversal``. | ||
| bad = "../escape" | ||
| with pytest.raises(ValueError, match="Invalid run_id"): | ||
| RunState(run_id=bad) | ||
| with pytest.raises(ValueError, match="Invalid run_id"): | ||
| RunState._validate_run_id(bad) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
RunState.load(run_id, project_root)interpolates the caller-suppliedrun_iddirectly into a filesystem path beforevalidating it:
RunState.__init__already validatesrun_idagainstr'^[a-zA-Z0-9][a-zA-Z0-9_-]*$', but only on the storedstate_data["run_id"]after the file lookup has already happened. Therun_idargument passed toloadis reachablefrom user input via the
specify workflow resume <run_id>CLI argument, so a value like../escapeletsstate_path.exists()probe arbitrary paths andjson.loadparse a state file from outside.specify/workflows/runs/.Reproducer
Impact
state_path.exists()reveals whether arbitrary paths the process can read exist.state.jsonplanted anywhere reachable fromruns_dir / <traversal>is parsed and theworkflow resumes under attacker-chosen
workflow_id/ step state.state.save()writes back to the traversed location, persisting the corruption acrossresume cycles.
Reachable from:
specify workflow resume <run_id>(CLI argument from__init__.py:4239)The change
r'^[a-zA-Z0-9][a-zA-Z0-9_-]*$'regex into a class-level_RUN_ID_PATTERNand a single_validate_run_idclassmethod so__init__andloadcannot drift._validate_run_id(run_id)at the top ofload— before the path is built. A malicious value raisesValueErrorcleanly, which the CLI already handles at__init__.py:4243to surface a friendly"Error: Invalid run_id ..."message.__init__refactored to call the same helper, so there is exactly one definition of "validrun_id" in the codebase.Precedent
src/specify_cli/agents.py:435-438(_ensure_within_directory) already guards extension-install paths against the samethreat model with
os.path.normpath+Path.is_relative_to. This change bringsRunStatein line with that precedent.Tests
tests/test_workflows.py::TestRunState:test_load_rejects_path_traversal— parametrized over 9 attack vectors:../escape,..,../../etc/passwd,foo/bar,foo\bar,.hidden,-flag,foo\x00bar, and"". Each test plants astate.jsonoutside the legitimateruns/directory so a missing guard would surface as a successful load rather than the ambiguousFileNotFoundError.test_init_and_load_share_validation— asserts__init__,load, and the helper all reject the same representativemalformed ID so future changes can't silently drift.
Notes for maintainers
because the threat model is local (requires either CLI argument control or env-var control on the target machine, neither
of which is a network-reachable attack surface) — but I'll defer to your preferred process.
run_id. The exact charset accepted before this PR is the exact charset acceptedafter.
SPECKIT_WORKFLOW_RUN_ID(env-var override) via__init__'s call to the new helper, so theenv-var override path is now consistently validated too.