Add instances option to target specific fleet nodes#3925
Open
fededagos wants to merge 4 commits into
Open
Conversation
Introduce an `instances` run profile option that pins a run to specific existing fleet instances (nodes). Each value matches an instance by its name (e.g. `my-fleet-0`) or by its hostname/IP address. When set, `filter_instances` keeps only matching instances and the job assignment phase never provisions new capacity to satisfy a node selector, terminating with a no-capacity error instead.
Reject runs that target fewer instances than the number of nodes they require, surfaced during planning via `validate_run_spec_and_set_defaults`. Exclude new-capacity backend offers from the run plan when `instances` is set, since they are never provisioned and would otherwise mislead the `dstack apply`/`dstack offer` output.
Add a 'Targeting specific instances' section to the shared fleets snippet (dev environments, tasks, services) and a corresponding tip in the protips guide.
Handle an explicit empty `instances` list consistently across the assignment gate, plan output, and instance filtering by checking `is not None` instead of truthiness, so an empty list targets existing instances only (rather than silently allowing new-capacity provisioning and showing unusable offers). Add regression tests ensuring the instance selector is applied on the multinode and shared-instances filter paths.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds an
instancesoption to run configurations (dev environments, tasks, services) thatpins a run to specific existing fleet instances (nodes). Each value matches an instance by
its name (e.g.
my-fleet-0) or by its hostname/IP address:Why
On multi-node fleets (e.g. an 8-node SSH cluster) you sometimes need a run to land on a
particular node, not just any matching one. Our case: certain datasets are staged only on
certain nodes, so the job has to run where its data already lives. Other uses: pinning to a
node with specific local state, isolating a flaky node for debugging, or co-locating with a
prior run's cached artifacts. Today the only targeting knob is
fleets:, which can't selectwithin a fleet.
Behavior
dstackplaces the run on whichevermatching node is available (offers are sorted by availability/price). A single entry pins
to one node.
instancesis set,dstacknever provisions new capacity to satisfya selector; if no listed instance is available the run fails with a no-capacity error (use
retryto wait for a busy node to free up).nodes it requires (
nodes > len(instances)).dstack apply/dstack offeroutput when
instancesis set, since they would never be provisioned.Implementation
The selector is added to
ProfileParamsand flows through the existing reuse path:filter_instanceskeeps only matching instances (matched against name, cloud hostname, orSSH host), the assignment phase declines new-capacity provisioning, validation runs in
validate_run_spec_and_set_defaults, and the plan builder drops unusable offers. No DBschema change — the field rides inside the existing
run_spec/profileJSON.Testing
SSH host), assignment placing a job on the targeted node, the no-new-capacity gating,
multinode validation (both directions), and the plan-output filtering. Full suite green.
end-to-end (a
0.20.19server rejects it as expected), and multinode validation fires.Docs
Documented in the shared fleets snippet (dev environments, tasks, services) and the protips
guide; the configuration reference picks up the field automatically.
Compatibility
Backward compatible and optional (defaults to
None); existing runs/fleets are unaffected.Both server and client need this version — the field is rejected by older servers and unknown
to older CLIs.
This PR was written primarily by Claude Code.