Skip to content

Add instances option to target specific fleet nodes#3925

Open
fededagos wants to merge 4 commits into
dstackai:masterfrom
fededagos:feat/target-specific-fleet-instances
Open

Add instances option to target specific fleet nodes#3925
fededagos wants to merge 4 commits into
dstackai:masterfrom
fededagos:feat/target-specific-fleet-instances

Conversation

@fededagos
Copy link
Copy Markdown

Adds an instances option to run configurations (dev environments, tasks, services) that
pins a run to specific existing fleet instances (nodes). Each value matches an instance by
its name (e.g. my-fleet-0) or by its hostname/IP address:

type: dev-environment
ide: vscode
instances:
  - my-fleet-3        # by name
  - 203.0.113.10      # or by IP

Why

On multi-node fleets (e.g. an 8-node SSH cluster) you sometimes need a run to land on a
particular node, not just any matching one. Our case: certain datasets are staged only on
certain nodes, so the job has to run where its data already lives. Other uses: pinning to a
node with specific local state, isolating a flaky node for debugging, or co-locating with a
prior run's cached artifacts. Today the only targeting knob is fleets:, which can't select
within a fleet.

Behavior

  • Allow-list semantics. List several instances and dstack places the run on whichever
    matching node is available (offers are sorted by availability/price). A single entry pins
    to one node.
  • Reuse-only. When instances is set, dstack never provisions new capacity to satisfy
    a selector; if no listed instance is available the run fails with a no-capacity error (use
    retry to wait for a busy node to free up).
  • Validation. A run is rejected up front if it lists fewer instances than the number of
    nodes it requires (nodes > len(instances)).
  • Plan output. New-capacity backend offers are hidden from dstack apply/dstack offer
    output when instances is set, since they would never be provisioned.

Implementation

The selector is added to ProfileParams and flows through the existing reuse path:
filter_instances keeps only matching instances (matched against name, cloud hostname, or
SSH host), the assignment phase declines new-capacity provisioning, validation runs in
validate_run_spec_and_set_defaults, and the plan builder drops unusable offers. No DB
schema change — the field rides inside the existing run_spec/profile JSON.

Testing

  • Unit/integration tests for selector matching (name, case-insensitive name, cloud hostname,
    SSH host), assignment placing a job on the targeted node, the no-new-capacity gating,
    multinode validation (both directions), and the plan-output filtering. Full suite green.
  • Live smoke test against a local server running this branch: the field is accepted
    end-to-end (a 0.20.19 server rejects it as expected), and multinode validation fires.

Docs

Documented in the shared fleets snippet (dev environments, tasks, services) and the protips
guide; the configuration reference picks up the field automatically.

Compatibility

Backward compatible and optional (defaults to None); existing runs/fleets are unaffected.
Both server and client need this version — the field is rejected by older servers and unknown
to older CLIs.

This PR was written primarily by Claude Code.

fededagos added 4 commits June 1, 2026 11:19
Introduce an `instances` run profile option that pins a run to specific
existing fleet instances (nodes). Each value matches an instance by its
name (e.g. `my-fleet-0`) or by its hostname/IP address.

When set, `filter_instances` keeps only matching instances and the job
assignment phase never provisions new capacity to satisfy a node
selector, terminating with a no-capacity error instead.
Reject runs that target fewer instances than the number of nodes they
require, surfaced during planning via `validate_run_spec_and_set_defaults`.

Exclude new-capacity backend offers from the run plan when `instances` is
set, since they are never provisioned and would otherwise mislead the
`dstack apply`/`dstack offer` output.
Add a 'Targeting specific instances' section to the shared fleets snippet
(dev environments, tasks, services) and a corresponding tip in the
protips guide.
Handle an explicit empty `instances` list consistently across the
assignment gate, plan output, and instance filtering by checking
`is not None` instead of truthiness, so an empty list targets existing
instances only (rather than silently allowing new-capacity provisioning
and showing unusable offers).

Add regression tests ensuring the instance selector is applied on the
multinode and shared-instances filter paths.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant