Skip to content

[Feature]: Support distributing service replicas across multiple fleets #3929

@NetanelK

Description

@NetanelK

Problem

When running a service with multiple replicas, it's common to want a guaranteed baseline (reserved/on-demand instances) with spot overflow for burst capacity. The natural approach is to create two fleets (one on-demand, one spot) and reference both in the service's fleets parameter.

However, dstack currently selects a single fleet for all replicas of a service. This makes it impossible to distribute replicas across fleets with different purchasing options.

Observed behavior

Setup:

  • Fleet A: nodes: 1, spot_policy: on-demand (with capacity reservation)
  • Fleet B: nodes: 0..1, spot_policy: spot
  • Service: replicas: 2, fleets: [fleet-a, fleet-b], spot_policy: auto

Result: failed_to_start_due_to_no_capacity — "Failed to use specified fleets"

When Fleet B is changed to nodes: 0..2:
All 2 replicas are provisioned on Fleet B (spot), Fleet A (reserved) is completely ignored.

Root cause (code analysis)

  1. Single fleet per run_should_wait_for_run_fleet_assignment in jobs_submitted.py ensures only replica_num=0 selects the fleet. Once run_model.fleet_id is set, all subsequent replicas are constrained to the same fleet.

  2. Capacity check excludes undersized fleets_run_can_fit_into_fleet in plan.py requires:

    fleet_available_capacity = fleet.nodes.max - occupied_instances
    if fleet_available_capacity < nodes_required_num:  # total min replicas
        return False  # fleet excluded entirely

    A fleet is excluded if its nodes.max can't accommodate ALL replicas.

  3. Price-based sort with no fleet priorityfind_optimal_fleet_with_offers sorts by:

    sort_key=(
        not candidate.has_pool_capacity,      # idle instances first
        candidate.min_instance_offer_price,   # cheapest price
        min_backend_offer_price,
    )

    Spot always wins over on-demand/reserved on price.

Desired behavior

Allow replicas of a single service to be distributed across multiple fleets. Example use case:

type: service
name: my-service
replicas: 3..10

fleets:
  - name: gpu-reserved
    min_replicas: 1  # always keep at least 1 replica here
  - name: gpu-spot
    # overflow capacity

spot_policy: auto

Possible solutions

  1. Per-fleet replica pinning — allow specifying min/max replicas per fleet in the fleets list
  2. Fleet priority/weight — ordered preference with fallback (try fleet A first, overflow to fleet B)
  3. Remove the single-fleet constraint — allow run_model.fleet_id to be per-job instead of per-run, and change _run_can_fit_into_fleet to check aggregate capacity across all candidate fleets

Workaround

Currently the only workaround is running two separate services (one per fleet) behind an external load balancer, which adds operational complexity and breaks the single-endpoint abstraction.

Would you like to help us implement this feature by sending a PR?

Yes

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions