Problem
When running a service with multiple replicas, it's common to want a guaranteed baseline (reserved/on-demand instances) with spot overflow for burst capacity. The natural approach is to create two fleets (one on-demand, one spot) and reference both in the service's fleets parameter.
However, dstack currently selects a single fleet for all replicas of a service. This makes it impossible to distribute replicas across fleets with different purchasing options.
Observed behavior
Setup:
- Fleet A:
nodes: 1, spot_policy: on-demand (with capacity reservation)
- Fleet B:
nodes: 0..1, spot_policy: spot
- Service:
replicas: 2, fleets: [fleet-a, fleet-b], spot_policy: auto
Result: failed_to_start_due_to_no_capacity — "Failed to use specified fleets"
When Fleet B is changed to nodes: 0..2:
All 2 replicas are provisioned on Fleet B (spot), Fleet A (reserved) is completely ignored.
Root cause (code analysis)
-
Single fleet per run — _should_wait_for_run_fleet_assignment in jobs_submitted.py ensures only replica_num=0 selects the fleet. Once run_model.fleet_id is set, all subsequent replicas are constrained to the same fleet.
-
Capacity check excludes undersized fleets — _run_can_fit_into_fleet in plan.py requires:
fleet_available_capacity = fleet.nodes.max - occupied_instances
if fleet_available_capacity < nodes_required_num: # total min replicas
return False # fleet excluded entirely
A fleet is excluded if its nodes.max can't accommodate ALL replicas.
-
Price-based sort with no fleet priority — find_optimal_fleet_with_offers sorts by:
sort_key=(
not candidate.has_pool_capacity, # idle instances first
candidate.min_instance_offer_price, # cheapest price
min_backend_offer_price,
)
Spot always wins over on-demand/reserved on price.
Desired behavior
Allow replicas of a single service to be distributed across multiple fleets. Example use case:
type: service
name: my-service
replicas: 3..10
fleets:
- name: gpu-reserved
min_replicas: 1 # always keep at least 1 replica here
- name: gpu-spot
# overflow capacity
spot_policy: auto
Possible solutions
- Per-fleet replica pinning — allow specifying min/max replicas per fleet in the
fleets list
- Fleet priority/weight — ordered preference with fallback (try fleet A first, overflow to fleet B)
- Remove the single-fleet constraint — allow
run_model.fleet_id to be per-job instead of per-run, and change _run_can_fit_into_fleet to check aggregate capacity across all candidate fleets
Workaround
Currently the only workaround is running two separate services (one per fleet) behind an external load balancer, which adds operational complexity and breaks the single-endpoint abstraction.
Would you like to help us implement this feature by sending a PR?
Yes
Problem
When running a service with multiple replicas, it's common to want a guaranteed baseline (reserved/on-demand instances) with spot overflow for burst capacity. The natural approach is to create two fleets (one on-demand, one spot) and reference both in the service's
fleetsparameter.However, dstack currently selects a single fleet for all replicas of a service. This makes it impossible to distribute replicas across fleets with different purchasing options.
Observed behavior
Setup:
nodes: 1,spot_policy: on-demand(with capacity reservation)nodes: 0..1,spot_policy: spotreplicas: 2,fleets: [fleet-a, fleet-b],spot_policy: autoResult:
failed_to_start_due_to_no_capacity— "Failed to use specified fleets"When Fleet B is changed to
nodes: 0..2:All 2 replicas are provisioned on Fleet B (spot), Fleet A (reserved) is completely ignored.
Root cause (code analysis)
Single fleet per run —
_should_wait_for_run_fleet_assignmentinjobs_submitted.pyensures onlyreplica_num=0selects the fleet. Oncerun_model.fleet_idis set, all subsequent replicas are constrained to the same fleet.Capacity check excludes undersized fleets —
_run_can_fit_into_fleetinplan.pyrequires:A fleet is excluded if its
nodes.maxcan't accommodate ALL replicas.Price-based sort with no fleet priority —
find_optimal_fleet_with_offerssorts by:Spot always wins over on-demand/reserved on price.
Desired behavior
Allow replicas of a single service to be distributed across multiple fleets. Example use case:
Possible solutions
fleetslistrun_model.fleet_idto be per-job instead of per-run, and change_run_can_fit_into_fleetto check aggregate capacity across all candidate fleetsWorkaround
Currently the only workaround is running two separate services (one per fleet) behind an external load balancer, which adds operational complexity and breaks the single-endpoint abstraction.
Would you like to help us implement this feature by sending a PR?
Yes