Skip to content

[Feature]: Support in-place gateway recreation to preserve ALB/IP and avoid DNS changes #3924

@Outvoker

Description

@Outvoker

Problem

Summary

When a dstack gateway needs to be recreated (e.g., to refresh the underlying EC2 instance), the current behavior forces downstream DNS or CNAME changes because the AWS Load Balancer and/or EC2 public IP change.
We'd like to request support for in-place gateway recreation that preserves the externally-visible endpoint.

Current Behavior

Gateway with cert (HTTPS)

  • Creating a gateway on AWS provisions an Application Load Balancer (ALB) and a target group pointing to a backing EC2 instance.
  • Recreating the gateway provisions a new ALB and a new EC2 instance.
  • The previous ALB does not appear to be cleaned up automatically.
  • Because the ALB DNS name changes, any CNAME pointing to the gateway must be updated.

Gateway without cert

  • Recreating the gateway provisions a new EC2 instance with a different public IP.
  • For wildcard-domain setups, this requires a fresh DNS change request and incurs downtime during propagation.

Solution

Requested Behavior

Gateway with cert

Support in-place recreation that:

  • Keeps the existing ALB.
  • Replaces only the EC2 instance behind it (update the target group to point to the new instance).
  • Cleans up the old EC2 instance after the new one is healthy.

Result: no CNAME change required, minimal/zero downtime.

Gateway without cert

Support in-place restart/recreation that:

  • Preserves the existing public IP (e.g., via Elastic IP reassociation, or by reusing the instance and replacing only the OS/kernel).

Result: no DNS change required.

Motivation / Background

We have several dstack gateways that were created some time ago and are now running Linux kernel versions that no longer meet our internal security compliance policy. The policy requires that the kernel release
date be within the last 180 days, with a 60-day remediation window once an asset is flagged.

To remediate, we need to refresh these gateways onto a current AMI/kernel. Today, doing so forces:

  1. A new ALB DNS name (with-cert case) — every consumer's CNAME must be updated.
  2. A new public IP (without-cert case) — every wildcard-domain DNS record must be updated, with downtime during propagation.

Because gateway recreation will be a recurring operational task (driven by quarterly patching cycles, not just one-time kernel upgrades), the DNS-change toil compounds significantly. In-place recreation would let
us meet our patching SLA without coordinating downstream DNS changes each cycle.

Workaround

No response

Would you like to help us implement this feature by sending a PR?

No

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions