Skip to content

perf: stabilize streaming budget across sampling steps (+ log noise cleanup)#1611

Open
fszontagh wants to merge 1 commit into
leejet:masterfrom
fszontagh:perf/stable-streaming-budget
Open

perf: stabilize streaming budget across sampling steps (+ log noise cleanup)#1611
fszontagh wants to merge 1 commit into
leejet:masterfrom
fszontagh:perf/stable-streaming-budget

Conversation

@fszontagh
Copy link
Copy Markdown
Contributor

Summary

Two related changes: stabilize the streaming budget across sampling steps, and demote the per-step budget logs that the old behavior produced.

When --stream-layers is active, the per-call effective_budget re-measured free VRAM at every compute() call. Free VRAM oscillates step-to-step (dips while a partial-offload buffer + compute buffer are live, recovers when freed), so the clamped budget swung between values like 8541 MB and 3354 MB on every step. The planner cache is keyed on the budget, so a different budget triggered a full re-merge of the base segments (150-230 ms per call on SDXL), and the chunk-K residency token could change forcing a resident-set reload.

This PR:

  • Ratchets effective_budget to the maximum value ever observed; subsequent clamping to a lower live free-VRAM value is reverted before reaching the planner. Reset to 0 in free_params_buffer.
  • Adjusts log levels so per-step budget noise no longer floods INFO: clamping streaming budget and graph cut budget merge took X ms are DEBUG, and streaming budget = X MB only logs at INFO when the budget actually ratchets up.

Related Issue / Discussion

Follow-up to #1598.

Additional Information

SDXL 896x1152, 40 steps, dpm++2m karras, --offload-to-cpu --stream-layers --max-vram -1 on RTX 3060:

Before After
generate_image 1m 18s 47s

Side effect of stable budget: planner now merges to 1 segment for SDXL and keeps it across steps, so the UNet offload_params runs once per generation instead of per step.

Checklist

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant