Skip to content

ci(linux): build fat package with GGML_BACKEND_DL + GGML_CPU_ALL_VARIANTS#5

Open
Geramy wants to merge 1 commit into
lemonadefrom
geramy/cpu-all-variants
Open

ci(linux): build fat package with GGML_BACKEND_DL + GGML_CPU_ALL_VARIANTS#5
Geramy wants to merge 1 commit into
lemonadefrom
geramy/cpu-all-variants

Conversation

@Geramy
Copy link
Copy Markdown
Member

@Geramy Geramy commented May 6, 2026

What

Switch the Linux x86_64 `ubuntu-latest-cmake` (cpu) and `ubuntu-latest-rocm` (HIP) builds from the AVX2/FMA/F16C portable baseline (#3) to a fat-package build:

```
-DGGML_NATIVE=OFF
-DGGML_BACKEND_DL=ON
-DGGML_CPU_ALL_VARIANTS=ON
```

The build now produces:

  • `libstable-diffusion.so` — main shared library
  • `libggml-cpu-sandybridge.so` — AVX
  • `libggml-cpu-haswell.so` — AVX2 + FMA + F16C
  • `libggml-cpu-skylakex.so` — + AVX-512F
  • `libggml-cpu-icelake.so` — + AVX-512 VNNI
  • `libggml-cpu-alderlake.so` — + AVX-VNNI + DOTPROD
  • `libggml-cpu-x64.so` — no-SIMD fallback

At runtime, `ggml_backend_load_all_from_path` (already wired by upstream PR leejet#1448) dlopens each variant, queries `__builtin_cpu_supports`, and picks the highest-tier match. Same zip works on a 2014 Sandy Bridge laptop and a 2024 Alder Lake server — without the runner-of-the-day AVX-512 lottery that crashed master-593.

Tradeoff

Linux x86_64 zip: ~12 MB → ~50–80 MB. Acceptable IMO — Lemonade and similar consumers cache the extracted dir across model loads, so this is paid once at install. In exchange you stop choosing between portability and AVX-512 perf — you get both.

Why not Windows / macOS

  • Windows AVX2 build already pins `GGML_NATIVE=OFF -DGGML_AVX2=ON`. Could get the same fat-package treatment for symmetry, but that's a separate change and not required to fix the consumer-side AVX-512 SIGILL pattern.
  • macOS arm64 all-Apple-Silicon parts share a uniform NEON+DOTPROD+i8mm+bf16 baseline (M1+), so `-march=native` does not introduce a portability problem there. Current Metal flags are already optimal.

Verification plan

  • Trigger a workflow_dispatch release on this branch (`create_release: true`).
  • Inspect the resulting `sd-{hash}-bin-Linux-Ubuntu-24.04-x86_64.zip` — it should contain `libstable-diffusion.so` plus several `libggml-cpu-*.so` files.
  • On an AVX-512-less host: `./sd-server -m model.safetensors` → uses haswell variant, no SIGILL.
  • On an AVX-512 host: same command → uses skylakex/icelake/alderlake variant, perf matches a native-compiled build.
  • Bump `sd-cpp` pins on lemonade-sdk/lemonade PR #1777 to the new tag and confirm `Test ollama (ubuntu-latest)` passes.

Lineage

Replaces #3's portable AVX2 baseline. #3 fixed the SIGILL but left AVX-512-class hosts running AVX2 code; this gets full perf on those hosts.

Replace the AVX2/FMA/F16C portable baseline (#3) with a fat-package build
that produces one libstable-diffusion.so plus a libggml-cpu-*.so per CPU
variant — sandybridge, haswell, skylakex (AVX-512F), icelake (AVX-512 +
VNNI), alderlake (AVX-512 + VNNI + DOTPROD), and a pure-x64 fallback.

At runtime ggml dlopens the variants and picks the highest-tier one the
host CPU supports. AVX-512 hosts get AVX-512 perf; older boxes fall back
gracefully — no -march=native runner lottery, no SIGILL.

Tradeoff: zip grows from ~12 MB → ~50–80 MB. Acceptable for a one-time
download, especially since downstream consumers (Lemonade) cache the
extracted directory across model loads.

Applied to ubuntu-latest-cmake (CPU) and ubuntu-latest-rocm (HIP), since
the HIPBLAS build still uses ggml CPU ops for parts of the pipeline.

Windows AVX2 already pins GGML_NATIVE=OFF + AVX2 only, and macOS arm64
shares a uniform NEON+DOTPROD+i8mm+bf16 baseline across all Apple Silicon
generations, so neither needs the same treatment.

Upstream PR leejet#1448 (commit b8079e2) wired the
runtime backend discovery code into libstable-diffusion.so already; this
just enables the build flag that produces the variant .so files.
kenvandine pushed a commit that referenced this pull request Jun 1, 2026
ci: standardize CUDA artifact names and compression formats
@kenvandine kenvandine requested a review from Copilot June 1, 2026 12:44
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the Linux x86_64 CI build configuration to produce a “fat” CPU package by enabling ggml’s dynamic backend loading and building all CPU ISA variants, improving portability (avoiding SIGILL on non-AVX-512 hosts) while retaining high performance on newer CPUs.

Changes:

  • Switch ubuntu-latest-cmake from a fixed AVX2/FMA/F16C baseline to GGML_BACKEND_DL=ON + GGML_CPU_ALL_VARIANTS=ON (with GGML_NATIVE=OFF).
  • Apply the same fat-package approach to the ubuntu-latest-rocm (HIPBLAS) build.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +58 to +61
# x64). At runtime ggml dlopens whichever variant is highest-priority on
# the host CPU, so an AVX-512 host gets AVX-512 perf and an AVX-512-less
# host falls back to haswell — same zip, no -march=native runner
# lottery, no SIGILL.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants