Fix avx512vl_{128,256} correctness bugs + CI default-arch typo by DiamonDinoia · Pull Request #1357 · xtensor-stack/xsimd

DiamonDinoia · 2026-06-02T18:20:57Z

Fixes bugs on the k-mask avx512vl_{128,256} arches, plus the CI typo hiding them:

CI: $CXX_FLAGS → $CXXFLAGS in linux.yml; the avx512vl_256 job now builds with its intended default arch.
swizzle: 16-bit dynamic swizzle recursed infinitely (build failure); added a uint16_t base forwarding to avx2.
neq: _CMP_NEQ_OQ → _CMP_NEQ_UQ so neq(NaN, x) is true.
decr_if/incr_if: avx's self ± batch(mask.data) is wrong for k-mask bools; delegate to the select-based common path.

Covered by existing tests (test_decr_if/incr_if/neq/neq_nan, uint16 swizzle); now pass on both VL arches under SDE.

The avx_128, avx2_128 and avx512vl_256 matrix branches set CXXFLAGS="$CXX_FLAGS -DXSIMD_DEFAULT_ARCH=..." but $CXX_FLAGS is undefined at that point, so any inherited CXXFLAGS is wiped and the intended append never happens. Use $CXXFLAGS, matching the avx512vl_128 branch. This also means the avx512vl_256 job now actually compiles with its intended default arch (previously it silently did not).

Three correctness bugs on the avx512vl_128 / avx512vl_256 arches, which use k-mask batch_bool registers rather than vector masks: - 16-bit dynamic swizzle recursed infinitely. The sized-2 bridge bitwise_cast<uint16_t>(self) is a no-op for uint16_t, so it re-selected itself (gcc: 'always_inline ... inline-unit-growth limit reached', breaking the full avx512vl_256 test build). Add a uint16_t base that forwards to avx2, mirroring the existing uint8_t base. - neq(NaN, x) returned false: float/double neq used _CMP_NEQ_OQ (ordered, false on NaN). IEEE != needs _CMP_NEQ_UQ, as avx/avx512f already use. - decr_if/incr_if produced garbage (e.g. decr_if(1, all-true)=0x10000). They inherited avx's 'self +/- batch<T>(mask.data)', which assumes a vector batch_bool whose true lanes are all-ones; here mask.data is a k-mask bitfield, so the broadcast is wrong. Delegate to the select-based common implementation. Covered by existing tests (test_decr_if/test_incr_if/test_neq/ test_neq_nan in test_xsimd_api.cpp, and the uint16 swizzle in test_shuffle/test_batch_manip), which now pass on both VL arches under SDE; previously they could not even build (swizzle) or were never exercised because the VL_256 CI job lacked its default arch (see prior commit).

DiamonDinoia added 2 commits June 2, 2026 14:11

DiamonDinoia mentioned this pull request Jun 2, 2026

perf(avx512vl): native EVEX int64 shift, signed rotr, compress/expand #1358

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix avx512vl_{128,256} correctness bugs + CI default-arch typo#1357

Fix avx512vl_{128,256} correctness bugs + CI default-arch typo#1357
DiamonDinoia wants to merge 2 commits into
xtensor-stack:masterfrom
DiamonDinoia:fix/avx512vl-known-issues

DiamonDinoia commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

DiamonDinoia commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant