Skip to content

Fix avx512vl_{128,256} correctness bugs + CI default-arch typo#1357

Open
DiamonDinoia wants to merge 2 commits into
xtensor-stack:masterfrom
DiamonDinoia:fix/avx512vl-known-issues
Open

Fix avx512vl_{128,256} correctness bugs + CI default-arch typo#1357
DiamonDinoia wants to merge 2 commits into
xtensor-stack:masterfrom
DiamonDinoia:fix/avx512vl-known-issues

Conversation

@DiamonDinoia
Copy link
Copy Markdown
Contributor

Fixes bugs on the k-mask avx512vl_{128,256} arches, plus the CI typo hiding them:

  • CI: $CXX_FLAGS$CXXFLAGS in linux.yml; the avx512vl_256 job now builds with its intended default arch.
  • swizzle: 16-bit dynamic swizzle recursed infinitely (build failure); added a uint16_t base forwarding to avx2.
  • neq: _CMP_NEQ_OQ_CMP_NEQ_UQ so neq(NaN, x) is true.
  • decr_if/incr_if: avx's self ± batch(mask.data) is wrong for k-mask bools; delegate to the select-based common path.

Covered by existing tests (test_decr_if/incr_if/neq/neq_nan, uint16 swizzle); now pass on both VL arches under SDE.

The avx_128, avx2_128 and avx512vl_256 matrix branches set
CXXFLAGS="$CXX_FLAGS -DXSIMD_DEFAULT_ARCH=..." but $CXX_FLAGS is
undefined at that point, so any inherited CXXFLAGS is wiped and the
intended append never happens. Use $CXXFLAGS, matching the avx512vl_128
branch. This also means the avx512vl_256 job now actually compiles with
its intended default arch (previously it silently did not).
Three correctness bugs on the avx512vl_128 / avx512vl_256 arches, which
use k-mask batch_bool registers rather than vector masks:

- 16-bit dynamic swizzle recursed infinitely. The sized-2 bridge
  bitwise_cast<uint16_t>(self) is a no-op for uint16_t, so it re-selected
  itself (gcc: 'always_inline ... inline-unit-growth limit reached',
  breaking the full avx512vl_256 test build). Add a uint16_t base that
  forwards to avx2, mirroring the existing uint8_t base.

- neq(NaN, x) returned false: float/double neq used _CMP_NEQ_OQ (ordered,
  false on NaN). IEEE != needs _CMP_NEQ_UQ, as avx/avx512f already use.

- decr_if/incr_if produced garbage (e.g. decr_if(1, all-true)=0x10000).
  They inherited avx's 'self +/- batch<T>(mask.data)', which assumes a
  vector batch_bool whose true lanes are all-ones; here mask.data is a
  k-mask bitfield, so the broadcast is wrong. Delegate to the
  select-based common implementation.

Covered by existing tests (test_decr_if/test_incr_if/test_neq/
test_neq_nan in test_xsimd_api.cpp, and the uint16 swizzle in
test_shuffle/test_batch_manip), which now pass on both VL arches under
SDE; previously they could not even build (swizzle) or were never
exercised because the VL_256 CI job lacked its default arch (see prior
commit).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant