Fix avx512vl_{128,256} correctness bugs + CI default-arch typo#1357
Open
DiamonDinoia wants to merge 2 commits into
Open
Fix avx512vl_{128,256} correctness bugs + CI default-arch typo#1357DiamonDinoia wants to merge 2 commits into
DiamonDinoia wants to merge 2 commits into
Conversation
The avx_128, avx2_128 and avx512vl_256 matrix branches set CXXFLAGS="$CXX_FLAGS -DXSIMD_DEFAULT_ARCH=..." but $CXX_FLAGS is undefined at that point, so any inherited CXXFLAGS is wiped and the intended append never happens. Use $CXXFLAGS, matching the avx512vl_128 branch. This also means the avx512vl_256 job now actually compiles with its intended default arch (previously it silently did not).
Three correctness bugs on the avx512vl_128 / avx512vl_256 arches, which use k-mask batch_bool registers rather than vector masks: - 16-bit dynamic swizzle recursed infinitely. The sized-2 bridge bitwise_cast<uint16_t>(self) is a no-op for uint16_t, so it re-selected itself (gcc: 'always_inline ... inline-unit-growth limit reached', breaking the full avx512vl_256 test build). Add a uint16_t base that forwards to avx2, mirroring the existing uint8_t base. - neq(NaN, x) returned false: float/double neq used _CMP_NEQ_OQ (ordered, false on NaN). IEEE != needs _CMP_NEQ_UQ, as avx/avx512f already use. - decr_if/incr_if produced garbage (e.g. decr_if(1, all-true)=0x10000). They inherited avx's 'self +/- batch<T>(mask.data)', which assumes a vector batch_bool whose true lanes are all-ones; here mask.data is a k-mask bitfield, so the broadcast is wrong. Delegate to the select-based common implementation. Covered by existing tests (test_decr_if/test_incr_if/test_neq/ test_neq_nan in test_xsimd_api.cpp, and the uint16 swizzle in test_shuffle/test_batch_manip), which now pass on both VL arches under SDE; previously they could not even build (swizzle) or were never exercised because the VL_256 CI job lacked its default arch (see prior commit).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes bugs on the k-mask
avx512vl_{128,256}arches, plus the CI typo hiding them:$CXX_FLAGS→$CXXFLAGSinlinux.yml; theavx512vl_256job now builds with its intended default arch.swizzlerecursed infinitely (build failure); added auint16_tbase forwarding to avx2._CMP_NEQ_OQ→_CMP_NEQ_UQsoneq(NaN, x)is true.self ± batch(mask.data)is wrong for k-mask bools; delegate to the select-basedcommonpath.Covered by existing tests (
test_decr_if/incr_if/neq/neq_nan, uint16swizzle); now pass on both VL arches under SDE.