Align grouped first_value/last_value FILTER handling with shared Some(true) semantics#22681
Open
kosiew wants to merge 4 commits into
Open
Align grouped first_value/last_value FILTER handling with shared Some(true) semantics#22681kosiew wants to merge 4 commits into
first_value/last_value FILTER handling with shared Some(true) semantics#22681kosiew wants to merge 4 commits into
Conversation
…on functionality - Updated `filter_to_validity` to public in `groups_accumulator/nulls.rs`. - Modified `first_last.rs` to use shared `filter_to_validity` for grouped first/last. - Enhanced handling to reject NULL filter rows even if the value bit is true. - Added tests for: - First update path - Last update path - First merge path
- Destructured group_indices loop by removing redundant variable assignment. - Added test helper: new_int64_first_last_group_acc(...) for improved test setup. - Added test helper: nullable_bool_filter(...) to facilitate testing with nullable booleans. - Replaced repeated test setup across tests with the newly created helpers for better maintainability.
- Changed valid to validity in nullable_bool_filter. - Added assert_group_acc_int64_result function. - Replaced repeated evaluate/downcast/assert blocks in three tests for better maintainability.
|
The same bug in #22666. Looks like this PR covers that one too. Might be worth a sqllogictest on top since the repro was SQL. Here's mine if you want it (without the fix, g=1 returns 10 instead of NULL): |
Contributor
Author
|
@Nagato-Yuzuru |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Rationale for this change
Grouped
first_value/last_valueused a local filter check based onBooleanArray::value(idx). This reads only the value bit and does not enforce aggregateFILTERsemantics, where a row should pass only when the predicate isSome(true).DataFusion already has a shared helper,
filter_to_validity, that converts a nullable boolean filter into a validity bitmap implementing the correct semantics (Some(true)passes;Some(false)andNonedo not). Other grouped aggregate helper paths already use this helper.This change aligns grouped
first_value/last_valuewith the existing shared implementation and avoids semantic drift in nullableFILTERhandling.What changes are included in this PR?
Make
filter_to_validitypublicly accessible fromdatafusion-functions-aggregate-common.Update
FirstLastGroupsAccumulator::get_filtered_extreme_of_each_groupto:filter_to_validityFILTERpass/fail decisions from the resulting validity bitmap rather than directly readingBooleanArray::value(idx)Minor cleanup in the grouped accumulator loop by destructuring
group_idxdirectly.Add targeted grouped accumulator regression tests covering nullable filters where the underlying value bit is
truebut the filter validity isNULL:first_valuelast_valuemerge_batchAdd SQLLogicTest coverage for grouped
first_valueandlast_valuewith nullableFILTERpredicates, including:NULLTRUEpredicates participate in the aggregateAre these changes tested?
Yes.
Added Rust unit tests in
datafusion/functions-aggregate/src/first_last.rs:test_first_group_acc_rejects_null_filter_with_true_value_bittest_last_group_acc_rejects_null_filter_with_true_value_bittest_first_group_acc_merge_rejects_null_filter_with_true_value_bitAdded SQLLogicTest coverage in:
datafusion/sqllogictest/test_files/aggregate.sltValidation performed:
Are there any user-facing changes?
Yes.
This fixes aggregate
FILTERbehavior for groupedfirst_valueandlast_valuewhen nullable filter predicates are involved. Rows with aNULLfilter predicate are now consistently treated as filtered out, matching aggregateFILTERsemantics and the behavior already implemented by the shared helper.No SQL syntax or public user-facing APIs are changed. The only API change is that
filter_to_validityis made public within the aggregate-common crate to enable reuse by grouped aggregate implementations.LLM-generated code disclosure
This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed and tested.