[Enhancement](mmhash) Support mmhash3_u64_v2#61846
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
FE UT Coverage ReportIncrement line coverage |
TPC-H: Total hot run time: 27009 ms |
TPC-DS: Total hot run time: 168322 ms |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
/review |
There was a problem hiding this comment.
Pull request overview
Adds a new hash scalar function murmur_hash3_u64_v2 (unsigned 64-bit MurmurHash3 v2 exposed as LARGEINT) across FE/Nereids and BE, plus regression coverage to validate results and null handling.
Changes:
- Register
murmur_hash3_u64_v2in FE (builtin function list + Nereids expression/visitor support). - Implement and register the BE function variant returning
Int128to represent unsigned 64-bit results. - Extend regression tests and golden outputs for the new function (including table + multi-arg cases).
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| regression-test/suites/query_p0/sql_functions/hash_functions/test_hash_function.groovy | Adds query tests + validation helper and table coverage for murmur_hash3_u64_v2. |
| regression-test/data/query_p0/sql_functions/hash_functions/test_hash_function.out | Adds expected outputs for the new test cases. |
| fe/fe-core/src/main/java/org/apache/doris/nereids/trees/expressions/visitor/ScalarFunctionVisitor.java | Adds visitor hook for the new scalar function. |
| fe/fe-core/src/main/java/org/apache/doris/nereids/trees/expressions/functions/scalar/MurmurHash3U64V2.java | Introduces Nereids scalar function definition/signatures for murmur_hash3_u64_v2. |
| fe/fe-core/src/main/java/org/apache/doris/catalog/BuiltinScalarFunctions.java | Registers the new function as a builtin scalar. |
| be/src/exprs/function/function_hash.cpp | Adds BE implementation/registration for murmur_hash3_u64_v2 using Int128 output. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
regression-test/suites/query_p0/sql_functions/hash_functions/test_hash_function.groovy
Show resolved
Hide resolved
regression-test/suites/query_p0/sql_functions/hash_functions/test_hash_function.groovy
Show resolved
Hide resolved
There was a problem hiding this comment.
Code Review Summary for Enhancement Support mmhash3_u64_v2
This PR adds a new murmur_hash3_u64_v2 function that returns the unsigned 64-bit murmur hash value as a LARGEINT type. The approach reuses the existing MurmurHash3Impl template with TYPE_LARGEINT as the return type, storing the uint64_t hash result in a __int128_t column to preserve unsigned semantics.
Critical Checkpoint Conclusions
-
Goal / Does the code accomplish it? Yes. The function correctly returns the unsigned interpretation of the murmur_hash3_64_v2 hash value by storing
uint64_tinLARGEINT(__int128_t). Tests verifymurmur_hash3_u64_v2(x) == CAST(murmur_hash3_64_v2(x) AS LARGEINT) & 0xFFFFFFFFFFFFFFFF. -
Is the modification minimal and focused? Yes. The change is small and focused: 5 lines in BE, a new FE function class following the exact pattern of
MurmurHash364V2, and comprehensive tests. -
Concurrency? Not applicable. No concurrent data structures involved.
-
Lifecycle / static init? Not applicable.
-
Configuration items? None added.
-
Incompatible changes / rolling upgrades? No incompatible changes. This is purely additive (new function). Mixed-version clusters will simply not recognize
murmur_hash3_u64_v2on older nodes, which is expected behavior for new functions. -
Parallel code paths? The
static_cast<uint64_t>change to the seed parameter affects the sharedexecute()code path for all non-TYPE_INTinstantiations. For the existingTYPE_BIGINTcase,static_cast<uint64_t>(int64_t)is identical to the previous implicit conversion — no behavioral change. ForTYPE_LARGEINT, the cast correctly truncates to the lower 64 bits (upper bits are always zero since values come frommurmur_hash3_64()widened to__int128_t). -
Special conditional checks? The
get_name()template usesif constexprwith correct ordering:TYPE_INT→TYPE_LARGEINT→is_mmh64_v2→ default. All four instantiations resolve correctly. -
FE-BE type consistency? FE returns
LargeIntType, BE usesDataTypeInt128/ColumnVector<TYPE_LARGEINT>— these are consistent. -
Test coverage? Good coverage: null handling, empty string, single-arg, multi-arg (2 and 3 args), table-based tests, constant folding tests, and cross-validation against the masked
murmur_hash3_64_v2result. Expected output values verified correct (e.g.,2^64 + (-2648103510258542450) = 15798640563451009166). -
Observability? Not needed for a pure scalar function.
-
Transaction / persistence? Not applicable.
-
FE-BE variable passing? Not applicable.
-
Performance? Using
LARGEINT(__int128_t) doubles memory per hash value (16 bytes vs 8 bytes needed). This is a design trade-off since Doris lacks an unsigned 64-bit type, and usingLARGEINTis the standard approach. No performance anti-patterns. -
Other issues? One minor test standards issue: the test drops the table at the end of the test (line 95 of groovy), which goes against the coding standard of preserving tables for debugging. See inline comment.
Overall
The implementation is correct and well-tested. One minor test standards issue noted.
regression-test/suites/query_p0/sql_functions/hash_functions/test_hash_function.groovy
Show resolved
Hide resolved
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Release note
doc : apache/doris-website#3499
Support function
murmur_hash3_u64_v2(str)Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)