Skip to content

HIVE-29601: ACID: Cleaner finds base directories valid with writeId above the cleaner highWaterMark#6470

Merged
kuczoram merged 2 commits into
apache:masterfrom
kuczoram:HIVE-29601
May 12, 2026
Merged

HIVE-29601: ACID: Cleaner finds base directories valid with writeId above the cleaner highWaterMark#6470
kuczoram merged 2 commits into
apache:masterfrom
kuczoram:HIVE-29601

Conversation

@kuczoram
Copy link
Copy Markdown
Contributor

@kuczoram kuczoram commented May 8, 2026

What changes were proposed in this pull request?

Fix the base directory validation in AcidUtils.isValidBase by removing the shortcut to accept base directories with writeId below the minOpenWriteId without checking against the cleaner's highWaterMark.

Why are the changes needed?

It can happen that the minOpenWriteId is set and greater than the highWaterMark. Without checking against the highWaterMark, base directories above the highWaterMark will be considered valid. Because of this the cleaner will delete base directories above its highWaterMark causing the following cleaner runs fail due to missing base directories. In special cases, the cleaner can pick a base directory belongs to an aborted insert overwrite as latest valid base. This will lead to data loss.

Does this PR introduce any user-facing change?

No

How was this patch tested?

The scenarios are covered in TestCleaner unit test. Also added unit test for AcidUtils.isValidBase

*/
@Override
public boolean isWriteIdAborted(long writeId) {
if (writeId > highWatermark) {
Copy link
Copy Markdown
Member

@deniskuzZ deniskuzZ May 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this another bug? how about isWriteIdRangeAborted, do we need similar check?

Copy link
Copy Markdown
Contributor Author

@kuczoram kuczoram May 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fortunately isWriteIdRangeAborted already contains check against the highWaterMark

if (highWatermark < minWriteId) {
  return RangeResponse.NONE;
}

In isWriteIdAborted I think it is indeed another bug.

Copy link
Copy Markdown
Member

@deniskuzZ deniskuzZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM +1

@sonarqubecloud
Copy link
Copy Markdown

@kuczoram kuczoram merged commit 3d96d70 into apache:master May 12, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants