Skip to content

25.3.8-fips: Make tests in CI a bit more green#1614

Open
mkmkme wants to merge 9 commits intoreleases/25.3.8-fipsfrom
mkmkme/fips/disable-broken-tests
Open

25.3.8-fips: Make tests in CI a bit more green#1614
mkmkme wants to merge 9 commits intoreleases/25.3.8-fipsfrom
mkmkme/fips/disable-broken-tests

Conversation

@mkmkme
Copy link
Copy Markdown
Collaborator

@mkmkme mkmkme commented Apr 2, 2026

Fix a config in one test, disable a couple of tests.

Changelog category (leave one):

  • CI Fix or Improvement (changelog entry is not required)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

...

Documentation entry for user-facing changes

...

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • S3 Export (2h)
  • Swarms (30m)
  • Tiered Storage (2h)

It requires PRQL that is disabled in FIPS.
@mkmkme mkmkme added fips Work related to Altinity FIPS releases fips-25.3 fips-25.3.8.30001 labels Apr 2, 2026
@mkmkme mkmkme changed the title Disable broken tests 25.3.8-fips: Disable broken tests Apr 3, 2026
mkmkme added 3 commits April 3, 2026 10:27
DeltaLake is not supported in FIPS.
We replaced aes_128_gcm_siv with aes_128_gcm, but it was not reflected
in the test configs. Let's fix it.
@mkmkme mkmkme marked this pull request as ready for review April 3, 2026 08:28
@mkmkme mkmkme changed the title 25.3.8-fips: Disable broken tests 25.3.8-fips: Make tests in CI a bit greener Apr 3, 2026
@mkmkme mkmkme changed the title 25.3.8-fips: Make tests in CI a bit greener 25.3.8-fips: Make tests in CI a bit more green Apr 3, 2026
@mkmkme
Copy link
Copy Markdown
Collaborator Author

mkmkme commented Apr 3, 2026

It's quite odd test_disks_app_func/test.py::test_disks_app_func_rm_shared_recursive is failing in the CI semi-consistently. It's passing locally.

@mkmkme
Copy link
Copy Markdown
Collaborator Author

mkmkme commented Apr 3, 2026

03170_ecs_crash is something odd. Never seen it failing before. And these changes didn't touch it at all.

@mkmkme
Copy link
Copy Markdown
Collaborator Author

mkmkme commented Apr 3, 2026

test_dictionaries_all_layouts_separate_sources/test_mongo_uri.py::test_simple_ssl[flat-True] is an interesting failing test. It also fails on non-fips build from branch releases/25.3.8. So it's unrelated.

@mkmkme
Copy link
Copy Markdown
Collaborator Author

mkmkme commented Apr 3, 2026

test_disks_app_other_disk_types/test.py::test_disks_app_plain_rewritable also can't be reproduced locally, but for some reason is reproduced in CI

@mkmkme
Copy link
Copy Markdown
Collaborator Author

mkmkme commented Apr 3, 2026

test_disks_app_other_disk_types/test.py::test_disks_app_plain_rewritable also can't be reproduced locally, but for some reason is reproduced in CI

Okay it does fail on asan due to memory leak. Investigating

mkmkme and others added 2 commits April 4, 2026 10:41
AWS-LC FIPS 2.0.0 allocates a per-thread `fips_service_indicator_state`
struct (48 bytes + 24-byte pointer array) on the first FIPS-approved
crypto operation via `service_indicator_get()` →
`CRYPTO_set_thread_local()` → `OPENSSL_malloc()`. This state tracks
whether each cryptographic operation used a FIPS-approved algorithm, as
required by FIPS 140-3 compliance.

The memory is registered with a pthread TLS destructor (`OPENSSL_free`)
that fires when the thread exits. However, in ClickHouse the crypto
operations (e.g. SHA-256 for S3 request signing) run on GlobalThreadPool
worker threads. These worker threads outlive the application-level thread
pools (like `threadpool_writer`) because `ThreadFromGlobalPoolImpl`
submits jobs to the GlobalThreadPool rather than owning threads directly.
The GlobalThreadPool worker threads are only joined during static
destruction of `GlobalThreadPool::the_instance`, which races with
LeakSanitizer's atexit check.

This is a false positive: the memory IS freed when the worker threads
eventually exit, but LSAN runs its scan before GlobalThreadPool static
destruction completes.

This issue is FIPS-specific: the `FIPS_service_indicator_update_state`
code path only exists in AWS-LC FIPS builds. Non-FIPS builds (OpenSSL
3.2.1) do not have service indicator tracking.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
AWS-LC FIPS 2.0.0 allocates a per-thread `fips_service_indicator_state`
(48 + 24 bytes) on the first FIPS-approved crypto operation via
`service_indicator_get()`. This state is freed by a pthread TLS
destructor when the thread exits, but in ClickHouse the crypto
operations (e.g. SHA-256 for S3 request signing) run on GlobalThreadPool
worker threads whose lifetime extends beyond LeakSanitizer's check.

This causes LSAN to report a false positive in integration tests that
invoke `clickhouse disks` or other CLI tools performing S3 operations.

The fix:
- Add `tests/integration/helpers/lsan_suppressions.txt` with a
  suppression for `service_indicator_get` (placed in helpers/ because
  only `tests/integration/` is mounted into the test runner container)
- Configure `cluster.py` to set `LSAN_OPTIONS` with the suppressions
  file path and copy it into each instance's config directory

This issue is FIPS-specific: the `FIPS_service_indicator_update_state`
code path only exists in AWS-LC FIPS builds.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@mkmkme
Copy link
Copy Markdown
Collaborator Author

mkmkme commented Apr 4, 2026

[gw8] [ 71%] PASSED test_dictionaries_all_layouts_separate_sources/test_mongo_uri.py::test_simple_ssl[flat-True] 
[gw8] [ 71%] ERROR test_dictionaries_all_layouts_separate_sources/test_mongo_uri.py::test_simple_ssl[flat-True] 

This seems to be failing due to a memory leak in BIO_do_handshake that is being fixed in #1616

mkmkme and others added 3 commits April 4, 2026 13:52
The previous commit (bf4af17) added LSAN suppressions for
integration tests via cluster.py, but stateless tests inherit
LSAN_OPTIONS from the Docker image's ENV directive which did not
include the suppressions file path.

The base Dockerfile already wrote the correct LSAN_OPTIONS (with
suppressions path) to /etc/environment (line 38) for services, but
the ENV on line 45 — used by non-login shells including the test
runner — omitted the suppressions= directive. This caused
clickhouse-disks and other CLI tools to abort when LSAN detected
the AWS-LC FIPS service_indicator_get allocation.

Fix both:
- docker/test/base/Dockerfile: add suppressions path to ENV
  LSAN_OPTIONS, matching the /etc/environment entry
- tests/config/lsan_suppressions.txt: add service_indicator_get
  suppression (this file is installed at
  /usr/share/clickhouse-test/config/lsan_suppressions.txt in the
  test Docker image)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The kitware apt key baked into the altinityinfra/test-util base image
has expired, causing `apt-get update` to fail with:

  GPG error: https://apt.kitware.com/ubuntu jammy InRelease:
  NO_PUBKEY 65ADECD7A7039392

Re-fetch the key before the first apt-get update, same approach as
53bcc4f and 4bc620d for the binary-builder image.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Set LSAN_OPTIONS with the suppressions file path in the stateless test
runner (tests/clickhouse-test) instead of modifying the Docker image.
This ensures clickhouse-disks and other CLI tools invoked from stateless
tests pick up the AWS-LC FIPS service_indicator_get suppression.

Also reverts the docker/test/base/Dockerfile changes from the previous
two commits (LSAN_OPTIONS ENV and kitware GPG key refresh) to avoid
triggering Docker image rebuilds.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@mkmkme mkmkme force-pushed the mkmkme/fips/disable-broken-tests branch from 37f10f6 to 50ffdb2 Compare April 4, 2026 15:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fips Work related to Altinity FIPS releases fips-25.3 fips-25.3.8.30001

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants