25.3.8-fips: Make tests in CI a bit more green#1614
25.3.8-fips: Make tests in CI a bit more green#1614mkmkme wants to merge 9 commits intoreleases/25.3.8-fipsfrom
Conversation
It requires PRQL that is disabled in FIPS.
DeltaLake is not supported in FIPS.
We replaced aes_128_gcm_siv with aes_128_gcm, but it was not reflected in the test configs. Let's fix it.
|
It's quite odd |
|
|
|
|
|
|
Okay it does fail on asan due to memory leak. Investigating |
AWS-LC FIPS 2.0.0 allocates a per-thread `fips_service_indicator_state` struct (48 bytes + 24-byte pointer array) on the first FIPS-approved crypto operation via `service_indicator_get()` → `CRYPTO_set_thread_local()` → `OPENSSL_malloc()`. This state tracks whether each cryptographic operation used a FIPS-approved algorithm, as required by FIPS 140-3 compliance. The memory is registered with a pthread TLS destructor (`OPENSSL_free`) that fires when the thread exits. However, in ClickHouse the crypto operations (e.g. SHA-256 for S3 request signing) run on GlobalThreadPool worker threads. These worker threads outlive the application-level thread pools (like `threadpool_writer`) because `ThreadFromGlobalPoolImpl` submits jobs to the GlobalThreadPool rather than owning threads directly. The GlobalThreadPool worker threads are only joined during static destruction of `GlobalThreadPool::the_instance`, which races with LeakSanitizer's atexit check. This is a false positive: the memory IS freed when the worker threads eventually exit, but LSAN runs its scan before GlobalThreadPool static destruction completes. This issue is FIPS-specific: the `FIPS_service_indicator_update_state` code path only exists in AWS-LC FIPS builds. Non-FIPS builds (OpenSSL 3.2.1) do not have service indicator tracking. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
AWS-LC FIPS 2.0.0 allocates a per-thread `fips_service_indicator_state` (48 + 24 bytes) on the first FIPS-approved crypto operation via `service_indicator_get()`. This state is freed by a pthread TLS destructor when the thread exits, but in ClickHouse the crypto operations (e.g. SHA-256 for S3 request signing) run on GlobalThreadPool worker threads whose lifetime extends beyond LeakSanitizer's check. This causes LSAN to report a false positive in integration tests that invoke `clickhouse disks` or other CLI tools performing S3 operations. The fix: - Add `tests/integration/helpers/lsan_suppressions.txt` with a suppression for `service_indicator_get` (placed in helpers/ because only `tests/integration/` is mounted into the test runner container) - Configure `cluster.py` to set `LSAN_OPTIONS` with the suppressions file path and copy it into each instance's config directory This issue is FIPS-specific: the `FIPS_service_indicator_update_state` code path only exists in AWS-LC FIPS builds. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This seems to be failing due to a memory leak in |
The previous commit (bf4af17) added LSAN suppressions for integration tests via cluster.py, but stateless tests inherit LSAN_OPTIONS from the Docker image's ENV directive which did not include the suppressions file path. The base Dockerfile already wrote the correct LSAN_OPTIONS (with suppressions path) to /etc/environment (line 38) for services, but the ENV on line 45 — used by non-login shells including the test runner — omitted the suppressions= directive. This caused clickhouse-disks and other CLI tools to abort when LSAN detected the AWS-LC FIPS service_indicator_get allocation. Fix both: - docker/test/base/Dockerfile: add suppressions path to ENV LSAN_OPTIONS, matching the /etc/environment entry - tests/config/lsan_suppressions.txt: add service_indicator_get suppression (this file is installed at /usr/share/clickhouse-test/config/lsan_suppressions.txt in the test Docker image) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The kitware apt key baked into the altinityinfra/test-util base image has expired, causing `apt-get update` to fail with: GPG error: https://apt.kitware.com/ubuntu jammy InRelease: NO_PUBKEY 65ADECD7A7039392 Re-fetch the key before the first apt-get update, same approach as 53bcc4f and 4bc620d for the binary-builder image. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Set LSAN_OPTIONS with the suppressions file path in the stateless test runner (tests/clickhouse-test) instead of modifying the Docker image. This ensures clickhouse-disks and other CLI tools invoked from stateless tests pick up the AWS-LC FIPS service_indicator_get suppression. Also reverts the docker/test/base/Dockerfile changes from the previous two commits (LSAN_OPTIONS ENV and kitware GPG key refresh) to avoid triggering Docker image rebuilds. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
37f10f6 to
50ffdb2
Compare
Fix a config in one test, disable a couple of tests.
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
...
Documentation entry for user-facing changes
...
CI/CD Options
Exclude tests:
Regression jobs to run: