test: Add diagnostic logging to investigate intermittent electrs failures in CI by joostjager · Pull Request #849 · lightningdevkit/ldk-node

joostjager · 2026-03-26T09:14:22Z

Force Esplora chain source and run on macOS to reproduce the fee rate estimation failure caused by electrsd exposing 0.0.0.0 as the Esplora URL. On macOS, connecting to 0.0.0.0 as a destination address results in ConnectionRefused.

AI tools were used in preparing this commit.

ldk-reviews-bot · 2026-03-26T09:14:24Z

👋 Hi! I see this is a draft PR.
I'll wait to assign reviewers until you mark it as ready for review.
Just convert it out of draft status when you're ready for review!

tnull

"All checks have passed" 🙃

joostjager · 2026-03-26T09:44:25Z

Indeed, hypothesis failed.

Enable electrs stderr output in CI and log connection details at startup. Log errors that were previously silently discarded: the first block_headers_subscribe failure, generate_to_address failures, and ping errors across all polling helpers. This will help diagnose intermittent CI failures where electrs appears to crash or become unreachable mid-test. AI tools were used in preparing this commit.

Run integration tests 10 times in a loop with --nocapture to maximize the chance of hitting the intermittent electrs crash and to capture the new diagnostic logging output. AI tools were used in preparing this commit.

Increase resource pressure to reproduce intermittent electrs failures. Each of the 3 shards runs 5 iterations with 3 concurrent cargo test processes, for 45 total test runs with up to 9 simultaneous processes. AI tools were used in preparing this commit.

Read LDK_NODE_TEST_BASE_PORT env var to offset the listening port range, avoiding collisions when multiple test processes run simultaneously. Assign base ports 20000, 21000, 22000 to the three concurrent processes in the stress-test CI job. AI tools were used in preparing this commit.

Ports from previous iterations may still be in TIME_WAIT when the next iteration starts. Offset the base port by both iteration and process index to ensure no overlap. AI tools were used in preparing this commit.

Check whether the kernel OOM killer is responsible for electrs silently disappearing during tests. Dump relevant dmesg output after any test failure on ubuntu runners. AI tools were used in preparing this commit.

…ation Revert the deterministic port allocation approach from PR lightningdevkit#847 and instead use random ports with a retry loop around node.start(). This avoids collisions with ports allocated by electrsd/corepc_node via get_available_port(), which use the OS ephemeral port allocator and can land in any range. On InvalidSocketAddress, new random ports are selected and the node is rebuilt, up to 5 attempts. AI tools were used in preparing this commit.

When node.start() fails with InvalidSocketAddress and we retry with new random ports, also generate a fresh storage directory. Reusing the same directory causes the second build to fail with ReadFailed/Namespace not found since the first build already wrote data there. AI tools were used in preparing this commit.

Run lsof to identify what is using the port when node.start() fails with a binding error. This helps distinguish between collisions with electrsd/bitcoind, other test processes, or TIME_WAIT leftovers. AI tools were used in preparing this commit.

Read node_b's listening addresses from the node after setup instead of using the pre-retry variable, which may differ if setup_node retried with new ports. Simplify the stress test to run 1 process per shard with 10 iterations instead of 3 concurrent processes. The concurrent processes caused port collisions in code paths outside setup_node that don't have retry logic, which is noise unrelated to the electrs crash we're investigating. AI tools were used in preparing this commit.

Avoid intra-process port collisions between parallel tests by using an atomic counter that increments by 2 for each allocation. The base port is randomized once per process to reduce inter-process collisions. This eliminates the birthday-paradox collisions that occurred when every call independently picked a random port from the range. AI tools were used in preparing this commit.

fetch_update returns the previous value, so the first caller got port 0 instead of the random base. Use compare_exchange for one-time init followed by fetch_add, which correctly returns the base port to the first caller. AI tools were used in preparing this commit.

Restrict the random base port to 10000-30000, which is below the Linux ephemeral port range (32768-60999). This prevents collisions with OS-assigned ports used by electrsd and bitcoind. AI tools were used in preparing this commit.

joostjager force-pushed the repro-fee-estimation branch from 443aa30 to 48a1bb2 Compare March 26, 2026 09:17

tnull reviewed Mar 26, 2026

View reviewed changes

joostjager closed this Mar 26, 2026

joostjager reopened this Mar 26, 2026

joostjager force-pushed the repro-fee-estimation branch from 48a1bb2 to 33927c3 Compare March 26, 2026 14:39

joostjager changed the title ~~ci: Reproduce Esplora 0.0.0.0 connection failure on macOS~~ test: Add diagnostic logging to investigate intermittent electrs failures in CI Mar 26, 2026

joostjager added 12 commits March 26, 2026 16:13

ci: Add stress-test job to reproduce intermittent electrs failures

163569b

Run integration tests 10 times in a loop with --nocapture to maximize the chance of hitting the intermittent electrs crash and to capture the new diagnostic logging output. AI tools were used in preparing this commit.

ci: Use unique base ports per iteration to avoid TIME_WAIT collisions

fc11c64

Ports from previous iterations may still be in TIME_WAIT when the next iteration starts. Offset the base port by both iteration and process index to ensure no overlap. AI tools were used in preparing this commit.

ci: Dump dmesg OOM/kill messages on test failure

1bd1672

Check whether the kernel OOM killer is responsible for electrs silently disappearing during tests. Dump relevant dmesg output after any test failure on ubuntu runners. AI tools were used in preparing this commit.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: Add diagnostic logging to investigate intermittent electrs failures in CI #849

test: Add diagnostic logging to investigate intermittent electrs failures in CI #849
joostjager wants to merge 13 commits intolightningdevkit:mainfrom
joostjager:repro-fee-estimation

joostjager commented Mar 26, 2026

Uh oh!

ldk-reviews-bot commented Mar 26, 2026

Uh oh!

tnull left a comment •

edited

Loading

Uh oh!

joostjager commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

joostjager commented Mar 26, 2026

Uh oh!

ldk-reviews-bot commented Mar 26, 2026

Uh oh!

tnull left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joostjager commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tnull left a comment •

edited

Loading