Skip to content

fix: deflake //rs/tests/consensus/tecdsa:tecdsa_key_rotation_test#9674

Closed
basvandijk wants to merge 1 commit intomasterfrom
ai/deflake-tecdsa_key_rotation_test-2026-03-30
Closed

fix: deflake //rs/tests/consensus/tecdsa:tecdsa_key_rotation_test#9674
basvandijk wants to merge 1 commit intomasterfrom
ai/deflake-tecdsa_key_rotation_test-2026-03-30

Conversation

@basvandijk
Copy link
Copy Markdown
Collaborator

@basvandijk basvandijk commented Mar 30, 2026

Problem

The test //rs/tests/consensus/tecdsa:tecdsa_key_rotation_test was flaky, timing out at 600s. The stash was stuck at 5 instead of draining to 0.

Root Cause

The test set max_parallel_pre_signature_transcripts_in_creation=0 to stop pre-signature creation, then expected the stash to drain to 0 after key rotation. However, this created a race condition:

  1. If a key rotation occurred before the config change propagated to all nodes, new pre-signatures with the rotated key transcript could be created and fill the stash.
  2. With max_parallel=0, no new pre-signatures would ever be delivered.
  3. The execution layer purges stale pre-signatures (Mechanism A) only when it receives a delivery with a different key transcript ID. Since no deliveries happened, the purge never fired.
  4. The stash remained at 5 indefinitely, causing the timeout.

Fix

  • Set max_parallel_pre_signature_transcripts_in_creation=1 (instead of 0) to keep creation active at a low rate
  • Set pre_signatures_to_create_in_advance=1 (instead of 5) to reduce the target stash size
  • Await a final stash size of 1 (instead of 0) after rotation

This ensures that after key rotation, new pre-signatures with the rotated key transcript are created and delivered, triggering the purge of stale entries. The test still validates that key rotation works correctly and the stash stabilizes at the configured target.

Validation

Both tecdsa_key_rotation_test and tecdsa_key_rotation_test_head_nns pass locally.

The test was flaky because setting max_parallel_pre_signature_transcripts_in_creation
to 0 could create a race condition: if a key rotation occurred before the config
change propagated, the execution layer's stash would contain pre-signatures with
the new key transcript, but no new pre-signatures would ever be delivered (since
creation was disabled). This prevented the stale-key-transcript purge mechanism
from firing, leaving the stash stuck at 5 instead of draining to 0.

Fix by setting max_parallel to 1 and pre_signatures_to_create_in_advance to 1.
This keeps creation active so new pre-signatures with the rotated key transcript
are delivered, triggering the purge of stale entries. The test now awaits a stash
size of 1 (matching the new target) instead of 0.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Deflakes the //rs/tests/consensus/tecdsa:tecdsa_key_rotation_test by avoiding a configuration state that can prevent post-rotation pre-signature deliveries (and therefore prevent stale-stash purging), which previously could cause the test to hang until timeout.

Changes:

  • Adjusts chain-key pre-signature config during the test to keep creation enabled at a low rate (max_parallel=1) and reduce the target stash size (pre_signatures_to_create_in_advance=1).
  • Updates the post-rotation assertion to expect a stash size of 1 instead of 0.
  • Updates inline comments to explain the rationale behind the new configuration.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@@ -56,13 +56,15 @@ fn test(test_env: TestEnv) {
.await;
// Stash size should be 5 before the roation
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in comment: "roation" should be "rotation" (also consider updating the comment to reflect the current test intent, since the stash size is later reduced).

Suggested change
// Stash size should be 5 before the roation
// Stash size should initially be 5 before key rotation

Copilot uses AI. Check for mistakes.
Comment on lines 62 to +66
set_pre_signature_stash_size(
&governance,
app_subnet.subnet_id,
key_ids.as_slice(),
/* max_parallel_pre_signatures */ 0,
/* max_stash_size */ 5,
/* max_parallel_pre_signatures */ 1,
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After updating the chain key config via set_pre_signature_stash_size, the test proceeds without waiting for the new target stash size/config to take effect. This can make the test more timing-dependent (config propagation vs. key rotation). Consider awaiting the stash size to reach 1 immediately after the proposal (similar to pre_signature_stash_management_test.rs) before continuing with the rotation assertions.

Copilot uses AI. Check for mistakes.
@basvandijk basvandijk closed this Mar 31, 2026
@basvandijk basvandijk deleted the ai/deflake-tecdsa_key_rotation_test-2026-03-30 branch March 31, 2026 12:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants