Skip to content

fix: pre-set ECR Lambda pull policy to prevent concurrent SetRepositoryPolicy race condition (#8190)#8945

Open
abhishektang wants to merge 3 commits intoaws:developfrom
abhishektang:feat/resolve-image-repos-package
Open

fix: pre-set ECR Lambda pull policy to prevent concurrent SetRepositoryPolicy race condition (#8190)#8945
abhishektang wants to merge 3 commits intoaws:developfrom
abhishektang:feat/resolve-image-repos-package

Conversation

@abhishektang
Copy link
Copy Markdown
Contributor

Problem

Fixes #8190.

When deploying a SAM application with multiple Lambda functions referencing the same or different ECR repositories, CloudFormation calls ecr:SetRepositoryPolicy concurrently — once per Lambda. Each call overwrites the existing policy rather than merging it, so whichever write lands last wins and earlier Lambdas lose access, resulting in a 403 on image pull.

Solution

Before creating the changeset, SAM CLI now pre-sets a stable SAMCliLambdaECRAccess policy SID on every ECR repository referenced by the deployment (via ImageRepository / ImageRepositories). Because the SID is deterministic, repeated calls are idempotent and safe.

Three private helpers are added to samcli/commands/deploy/deploy_context.py:

Helper Responsibility
_extract_ecr_repo_name Parse repo name from a full ECR URI
_ensure_ecr_lambda_pull_policy Collect all unique repo names and call _upsert_ecr_lambda_policy for each
_upsert_ecr_lambda_policy Idempotently set or merge the SAMCliLambdaECRAccess statement; retries on concurrent SetRepositoryPolicy conflicts; skips gracefully on AccessDeniedException

_ensure_ecr_lambda_pull_policy is called from DeployContext.run() immediately before create_and_wait_for_changeset.

Changes

  • samcli/commands/deploy/deploy_context.py — ECR policy helpers + call site
  • samcli/commands/deploy/exceptions.py — new ECRPolicySetError(UserException)
  • tests/unit/commands/deploy/test_ecr_policy_helpers.py — 21 new unit tests covering all branches of the three helpers
  • tests/unit/commands/deploy/test_deploy_context.py — patch _ensure_ecr_lambda_pull_policy at class level to isolate existing deploy tests from ECR side-effects
  • tests/unit/commands/_utils/test_template.py — fix test_updates_imageuri_when_pointing_to_local_archive: replace fragile CWD-relative file creation (which caused a PermissionError on macOS) with a pathlib.Path.is_file mock

Testing

pytest tests/unit/commands/deploy/test_deploy_context.py \
       tests/unit/commands/deploy/test_ecr_policy_helpers.py -v
# 35 passed

pytest --cov samcli --cov schema --cov-fail-under 94 tests/unit \
       --ignore=tests/unit/lib/cfn_language_extensions \
       --cov-config=.coveragerc_no_lang_ext
# 7479 passed, 0 failed, 94.05% coverage

Ruff and mypy also pass (mypy pre-existing errors are unrelated to this change).

Implements issue aws#3888 to auto-create ECR repositories during
packaging, matching sam deploy behavior. Enables package-once,
deploy-many CI/CD workflows with managed ECR repos.

- Add --resolve-image-repos CLI option to sam package
- Call sync_ecr_stack() to auto-create managed ECR repositories
- Add validation requiring --s3-bucket when flag is used
- Add conflict detection with --image-repositories
- Add unit tests for validation logic

Closes aws#3888
…ryPolicy race condition (aws#8190)

- Add _ensure_ecr_lambda_pull_policy() called before changeset creation to
  pre-set a stable SAMCliLambdaECRAccess SID on all referenced ECR repos.
- Add _upsert_ecr_lambda_policy() to idempotently set/merge the policy,
  handling AccessDeniedException gracefully and retrying on concurrent
  SetRepositoryPolicy conflicts (ResourceInUseException).
- Add ECRPolicySetError exception for unrecoverable policy failures.
- Add 21 unit tests in test_ecr_policy_helpers.py covering all branches.
- Patch _ensure_ecr_lambda_pull_policy in TestSamDeployCommand to isolate
  deploy-flow tests from ECR side-effects.
- Fix test_updates_imageuri_when_pointing_to_local_archive: replace
  fragile CWD-relative file creation with pathlib.Path.is_file mock.
@abhishektang abhishektang requested a review from a team as a code owner May 4, 2026 01:22
@github-actions github-actions Bot added area/deploy sam deploy command pr/external stage/needs-triage Automatically applied to new issues and PRs, indicating they haven't been looked at. labels May 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/deploy sam deploy command pr/external stage/needs-triage Automatically applied to new issues and PRs, indicating they haven't been looked at.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: Docker image-based Lambda failures

1 participant