ci: layered Python base images for cross-matrix dedup by richiejp · Pull Request #9672 · mudler/LocalAI

richiejp · 2026-05-05T10:52:45Z

Experiment to see if creating OS+vendor+language base images can massively reduce CI time and build time locally.

The 234-entry backend matrix runs the same apt-update + GPU SDK install +
Python toolchain bootstrap into N independent registry-cache tags. Factor
that shared work out into a tier-1+2 base image (lang × accel × ubuntu ×
cuda) built once per workflow run, then consumed by every backend that
matches its tuple via BASE_IMAGE_PREBUILT.

The matrix data moves to .github/backend-matrix.yaml so backend.yml can
switch to fromJSON without duplicating the matrix. scripts/changed-backends.js
reads the data file, derives the deduplicated bases-matrix, annotates each
Python entry with the right base-image-prebuilt ref, and runs a collision
check that fails loudly if a future matrix change makes two consumers want
incompatible bases under the same tag-stem.

PR builds tag with -pr so end-to-end validation lives within one PR;
master builds tag without the suffix. The base-images registry cache
parallels the existing per-matrix-entry caches.

Adding a new (accel, cuda) flavour is a backend-matrix.yaml edit; adding
a new language tier is a Dockerfile. recipe + a slim of the
consumer Dockerfile (script auto-detects via .docker/bases/).

10 distinct bases derive from the current 234 entries, replacing the
inline bootstrap that previously ran into ~10 separate cache tags.

Assisted-by: Claude:opus-4-7-1m [Claude Code]

The previous tag scheme pushed to quay.io/go-skynet/localai-base, which required a separate quay repo + a write-permission grant for the CI robot. PR #9672 hit a 401 on push because that grant was missing — the robot can log in but not write to localai-base. ci-cache already exists, the robot already has write access (it writes the buildkit cache there on every backend build), and OCI tags namespace cleanly within a repo. So publish base images to quay.io/go-skynet/ci-cache:base-image-<stem>[-pr<N>]. The `base-image-` prefix doesn't collide with the existing tag prefixes: - cache<tag-suffix> per-backend buildkit cache - cache-localai<tag-suffix> root image buildkit cache - base-<stem> base image's own buildkit cache - base-image-<stem> the published OCI image (new) base_images.yml's compute_ref step and prebuiltRef() in scripts/changed-backends.js are kept in lock-step. Local Makefile tags are unchanged (they're just local docker labels with no remote correlation). Assisted-by: Claude:opus-4-7-1m [Claude Code]

The previous tag scheme pushed to quay.io/go-skynet/localai-base, which required a separate quay repo + a write-permission grant for the CI robot. PR #9672 hit a 401 on push because that grant was missing — the robot can log in but not write to localai-base. ci-cache already exists, the robot already has write access (it writes the buildkit cache there on every backend build), and OCI tags namespace cleanly within a repo. So publish base images to quay.io/go-skynet/ci-cache:base-image-<stem>[-pr<N>]. The `base-image-` prefix doesn't collide with the existing tag prefixes: - cache<tag-suffix> per-backend buildkit cache - cache-localai<tag-suffix> root image buildkit cache - base-<stem> base image's own buildkit cache - base-image-<stem> the published OCI image (new) base_images.yml's compute_ref step and prebuiltRef() in scripts/changed-backends.js are kept in lock-step. Local Makefile tags are unchanged (they're just local docker labels with no remote correlation). Assisted-by: Claude:opus-4-7-1m [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com>

The 234-entry backend matrix runs the same apt-update + GPU SDK install + Python toolchain bootstrap into N independent registry-cache tags. Factor that shared work out into a tier-1+2 base image (lang × accel × ubuntu × cuda) built once per workflow run, then consumed by every backend that matches its tuple via BASE_IMAGE_PREBUILT. The matrix data moves to .github/backend-matrix.yaml so backend.yml can switch to fromJSON without duplicating the matrix. scripts/changed-backends.js reads the data file, derives the deduplicated bases-matrix, annotates each Python entry with the right base-image-prebuilt ref, and runs a collision check that fails loudly if a future matrix change makes two consumers want incompatible bases under the same tag-stem. PR builds tag with -pr<N> so end-to-end validation lives within one PR; master builds tag without the suffix. The base-images registry cache parallels the existing per-matrix-entry caches. Adding a new (accel, cuda) flavour is a backend-matrix.yaml edit; adding a new language tier is a Dockerfile.<lang> recipe + a slim of the consumer Dockerfile (script auto-detects via .docker/bases/). 10 distinct bases derive from the current 234 entries, replacing the inline bootstrap that previously ran into ~10 separate cache tags. Assisted-by: Claude:opus-4-7-1m [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com>

Python's tier-1+2 base image (apt + GPU SDK + lang toolchain) was the only lang previously factored. The remaining 82 matrix entries (62 golang + 9 llama-cpp + 9 turboquant + 1 ik-llama-cpp + 1 rust) still inlined the same bootstrap into per-backend cache tags. Add .docker/bases/Dockerfile.{golang,cpp,rust} mirroring Dockerfile.python's GPU stack, with the lang-specific tail at the bottom (Go + protoc + grpc tooling; protoc + cmake + GRPC; rustup + audio dev libs respectively). Slim the five consumer Dockerfiles to FROM ${BASE_IMAGE_PREBUILT} + the per-backend COPY/make. The C++ trio (llama-cpp, ik-llama-cpp, turboquant) only differ in their make targets, so langOf() in scripts/changed-backends.js remaps all three Dockerfile suffixes to the shared 'cpp' base. That collapses 17 would-be distinct bases to 8. langTriggerSelector and baseTriggerFiles are extended so PRs touching the new recipes fan out canaries; the .docker/bases/ auto-detection picks up the new langs without further script changes. Makefile: add docker-build-{python,golang,cpp,rust}-base targets and a local-base-tag/local-base-target macro pair so each backend's docker-build-X chains through the right base. The previous python-only prereq is now a generic per-lang dispatch. Total distinct bases for the full 234-entry matrix: 29 (was 9 with only python factored). The C++ base also absorbs the previously per-consumer GRPC build stage, removing the dominant cost from the llama-cpp / ik-llama-cpp / turboquant rebuild paths. Assisted-by: Claude:opus-4-7-1m [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com>

The previous tag scheme pushed to quay.io/go-skynet/localai-base, which required a separate quay repo + a write-permission grant for the CI robot. PR #9672 hit a 401 on push because that grant was missing — the robot can log in but not write to localai-base. ci-cache already exists, the robot already has write access (it writes the buildkit cache there on every backend build), and OCI tags namespace cleanly within a repo. So publish base images to quay.io/go-skynet/ci-cache:base-image-<stem>[-pr<N>]. The `base-image-` prefix doesn't collide with the existing tag prefixes: - cache<tag-suffix> per-backend buildkit cache - cache-localai<tag-suffix> root image buildkit cache - base-<stem> base image's own buildkit cache - base-image-<stem> the published OCI image (new) base_images.yml's compute_ref step and prebuiltRef() in scripts/changed-backends.js are kept in lock-step. Local Makefile tags are unchanged (they're just local docker labels with no remote correlation). Assisted-by: Claude:opus-4-7-1m [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com>

richiejp · 2026-05-07T08:37:35Z

This doesn't solve the problem. Most of the time is spent doing backend specific stuff.

Per-run saving (steady state, after bases are cached):

~17 cpp consumers × 5-18m saved each ≈ 150-200 min of CPU time across the matrix
golang/python/rust consumers: near-zero saving (their inlined apt+toolchain was small to begin with)

Cost we paid for it:

29 bases × ~36m mean = 1035 min CPU time, once, when bases change
cpp-vulkan-2404 (5h17m) and cpp-cpu-2404 (4h33m) dominate — both because of QEMU multi-arch + GRPC compile

Wall-clock impact (what users see):

The cpp matrix is gated on its base. The slowest cpp base (vulkan, 5h17m) becomes the new critical path for vulkan consumers when the base changes. Master parallelizes the GRPC
build inside each consumer, so its critical path is just the slowest single consumer (~5h25m for vulkan-llama-cpp). Roughly a wash on cpp-vulkan; small win elsewhere.
The 4 still-queued llama-cpp jobs are queued on bigger-runner — that's runner contention, nothing to do with this PR.

Bottom line: the saving is ~10-15% of cpp consumer time, paid for by a 17h one-time base build. It's a real but modest win, dominated by the GRPC step. The big ROI would come from
moving more work into bases (e.g. baking the protogen-go step or backend-specific deps that don't change often) — not from the layering itself.

richiejp force-pushed the ci/layered-base-images branch from 7481e52 to 76eab55 Compare May 6, 2026 15:08

richiejp added 3 commits May 6, 2026 16:10

richiejp force-pushed the ci/layered-base-images branch from 76eab55 to 9d42a16 Compare May 6, 2026 20:51

richiejp closed this May 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ci: layered Python base images for cross-matrix dedup#9672

ci: layered Python base images for cross-matrix dedup#9672
richiejp wants to merge 3 commits intomasterfrom
ci/layered-base-images

richiejp commented May 5, 2026

Uh oh!

richiejp commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

richiejp commented May 5, 2026

Uh oh!

richiejp commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant