fix: migrate from *.list to *.md5sums files for dpkg#9131
fix: migrate from *.list to *.md5sums files for dpkg#9131DmitriyLewen merged 6 commits intoaquasecurity:mainfrom
*.list to *.md5sums files for dpkg#9131Conversation
*.list to *.md5sums files*.list to *.md5sums files for dpkg
| "PkgIdentifier": { | ||
| "PURL": "pkg:deb/debian/libidn2-0@2.0.5-1?arch=amd64\u0026distro=debian-10.1", | ||
| "UID": "473f5eb9e3d4a2f2" | ||
| "UID": "24f9b08969c58720" |
There was a problem hiding this comment.
I investigated this case:
list file contains 2 files:
➜ docker run -it --rm --platform=linux/amd64 debian:12 cat var/lib/dpkg/info/libidn2-0\:amd64.list | grep libidn2.so.0
/usr/lib/x86_64-linux-gnu/libidn2.so.0.3.8
/usr/lib/x86_64-linux-gnu/libidn2.so.0But md5sums file contains only one file:
➜ docker run -it --rm --platform=linux/amd64 debian:12 cat var/lib/dpkg/info/libidn2-0\:amd64.md5sums | grep libidn2.so.0
c745ba8b8dfd28a2aa7efb3081ca5eed usr/lib/x86_64-linux-gnu/libidn2.so.0.3.8libidn2.so.0 is link to libidn2.so.0.3.8 file:
➜ docker run -it --rm --platform=linux/amd64 debian:12 ls -hl /usr/lib/x86_64-linux-gnu | grep libidn2.so.0
lrwxrwxrwx 1 root root 16 Aug 28 2022 libidn2.so.0 -> libidn2.so.0.3.8
-rw-r--r-- 1 root root 195K Aug 28 2022 libidn2.so.0.3.8That is why md5sums doesn't have this file.
Trivy doesn't currently support links - #5356
So this shouldn't be a problem.
There was a problem hiding this comment.
Can you write a comment to the source code somewhere so that we can recall it when we add support for symlinks?
There was a problem hiding this comment.
Pull Request Overview
This PR refactors the dpkg analyzer to read *.md5sums files instead of legacy *.list files, improving compatibility with distroless images.
- Replace
.listparsing logic with.md5sumsparsing in code and tests - Update
RequiredandisMd5SumsFileto detect only.md5sumsfiles - Refresh testdata and golden outputs to use
tar.md5sumsand new UIDs
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/fanal/analyzer/pkg/dpkg/dpkg.go | Switch parsing from .list to .md5sums, implement parseDpkgMd5sums, update file-detection logic |
| pkg/fanal/analyzer/pkg/dpkg/dpkg_test.go | Adapt tests to .md5sums files, update expected installed files and test cases |
| pkg/fanal/analyzer/pkg/dpkg/testdata/tar.md5sums | Add new md5sums-format testdata |
| pkg/fanal/analyzer/pkg/dpkg/testdata/tar.list | Remove obsolete .list testdata |
| integration/testdata/debian-buster-ignore-unfixed.json.golden | Update golden UID for package identifiers |
Comments suppressed due to low confidence (2)
pkg/fanal/analyzer/pkg/dpkg/dpkg.go:127
- [nitpick] Consider renaming the variable
fileto something likefilePathfor clarity, since it represents the extracted file path from the md5sums line.
_, file, ok := strings.Cut(current, " ")
pkg/fanal/analyzer/pkg/dpkg/dpkg.go:119
- Add unit tests for malformed md5sums lines (e.g. missing delimiter) to verify that the parser returns the expected error.
func (a dpkgAnalyzer) parseDpkgMd5sums(scanner *bufio.Scanner) ([]string, error) {
| _, file, ok := strings.Cut(current, " ") | ||
| if !ok { | ||
| return nil, xerrors.Errorf("invalid md5sums line format: %s", current) | ||
| } |
There was a problem hiding this comment.
nit: I was concerned that there might be cases where there are three spaces instead of two, or where tabs are used, so I thought it might be better to implement it in a way that wouldn’t be affected by such differences. However, if it’s guaranteed that it will always be two spaces, I think the current implementation is fine.
| _, file, ok := strings.Cut(current, " ") | |
| if !ok { | |
| return nil, xerrors.Errorf("invalid md5sums line format: %s", current) | |
| } | |
| ss := strings.Fields(current) | |
| if len(ss) != 2 { | |
| return nil, xerrors.Errorf("invalid md5sums line format: %s", current) | |
| } |
There was a problem hiding this comment.
even old versions (I checked it in Ubuntu 12.04) use this format
also the documentation clearly states about 2 spaces - https://man7.org/linux/man-pages/man5/deb-md5sums.5.html
so I think we can leave it like this and fix it if there is feedback from users (analyze their case first)
There was a problem hiding this comment.
I thought it might be safer to make the change, as it would handle more cases unless the current code has a clear advantage in terms of readability or lines of code. But we can leave it as I don't stick to that.
| "PkgIdentifier": { | ||
| "PURL": "pkg:deb/debian/libidn2-0@2.0.5-1?arch=amd64\u0026distro=debian-10.1", | ||
| "UID": "473f5eb9e3d4a2f2" | ||
| "UID": "24f9b08969c58720" |
There was a problem hiding this comment.
Can you write a comment to the source code somewhere so that we can recall it when we add support for symlinks?
knqyf263
left a comment
There was a problem hiding this comment.
Since this affects Trivy's output (especially Distroless), it's not refactoring. The prefix should be feat or fix.
*.list to *.md5sums files for dpkg*.list to *.md5sums files for dpkg
|
yes. Thanks. Updated |
This PR contains the following updates: | Package | Update | Change | |---|---|---| | [mirror.gcr.io/aquasec/trivy](https://www.aquasec.com/products/trivy/) ([source](https://github.com/aquasecurity/trivy)) | minor | `0.64.1` -> `0.65.0` | --- ### Release Notes <details> <summary>aquasecurity/trivy (mirror.gcr.io/aquasec/trivy)</summary> ### [`v0.65.0`](https://github.com/aquasecurity/trivy/blob/HEAD/CHANGELOG.md#0650-2025-07-30) [Compare Source](aquasecurity/trivy@v0.64.1...v0.65.0) ##### Features - add graceful shutdown with signal handling ([#​9242](aquasecurity/trivy#9242)) ([2c05882](aquasecurity/trivy@2c05882)) - add HTTP request/response tracing support ([#​9125](aquasecurity/trivy#9125)) ([aa5b32a](aquasecurity/trivy@aa5b32a)) - **alma:** add AlmaLinux 10 support ([#​9207](aquasecurity/trivy#9207)) ([861d51e](aquasecurity/trivy@861d51e)) - **flag:** add schema validation for `--server` flag ([#​9270](aquasecurity/trivy#9270)) ([ed4640e](aquasecurity/trivy@ed4640e)) - **image:** add Docker context resolution ([#​9166](aquasecurity/trivy#9166)) ([99cd4e7](aquasecurity/trivy@99cd4e7)) - **license:** observe pkg types option in license scanner ([#​9091](aquasecurity/trivy#9091)) ([d44af8c](aquasecurity/trivy@d44af8c)) - **misconf:** add private ip google access attribute to subnetwork ([#​9199](aquasecurity/trivy#9199)) ([263845c](aquasecurity/trivy@263845c)) - **misconf:** added logging and versioning to the gcp storage bucket ([#​9226](aquasecurity/trivy#9226)) ([110f80e](aquasecurity/trivy@110f80e)) - **repo:** add git repository metadata to reports ([#​9252](aquasecurity/trivy#9252)) ([f4b2cf1](aquasecurity/trivy@f4b2cf1)) - **report:** add CVSS vectors in sarif report ([#​9157](aquasecurity/trivy#9157)) ([60723e6](aquasecurity/trivy@60723e6)) - **sbom:** add SHA-512 hash support for CycloneDX SBOM ([#​9126](aquasecurity/trivy#9126)) ([12d6706](aquasecurity/trivy@12d6706)) ##### Bug Fixes - **alma:** parse epochs from rpmqa file ([#​9101](aquasecurity/trivy#9101)) ([82db2fc](aquasecurity/trivy@82db2fc)) - also check `filepath` when removing duplicate packages ([#​9142](aquasecurity/trivy#9142)) ([4d10a81](aquasecurity/trivy@4d10a81)) - **aws:** update amazon linux 2 EOL date ([#​9176](aquasecurity/trivy#9176)) ([0ecfed6](aquasecurity/trivy@0ecfed6)) - **cli:** Add more non-sensitive flags to telemetry ([#​9110](aquasecurity/trivy#9110)) ([7041a39](aquasecurity/trivy@7041a39)) - **cli:** ensure correct command is picked by telemetry ([#​9260](aquasecurity/trivy#9260)) ([b4ad00f](aquasecurity/trivy@b4ad00f)) - **cli:** panic: attempt to get os.Args\[1] when len(os.Args) < 2 ([#​9206](aquasecurity/trivy#9206)) ([adfa879](aquasecurity/trivy@adfa879)) - **license:** add missed `GFDL-NIV-1.1` and `GFDL-NIV-1.2` into Trivy mapping ([#​9116](aquasecurity/trivy#9116)) ([a692f29](aquasecurity/trivy@a692f29)) - **license:** handle WITH operator for `LaxSplitLicenses` ([#​9232](aquasecurity/trivy#9232)) ([b4193d0](aquasecurity/trivy@b4193d0)) - migrate from `*.list` to `*.md5sums` files for `dpkg` ([#​9131](aquasecurity/trivy#9131)) ([f224de3](aquasecurity/trivy@f224de3)) - **misconf:** correctly adapt azure storage account ([#​9138](aquasecurity/trivy#9138)) ([51aa022](aquasecurity/trivy@51aa022)) - **misconf:** correctly parse empty port ranges in google\_compute\_firewall ([#​9237](aquasecurity/trivy#9237)) ([77bab7b](aquasecurity/trivy@77bab7b)) - **misconf:** fix log bucket in schema ([#​9235](aquasecurity/trivy#9235)) ([7ebc129](aquasecurity/trivy@7ebc129)) - **misconf:** skip rewriting expr if attr is nil ([#​9113](aquasecurity/trivy#9113)) ([42ccd3d](aquasecurity/trivy@42ccd3d)) - **nodejs:** don't use prerelease logic for compare npm constraints ([#​9208](aquasecurity/trivy#9208)) ([fe96436](aquasecurity/trivy@fe96436)) - prevent graceful shutdown message on normal exit ([#​9244](aquasecurity/trivy#9244)) ([6095984](aquasecurity/trivy@6095984)) - **rootio:** check full version to detect `root.io` packages ([#​9117](aquasecurity/trivy#9117)) ([c2ddd44](aquasecurity/trivy@c2ddd44)) - **rootio:** fix severity selection ([#​9181](aquasecurity/trivy#9181)) ([6fafbeb](aquasecurity/trivy@6fafbeb)) - **sbom:** merge in-graph and out-of-graph OS packages in scan results ([#​9194](aquasecurity/trivy#9194)) ([aa944cc](aquasecurity/trivy@aa944cc)) - **sbom:** use correct field for licenses in CycloneDX reports ([#​9057](aquasecurity/trivy#9057)) ([143da88](aquasecurity/trivy@143da88)) - **secret:** add UTF-8 validation in secret scanner to prevent protobuf marshalling errors ([#​9253](aquasecurity/trivy#9253)) ([54832a7](aquasecurity/trivy@54832a7)) - **secret:** fix line numbers for multiple-line secrets ([#​9104](aquasecurity/trivy#9104)) ([e579746](aquasecurity/trivy@e579746)) - **server:** add HTTP transport setup to server mode ([#​9217](aquasecurity/trivy#9217)) ([1163b04](aquasecurity/trivy@1163b04)) - supporting .egg-info/METADATA in python.Packaging analyzer ([#​9151](aquasecurity/trivy#9151)) ([e306e2d](aquasecurity/trivy@e306e2d)) - **terraform:** `for_each` on a map returns a resource for every key ([#​9156](aquasecurity/trivy#9156)) ([153318f](aquasecurity/trivy@153318f)) </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR is behind base branch, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0MS4xLjMiLCJ1cGRhdGVkSW5WZXIiOiI0MS4xLjMiLCJ0YXJnZXRCcmFuY2giOiJtYWluIiwibGFiZWxzIjpbImltYWdlIl19--> Reviewed-on: https://gitea.alexlebens.dev/alexlebens/infrastructure/pulls/1073 Co-authored-by: Renovate Bot <renovate-bot@alexlebens.net> Co-committed-by: Renovate Bot <renovate-bot@alexlebens.net>
Description
We currently use
*.listfiles to detect files ofdpkgpackages.But
distrolessimages don't have this file (See #9046).So we migrate from to
**/info/*.md5sums(**/status.d/*.md5sumsfor distroless) files.Example
before:
after:
Related issues
/var/lib/dpkg/*/<package>.md5sumsto find list of system files #9046Checklist