Skip to content

feat(license): improve work text licenses with custom classification#8888

Merged
knqyf263 merged 9 commits intoaquasecurity:mainfrom
DmitriyLewen:echancement/text-licenses
May 22, 2025
Merged

feat(license): improve work text licenses with custom classification#8888
knqyf263 merged 9 commits intoaquasecurity:mainfrom
DmitriyLewen:echancement/text-licenses

Conversation

@DmitriyLewen
Copy link
Copy Markdown
Contributor

Description

Users can want to add text licenses in config file.
So we need to check text licenses from classification.

Also this PR adds options to use glob patterns in config files for text licenses.

Related issues

Related PRs

Checklist

  • I've read the guidelines for contributing to this repository.
  • I've followed the conventions in the PR title.
  • I've added tests that prove my fix is effective or that my feature works.
  • I've updated the documentation with the relevant information (if needed).
  • I've added usage information (if the PR introduces new options)
  • I've included a "before" and "after" example to the description (if the PR is a user interface change).

- use names from categories as glob patterns to detect category
- use text license to detect category
- split license text by `/` to match with glob pattern
- add tests
Comment thread pkg/licensing/scanner.go Outdated
func (s *Scanner) scanPartOfLicenseText(license string) (types.LicenseCategory, bool) {
for cat, names := range s.categories {
for _, name := range names {
match, err := filepath.Match(name, license)
Copy link
Copy Markdown
Collaborator

@knqyf263 knqyf263 May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using filepath.Match for matching license texts seems likely to cause unexpected issues. In fact, handling cases where slashes are included is necessary. I initially thought using just glob would be sufficient, but considering there’s no suitable standard library for it, using regular expressions seems like a better approach.

Suggested change
match, err := filepath.Match(name, license)
match, err := regexp.MatchString(name, license)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about using regexp.

We have many licenses in categories.
So we will use regex for each license in the category and repeat it for each license found.
I am worried about the resources used.

I think that in most cases users use our categories, so using glob/regexp sgh should be a rare case.
So I suggest starting with glob patterns.

but I don't insist.
If you are sure - I will update the PR

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that glob is better, but I don't think using filepath.Match is a good idea. If you know a standard library for glob, I would be happy with that.

By the way, for text data, I think it's better to add a prefix in the configuration file to identify it as license text clearly. That way, license names/IDs can be skipped, resolving the performance issue.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all glob pattern parsing libraries without splitting on / are old...
others split strings on / (like doublestart which we use in iac package).

So it looks like we need to use your idea (regex + prefix).

I was thinking about using text:// prefix (as in text field).
But I'm worried that it might be confusing.

maybe regex://.
wdyt?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regex:// gives users the impression that they can use regex to match license names. So, I vote for text://. We can document it for clarity.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okat, i will try to write a good description in the docs and hope that users will read it 😄

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since license text matching is a rather specialized use case, I expect that users requiring such depth of functionality in Trivy will refer to the documentation.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated PR.
Take a look, when you have time.

Comment thread docs/docs/scanner/license.md Outdated
Comment thread pkg/licensing/scanner.go Outdated
DmitriyLewen and others added 2 commits May 22, 2025 16:08
Co-authored-by: Teppei Fukuda <knqyf263@gmail.com>
Co-authored-by: Teppei Fukuda <knqyf263@gmail.com>
@knqyf263 knqyf263 added this pull request to the merge queue May 22, 2025
Merged via the queue into aquasecurity:main with commit ee52230 May 22, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

enhancement(license): improve work with custom classification of licenses from config file

2 participants