feat(license): improve work text licenses with custom classification#8888
Conversation
- use names from categories as glob patterns to detect category - use text license to detect category - split license text by `/` to match with glob pattern - add tests
| func (s *Scanner) scanPartOfLicenseText(license string) (types.LicenseCategory, bool) { | ||
| for cat, names := range s.categories { | ||
| for _, name := range names { | ||
| match, err := filepath.Match(name, license) |
There was a problem hiding this comment.
Using filepath.Match for matching license texts seems likely to cause unexpected issues. In fact, handling cases where slashes are included is necessary. I initially thought using just glob would be sufficient, but considering there’s no suitable standard library for it, using regular expressions seems like a better approach.
| match, err := filepath.Match(name, license) | |
| match, err := regexp.MatchString(name, license) |
There was a problem hiding this comment.
I was thinking about using regexp.
We have many licenses in categories.
So we will use regex for each license in the category and repeat it for each license found.
I am worried about the resources used.
I think that in most cases users use our categories, so using glob/regexp sgh should be a rare case.
So I suggest starting with glob patterns.
but I don't insist.
If you are sure - I will update the PR
There was a problem hiding this comment.
I agree that glob is better, but I don't think using filepath.Match is a good idea. If you know a standard library for glob, I would be happy with that.
By the way, for text data, I think it's better to add a prefix in the configuration file to identify it as license text clearly. That way, license names/IDs can be skipped, resolving the performance issue.
There was a problem hiding this comment.
all glob pattern parsing libraries without splitting on / are old...
others split strings on / (like doublestart which we use in iac package).
So it looks like we need to use your idea (regex + prefix).
I was thinking about using text:// prefix (as in text field).
But I'm worried that it might be confusing.
maybe regex://.
wdyt?
There was a problem hiding this comment.
regex:// gives users the impression that they can use regex to match license names. So, I vote for text://. We can document it for clarity.
There was a problem hiding this comment.
okat, i will try to write a good description in the docs and hope that users will read it 😄
There was a problem hiding this comment.
Since license text matching is a rather specialized use case, I expect that users requiring such depth of functionality in Trivy will refer to the documentation.
There was a problem hiding this comment.
Updated PR.
Take a look, when you have time.
Co-authored-by: Teppei Fukuda <knqyf263@gmail.com>
Co-authored-by: Teppei Fukuda <knqyf263@gmail.com>
Description
Users can want to add text licenses in config file.
So we need to check text licenses from classification.
Also this PR adds options to use
globpatterns in config files for text licenses.Related issues
Related PRs
Checklist