MDEV-30124: Add format validation for JSON Schema#4750
MDEV-30124: Add format validation for JSON Schema#4750varundeepsaini wants to merge 1 commit intoMariaDB:mainfrom
Conversation
293b984 to
e5142f5
Compare
5a6ab32 to
602a323
Compare
gkodinov
left a comment
There was a problem hiding this comment.
Thank you for your contribution! This is a preliminary review.
In general I would strive to reuse as much of the existing infrastructure in the server as possible: charset handling, parsers for various data types, 3d party libraries etc.
The declarations of the formats are far from simple: some have quoting, some escaping etc.
I've jotted down some of the issues that I find at a first glance. It looks like this can benefit from some sort of standardized testing of the parser too. I'm sure test sets exist for most of these parsers.
602a323 to
1524d38
Compare
1524d38 to
b88e9c4
Compare
|
this is GSoC project, as far as I remember, let's change it to draft until at least the coding period starts |
|
@vuvova |
JSON_SCHEMA_VALID() now validates the format keyword per JSON Schema Draft 2020-12 when the new session variable json_schema_format_validation is ON. By default it stays an annotation, matching the previous behaviour. The 18 Draft 2020-12 formats are supported: date-time, date, time, duration, email, idn-email, hostname, idn-hostname, ipv4, ipv6, uri, uri-reference, iri, iri-reference, uuid, json-pointer, relative-json-pointer, regex. Each format has a small syntactic validator in sql/json_schema.cc. IP addresses are parsed with inet_pton, regex with pcre2_compile, and the rest with hand-written checks that follow the relevant RFC grammar.
b88e9c4 to
ceef2ab
Compare
|
@varundeepsaini no problem, you can still work on this draft if you'd like. I just don't think we can promise anything until we know the status of your GSoC proposal, which depends on Google's decision. |
Summary
Implements optional format validation for the
JSON_SCHEMA_VALID()function per JSON Schema Draft 2020-12.json_schema_format_validation(defaultOFF)OFF, theformatkeyword is treated as annotation only (existing behavior)ON, validates strings against 18 format types:date-time,date,time,duration,email,idn-email,hostname,idn-hostname,ipv4,ipv6,uri,uri-reference,iri,iri-reference,uuid,json-pointer,relative-json-pointer,regexTest plan
json_schema_format_validation=ON