Skip to content

MDEV-30124: Add format validation for JSON Schema#4750

Draft
varundeepsaini wants to merge 1 commit intoMariaDB:mainfrom
varundeepsaini:MDEV-30124-json-schema-format-validation
Draft

MDEV-30124: Add format validation for JSON Schema#4750
varundeepsaini wants to merge 1 commit intoMariaDB:mainfrom
varundeepsaini:MDEV-30124-json-schema-format-validation

Conversation

@varundeepsaini
Copy link
Copy Markdown

Summary

Implements optional format validation for the JSON_SCHEMA_VALID() function per JSON Schema Draft 2020-12.

  • Adds a new session variable json_schema_format_validation (default OFF)
  • When OFF, the format keyword is treated as annotation only (existing behavior)
  • When ON, validates strings against 18 format types: date-time, date, time, duration, email, idn-email, hostname, idn-hostname, ipv4, ipv6, uri, uri-reference, iri, iri-reference, uuid, json-pointer, relative-json-pointer, regex
  • Unknown format values always pass validation

Test plan

  • Existing format annotation tests continue to pass unchanged
  • New tests cover all 18 formats with valid/invalid inputs when json_schema_format_validation=ON
  • Verified non-string types always pass regardless of format
  • Verified unknown format values are treated as annotation

@varundeepsaini varundeepsaini marked this pull request as draft March 7, 2026 05:22
@varundeepsaini varundeepsaini force-pushed the MDEV-30124-json-schema-format-validation branch 3 times, most recently from 293b984 to e5142f5 Compare March 7, 2026 08:22
@varundeepsaini varundeepsaini marked this pull request as ready for review March 7, 2026 09:34
@varundeepsaini varundeepsaini force-pushed the MDEV-30124-json-schema-format-validation branch 2 times, most recently from 5a6ab32 to 602a323 Compare March 7, 2026 16:45
@gkodinov gkodinov added the External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements. label Mar 9, 2026
Copy link
Copy Markdown
Member

@gkodinov gkodinov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution! This is a preliminary review.

In general I would strive to reuse as much of the existing infrastructure in the server as possible: charset handling, parsers for various data types, 3d party libraries etc.

The declarations of the formats are far from simple: some have quoting, some escaping etc.

I've jotted down some of the issues that I find at a first glance. It looks like this can benefit from some sort of standardized testing of the parser too. I'm sure test sets exist for most of these parsers.

Comment thread sql/json_schema.cc Outdated
Comment thread sql/json_schema.cc Outdated
Comment thread sql/json_schema.cc Outdated
Comment thread sql/json_schema.cc
Comment thread sql/json_schema.cc
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Mar 30, 2026

CLA assistant check
All committers have signed the CLA.

@varundeepsaini varundeepsaini force-pushed the MDEV-30124-json-schema-format-validation branch from 602a323 to 1524d38 Compare April 1, 2026 11:50
@varundeepsaini varundeepsaini force-pushed the MDEV-30124-json-schema-format-validation branch from 1524d38 to b88e9c4 Compare April 21, 2026 09:53
@vuvova vuvova marked this pull request as draft April 21, 2026 09:54
@vuvova
Copy link
Copy Markdown
Member

vuvova commented Apr 21, 2026

this is GSoC project, as far as I remember, let's change it to draft until at least the coding period starts

@varundeepsaini
Copy link
Copy Markdown
Author

@vuvova
yeah my bad, this PR was raised before I drafted the proposal and got to know about MDEV-30219 (the broader ticket)

JSON_SCHEMA_VALID() now validates the format keyword per JSON Schema
Draft 2020-12 when the new session variable json_schema_format_validation
is ON. By default it stays an annotation, matching the previous behaviour.

The 18 Draft 2020-12 formats are supported: date-time, date, time,
duration, email, idn-email, hostname, idn-hostname, ipv4, ipv6, uri,
uri-reference, iri, iri-reference, uuid, json-pointer,
relative-json-pointer, regex.

Each format has a small syntactic validator in sql/json_schema.cc.
IP addresses are parsed with inet_pton, regex with pcre2_compile, and
the rest with hand-written checks that follow the relevant RFC grammar.
@varundeepsaini varundeepsaini force-pushed the MDEV-30124-json-schema-format-validation branch from b88e9c4 to ceef2ab Compare April 21, 2026 10:29
@vuvova
Copy link
Copy Markdown
Member

vuvova commented Apr 21, 2026

@varundeepsaini no problem, you can still work on this draft if you'd like. I just don't think we can promise anything until we know the status of your GSoC proposal, which depends on Google's decision.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements.

Development

Successfully merging this pull request may close these issues.

4 participants