feat(Datastream): Add SQL Server (MSSQL) source support#3580
feat(Datastream): Add SQL Server (MSSQL) source support#3580pabloqc wants to merge 6 commits intoGoogleCloudPlatform:mainfrom
Conversation
Add full SQL Server CDC support to the Datastream-to-BigQuery pipeline: - SQL Server metadata extraction (schema, lsn, tx_id) in Avro and JSON paths - SQL Server sort key definitions for BigQuery MERGE operations - SQL Server schema discovery via Datastream API (table/column discovery, primary key extraction, SQL Server-to-BigQuery type conversion) - Add _metadata_lsn to BigQuery default staging table schema - Update template documentation to list SQL Server as supported source Also fix a pre-existing bug where _metadata_lsn was incorrectly populated from the "database" field instead of the "lsn" field in the JSON format path for both PostgreSQL and SQL Server sources.
…thods Add 16 new tests covering: - convertSqlServerToBigQueryColumnType: all type mappings (STRING, BOOL, INT64, FLOAT64, BIGNUMERIC, BYTES, DATE, TIME, TIMESTAMP), case insensitivity, unknown type fallback, and TIMESTAMP pattern matching - buildSqlServerRdbmsForTable: RDBMS structure and special characters - getSqlServerPrimaryKeys: PK extraction with mixed, no, all, and null primary key scenarios using mocked discoverSqlServerTableSchema
Combine the postgresql and sqlserver conditions with || where both branches had identical logic (FormatDatastreamJsonToJson metadata extraction and DatastreamRow sort fields).
- Merge postgresql and sqlserver branches in FormatDatastreamRecordToJson (same consolidation already done for the JSON path) - Fix incorrect docs claiming PostgreSQL is not supported - Fix wrong class Javadoc in DataStreamClientTest - Remove stale section comment in DataStreamClientTest
Auto-formatted with google-java-format via spotless:apply (Java 17).
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces comprehensive support for SQL Server (MSSQL) as a source in the Datastream-to-BigQuery pipeline. It enables schema discovery, metadata extraction, and proper type conversion, ensuring that SQL Server data can be seamlessly replicated to BigQuery. Additionally, the changes include necessary bug fixes for existing metadata handling and documentation updates to clarify supported source databases. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
|
This PR re-introduces SQL Server (MSSQL) source support for the Datastream-to-BigQuery pipeline, addressing the issues that led to the revert in #3579:
Additionally, this PR fixes a pre-existing bug where |
|
cc: @dhercher |
Codecov Report❌ Patch coverage is ❌ Your patch check has failed because the patch coverage (54.83%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #3580 +/- ##
============================================
+ Coverage 52.12% 52.35% +0.22%
- Complexity 5644 6142 +498
============================================
Files 1040 1053 +13
Lines 63118 63419 +301
Branches 6922 6957 +35
============================================
+ Hits 32903 33202 +299
+ Misses 27981 27967 -14
- Partials 2234 2250 +16
🚀 New features to boost your workflow:
|
|
@dhercher could you please take a look? It looks like all checks are passing now |
Summary
_metadata_schema,_metadata_lsn,_metadata_tx_id) in both Avro and JSON format paths_metadata_timestamp,_metadata_lsn) for BigQuery MERGE operations_metadata_lsnto BigQuery default staging table schema_metadata_lsnwas incorrectly populated from"database"field instead of"lsn"field in the JSON format path for PostgreSQLTest plan
DataStreamClientSQL Server methods (type conversion, RDBMS building, primary key extraction)testProcessElement_sqlServerintegration test forFormatDatastreamJsonToJsontestSqlServerSortFieldstest forDatastreamRow