Skip to content

feat(Datastream): Add SQL Server (MSSQL) source support#3580

Open
pabloqc wants to merge 6 commits intoGoogleCloudPlatform:mainfrom
pabloqc:feat/datastream-sqlserver-support
Open

feat(Datastream): Add SQL Server (MSSQL) source support#3580
pabloqc wants to merge 6 commits intoGoogleCloudPlatform:mainfrom
pabloqc:feat/datastream-sqlserver-support

Conversation

@pabloqc
Copy link
Copy Markdown
Contributor

@pabloqc pabloqc commented Mar 27, 2026

Summary

  • Add full SQL Server (MSSQL) CDC source support to the Datastream-to-BigQuery pipeline
  • SQL Server metadata extraction (_metadata_schema, _metadata_lsn, _metadata_tx_id) in both Avro and JSON format paths
  • SQL Server sort key definitions (_metadata_timestamp, _metadata_lsn) for BigQuery MERGE operations
  • SQL Server schema discovery via Datastream API: table/column discovery, primary key extraction, and SQL Server-to-BigQuery type conversion
  • Add _metadata_lsn to BigQuery default staging table schema
  • Fix pre-existing bug where _metadata_lsn was incorrectly populated from "database" field instead of "lsn" field in the JSON format path for PostgreSQL
  • Fix incorrect documentation claiming PostgreSQL is not supported
  • Merge identical PostgreSQL/SQL Server branches where both share the same metadata logic

Test plan

  • 16 new unit tests for DataStreamClient SQL Server methods (type conversion, RDBMS building, primary key extraction)
  • New testProcessElement_sqlServer integration test for FormatDatastreamJsonToJson
  • New testSqlServerSortFields test for DatastreamRow
  • All existing tests pass (no regressions)
  • Spotless formatting verified with Java 17
  • Checkstyle passes

pabloqc added 5 commits March 27, 2026 21:21
Add full SQL Server CDC support to the Datastream-to-BigQuery pipeline:
- SQL Server metadata extraction (schema, lsn, tx_id) in Avro and JSON paths
- SQL Server sort key definitions for BigQuery MERGE operations
- SQL Server schema discovery via Datastream API (table/column discovery,
  primary key extraction, SQL Server-to-BigQuery type conversion)
- Add _metadata_lsn to BigQuery default staging table schema
- Update template documentation to list SQL Server as supported source

Also fix a pre-existing bug where _metadata_lsn was incorrectly populated
from the "database" field instead of the "lsn" field in the JSON format
path for both PostgreSQL and SQL Server sources.
…thods

Add 16 new tests covering:
- convertSqlServerToBigQueryColumnType: all type mappings (STRING, BOOL,
  INT64, FLOAT64, BIGNUMERIC, BYTES, DATE, TIME, TIMESTAMP), case
  insensitivity, unknown type fallback, and TIMESTAMP pattern matching
- buildSqlServerRdbmsForTable: RDBMS structure and special characters
- getSqlServerPrimaryKeys: PK extraction with mixed, no, all, and null
  primary key scenarios using mocked discoverSqlServerTableSchema
Combine the postgresql and sqlserver conditions with || where both
branches had identical logic (FormatDatastreamJsonToJson metadata
extraction and DatastreamRow sort fields).
- Merge postgresql and sqlserver branches in FormatDatastreamRecordToJson
  (same consolidation already done for the JSON path)
- Fix incorrect docs claiming PostgreSQL is not supported
- Fix wrong class Javadoc in DataStreamClientTest
- Remove stale section comment in DataStreamClientTest
Auto-formatted with google-java-format via spotless:apply (Java 17).
@pabloqc pabloqc requested review from a team as code owners March 27, 2026 20:22
@pabloqc pabloqc requested review from darshan-sj and rohitwali March 27, 2026 20:22
@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces comprehensive support for SQL Server (MSSQL) as a source in the Datastream-to-BigQuery pipeline. It enables schema discovery, metadata extraction, and proper type conversion, ensuring that SQL Server data can be seamlessly replicated to BigQuery. Additionally, the changes include necessary bug fixes for existing metadata handling and documentation updates to clarify supported source databases.

Highlights

  • SQL Server CDC Support: Added full SQL Server (MSSQL) CDC source support to the Datastream-to-BigQuery pipeline, including metadata extraction and schema discovery.
  • Metadata and Sorting: Implemented SQL Server metadata handling (_metadata_schema, _metadata_lsn, _metadata_tx_id) and defined sort keys for BigQuery MERGE operations.
  • Bug Fixes and Refactoring: Fixed a bug where _metadata_lsn was incorrectly populated for PostgreSQL and merged shared metadata logic between PostgreSQL and SQL Server.
  • Documentation: Updated documentation to correctly reflect that PostgreSQL and SQL Server are supported sources.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@pabloqc
Copy link
Copy Markdown
Contributor Author

pabloqc commented Mar 27, 2026

This PR re-introduces SQL Server (MSSQL) source support for the Datastream-to-BigQuery pipeline, addressing the issues that led to the revert in #3579:

  • Spotless: Verified passing locally with Java 17
  • Labels: Could a maintainer please add addition and Datastream labels? (I don't have permission to set them as an external contributor)
  • Test coverage: Added 16 new unit tests for DataStreamClient, plus tests for FormatDatastreamJsonToJson and DatastreamRow (up from 1.69% patch coverage in the original PR)

Additionally, this PR fixes a pre-existing bug where _metadata_lsn was incorrectly reading from the "database" field instead of "lsn" in the JSON format path for PostgreSQL, and corrects the documentation that claimed PostgreSQL was not supported.

cc @Abacn @damccorm

@pabloqc
Copy link
Copy Markdown
Contributor Author

pabloqc commented Mar 30, 2026

cc: @dhercher

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 31, 2026

Codecov Report

❌ Patch coverage is 54.83871% with 28 lines in your changes missing coverage. Please review.
✅ Project coverage is 52.35%. Comparing base (2f91d55) to head (0c56c94).
⚠️ Report is 59 commits behind head on main.

Files with missing lines Patch % Lines
...teleport/v2/datastream/utils/DataStreamClient.java 58.82% 21 Missing ⚠️
...tream/transforms/FormatDatastreamRecordToJson.java 33.33% 3 Missing and 1 partial ⚠️
...eleport/v2/cdc/mappers/BigQueryDefaultSchemas.java 0.00% 1 Missing ⚠️
...astream/transforms/FormatDatastreamJsonToJson.java 50.00% 1 Missing ⚠️
...d/teleport/v2/datastream/values/DatastreamRow.java 50.00% 0 Missing and 1 partial ⚠️

❌ Your patch check has failed because the patch coverage (54.83%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #3580      +/-   ##
============================================
+ Coverage     52.12%   52.35%   +0.22%     
- Complexity     5644     6142     +498     
============================================
  Files          1040     1053      +13     
  Lines         63118    63419     +301     
  Branches       6922     6957      +35     
============================================
+ Hits          32903    33202     +299     
+ Misses        27981    27967      -14     
- Partials       2234     2250      +16     
Components Coverage Δ
spanner-templates 72.15% <ø> (+0.05%) ⬆️
spanner-import-export 68.89% <ø> (+0.15%) ⬆️
spanner-live-forward-migration 80.36% <ø> (-0.02%) ⬇️
spanner-live-reverse-replication 77.82% <ø> (+0.02%) ⬆️
spanner-bulk-migration 89.18% <ø> (-0.01%) ⬇️
gcs-spanner-dv 85.32% <ø> (-0.03%) ⬇️
Files with missing lines Coverage Δ
...ud/teleport/v2/templates/DataStreamToBigQuery.java 0.00% <ø> (ø)
...eleport/v2/cdc/mappers/BigQueryDefaultSchemas.java 0.00% <0.00%> (ø)
...astream/transforms/FormatDatastreamJsonToJson.java 55.38% <50.00%> (ø)
...d/teleport/v2/datastream/values/DatastreamRow.java 38.57% <50.00%> (ø)
...tream/transforms/FormatDatastreamRecordToJson.java 51.42% <33.33%> (-1.15%) ⬇️
...teleport/v2/datastream/utils/DataStreamClient.java 20.49% <58.82%> (+10.63%) ⬆️

... and 24 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@damccorm damccorm added the addition New feature or request label Mar 31, 2026
@damccorm
Copy link
Copy Markdown
Contributor

@dhercher could you please take a look? It looks like all checks are passing now

@rohitwali rohitwali removed their request for review April 2, 2026 08:46
@pabloqc pabloqc requested a review from dhercher April 7, 2026 09:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

addition New feature or request size/XS

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants