Skip to content

feat: add BigLake Iceberg support for BigQuery analytics plugin#4749

Closed
caohy1988 wants to merge 2 commits intogoogle:mainfrom
caohy1988:feat/biglake-iceberg-support
Closed

feat: add BigLake Iceberg support for BigQuery analytics plugin#4749
caohy1988 wants to merge 2 commits intogoogle:mainfrom
caohy1988:feat/biglake-iceberg-support

Conversation

@caohy1988
Copy link

Summary

  • Adds biglake_storage_uri config option to BigQueryLoggerConfig that enables BigLake managed Iceberg table creation
  • Automatically replaces JSON schema fields with STRING (BigLake Iceberg does not support JSON type)
  • Sets BigLakeConfiguration (connection_id, storage_uri, file_format=PARQUET, table_format=ICEBERG) on table creation
  • Validates that connection_id is provided when biglake_storage_uri is set

Test plan

  • 11 new tests in TestBigLakeIceberg covering config, property, validation, schema transformation, Arrow metadata, and table creation
  • All 197 existing tests pass
  • Autoformatting applied

🤖 Generated with Claude Code

Add `biglake_storage_uri` config option that, when set alongside
`connection_id`, automatically creates BigLake managed Iceberg tables
and replaces JSON schema fields with STRING (since BigLake Iceberg
does not support JSON type).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@google-cla
Copy link

google-cla bot commented Mar 7, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the BigQuery analytics plugin by integrating support for BigLake managed Iceberg tables. This allows users to leverage BigLake's capabilities for external table management, providing a more flexible and scalable data storage solution. The changes include new configuration options, automatic schema adjustments to ensure compatibility with Iceberg's data types, and robust validation to guide proper usage.

Highlights

  • BigLake Iceberg Configuration: Introduced a new biglake_storage_uri option in BigQueryLoggerConfig to enable BigLake managed Iceberg table creation.
  • Schema Transformation: Implemented automatic conversion of JSON schema fields to STRING type, as BigLake Iceberg does not support JSON.
  • BigLake Table Creation: Configured BigLakeConfiguration (including connection_id, storage_uri, file_format=PARQUET, table_format=ICEBERG) during BigQuery table creation when BigLake is enabled.
  • Input Validation: Added validation to ensure that a connection_id is provided when biglake_storage_uri is specified.
  • Test Coverage: Added 11 new tests specifically for BigLake Iceberg functionality, covering configuration, properties, validation, schema transformation, Arrow metadata, and table creation.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • src/google/adk/plugins/bigquery_agent_analytics_plugin.py
    • Added biglake_storage_uri field to BigQueryLoggerConfig.
    • Implemented _replace_json_with_string function to convert JSON schema fields to STRING.
    • Modified _get_events_schema to conditionally apply JSON to STRING conversion based on a new biglake parameter.
    • Added a validation check in the __init__ method to enforce connection_id presence when biglake_storage_uri is set.
    • Introduced an is_biglake property to determine if BigLake functionality is enabled.
    • Updated _lazy_setup to pass the is_biglake flag to _get_events_schema.
    • Enhanced _ensure_schema_exists to set BigLakeConfiguration properties for BigLake tables.
  • tests/unittests/plugins/test_bigquery_agent_analytics_plugin.py
    • Created TestBigLakeIceberg class to house new tests.
    • Added tests for biglake_storage_uri configuration and default value.
    • Included tests for the is_biglake property's behavior.
    • Added a test to verify connection_id is required with biglake_storage_uri.
    • Implemented tests to confirm JSON fields are replaced with STRING in BigLake schemas.
    • Added a test to ensure Arrow schema metadata does not contain JSON types for BigLake.
    • Included a test to verify BigLakeConfiguration is correctly set during table creation.
    • Added tests to confirm non-BigLake schemas remain unchanged and for the _replace_json_with_string helper.
Activity
  • No human activity has been recorded for this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@adk-bot adk-bot added the services [Component] This issue is related to runtime services, e.g. sessions, memory, artifacts, etc label Mar 7, 2026
@adk-bot
Copy link
Collaborator

adk-bot commented Mar 7, 2026

Response from ADK Triaging Agent

Hello @caohy1988, thank you for your contribution!

Before we can merge this PR, could you please sign our Contributor License Agreement (CLA)? You can find more information at https://cla.developers.google.com/.

Also, for a new feature of this scope, please create a GitHub issue that describes the feature and associate it with this pull request.

This will help us to track the new feature and review your PR more efficiently. Thanks!

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for BigLake managed Iceberg tables to the BigQuery analytics plugin. The changes are well-implemented, including the addition of the biglake_storage_uri configuration, schema transformation to handle JSON types, and setting the BigLakeConfiguration on table creation. The new tests are comprehensive and cover the new functionality thoroughly. I have one suggestion to remove a redundant validation check to improve code maintainability.

Comment on lines +2193 to +2198

if not self.config.connection_id:
raise ValueError(
"connection_id is required for BigLake Iceberg tables."
" Set it in BigQueryLoggerConfig."
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This validation check for connection_id is redundant. The __init__ method of BigQueryAgentAnalyticsPlugin already performs an identical check on lines 1905-1908. If self.is_biglake is true, it implies self.config.biglake_storage_uri is set, and the constructor would have already ensured self.config.connection_id is also set.

Removing this duplicate check will improve maintainability by centralizing the validation logic in the constructor, adhering to the "fail-fast" principle. The suggested change also removes the now-unnecessary blank line.

1. Normalize connection_id to full resource path for BigLakeConfiguration
   (projects/{project}/locations/{loc}/connections/{name}).
2. Skip time partitioning for BigLake Iceberg by default (preview feature);
   add biglake_time_partitioning opt-in flag.
3. Document Storage Write API latency caveat for Iceberg metadata refresh
   (~90 min for open-source engine visibility).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@caohy1988 caohy1988 closed this Mar 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

services [Component] This issue is related to runtime services, e.g. sessions, memory, artifacts, etc

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants