Skip to content

[WIP] Antalya 26.1; Remote initiator improvements 2#1608

Open
ianton-ru wants to merge 3 commits intoantalya-26.1from
feature/antalya-26.1/remote_initiator_improvements_2
Open

[WIP] Antalya 26.1; Remote initiator improvements 2#1608
ianton-ru wants to merge 3 commits intoantalya-26.1from
feature/antalya-26.1/remote_initiator_improvements_2

Conversation

@ianton-ru
Copy link
Copy Markdown

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Use object_storage_remote_initiator without object_storage_cluster on initial node.

Documentation entry for user-facing changes

Solved #1607
Query with object_storage_remote_initiator and object_storage_remote_initiator_cluster can be executed, when object_storage_cluster setting is defined only on remote node.

SELECT * FROM datalake.`iceberg.table` SETTINGS object_storage_remote_initiator=1, object_storage_remote_initiator_cluster='swarm_external_name'

converted to

SELECT * FROM remote('random_node_from_swarm_external_name', iceberg(...))

Remote node random_node_from_swarm_external_name executes

SELECT* FROM iceberg(...)

locally of as cluster table function depend on object_storage_cluster defined only on this node.

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • S3 Export (2h)
  • Swarms (30m)
  • Tiered Storage (2h)

@ianton-ru
Copy link
Copy Markdown
Author

@codex review

@ianton-ru ianton-ru changed the title Antalya 26.1; Remote initiator improvements 2 [WIP] Antalya 26.1; Remote initiator improvements 2 Apr 1, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 1, 2026

Workflow [PR], commit [b1efb77]

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6a03617679

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

const auto & settings = context->getSettingsRef();

is_cluster_function = !settings[Setting::object_storage_cluster].value.empty() && typename Base::Configuration().isClusterSupported();
is_cluster_function = typename Base::Configuration().isClusterSupported();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve simple-mode fallback when no cluster is configured

Assigning is_cluster_function from Base::Configuration().isClusterSupported() unconditionally forces s3(...)/iceberg(...) fallback functions into cluster mode even when object_storage_cluster is empty. In the CREATE TABLE ... AS <table_function> path, arguments are parsed before validateUseToCreateTable() is called, so this change now makes validateUseToCreateTable() throw for previously valid non-cluster statements like CREATE TABLE t AS s3(...). Please keep cluster mode gated by an actual configured cluster name (or a narrower remote-initiator condition) to avoid regressing non-cluster table-function creation.

Useful? React with 👍 / 👎.

@ianton-ru
Copy link
Copy Markdown
Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ed463afe38

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +369 to 370
if (make_cluster_function)
{
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Set cluster_name_in_settings on non-cluster rewrite

When make_cluster_function is false, this branch now skips setting cluster_name_in_settings, but updateQueryToSendIfNeeded() still uses that flag to decide whether the first argument is a cluster name (extractDynamicStorageType(..., !cluster_name_in_settings)). For engine-to-function rewrites in the new remote-initiator-without-local-cluster flow, this misclassifies the first real argument as a cluster name, so named-collection-based data lake queries can lose storage_type detection and fall back to S3 semantics on the remote side.

Useful? React with 👍 / 👎.

Comment on lines +118 to +119
is_cluster_function = (!settings[Setting::object_storage_cluster].value.empty() || !settings[Setting::object_storage_remote_initiator_cluster].value.empty())
&& typename Base::Configuration().isClusterSupported();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Gate cluster fallback on remote initiator being enabled

This condition now flips to cluster-mode parsing whenever object_storage_remote_initiator_cluster is set, even if object_storage_remote_initiator is disabled. That changes plain s3/iceberg/... table-function behavior globally for sessions that carry only the cluster-name setting (for example from a profile), and can trigger cluster-only restrictions such as validateUseToCreateTable() rejecting create-table usage unexpectedly.

Useful? React with 👍 / 👎.

@ianton-ru ianton-ru force-pushed the feature/antalya-26.1/remote_initiator_improvements_2 branch from ed463af to c60f71f Compare April 1, 2026 14:27
@ianton-ru
Copy link
Copy Markdown
Author

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. What shall we delve into next?

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ianton-ru ianton-ru force-pushed the feature/antalya-26.1/remote_initiator_improvements_2 branch from c60f71f to b1efb77 Compare April 1, 2026 15:42
@ianton-ru
Copy link
Copy Markdown
Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b1efb77d2a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

const auto & settings = context->getSettingsRef();
ASTPtr query_to_send = query_info.query;

if (!isClusterSupported() || cluster_name_from_settings.empty())
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Exclude non-cluster storages from remote-initiator rewrite

The new if (!isClusterSupported() || cluster_name_from_settings.empty()) branch now routes object_storage_remote_initiator=1 queries through updateQueryToSendIfNeeded(..., make_cluster_function=false) even when the storage explicitly reports isClusterSupported()==false. For datalake configs like StorageLocalPaimonConfiguration (used by PaimonLocal), StorageObjectStorageCluster::updateQueryForDistributedEngineIfNeeded cannot map the engine name (PaimonLocal) to a table function and throws a logical error before the remote call is built. Previously these queries fell back to pure local read; now they fail whenever remote initiator is enabled.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant