CASSANALYTICS-168 Need the ability to broadcast and reconstruct subclasses on executors by skoppu22 · Pull Request #205 · apache/cassandra-analytics

skoppu22 · 2026-05-08T12:29:19Z

After the BulkWriterConfig broadcast refactor f960685, bulk writer’s context/cluster/config subclasses cannot be instantiated on executors. For any job whose driver-side context was instantiated from a subclass, the executor silently instantiates base class implementations. Hence need to add the ability to broadcast and reconstruct subclasses on executors.

Circle CI link: https://app.circleci.com/pipelines/github/skoppu22/cassandra-analytics/116/workflows/6b53f2bd-017f-4dbe-917f-8d6794dfd24b

jmckenzie-dev · 2026-05-08T16:05:23Z

+
+        // Extract only broadcast-safe cluster metadata
+
+        // ClusterInfo has transient fields (CassandraContext, token mappings) that are not serializable


Currently the distinction on what is transient and what is not is implicit, derived from the verbiage in this method. How can we instead make it clear within the ClusterInfo what fields are serializable and what are not? Brainstorming, thinking:

javadoc comments

usage of an @Serializable and @Serial interface (kind of overloading and using in a different way than the formal usage but would annotate the intent)

Adding our own @Immutable style interface for something or otherwise denoting the fields final or pushing them to being final if appropriate

Having the serializability state of these fields denoted here in comments is brittle and runs a real risk of drift; changes in ClusterInfo could easily break these contracts in the future w/out another maintainer realizing it.

Removed these comments. In CassandraClusterInfo, grouped fields by serializable state and added comments

jmckenzie-dev · 2026-05-08T16:27:49Z

+    public BulkWriterContext toBulkWriterContext()
+    {
+        BulkSparkConf conf = getConf();
+        if (conf.isCoordinatedWriteConfigured())


stylistic nit: you could rewrite this as:

return conf.isCoordinatedWriteConfigured() ? new CassandraCoordinatedBulkWriterContext(this) : new CassandraBulkWriterContext(this);

Whether or not you think that's more clear is another story entirely. :)

jmckenzie-dev · 2026-05-08T16:28:44Z

+
+        // Extract only broadcast-safe cluster metadata
+
+        // ClusterInfo has transient fields (CassandraContext, token mappings) that are not serializable


Same as above - how can we make this more explicit near the source of the data and its serializability (and reflect the downstream expectation of that serializability) instead of having that information and expectation only reflected here?

jmckenzie-dev · 2026-05-08T16:29:30Z

+
+        BulkWriterContext context = customConfig.toBulkWriterContext();
+        assertThat(context).isNotNull();
+        // The OSS default would return CassandraBulkWriterContext or CassandraCoordinatedBulkWriterContext,


The OSS default <- this is the OSS project. Is this comment from another context and need to refine here?

Reworded this

yifan-c · 2026-05-08T18:35:00Z

        return new CassandraClusterInfoGroup(clusterInfos);
    }

-    @VisibleForTesting // ONLY FOR TESTING


Why removing the annotation? I think it is still only used by test code

skoppu22 added 2 commits May 7, 2026 19:28

Enable extensibility for bulk writer broadcast/reconstruction

b037b60

remove import

3a6ca2a

jmckenzie-dev reviewed May 8, 2026

View reviewed changes

resolve comments

bbf1f42

skoppu22 commented May 8, 2026

View reviewed changes

Comment thread ...ion-tests/src/test/java/org/apache/cassandra/analytics/BulkReaderMultiDCConsistencyTest.java

add comment

190b988

jmckenzie-dev approved these changes May 8, 2026

View reviewed changes

yifan-c reviewed May 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CASSANALYTICS-168 Need the ability to broadcast and reconstruct subclasses on executors#205

CASSANALYTICS-168 Need the ability to broadcast and reconstruct subclasses on executors#205
skoppu22 wants to merge 4 commits intoapache:trunkfrom
skoppu22:yif/config-context-extensible

skoppu22 commented May 8, 2026 •

edited

Loading

Uh oh!

jmckenzie-dev May 8, 2026

Uh oh!

skoppu22 May 8, 2026

Uh oh!

jmckenzie-dev May 8, 2026

Uh oh!

jmckenzie-dev May 8, 2026

Uh oh!

jmckenzie-dev May 8, 2026

Uh oh!

skoppu22 May 8, 2026

Uh oh!

Uh oh!

yifan-c May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		// Extract only broadcast-safe cluster metadata

		// ClusterInfo has transient fields (CassandraContext, token mappings) that are not serializable

Conversation

skoppu22 commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jmckenzie-dev May 8, 2026

Choose a reason for hiding this comment

Uh oh!

skoppu22 May 8, 2026

Choose a reason for hiding this comment

Uh oh!

jmckenzie-dev May 8, 2026

Choose a reason for hiding this comment

Uh oh!

jmckenzie-dev May 8, 2026

Choose a reason for hiding this comment

Uh oh!

jmckenzie-dev May 8, 2026

Choose a reason for hiding this comment

Uh oh!

skoppu22 May 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yifan-c May 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

skoppu22 commented May 8, 2026 •

edited

Loading