Skip to content

CASSANALYTICS-168 Need the ability to broadcast and reconstruct subclasses on executors#205

Open
skoppu22 wants to merge 4 commits intoapache:trunkfrom
skoppu22:yif/config-context-extensible
Open

CASSANALYTICS-168 Need the ability to broadcast and reconstruct subclasses on executors#205
skoppu22 wants to merge 4 commits intoapache:trunkfrom
skoppu22:yif/config-context-extensible

Conversation

@skoppu22
Copy link
Copy Markdown
Contributor

@skoppu22 skoppu22 commented May 8, 2026

After the BulkWriterConfig broadcast refactor f960685, bulk writer’s context/cluster/config subclasses cannot be instantiated on executors. For any job whose driver-side context was instantiated from a subclass, the executor silently instantiates base class implementations. Hence need to add the ability to broadcast and reconstruct subclasses on executors.

Circle CI link: https://app.circleci.com/pipelines/github/skoppu22/cassandra-analytics/116/workflows/6b53f2bd-017f-4dbe-917f-8d6794dfd24b


// Extract only broadcast-safe cluster metadata

// ClusterInfo has transient fields (CassandraContext, token mappings) that are not serializable
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently the distinction on what is transient and what is not is implicit, derived from the verbiage in this method. How can we instead make it clear within the ClusterInfo what fields are serializable and what are not? Brainstorming, thinking:

  • javadoc comments
  • usage of an @Serializable and @Serial interface (kind of overloading and using in a different way than the formal usage but would annotate the intent)
  • Adding our own @Immutable style interface for something or otherwise denoting the fields final or pushing them to being final if appropriate

Having the serializability state of these fields denoted here in comments is brittle and runs a real risk of drift; changes in ClusterInfo could easily break these contracts in the future w/out another maintainer realizing it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed these comments. In CassandraClusterInfo, grouped fields by serializable state and added comments

public BulkWriterContext toBulkWriterContext()
{
BulkSparkConf conf = getConf();
if (conf.isCoordinatedWriteConfigured())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stylistic nit: you could rewrite this as:

        return conf.isCoordinatedWriteConfigured() ?
            new CassandraCoordinatedBulkWriterContext(this) :
            new CassandraBulkWriterContext(this);

Whether or not you think that's more clear is another story entirely. :)


// Extract only broadcast-safe cluster metadata

// ClusterInfo has transient fields (CassandraContext, token mappings) that are not serializable
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above - how can we make this more explicit near the source of the data and its serializability (and reflect the downstream expectation of that serializability) instead of having that information and expectation only reflected here?


BulkWriterContext context = customConfig.toBulkWriterContext();
assertThat(context).isNotNull();
// The OSS default would return CassandraBulkWriterContext or CassandraCoordinatedBulkWriterContext,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The OSS default <- this is the OSS project. Is this comment from another context and need to refine here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworded this

return new CassandraClusterInfoGroup(clusterInfos);
}

@VisibleForTesting // ONLY FOR TESTING
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why removing the annotation? I think it is still only used by test code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants