Steps to reproduce:
CREATE TABLE ref_table (key int);
SELECT create_reference_table('ref_table');
SET citus.enable_ddl_propagation TO OFF;
DROP TABLE ref_table CASCADE; -- citus_drop_all_shards doesn't drop shards and metadata
SELECT 1 FROM master_add_node('localhost', 9700, groupId => 0); -- a new node without the ref table
CREATE TABLE citus_local_table(id int, other_column int);
SELECT citus_add_local_table_to_metadata('citus_local_table'); -- triggers EnsureReferenceTablesExistOnAllNodes
-- ref_table has placements on both workers, but not on the coordinator
-- EnsureReferenceTablesExistOnAllNodes tries to replicate it to the coordinator
DROP TABLE triggers some other drop functions:
- master_remove_distributed_table_metadata_from_workers
- citus_drop_all_shards
- master_remove_partition_metadata
but these functions do an early return without dropping anything, because enable_ddl_propagation is set to off.
The actual table is dropped on the coordinator, but the shards remain on the workers. We don't do anything on pg_dist_partition, pg_dist_shard and pg_dist_placement as well. So the metadata actually says that we still have a reference table, which has shard placements on both workers.
When master_add_node is called, it introduces a new node which doesn't have the reference table on it.
citus_add_local_table_to_metadata calls EnsureReferenceTablesExistOnAllNodes, which fetches all reference table records from pg_dist_partition, including the dropped one. Then, because there is no placement on the new node for the reference table, as pg_dist_placement tells us, it tries to replicate the reference table to the new node, by calling citus_copy_shard_placement.
citus_copy_shard_placement crashes during the recreation of the table on the target node, because it can't find the name/schema for the given relationId, as that table doesn't exist anymore.
In short, I think we have two separate issues here
DROP TABLE doesn't drop shards and the metadata when ddl propagation is off. We can maybe leave shards on workers, but we should delete the pg_dist_ entries for the table, as we actually don't have that reference/distributed table anymore. This will prevent the crash as there will be no entry in pg_dist_partition for the table anymore.
The shards left on the workers can be cleaned up right away ignoring the GUC, or we can create cleanup records for them to make them dropped once the GUC is set to on again. We can also consider removing the GUC in the future.
- We should add some checks & errors to the
citus_copy_shard_placement logic so it should never crash for cases like this but gracefully error out.
Steps to reproduce:
DROP TABLE triggers some other drop functions:
but these functions do an early return without dropping anything, because
enable_ddl_propagationis set to off.The actual table is dropped on the coordinator, but the shards remain on the workers. We don't do anything on
pg_dist_partition,pg_dist_shardandpg_dist_placementas well. So the metadata actually says that we still have a reference table, which has shard placements on both workers.When
master_add_nodeis called, it introduces a new node which doesn't have the reference table on it.citus_add_local_table_to_metadatacallsEnsureReferenceTablesExistOnAllNodes, which fetches all reference table records frompg_dist_partition, including the dropped one. Then, because there is no placement on the new node for the reference table, aspg_dist_placementtells us, it tries to replicate the reference table to the new node, by callingcitus_copy_shard_placement.citus_copy_shard_placementcrashes during the recreation of the table on the target node, because it can't find the name/schema for the given relationId, as that table doesn't exist anymore.In short, I think we have two separate issues here
DROP TABLEdoesn't drop shards and the metadata when ddl propagation is off. We can maybe leave shards on workers, but we should delete thepg_dist_entries for the table, as we actually don't have that reference/distributed table anymore. This will prevent the crash as there will be no entry inpg_dist_partitionfor the table anymore.The shards left on the workers can be cleaned up right away ignoring the GUC, or we can create cleanup records for them to make them dropped once the GUC is set to on again. We can also consider removing the GUC in the future.
citus_copy_shard_placementlogic so it should never crash for cases like this but gracefully error out.