-
Notifications
You must be signed in to change notification settings - Fork 206
[Bug] Matview Up-go-date maintenance feature is broken. #582
Description
Cloudberry Database version
Matview up-go-date maintenance feature in CloudBerry is subsystem designed to answer, whether matview data is up-to-dat or not (refresh needed).
On current HEAD, it is broken. Multiple ways.
There are three ways to fool gp_matview_aux that I have discovered. Maybe there are more.
- Table inheritance.
- Rewrite rules
- CTE
The latter two of them leads to incorrect query execution results
rewrite:
db2=# create table tt1(i int, j int);
NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'i' as the Cloudberry Database data distribution key for this table.
HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
CREATE TABLE
db2=# create table tt2(i int, j int);
NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'i' as the Cloudberry Database data distribution key for this table.
HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
CREATE TABLE
db2=# create materialized view mv1 as select * from ^C
db2=# insert into tt2 values(1,2),(2,2), (3,2),(1,4);
INSERT 0 4
db2=# create materialized view mv1 as select j from tt2 where i = 1;
NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column(s) named 'j' as the Cloudberry Database data distribution key for this table.
HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
SELECT 2
db2=# create rule r1 as on insert TO tt1 do also insert into tt2 values(1,1);
CREATE RULE
db2=# analyze mv1 ;
ANALYZE
db2=# analyze tt2;
ANALYZE
db2=# select j from tt2 where i = 1;
j
---
2
4
(2 rows)
db2=# table tt2;
i | j
---+---
2 | 2
3 | 2
1 | 2
1 | 4
(4 rows)
db2=# insert into tt1 values(1,1);
INSERT 0 1
db2=# table tt2;
i | j
---+---
2 | 2
3 | 2
1 | 2
1 | 4
1 | 1
(5 rows)
db2=# select j from tt2 where i = 1;
j
---
2
4
1
(3 rows)
db2=# set enable_answer_query_using_materialized_views to true;
SET
db2=# select j from tt2 where i = 1;
j
---
2
4
(2 rows)
db2=#
CTE:
db3=# set enable_answer_query_using_materialized_views to false;
SET
db3=# select count(1) from tt1 where i = 1;
count
-------
12
(1 row)
db3=# set enable_answer_query_using_materialized_views to true;
SET
db3=# select count(1) from tt1 where i = 1;
count
-------
12
(1 row)
db3=# with mod1 as (insert into tt1 values(1,1) returning *) select * from mod1;
i | j
---+---
1 | 1
(1 row)
db3=# select count(1) from tt1 where i = 1;
count
-------
12
(1 row)
db3=# set enable_answer_query_using_materialized_views to false;
SET
db3=# select count(1) from tt1 where i = 1;
count
-------
13
(1 row)
db3=#
I did not reproduce any incorrect result with table inheritance. This is bacause of has_subclass() check inside answer_query_using_materialized_views function.
The problem is here https://github.com/cloudberrydb/cloudberrydb/blob/main/src/backend/executor/execMain.c#L1047
We cannot rely on neither es_processed (rewrite rules case) nor (operation != CMD_SELECT) check (CTE case).
Rewrite rules case can be fixed via relhasrules check. This will disable AQUMV if relation once has rules. Including SELECT rules.
I dont have any fix for CTE for know. Maybe ill come up with something later.
Also, gp_matview_aux should be fixed. In says
db4=# create table tt1(i int, j int);
NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'i' as the Cloudberry Database data distribution key for this table.
HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
CREATE TABLE
db4=# insert into tt1 select * from generate_series(1,10)a, generate_series(1,10)b;
INSERT 0 100
db4=# create table tt2 (i int) inherits(tt1);
NOTICE: table has parent, setting distribution columns to match parent table
NOTICE: merging column "i" with inherited definition
CREATE TABLE
db4=# insert into tt2 values(1,1);
INSERT 0 1
db4=# create materialized view m1 as select * from tt1 where i = 1;
NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column(s) named 'i' as the Cloudberry Database data distribution key for this table.
HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
SELECT 11
db4=# drop table tt2;
DROP TABLE
db4=# select ^C
db4=# table gp_matview_aux ;
mvoid | mvname | datastatus
-------+--------+------------
57503 | m1 | u
(1 rows)
It says datastatus = 'u' which is 100% false.
Also, why gp_matview_tables is shared catalog relation?
What happened
What you think should happen instead
How to reproduce
Operating System
Anything else
No response
Are you willing to submit PR?
- Yes, I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct.