Skip to content

[Bug] Matview Up-go-date maintenance feature is broken.  #582

@reshke

Description

@reshke

Cloudberry Database version

Matview up-go-date maintenance feature in CloudBerry is subsystem designed to answer, whether matview data is up-to-dat or not (refresh needed).

On current HEAD, it is broken. Multiple ways.

There are three ways to fool gp_matview_aux that I have discovered. Maybe there are more.

  1. Table inheritance.
  2. Rewrite rules
  3. CTE

The latter two of them leads to incorrect query execution results

rewrite:


db2=# create table tt1(i int, j int);
NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'i' as the Cloudberry Database data distribution key for this table.
HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
CREATE TABLE
db2=# create table tt2(i int, j int);
NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'i' as the Cloudberry Database data distribution key for this table.
HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
CREATE TABLE
db2=# create materialized view mv1 as select * from ^C
db2=# insert into tt2 values(1,2),(2,2), (3,2),(1,4);
INSERT 0 4
db2=# create materialized view mv1 as select j from tt2 where i = 1;
NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column(s) named 'j' as the Cloudberry Database data distribution key for this table.
HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
SELECT 2
db2=# create rule r1 as on insert TO tt1 do also insert into tt2 values(1,1);
CREATE RULE
db2=# analyze mv1 ;
ANALYZE
db2=# analyze tt2;
ANALYZE
db2=# select j from tt2 where i = 1;
 j
---
 2
 4
(2 rows)

db2=# table tt2;
 i | j
---+---
 2 | 2
 3 | 2
 1 | 2
 1 | 4
(4 rows)

db2=# insert into tt1 values(1,1);
INSERT 0 1
db2=# table tt2;
 i | j
---+---
 2 | 2
 3 | 2
 1 | 2
 1 | 4
 1 | 1
(5 rows)

db2=# select j from tt2 where i = 1;
 j
---
 2
 4
 1
(3 rows)
db2=# set enable_answer_query_using_materialized_views to true;
SET
db2=# select j from tt2 where i = 1;
 j
---
 2
 4
(2 rows)

db2=#

CTE:


db3=# set enable_answer_query_using_materialized_views to false;
SET
db3=# select count(1) from tt1 where i = 1;
 count
-------
    12
(1 row)

db3=# set enable_answer_query_using_materialized_views to true;
SET
db3=# select count(1) from tt1 where i = 1;
 count
-------
    12
(1 row)

db3=# with mod1 as (insert into tt1 values(1,1) returning *) select * from mod1;
 i | j
---+---
 1 | 1
(1 row)

db3=# select count(1) from tt1 where i = 1;
 count
-------
    12
(1 row)

db3=# set enable_answer_query_using_materialized_views to false;
SET
db3=# select count(1) from tt1 where i = 1;
 count
-------
    13
(1 row)

db3=#

I did not reproduce any incorrect result with table inheritance. This is bacause of has_subclass() check inside answer_query_using_materialized_views function.

The problem is here https://github.com/cloudberrydb/cloudberrydb/blob/main/src/backend/executor/execMain.c#L1047
We cannot rely on neither es_processed (rewrite rules case) nor (operation != CMD_SELECT) check (CTE case).

Rewrite rules case can be fixed via relhasrules check. This will disable AQUMV if relation once has rules. Including SELECT rules.

I dont have any fix for CTE for know. Maybe ill come up with something later.

Also, gp_matview_aux should be fixed. In says



db4=# create table tt1(i int, j int);
NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'i' as the Cloudberry Database data distribution key for this table.
HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
CREATE TABLE
db4=# insert into tt1 select * from generate_series(1,10)a, generate_series(1,10)b;
INSERT 0 100
db4=# create table tt2 (i int) inherits(tt1);
NOTICE:  table has parent, setting distribution columns to match parent table
NOTICE:  merging column "i" with inherited definition
CREATE TABLE
db4=# insert into tt2 values(1,1);
INSERT 0 1
db4=# create materialized view m1 as select * from tt1 where i = 1;
NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column(s) named 'i' as the Cloudberry Database data distribution key for this table.
HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
SELECT 11
db4=# drop table tt2;
DROP TABLE
db4=# select ^C

db4=# table gp_matview_aux ;
 mvoid | mvname | datastatus
-------+--------+------------
 57503 | m1     | u
(1 rows)

It says datastatus = 'u' which is 100% false.

Also, why gp_matview_tables is shared catalog relation?

What happened

What you think should happen instead

How to reproduce

Operating System

Anything else

No response

Are you willing to submit PR?

  • Yes, I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

Labels

type: BugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions