What happened?
The expression:
t = ibis.table([("a", "string"), ("x", "int64")], name="t")
expr = t.filter((_.a == "hello").any().over(group_by=_.x))
incorrectly produces the SQL:
SELECT
*
FROM `t` AS `t0`
QUALIFY
LOGICAL_OR(`t0`.`a` = 'hello') OVER (
PARTITION BY `t0`.`x`
ORDER BY NULL ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
This is incorrect and produces inconsistent results without any error, at least in bigquery. This is because ordering by null creates an undefined sort resulting in the ordering of the current row being pseudorandom when combined with the frame. This means the logical_or and logical_and expressions consider a different number of rows each time.
The correct expression should be:
SELECT
*
FROM `t` AS `t0`
QUALIFY
LOGICAL_OR(`t0`.`a` = 'hello') OVER (PARTITION BY `t0`.`x`)
I have proposed a fix in #11930, but I suspect further unittests will probably be required across a range of backends.
What version of ibis are you using?
11.0.0
What backend(s) are you using, if any?
Bigquery
Relevant log output
Code of Conduct
What happened?
The expression:
incorrectly produces the SQL:
This is incorrect and produces inconsistent results without any error, at least in bigquery. This is because ordering by null creates an undefined sort resulting in the ordering of the current row being pseudorandom when combined with the frame. This means the
logical_orandlogical_andexpressions consider a different number of rows each time.The correct expression should be:
I have proposed a fix in #11930, but I suspect further unittests will probably be required across a range of backends.
What version of ibis are you using?
11.0.0
What backend(s) are you using, if any?
Bigquery
Relevant log output
Code of Conduct