Skip to content

bug: any and all operators incorrectly apply default frame and sort order with window function #11931

@ssabdb

Description

@ssabdb

What happened?

The expression:

t = ibis.table([("a", "string"), ("x", "int64")], name="t")
expr = t.filter((_.a == "hello").any().over(group_by=_.x))

incorrectly produces the SQL:

SELECT
  *
FROM `t` AS `t0`
QUALIFY
  LOGICAL_OR(`t0`.`a` = 'hello') OVER (
PARTITION BY `t0`.`x` 
ORDER BY NULL ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)

This is incorrect and produces inconsistent results without any error, at least in bigquery. This is because ordering by null creates an undefined sort resulting in the ordering of the current row being pseudorandom when combined with the frame. This means the logical_or and logical_and expressions consider a different number of rows each time.

The correct expression should be:

SELECT
  *
FROM `t` AS `t0`
QUALIFY
  LOGICAL_OR(`t0`.`a` = 'hello') OVER (PARTITION BY `t0`.`x`)

I have proposed a fix in #11930, but I suspect further unittests will probably be required across a range of backends.

What version of ibis are you using?

11.0.0

What backend(s) are you using, if any?

Bigquery

Relevant log output

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugIncorrect behavior inside of ibis

    Type

    No type

    Projects

    Status

    backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions