You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Cloud Datastream to BigQuery (DataStream_to_BigQuery / job label: cloud_datastream_to_bigquery)
Template Version
2025-11-11-00_rc00
What happened?
Expected behavior
When using MySQL Datastream in GTID mode, _metadata_log_file and _metadata_log_position should be populated in BigQuery — the same as in binlog mode — so that the order of records can be determined for the MERGE operation.
Actual behavior
When MySQL Datastream operates in GTID mode, the Avro source_metadata schema differs from binlog mode. The template hardcodes "log_file" and "log_position" as the field names to look up (L150-151), but in GTID mode Avro, these fields are named "binlog_file" and "binlog_position". Because getSourceMetadata() returns null when the field is absent from the schema (L243-L251), both _metadata_log_file and _metadata_log_position are written as null in BigQuery.
In other words, the ordering metadata is present in GTID-mode Avro, but the template does not extract it because it only supports the binlog-mode field names.
Impact
_metadata_log_file and _metadata_log_position are always null for records ingested via GTID mode
When multiple CDC changes for the same key occur within the same timestamp boundary, the template cannot use binlog ordering to deterministically rank them, which can break deduplication and upsert correctness in the generated MERGE logic.
Suggested fix
When source_metadata.log_file / source_metadata.log_position are absent, the template should also check for:
source_metadata.binlog_file
source_metadata.binlog_position
and map them to:
_metadata_log_file
_metadata_log_position
This would preserve ordering semantics for MySQL GTID-mode Datastream sources.
Note
Note: The related issue where GTID mode triggers the Oracle default sort branch (_metadata_scn) can be mitigated by setting the datastreamSourceType=mysql override parameter.
However, even with that workaround applied, _metadata_log_file and _metadata_log_position remain null. This report focuses on that remaining problem.
Related Template(s)
Cloud Datastream to BigQuery (
DataStream_to_BigQuery/ job label:cloud_datastream_to_bigquery)Template Version
2025-11-11-00_rc00
What happened?
Expected behavior
When using MySQL Datastream in GTID mode,
_metadata_log_fileand_metadata_log_positionshould be populated in BigQuery — the same as in binlog mode — so that the order of records can be determined for the MERGE operation.Actual behavior
When MySQL Datastream operates in GTID mode, the Avro
source_metadataschema differs from binlog mode. The template hardcodes"log_file"and"log_position"as the field names to look up (L150-151), but in GTID mode Avro, these fields are named"binlog_file"and"binlog_position". BecausegetSourceMetadata()returnsnullwhen the field is absent from the schema (L243-L251), both_metadata_log_fileand_metadata_log_positionare written as null in BigQuery.Binlog mode Avro:
{ "read_method": "mysql-cdc-binlog", "source_metadata": { "log_file": "mysql-bin-changelog.072901", "log_position": 33822 } }GTID mode Avro:
{ "read_method": "cdc", "source_metadata": { "gtid": "92e3e612-66e2-11f0-a5a7-0e004f023b83:22073", "binlog_file": "mysql-bin-changelog.073728", "binlog_position": 147862 } }In other words, the ordering metadata is present in GTID-mode Avro, but the template does not extract it because it only supports the binlog-mode field names.
Impact
_metadata_log_fileand_metadata_log_positionare always null for records ingested via GTID modeMERGElogic.Suggested fix
When
source_metadata.log_file/source_metadata.log_positionare absent, the template should also check for:source_metadata.binlog_filesource_metadata.binlog_positionand map them to:
_metadata_log_file_metadata_log_positionThis would preserve ordering semantics for MySQL GTID-mode Datastream sources.
Note
Note: The related issue where GTID mode triggers the Oracle default sort branch (_metadata_scn) can be mitigated by setting the
datastreamSourceType=mysqloverride parameter.However, even with that workaround applied, _metadata_log_file and _metadata_log_position remain null. This report focuses on that remaining problem.
Relevant log output