ROW vs STATEMENT vs MIXED Binary Log Formats for Archiving and PITR Automation

Binary log format is not a passive replication tuning knob; it is the contract that decides whether an archived transaction can be replayed byte-for-byte or only approximately. When you automate point-in-time recovery (PITR) and long-term binary log retention, the choice between ROW, STATEMENT, and MIXED directly governs three things that cannot be renegotiated after an incident: replay determinism, archive storage economics, and the auditability required for compliance. The naive approach — treat binlog_format as a static value set once in my.cnf and never checked again — fails precisely at the moment it matters, because a session can override the global format, a version upgrade can shift the default, and a single MIXED file can carry both statement and row events. An archival control plane must therefore treat format as runtime state to be detected and validated on every cycle, not configuration to be assumed. This guide builds that pipeline on the event model and lifecycle established in MySQL Binary Log Architecture & GTID Fundamentals.

Visual Overview

The decision above drives everything that follows: PITR automation forces the branch to ROW, and every downstream stage — detection, routing, verification, purge — inherits the guarantees that branch provides.

Core Concept & Prerequisites

Each format serializes a data modification differently, and that serialization is the only thing your archive actually holds:

STATEMENT logs the original SQL text verbatim. Archives stay compact, but replay determinism is fragile: UUID(), NOW(), RAND(), SYSDATE(), non-deterministic UDFs, LIMIT without ORDER BY, and user-variable dependencies all replay differently against a different clock, seed, or row order. To replay a statement-format archive faithfully you must preserve exact session context — timezone, character set, sql_mode, and the same server-side function behaviour — which is rarely captured end-to-end.
ROW logs the before-and-after column images of every changed row. This guarantees deterministic reconstruction regardless of non-deterministic functions or schema drift, which is the entire point of PITR. The cost is a larger footprint, especially for bulk UPDATE/DELETE. MySQL 8.0 mitigates this with binlog_row_image=MINIMAL (log only the primary key plus changed columns) or NOBLOB, shrinking payloads without sacrificing replay accuracy.
MIXED defaults to statement logging and switches to row logging when the server detects an unsafe construct. It reads as a compromise, but for an archiver it is the hardest case: a single file can contain both event types, so the reader needs two deserialization paths and cannot assume a stable format boundary across a retention window.

For PITR automation, ROW is the standard, and it is the complementary gate to a gap-free GTID Tracking & Enforcement pipeline: GTIDs prove no transaction is missing, while ROW proves each surviving transaction replays identically. A contiguous GTID set built from statement-format events still loses fidelity. The two enforcement layers cover different failure modes and neither substitutes for the other.

Version and environment constraints:

MySQL 8.0 defaults to binlog_format=ROW; MySQL 5.7 and earlier often default to MIXED or STATEMENT, which is why legacy topologies still emit heterogeneous streams your pipeline must tolerate. binlog_format is deprecated in MySQL 8.0.34+ and slated for removal — MySQL is standardizing on row logging — so any new automation should target ROW and treat other values as legacy edge cases to detect and quarantine.
gtid_mode=ON and enforce_gtid_consistency=ON are assumed; the latter rejects some constructs that are only unsafe under statement logging, which interacts with format choice (see Error Handling below).
Python 3.10+ for the automation layer (structural pattern matching and the walrus operator are used below), with mysql-connector-python 8.0+ for pooled connections and tenacity for retry orchestration — both already standard in this codebase.

Server-side tuning that shapes what these formats emit — row image granularity and transaction dependency tracking — is covered in configuring binlog_format for minimal replication overhead; this page is concerned with detecting, validating, and routing whatever the server actually produces.

Production-Grade Python Implementation

A correct pipeline interrogates the live format before it transfers, snapshots, or compresses anything. It must read both global and session scope, correlate the result with the GTID frontier, validate against policy, and route idempotently. The module below does this with a frozen dataclass, connection pooling, exponential-backoff retries, structured logging, and an atomic, resumable copy. It targets Python 3.10+ and mysql-connector-python 8.0+.

import hashlib
import logging
from dataclasses import dataclass
from pathlib import Path

import mysql.connector
from mysql.connector import Error, pooling
from tenacity import (
    retry,
    retry_if_exception_type,
    stop_after_attempt,
    wait_exponential,
)

logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s | %(levelname)s | %(name)s | %(message)s",
)
logger = logging.getLogger("binlog_format_router")

ALLOWED_FORMATS = frozenset({"ROW"})  # PITR policy: row logging only


class ArchivePolicyError(Exception):
    """Detected format violates the archival policy; route to quarantine."""


@dataclass(frozen=True, slots=True)
class FormatSnapshot:
    """A point-in-time view of the server's logging format and GTID frontier."""
    global_format: str
    session_format: str
    gtid_executed: str

    @property
    def is_uniform(self) -> bool:
        """True when session cannot silently diverge from the archived global."""
        return self.global_format == self.session_format

    @property
    def is_pitr_safe(self) -> bool:
        return self.global_format in ALLOWED_FORMATS and self.is_uniform


_POOL = pooling.MySQLConnectionPool(pool_name="fmt_pool", pool_size=4)


@retry(
    retry=retry_if_exception_type(Error),
    stop=stop_after_attempt(4),
    wait=wait_exponential(multiplier=0.5, max=8),
    reraise=True,
)
def detect_format(conn_params: dict) -> FormatSnapshot:
    """Read global + session binlog_format and the GTID frontier atomically."""
    conn = _POOL.get_connection() if not conn_params else mysql.connector.connect(
        **conn_params, autocommit=True
    )
    try:
        with conn.cursor(dictionary=True) as cur:
            # MySQL 8.0+: session scope can override global; read both.
            cur.execute(
                "SELECT @@GLOBAL.binlog_format  AS global_fmt, "
                "       @@SESSION.binlog_format AS session_fmt, "
                "       @@GLOBAL.gtid_executed  AS gtid_executed"
            )
            row = cur.fetchone()
        snap = FormatSnapshot(
            global_format=row["global_fmt"],
            session_format=row["session_fmt"],
            gtid_executed=row["gtid_executed"].replace("\n", ""),
        )
        logger.info(
            "format.detected global=%s session=%s uniform=%s",
            snap.global_format, snap.session_format, snap.is_uniform,
        )
        return snap
    finally:
        conn.close()


def validate_policy(snap: FormatSnapshot) -> None:
    """Fail closed: anything but a uniform ROW server is a policy violation."""
    if not snap.is_pitr_safe:
        raise ArchivePolicyError(
            f"non-PITR-safe format: global={snap.global_format} "
            f"session={snap.session_format}"
        )


def sha256(path: Path, chunk: int = 1 << 20) -> str:
    h = hashlib.sha256()
    with path.open("rb") as fh:
        while block := fh.read(chunk):  # walrus: stream, never load whole file
            h.update(block)
    return h.hexdigest()


def route_archive(
    snap: FormatSnapshot,
    source: Path,
    archive_root: Path,
    *,
    dry_run: bool = False,
) -> Path | None:
    """Idempotent, format-partitioned, atomic archival copy."""
    match snap.global_format:
        case "ROW":
            dest_dir = archive_root / "row"
        case "MIXED":
            dest_dir = archive_root / "hybrid"  # dual-parser recovery path
        case _:
            dest_dir = archive_root / "quarantine"

    target = dest_dir / source.name
    if dry_run:
        logger.info("format.dry_run source=%s -> target=%s", source, target)
        return None

    # Idempotency: a byte-identical target is a completed transfer.
    if target.exists() and sha256(target) == sha256(source):
        logger.info("format.skip already_archived target=%s", target)
        return target

    dest_dir.mkdir(parents=True, exist_ok=True)
    tmp = target.with_suffix(target.suffix + ".partial")
    tmp.write_bytes(source.read_bytes())
    tmp.rename(target)  # atomic on POSIX: no reader ever sees a partial file
    logger.info("format.routed target=%s format=%s", target, snap.global_format)
    return target

Two design choices are load-bearing. First, detection reads session scope alongside global, because SET SESSION binlog_format can quietly log a subset of transactions in a different format than the file’s nominal format implies — a uniformity check catches that before it becomes an unrecoverable surprise. Second, a MIXED file is not rejected outright but routed to a hybrid path tagged for dual-parser recovery, while unknown values go to quarantine; the pipeline never silently drops or misroutes a binary log.

Configuration Reference

The variables below determine what each format emits and how faithfully it can be replayed. Recommended values target PITR automation on MySQL 8.0+.

Variable	Type	Default (8.0)	Recommended	PITR impact
`binlog_format`	enum	`ROW`	`ROW`	Row images are the only format that guarantees deterministic replay; deprecated in 8.0.34+, standardizing on ROW.
`binlog_row_image`	enum	`FULL`	`MINIMAL` (or `FULL` if downstream needs all columns)	`MINIMAL` logs PK + changed columns only, shrinking archives sharply; `FULL` eases forensic diffing at higher cost.
`binlog_row_metadata`	enum	`MINIMAL`	`FULL`	`FULL` embeds column names/types so an archived row event stays self-describing across schema drift.
`binlog_row_value_options`	set	`''` (empty)	`''` for PITR	`PARTIAL_JSON` logs JSON diffs; empty keeps full values so replay is unambiguous.
`binlog_rows_query_log_events`	bool	`OFF`	`ON`	Attaches the original SQL as a comment to row events — invaluable when auditing a row-format archive.
`binlog_checksum`	enum	`CRC32`	`CRC32`	Per-event checksum lets `mysqlbinlog --verify-binlog-checksum` detect corruption before replay.
`log_bin_trust_function_creators`	bool	`OFF`	`OFF`	Keeping it off forces deterministic-function discipline that statement/mixed formats depend on.

Apply the row-image and metadata changes dynamically and confirm they took at global scope:

-- MySQL 8.0.22+: enforce PITR-safe row logging without a restart.
SET GLOBAL binlog_format        = 'ROW';
SET GLOBAL binlog_row_image     = 'MINIMAL';
SET GLOBAL binlog_row_metadata  = 'FULL';
SET GLOBAL binlog_rows_query_log_events = ON;

-- Verify the running server, not the config file.
SELECT @@GLOBAL.binlog_format, @@GLOBAL.binlog_row_image,
       @@GLOBAL.binlog_row_metadata;

Note that a SET GLOBAL change takes effect only for sessions that connect after it; existing sessions keep their inherited value until they reconnect, which is exactly why the detection layer above reads session scope rather than trusting the global.

Validation & Verification Gates

Format detection is a gate, not a formality. Before an archived segment is trusted for recovery, it must clear a fixed sequence:

Format uniformity. Confirm @@GLOBAL.binlog_format == @@SESSION.binlog_format == ROW. A divergence means some transactions in flight may be serialized differently than the file’s header implies.
Checksum integrity. Run mysqlbinlog --verify-binlog-checksum over the segment; any failure is a hard stop, because a corrupt row image replays as corruption, not as an error.
Format scan of the physical file. For any file routed to hybrid, parse for Format_desc, Query (with BEGIN/DML text), and Write_rows/Update_rows/Delete_rows events to confirm which parser each GTID block needs.
Dry-run replay. Stream mysqlbinlog --include-gtids=<range> into a throwaway staging schema and check for syntax errors, unsupported DDL, and — for statement-format survivors — unsafe-statement warnings.
Manifest reconciliation. Confirm the format tag recorded in the archive manifest matches what the physical scan found, so no ROW window is silently trusting a MIXED file.

Inspecting the raw event stream is the ground truth for steps 3 and 4:

# Decode without executing; -v -v expands row images to pseudo-SQL.
mysqlbinlog --verify-binlog-checksum --base64-output=DECODE-ROWS -v -v \
  --include-gtids='3E11FA47-71CA-11E1-9E33-C80AA9429562:1-100' \
  /var/lib/mysql/binlog.000042 | grep -E '### (INSERT|UPDATE|DELETE)|^# at'

A file that shows ### UPDATE lines is row-formatted; a file that shows bare BEGIN/DML text with no ### row annotations is statement-formatted. A file that shows both is MIXED and belongs on the hybrid recovery path.

Error Handling & Failure Modes

Format mismatches surface as a specific family of MySQL errors and warnings. Mapping them to root cause keeps the pipeline from either crashing blindly or silently proceeding.

from dataclasses import dataclass


@dataclass(frozen=True, slots=True)
class Remediation:
    cause: str
    action: str
    fatal: bool


def classify_format_error(errno: int) -> Remediation:
    match errno:
        case 1592:  # ER_BINLOG_UNSAFE_STATEMENT (warning)
            return Remediation(
                cause="Unsafe statement logged in STATEMENT format "
                      "(NOW(), UUID(), LIMIT without ORDER BY, etc.).",
                action="Switch the source to binlog_format=ROW; the archived "
                       "event may replay non-deterministically.",
                fatal=False,
            )
        case 1663:  # ER_BINLOG_ROW_MODE_AND_STMT_ENGINE
            return Remediation(
                cause="binlog_format=ROW but a table uses an engine limited "
                      "to statement-based logging.",
                action="Migrate the table to InnoDB; do not downgrade the "
                       "global format to accommodate it.",
                fatal=True,
            )
        case 1665:  # ER_BINLOG_STMT_MODE_AND_ROW_ENGINE
            return Remediation(
                cause="Statement is in row format but binlog_format=STATEMENT "
                      "and an engine requires row logging.",
                action="Set binlog_format=ROW (or MIXED) for the session "
                       "issuing the write.",
                fatal=True,
            )
        case 1666:  # ER_BINLOG_ROW_INJECTION_AND_STMT_MODE
            return Remediation(
                cause="Row-injection (replicated row event) hit a "
                      "STATEMENT-format channel.",
                action="Ensure replica applier runs with ROW; a mixed-format "
                       "topology cannot guarantee deterministic apply.",
                fatal=True,
            )
        case _:
            return Remediation(
                cause=f"Unmapped binlog-format error {errno}.",
                action="Capture SHOW VARIABLES LIKE 'binlog_format' on source "
                       "and target; halt the pipeline.",
                fatal=True,
            )

The warning-level 1592 is the quiet killer of statement and mixed archives: MySQL logs the transaction and moves on, so the corruption is latent until a PITR replays it against a different clock or row order and produces a plausible result instead of an identical one. The 1663/1665/1666 family is the server refusing to serialize a write it cannot log faithfully — those are working as intended, and the fix is always to move the workload to ROW, never to relax the format. A subtler failure is the mid-file format shift: an upgrade or a session override flips the format partway through a binary log, so a file the manifest labels ROW actually opens with statement events. The physical scan in the verification gates is what catches this before recovery trusts the label.

Observability & Alerting

You cannot enforce a format you do not measure on every cycle. Instrument the live format, the share of unsafe events, and the write pressure that predicts archive lag.

-- MySQL 8.0+: current logging format and row-image granularity.
SELECT @@GLOBAL.binlog_format   AS fmt,
       @@GLOBAL.binlog_row_image AS row_image;

-- MySQL 8.0+: count unsafe-statement warnings since last flush; a rising
-- value means STATEMENT/MIXED sources are emitting non-deterministic events.
SHOW GLOBAL STATUS LIKE 'Binlog_stmt_cache_use';
SHOW GLOBAL STATUS LIKE 'Binlog_cache_use';

-- MySQL 8.0+: binary-log write throughput, which predicts when the
-- archiver will start falling behind the retention purge clock.
SELECT FILE_NAME, COUNT_WRITE, SUM_NUMBER_OF_BYTES_WRITE
FROM performance_schema.file_summary_by_instance
WHERE FILE_NAME LIKE '%bin.%'
ORDER BY SUM_NUMBER_OF_BYTES_WRITE DESC;

Emit each detection result as a structured log record with stable field names — event, global_format, session_format, is_uniform, gtid_frontier, verdict — so alerts query fields rather than scrape message text. Recommended thresholds:

Format drift — page immediately when global_format != ROW or when is_uniform is false on a server the policy marks PITR-critical. This is the alarm that fires before a non-deterministic event is archived.
Unsafe-statement rate — a rising count of 1592 warnings on a STATEMENT/MIXED source is a warning routed to the owning application team, because it signals a workload that cannot be recovered faithfully in its current format.
Row-image cost — if FULL row images inflate archive size past the storage budget, alert to reconsider MINIMAL; the size/forensic trade-off is a tuning decision, not an incident.

Least privilege holds throughout: the detection account needs only REPLICATION CLIENT for SHOW BINARY LOG STATUS and SELECT on the relevant system views — never SUPER. The full privilege model, encryption in transit and at rest, and audit hooks live in security & access frameworks. When a format gate hard-stops and no clean recovery path remains, degrade deterministically per fallback routing strategies rather than forcing an unsafe replay, and cross-reference binlog retention boundaries so a quarantined MIXED file is never purged before its recovery path is resolved.

Frequently Asked Questions

Why is ROW the standard for PITR when it produces the largest archives?

Because PITR requires a recovery to be identical to the original, not merely plausible. ROW logs the exact before-and-after column images, so replay is deterministic regardless of NOW(), UUID(), RAND(), triggers, or row-order dependencies. The size penalty is real but is addressed at the source with binlog_row_image=MINIMAL and at rest with compression — never by downgrading to STATEMENT, which trades bytes for the one guarantee recovery cannot do without.

Can a single binary log file contain more than one format?

Yes, and it is the reason MIXED is the hardest case to archive. Under MIXED, MySQL logs statement events by default and switches to row events per-transaction when it detects an unsafe construct, so one file can hold both. A format change (a SET GLOBAL, a session override, or a version upgrade) can also shift the format mid-file. This is why the pipeline scans the physical event stream rather than trusting a single file-level label, and routes such files to a dual-parser recovery path.

How do I detect the real format when a session can override the global?

Read both scopes in one query: SELECT @@GLOBAL.binlog_format, @@SESSION.binlog_format. SET SESSION binlog_format lets a connection log its transactions differently from the global setting, and a SET GLOBAL change only affects sessions that connect afterward. Trusting the global alone can silently archive statement-format transactions inside a nominally ROW window; a uniformity check on both scopes is what closes that gap.

Does binlog_format still matter if I am upgrading to MySQL 8.0.34 or later?

More than ever, but the decision narrows. binlog_format is deprecated from 8.0.34 and MySQL is standardizing on row logging, so STATEMENT and MIXED are legacy states your automation should detect and quarantine rather than support long-term. Treat any non-ROW source as technical debt to migrate, and keep the detection gate in place through the upgrade so a transitional default flip cannot slip an unexpected format into the archive.

MySQL Binary Log Architecture & GTID Fundamentals — the event model and gtid_executed/gtid_purged lifecycle this format pipeline builds on.
GTID Tracking & Enforcement — the complementary gate that proves no transaction is missing while ROW proves each replays identically.
Configuring binlog_format for minimal replication overhead — server-side row-image and dependency-tracking tuning that shapes what these formats emit.
Binlog retention boundaries — computing safe purge windows so a quarantined MIXED file is never deleted before its recovery path resolves.
Fallback routing strategies — deterministic degradation when a format gate hard-stops.

Back to MySQL Binary Log Architecture & GTID Fundamentals.

ROW vs STATEMENT vs MIXED Binary Log Formats for Archiving and PITR Automation #

Visual Overview #

Core Concept & Prerequisites #

Production-Grade Python Implementation #

Configuration Reference #

Validation & Verification Gates #

Error Handling & Failure Modes #

Observability & Alerting #

Frequently Asked Questions #

Why is ROW the standard for PITR when it produces the largest archives? #

Can a single binary log file contain more than one format? #

How do I detect the real format when a session can override the global? #

Does binlog_format still matter if I am upgrading to MySQL 8.0.34 or later? #

Related #

Explore this section

Configuring binlog_format for Minimal Replication Overhead