Async Processing & Queue Management for Binary Log Archiving and PITR Automation

Decoupling binary log ingestion from object storage uploads is what separates a resilient archiving pipeline from a fragile cron script that stalls the primary. A synchronous uploader that blocks on every PUT amplifies replication lag during transient network partitions, holds file descriptors open across cloud API timeouts, and turns a single throttled request into a stalled binlog_expire_logs_seconds purge cycle — a direct threat to your Point-in-Time Recovery (PITR) window. This page defines the queue architecture that absorbs bursty rotation, enforces strict per-instance ordering, and delivers deterministic retry semantics, all as a component of the broader Automated Binlog Archiving to Object Storage pipeline. Naive approaches fail because they conflate detection with transport: the moment the process that watches for rotated logs also performs the upload, backpressure from storage propagates straight back into MySQL’s I/O path.

Visual Overview

Core Concept & Prerequisites

The operational intent is to isolate the MySQL data plane from the cloud storage control plane using a broker as the system of record for in-flight work. A lightweight producer daemon watches the binary log directory with inotify (or high-resolution polling as a fallback), detects freshly rotated segments, and publishes one task per closed file. A pool of workers consumes those tasks, applies the Compression & Encryption Workflows transform, streams the payload to storage, and persists the metadata a recovery run needs to target a precise transaction boundary. Detection is triggered by Rotation Scheduling & Cron Automation; that scheduler must only wake the producer, never invoke an upload directly, or the decoupling collapses.

Prerequisites for the pipeline described here:

MySQL 8.0.4+ — required for binlog_expire_logs_seconds (the expire_logs_days variable is deprecated) and for binlog_transaction_compression (MySQL 8.0.20+), which reduces the payload the queue must move.
GTID mode enabled — gtid_mode=ON and enforce_gtid_consistency=ON so every archived segment carries a resolvable gtid_executed range. A gap-free GTID Tracking & Enforcement pipeline is what makes queue ordering meaningful; without it, out-of-order uploads silently break the recovery chain.
A broker with per-key ordering — Redis Streams (consumer groups), RabbitMQ (consistent-hash exchange), or Apache Kafka (partition key) all work, provided you can pin one MySQL instance’s stream to a single consumer.
Python 3.10+ with celery and redis for task orchestration, boto3 for the S3/GCS-compatible transport, mysql-connector-python for pooled manifest writes, and tenacity for retrying the metadata commit independently of the upload retry.

Binlog recovery relies on absolute sequence integrity: replaying mysql-bin.000042 before mysql-bin.000041 corrupts the chain and makes timestamp targeting mathematically impossible. To guarantee strict FIFO per instance, queues are partitioned by server_uuid. Partitioning by hostname is a common mistake — a host that is re-provisioned or fails over keeps the same name but starts a fresh binlog sequence, so server_uuid is the only stable ordering key.

Production-Grade Python Implementation

The worker layer is implemented with Celery, which provides mature acknowledgement semantics, per-task retry policies, and broker abstraction. The module below is complete and runnable: it uses a typed dataclass for the archive task, task_acks_late so a crashed worker’s message is redelivered rather than lost, SHA-256 verification of the uploaded object, a pooled mysql-connector-python writer for the manifest, tenacity for the metadata commit, and structured JSON logging so a single segment can be traced from rotation to finalization.

#!/usr/bin/env python3
"""Async binlog archiving worker — Celery task + pooled manifest writer.
Targets: MySQL 8.0.4+, Python 3.10+.
Requires: celery, redis, boto3, mysql-connector-python, tenacity.
"""
from __future__ import annotations

import hashlib
import json
import logging
from dataclasses import dataclass, asdict
from pathlib import Path
from typing import Any

import boto3
from botocore.exceptions import ClientError
from celery import Celery
from mysql.connector import pooling, Error as MySQLError
from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type


# ---- structured logging -------------------------------------------------
class JsonFormatter(logging.Formatter):
    def format(self, record: logging.LogRecord) -> str:
        payload = {"level": record.levelname, "event": record.getMessage()}
        if isinstance(record.args, dict):
            payload |= record.args
        return json.dumps(payload)


handler = logging.StreamHandler()
handler.setFormatter(JsonFormatter())
logger = logging.getLogger("binlog_archiver")
logger.addHandler(handler)
logger.setLevel(logging.INFO)


# ---- broker configuration ----------------------------------------------
app = Celery(
    "binlog_queue",
    broker="redis://redis-broker:6379/0",
    backend="redis://redis-broker:6379/1",
)
app.conf.update(
    task_serializer="json",
    accept_content=["json"],
    task_acks_late=True,               # redeliver on worker crash, don't drop
    task_reject_on_worker_lost=True,
    worker_prefetch_multiplier=1,      # one in-flight task per worker = strict order
)

# Pooled connections to the manifest database (recovery metadata store).
MANIFEST_POOL = pooling.MySQLConnectionPool(
    pool_name="manifest_pool",
    pool_size=8,
    host="10.0.1.20",
    database="pitr_meta",
    user="binlog_archiver",
    password="__from_secret_manager__",
)


@dataclass(slots=True, frozen=True)
class ArchiveTask:
    server_uuid: str
    binlog_name: str
    binlog_path: str
    bucket: str
    gtid_range: str
    dry_run: bool = False

    @property
    def object_key(self) -> str:
        return f"mysql-binlogs/{self.server_uuid}/{self.binlog_name}"


def compute_sha256(path: Path) -> str:
    digest = hashlib.sha256()
    with path.open("rb") as fh:
        while chunk := fh.read(1 << 16):   # walrus: 64 KiB streaming read
            digest.update(chunk)
    return digest.hexdigest()


@retry(
    retry=retry_if_exception_type(MySQLError),
    stop=stop_after_attempt(5),
    wait=wait_exponential(multiplier=1, max=30),
    reraise=True,
)
def record_manifest(task: ArchiveTask, checksum: str) -> None:
    """Persist the recovery manifest row. Retried independently of the upload
    so a healthy S3 object is never orphaned by a transient metadata blip."""
    conn = MANIFEST_POOL.get_connection()
    try:
        cur = conn.cursor()
        cur.execute(
            """
            INSERT INTO binlog_manifest
                (server_uuid, binlog_name, object_key, sha256, gtid_range)
            VALUES (%s, %s, %s, %s, %s)
            ON DUPLICATE KEY UPDATE sha256 = VALUES(sha256)
            """,
            (task.server_uuid, task.binlog_name, task.object_key,
             checksum, task.gtid_range),
        )
        conn.commit()
    finally:
        conn.close()   # returns the connection to the pool


@app.task(bind=True, max_retries=6, acks_late=True)
def upload_binlog(self, payload: dict[str, Any]) -> dict[str, Any]:
    task = ArchiveTask(**payload)
    path = Path(task.binlog_path)

    if not path.exists():
        logger.warning("binlog_missing", {"file": str(path)})
        return {"status": "skipped", "reason": "file_missing"}

    checksum = compute_sha256(path)

    if task.dry_run:
        logger.info("dry_run", {"key": task.object_key, "sha256": checksum})
        return {"status": "dry_run", "sha256": checksum, **asdict(task)}

    try:
        s3 = boto3.client("s3")
        s3.upload_file(
            str(path), task.bucket, task.object_key,
            ExtraArgs={
                "ServerSideEncryption": "aws:kms",
                "Metadata": {"sha256": checksum,
                             "server_uuid": task.server_uuid,
                             "gtid_range": task.gtid_range},
            },
        )
        # Read back the stored object's checksum to catch silent corruption.
        head = s3.head_object(Bucket=task.bucket, Key=task.object_key)
        if head["Metadata"].get("sha256") != checksum:
            raise ValueError("checksum_mismatch_after_upload")

        record_manifest(task, checksum)
        logger.info("archived", {"key": task.object_key, "sha256": checksum})
        return {"status": "success", "sha256": checksum, "key": task.object_key}

    except (ClientError, ValueError) as exc:
        backoff = 2 ** self.request.retries          # 1, 2, 4, 8, 16, 32 s
        logger.error("upload_failed", {"key": task.object_key,
                                       "attempt": self.request.retries,
                                       "error": str(exc)})
        raise self.retry(exc=exc, countdown=backoff)

Two design choices deserve emphasis. First, worker_prefetch_multiplier=1 combined with a per-server_uuid queue is what actually enforces sequential processing — without it, Celery greedily buffers tasks and a fast worker can finish segment N+1 before a slow worker finishes N. Second, the manifest write is retried by tenacity separately from the S3 upload’s Celery retry, so a momentary database hiccup never leaves a verified object in storage with no recovery-index row pointing at it. Deeper broker tuning, prefetch trade-offs, and canvas workflows live in Using Celery for Async Binlog Upload Processing.

Configuration Reference

These server-side variables govern how fast segments arrive at the queue and how large each payload is. The queue must be tuned to the rotation rate they imply — if max_binlog_size is small and write volume is high, the producer emits tasks faster and worker concurrency must scale to match.

Variable	Type	Default	Recommended	PITR impact
`binlog_expire_logs_seconds`	integer (s)	`2592000` (30d)	`≥ 259200` (3d)	Local floor: the queue must fully drain a segment before this window purges it, or that segment is lost from the recovery chain.
`max_binlog_size`	integer (bytes)	`1073741824` (1 GiB)	`104857600`–`536870912`	Smaller segments rotate more often → more, smaller tasks → finer PITR granularity but higher queue throughput.
`sync_binlog`	integer	`1`	`1`	`1` guarantees each transaction is durable before it can be archived; any other value risks queuing a segment MySQL has not crash-safely persisted.
`binlog_transaction_compression`	boolean	`OFF`	`ON`	Compresses events at the source (MySQL 8.0.20+), shrinking the bytes each worker uploads and the storage the recovery chain occupies.
`binlog_row_image`	enum	`FULL`	`MINIMAL` or `FULL`	Smaller row images shrink each segment; use `FULL` only where downstream CDC consumers need it.
`gtid_mode`	enum	`OFF`	`ON`	Enables the `gtid_executed` range stamped into each manifest row, which the recovery orchestrator diffs to detect gaps.
`enforce_gtid_consistency`	enum	`OFF`	`ON`	Rejects GTID-unsafe statements at write time, preventing un-replayable events from ever entering an archived segment.

-- MySQL 8.0.20+ : rotation + durability settings that feed the queue
SET PERSIST binlog_expire_logs_seconds = 259200;   -- 3-day local floor
SET PERSIST max_binlog_size            = 268435456; -- 256 MiB segments
SET PERSIST sync_binlog                = 1;
SET PERSIST binlog_transaction_compression = ON;

SET PERSIST writes to mysqld-auto.cnf so the settings survive restart without editing the base config file — important for automation that must not race with a configuration-management tool rewriting my.cnf.

Validation & Verification Gates

An archived object is not “safe” until it has passed the same gates a recovery run will depend on. The pipeline enforces four:

Checksum verification. The worker computes SHA-256 before upload and re-reads the object’s stored digest via head_object (shown above). A mismatch raises immediately and the task retries rather than committing a corrupt manifest row.

GTID set diffing. After a batch drains, compare the union of archived gtid_range values against the server’s executed set to prove there are no holes:

-- MySQL 8.0+ : the authoritative executed set, live
SELECT @@GLOBAL.gtid_executed AS server_executed;
-- Compare against SELECT GROUP_CONCAT(gtid_range) FROM binlog_manifest
-- using GTID_SUBTRACT(server_executed, archived_union); an empty result
-- means every committed transaction is archived.

Dry-run replay. Before promoting a new worker build, run tasks with dry_run=True; the worker computes checksums and resolves object keys without transmitting bytes, so you can validate IAM policy, routing, and key structure with zero side effects.
Manifest reconciliation. Periodically list the bucket prefix for a server_uuid and diff object names against binlog_manifest rows. An object with no row (or a row with no object) indicates a partial failure that must be reconciled before it silently widens the recovery gap. Reconciliation pairs naturally with Base Backup Integration for PITR, which pins the starting GTID position the archived chain must contiguously extend.

Error Handling & Failure Modes

Transient cloud API failures are the common case; the exponential backoff (2 ** retries, capped) plus jitter at the broker prevents a throttled endpoint from triggering a thundering-herd retry storm. Persistent failures are the dangerous case, because they can silently truncate the recovery chain. Map each failure to a defined action:

ERROR 1236 (HY000): Could not find first log file name in binary log index file — the producer references a segment MySQL has already purged. Root cause: the queue fell behind binlog_expire_logs_seconds and a log rotated out before its task ran. Action: alert immediately (the recovery chain now has a hole), widen the local retention floor, and scale worker concurrency. This is the single most important failure to detect early.
ERROR 3546 (HY000): @@GLOBAL.GTID_PURGED cannot be changed during a recovery replay — the base backup’s gtid_purged does not line up with the archived chain’s starting GTID. Root cause: base-backup and archiving retention drifted out of alignment. Action: reconcile using Timestamp Targeting Strategies to locate the correct starting segment.
ERROR 1786 (HY000): Statement violates GTID consistency — a GTID-unsafe statement reached the log. This is prevented upstream by enforce_gtid_consistency=ON; if it appears in an archived segment, that segment may not replay cleanly.
Broker redelivery / duplicate tasks — a failover or visibility-timeout expiry re-queues a task already in flight. The ON DUPLICATE KEY UPDATE manifest write plus checksum idempotency makes redelivery harmless; a second upload of an identical, verified object is a no-op, not corruption.
Retry exhaustion — after max_retries the task routes to a dead-letter queue. A DLQ consumer persists the failed payload and pages on-call. Never let a task disappear on final failure; a dropped task is an undetected gap. Provider-specific throttling patterns are covered in Error Handling & Retry Logic.

Backpressure is the pressure-relief valve tying these together: when queue depth crosses a threshold (XLEN on a Redis Stream, or the RabbitMQ management API), the producer throttles inotify publishing and switches to batched polling so worker memory never exhausts under a rotation burst.

Observability & Alerting

Treat archiving lag as a first-class SLI: the gap between the newest rotated segment and the newest verified manifest row is your real, measured RPO exposure. Expose it, and the queue internals, as Prometheus metrics:

binlog_queue_depth{server_uuid} — pending tasks per instance; sustained growth means workers cannot keep pace with rotation.
binlog_archive_lag_seconds{server_uuid} — wall-clock age of the oldest un-archived rotated segment. Alert when this approaches binlog_expire_logs_seconds — that is the moment the recovery chain is about to lose a segment.
binlog_upload_duration_seconds — histogram; a rising p99 flags storage-side throttling before it becomes a stall.
binlog_checksum_mismatch_total and binlog_dlq_messages_total — any non-zero rate is a page.

MySQL’s own view of binlog and GTID progress comes from performance_schema:

-- MySQL 8.0+ : authoritative binlog + GTID position for lag correlation
SELECT * FROM performance_schema.log_status\G
-- Reports the current binary log file, position, and gtid_executed in a
-- single consistent snapshot — join this against the newest manifest row
-- to compute true archiving lag rather than trusting the watcher alone.

Emit structured log fields (server_uuid, binlog_name, object_key, sha256, attempt) as JSON, as the worker above does, so a single segment can be followed from filesystem rotation through broker ingestion, worker processing, and storage finalization. Correlating that trace with binlog_archive_lag_seconds is what turns a queue depth spike into an actionable root cause instead of a mystery. Retention thresholds themselves should be governed against replication lag as described in Binlog Retention Boundaries.

Frequently Asked Questions

Why partition queues by server_uuid instead of hostname?

server_uuid is stable across host re-provisioning and failover, whereas a hostname can be reused by a rebuilt machine that starts an entirely new binlog sequence. Partitioning by hostname would let two unrelated log sequences share one ordered stream, breaking the FIFO guarantee that PITR replay depends on. The UUID is the only key that maps one-to-one to a single, continuous binary log history.

How does the queue guarantee strict ordering if workers run concurrently?

Concurrency scales across instances, not within one. Each server_uuid maps to a single queue/partition, and worker_prefetch_multiplier=1 ensures only one task per instance is in flight at a time. You can run dozens of workers to archive dozens of instances in parallel, but any single instance’s segments are processed strictly in sequence. Combined with task_acks_late, a crashed worker’s segment is redelivered without advancing the sequence pointer.

What happens to a task after it exhausts its retries?

It routes to a dead-letter queue rather than being dropped. A DLQ consumer persists the failed payload’s metadata and pages on-call, because a silently discarded task is an undetected hole in the recovery chain. Once the root cause (bad IAM role, purged segment, corrupt header) is fixed, the payload is replayed from the DLQ. The DLQ depth metric should always alert when non-zero.

Can idempotency be enforced without a metadata database?

For a single worker, an in-process dictionary keyed by (server_uuid, binlog_name) suffices, but it does not survive restarts or coordinate across a worker pool. In production, use the manifest table (or a Redis hash) as the shared idempotency registry: the ON DUPLICATE KEY UPDATE write plus SHA-256 verification means a redelivered task re-uploads an identical, already-verified object as a harmless no-op instead of corrupting state.

Using Celery for Async Binlog Upload Processing — broker tuning, prefetch, and canvas workflows for the worker layer.
Rotation Scheduling & Cron Automation — how the producer daemon is triggered without invoking uploads directly.
Error Handling & Retry Logic — exponential backoff, circuit breakers, and dead-letter routing in depth.
AWS S3 & GCS Sync Pipelines — the multi-cloud transport layer the workers upload through.
Compression & Encryption Workflows — the transform stage applied before each upload.

Back to Automated Binlog Archiving to Object Storage.

Async Processing & Queue Management for Binary Log Archiving and PITR Automation #

Visual Overview #

Core Concept & Prerequisites #

Production-Grade Python Implementation #

Configuration Reference #

Validation & Verification Gates #

Error Handling & Failure Modes #

Observability & Alerting #

Frequently Asked Questions #

Related #

Explore this section

Using Celery for Asynchronous Binlog Upload Processing