Configuring synchronous PRAGMA for Crash Safety

An unattended edge device loses power mid-write and comes back with a corrupt database, or a battery-backed controller that never loses power crawls at 20 commits a second because every one of them blocks on an fsync(). Both are the same misconfiguration seen from opposite ends: PRAGMA synchronous set to a level that does not match the storage controller’s flush semantics. This page fixes that single knob for crash safety on constrained hardware — choosing the right level, applying it deterministically, and proving it holds after a real power cut. It sits under the PRAGMA Optimization Guide within the WAL Optimization & Concurrency Tuning discipline; the broader initialization order, checkpoint cadence, and journaling choices are handled on their own pages, and here the goal is narrow: pick a durability barrier that survives power loss without paying fsync() on every commit.

The trap is that a wrong level fails silently. A commit reports success to the application while the underlying fdatasync() never reached the block device, so the database looks healthy right up until the moment power is pulled at exactly the wrong point in a checkpoint.

Diagnosis

Confirm you actually have a synchronous problem — and which of the two variants — before changing anything. The setting is an integer, 0=OFF, 1=NORMAL, 2=FULL, 3=EXTRA, and the first step is reading the value that is genuinely in force on the connection your writers use, not the one you think you set:

import sqlite3

conn = sqlite3.connect("/var/lib/app/data.db")
sync = conn.execute("PRAGMA synchronous;").fetchone()[0]      # 0=OFF 1=NORMAL 2=FULL 3=EXTRA
mode = conn.execute("PRAGMA journal_mode;").fetchone()[0]
print(f"synchronous={sync}  journal_mode={mode}")

Read the result against your symptom:

synchronous=0 (OFF) with corruption after power loss. This is the dangerous variant. OFF removes the durability barrier entirely — SQLite hands pages to the OS cache and never forces them down, so a power cut can leave the main database and its journal in an inconsistent state that a subsequent PRAGMA integrity_check; reports as malformed. If you see OFF on a device that can lose power, you have found the corruption source.
synchronous=2 (FULL) with commit latency spikes. The safe-but-slow variant. FULL forces an fsync() at every commit; on QLC NAND, low-end SD cards, or an eMMC with weak firmware, each flush can stall 15–100× longer than the write itself, and under a write burst those stalls block readers too. Commit latency that tracks storage fsync cost rather than data size is the signature.
synchronous=1 (NORMAL) but journal_mode is not wal. NORMAL only carries its crash-safety guarantee in WAL mode. Under the legacy rollback journal, NORMAL can leave a window where a power loss during a commit corrupts the database. If journal_mode reads delete or truncate, the level is right but the prerequisite is missing.

The correct target for almost every edge, IoT, desktop, and Python-automation workload is synchronous=NORMAL paired with WAL — durable across application crashes, non-corrupting across power loss, and free of a per-commit flush. NORMAL’s one accepted cost is that the handful of transactions committed since the last checkpoint may roll back on power loss; the database itself never corrupts.

Solution

Apply the level once, at connection open, immediately after WAL mode is established and before the first write, then read it back and assert. Setting PRAGMA synchronous returns no row, so the only way to know it took is to query it again — a variable holding the value you intended proves nothing about the connection. The factory below configures a crash-safe connection and nothing tangential; every durable value carries its trade-off inline:

import sqlite3
import logging
from contextlib import contextmanager

logger = logging.getLogger("crash_safe")

@contextmanager
def crash_safe_connection(db_path: str, timeout: float = 30.0):
    """Yield a connection pinned to synchronous=NORMAL under WAL, verified by read-back."""
    conn = None
    try:
        # isolation_level=None -> autocommit: each PRAGMA executes immediately instead of
        # being deferred inside an implicit BEGIN that could silently discard it.
        conn = sqlite3.connect(db_path, timeout=timeout, isolation_level=None)

        # WAL is the prerequisite: synchronous=NORMAL is only crash-safe under WAL.
        # This is a persistent header flag; re-asserting it costs nothing and documents intent.
        mode = conn.execute("PRAGMA journal_mode=WAL;").fetchone()[0]
        if mode.lower() != "wal":
            raise RuntimeError(f"expected WAL, got {mode!r} -- NORMAL is unsafe without it")

        # NORMAL: no fsync() per commit, fsync deferred to checkpoint time. Non-corrupting
        # across power loss; only commits since the last checkpoint are at risk. The right
        # balance for eMMC / SD / industrial NVMe. (0=OFF 1=NORMAL 2=FULL 3=EXTRA)
        conn.execute("PRAGMA synchronous=NORMAL;")

        # Read the level BACK from SQLite -- not from a variable -- and assert it stuck.
        sync = conn.execute("PRAGMA synchronous;").fetchone()[0]
        if int(sync) != 1:
            raise RuntimeError(f"synchronous not applied: wanted 1 (NORMAL), got {sync}")

        logger.info("crash-safe connection verified: synchronous=NORMAL, journal_mode=WAL")
        yield conn
    except Exception:
        logger.exception("crash-safe connection init failed")
        raise
    finally:
        if conn:
            conn.close()

Two deployment-specific variants of the same three lines. For SQLite CLI or migration scripts, run the baseline before any write so the WAL header and level are set before the first journal opens:

PRAGMA journal_mode = WAL;         -- persistent header flag; enables NORMAL's crash safety
PRAGMA synchronous  = NORMAL;      -- no per-commit fsync; non-corrupting on power loss
PRAGMA wal_checkpoint(TRUNCATE);   -- reset -wal size so growth starts from a known baseline

For embedded C / bare-metal (RTOS, microcontrollers, custom desktop frameworks), apply the PRAGMAs before any sqlite3_prepare_v2() or sqlite3_exec() touches user data, and check every return code:

int configure_crash_safety(sqlite3 *db) {
    char *err = NULL;
    int rc;

    rc = sqlite3_exec(db, "PRAGMA journal_mode=WAL;", NULL, NULL, &err);
    if (rc != SQLITE_OK) goto fail;          /* WAL is the prerequisite for NORMAL */

    rc = sqlite3_exec(db, "PRAGMA synchronous=NORMAL;", NULL, NULL, &err);
    if (rc != SQLITE_OK) goto fail;          /* no per-commit fsync; power-loss safe */

    return SQLITE_OK;
fail:
    fprintf(stderr, "PRAGMA configuration failed: %s\n", err);
    sqlite3_free(err);
    return rc;
}

Only raise the level to FULL when a regulatory or safety requirement mandates that every acknowledged commit be durable through power loss, and only after you have measured the per-commit fsync() cost on the real storage. Never ship OFF on any device that can lose power; reserve it for ephemeral scratch or cache databases whose contents are disposable.

Verification

Three checks, cheapest first.

The read-back is already inside the factory: a startup line reading crash-safe connection verified: synchronous=NORMAL, journal_mode=WAL proves the level reached this connection. Treat its absence as a failed deployment, and assert it explicitly in CI image builds:

with crash_safe_connection("/var/lib/app/data.db") as conn:
    assert conn.execute("PRAGMA synchronous;").fetchone()[0] == 1   # 1 == NORMAL
    assert conn.execute("PRAGMA journal_mode;").fetchone()[0].lower() == "wal"

Next, prove the durability claim under an actual crash rather than trusting the level name. Kill the process mid-transaction and confirm the database recovers clean — WAL replay should restore a consistent state with no manual intervention:

# Terminal 1: start a writer looping INSERTs inside crash_safe_connection
# Terminal 2: sever it mid-write, then check integrity on reopen
kill -9 "$(pgrep -f writer.py)"
sqlite3 /var/lib/app/data.db "PRAGMA integrity_check;"   # must print: ok

A result of ok after a kill -9 is the proof that NORMAL holds; a malformed/*** in database result means you are still on OFF or running NORMAL without WAL — go back to Diagnosis. Finally, watch commit latency under load: with NORMAL it should track write size, not storage fsync cost. Latency that spikes on flash is the tell that a FULL level leaked in from somewhere.

Failure Modes & Gotchas

A pooled connection inherits an unsafe level from a previous tenant. synchronous is per connection and resets to the compile-time default on every fresh handle, so a pool that recycled a connection last used with synchronous=OFF for a bulk import can hand that same unsafe handle to a writer that assumes NORMAL. The tuned write path and the production write path become different connections, and the read-back assertion never runs on the live one. Enforce the level inside the pool’s connection-acquisition hook — not a one-time init script — and validate it with PRAGMA synchronous; before releasing each handle, the same recycled-handle discipline covered in connection pooling strategies.

NORMAL defers durability to a checkpoint that never fires. Because NORMAL only flushes the main database at checkpoint time, a workload that out-writes its checkpoint capacity — common when background workers queue transactions without ever triggering one, as in async execution patterns — grows the -wal file until it hits journal_size_limit or fills the disk, and the “safe” commits sit unpersisted in the log. Do not answer this by raising synchronous; keep NORMAL and bound the log with explicit wal_checkpoint(PASSIVE) calls during idle windows, sizing them against your checkpoint frequency tuning thresholds.

FULL on wear-sensitive flash trades throughput and lifespan for durability you may not need. Forcing an fsync() at every commit on QLC NAND or a low-end SD card can push per-commit stalls past 500 ms and multiply erase cycles, shortening device life. If a compliance rule genuinely requires FULL, cushion it: batch commits to amortize the flush where transactional boundaries allow (see threshold tuning for high-write workloads), pair reads with memory-mapped I/O to keep the read path off the flushed path, and confirm the controller firmware honors fsync for real power-loss recovery — a controller that lies about flushing makes FULL’s guarantee fiction anyway.

PRAGMA Optimization Guide — the parent reference: the full connection-scoped PRAGMA baseline and initialization order this level fits into.
Optimizing wal_autocheckpoint for Continuous Logging — where NORMAL’s deferred flush meets checkpoint cadence on an append-only logger.
Handling WAL File Bloat on Constrained Storage — bounding the -wal file that NORMAL leaves the checkpoint to drain.

Configuring synchronous PRAGMA for Crash Safety #

Diagnosis #

Solution #

Verification #

Failure Modes & Gotchas #

Related Pages #

Configuring synchronous PRAGMA for Crash Safety

Diagnosis

Solution

Verification

Failure Modes & Gotchas

Related Pages