Backup & Recovery

The default overlay ships a pg_dump-based backup CronJob that runs every 6 hours, writes a gzipped logical dump to a backup PVC, and (optionally) syncs that PVC to an offsite remote via rclone.

This is intentionally simple — no operators, no S3 SDKs in the cluster, no WAL archiving. If you need point-in-time recovery (PITR) or block-level base backups, see Upgrade path: barmanObjectStore below.

Architecture

CNPG cluster (autocom-db-r service)
        │
        ▼  pg_dump --format=plain | gzip
   /backups/autocom-YYYYMMDD-HHMMSS.sql.gz   (PVC: backup-pvc, 20Gi)
        │
        ▼  rclone sync (optional, every 6h, +15min after the dump)
   <remote>:autocom-backups/<env>/           (B2, R2, S3, GDrive — anything rclone speaks)
Layer What RPO Schedule
Local dump pg_dump to backup-pvc up to 6h every 6h on the hour
Offsite sync rclone to your remote up to 6h 15m every 6h at :15
Retention dumps older than LOCAL_RETENTION_DAYS are deleted each run

RPO (Recovery Point Objective): up to 6 hours with the default schedule. Shorten by editing the CronJob schedule if you need tighter.

What ships out of the box

Everything below is part of k8s/overlays/default/backup.yaml and applies on the same kubectl apply -k you use for the rest of the stack:

  • PersistentVolumeClaim/backup-pvc (20 GiB)
  • ConfigMap/backup-configLOCAL_RETENTION_DAYS=14, RCLONE_REMOTE/RCLONE_PATH (empty by default)
  • Secret/backup-offsite — placeholder for your rclone.conf
  • CronJob/postgres-backup-local — runs pg_dump/backups/autocom-<stamp>.sql.gz
  • CronJob/postgres-backup-offsite — runs rclone sync /backups <remote>:<path> (no-op if RCLONE_REMOTE is empty)
  • NetworkPolicy/allow-postgres-backup — egress to Postgres on 5432, DNS, and 443 (rclone)

The local dump uses the autocom-db-app secret (created during CNPG setup) and connects to the autocom-db-r service, which prefers replicas — so dumps don't load the primary.

Manual / on-demand dump

Run an immediate backup outside the schedule:

kubectl -n autocom-k8s create job --from=cronjob/postgres-backup-local \
  manual-$(date -u +%Y%m%d-%H%M%S)
kubectl -n autocom-k8s wait --for=condition=complete --timeout=180s \
  job -l job-name=manual-...

Inspect what's on the PVC:

kubectl -n autocom-k8s run --rm -i ls-backups --image=alpine --restart=Never \
  --overrides='{"spec":{"containers":[{"name":"x","image":"alpine","command":["ls","-lh","/backups"],"volumeMounts":[{"name":"b","mountPath":"/backups"}]}],"volumes":[{"name":"b","persistentVolumeClaim":{"claimName":"backup-pvc"}}]}}'

Restore

The default backup is a logical (pg_dump --format=plain) gzipped SQL file. Restore it into the running CNPG cluster by streaming through psql:

# 1. Pick the dump
DUMP=autocom-20260501-080635.sql.gz

# 2. Stream from the backup PVC into the primary
kubectl -n autocom-k8s run --rm -i restore-helper \
  --image=postgres:16-alpine --restart=Never \
  --env="PGHOST=autocom-db-rw" \
  --env="PGUSER=postgres" \
  --env="PGPASSWORD=$(kubectl -n autocom-k8s get secret autocom-db-superuser -o jsonpath='{.data.password}' | base64 -d)" \
  --env="PGDATABASE=autocom" \
  --overrides='{"spec":{"containers":[{"name":"x","image":"postgres:16-alpine","command":["sh","-c","gunzip -c /backups/'"$DUMP"' | psql"],"volumeMounts":[{"name":"b","mountPath":"/backups"}]}],"volumes":[{"name":"b","persistentVolumeClaim":{"claimName":"backup-pvc"}}]}}'

The dump is taken with --clean --if-exists, so it drops and recreates each object — running it on a non-empty database will overwrite existing data.

After restore, fix table ownership (the dump preserves no owner info):

kubectl -n autocom-k8s exec autocom-db-1 -c postgres -- \
  psql -U postgres -d autocom -c "
    DO \$\$ DECLARE r record; BEGIN
      FOR r IN SELECT 'ALTER TABLE \"' || tablename || '\" OWNER TO autocom;' AS sql FROM pg_tables WHERE schemaname='public' LOOP EXECUTE r.sql; END LOOP;
      FOR r IN SELECT 'ALTER SEQUENCE \"' || sequencename || '\" OWNER TO autocom;' AS sql FROM pg_sequences WHERE schemaname='public' LOOP EXECUTE r.sql; END LOOP;
    END \$\$;
  "

Enabling offsite sync

The offsite CronJob no-ops until you give it an rclone config. To enable Backblaze B2 (cheapest egress for this use case):

# 1. Create rclone.conf locally (or copy from your laptop)
cat > /tmp/rclone.conf <<EOF
[b2]
type = b2
account = <your-b2-key-id>
key = <your-b2-app-key>
EOF

# 2. Replace the placeholder secret
kubectl -n autocom-k8s create secret generic backup-offsite \
  --from-file=rclone.conf=/tmp/rclone.conf \
  --dry-run=client -o yaml | kubectl apply -f -

# 3. Tell the CronJob where to push
kubectl -n autocom-k8s patch cm backup-config --type=merge -p '{
  "data": {
    "RCLONE_REMOTE": "b2",
    "RCLONE_PATH":   "autocom-backups/vps-arm"
  }
}'

Replace b2 with r2, s3, gdrive, etc. — any rclone backend works. The CronJob uses rclone sync --checksum, so deleted local dumps are deleted remotely too.

Tuning

Setting Where Default Notes
Backup schedule CronJob/postgres-backup-local .spec.schedule 0 */6 * * * every 6h on the hour (UTC)
Offsite schedule CronJob/postgres-backup-offsite .spec.schedule 15 */6 * * * 15 min after each local dump
Local retention ConfigMap/backup-config LOCAL_RETENTION_DAYS 14 deleted by find -mtime +N -delete
Backup PVC size PersistentVolumeClaim/backup-pvc .resources.requests.storage 20Gi bump if dumps grow
Source endpoint hard-coded in CronJob env (PGHOST=autocom-db-r) replicas-preferred keeps load off the primary

Disaster recovery scenarios

Database corruption / accidental drop

  1. Pick the most recent good autocom-*.sql.gz from the backup PVC (or your offsite remote).
  2. Restore via the recipe above.
  3. Reissue any writes since the dump from app logs / event store if you have one.

Whole CNPG cluster lost

  1. Recreate the cluster (kubectl apply -k k8s/overlays/default) and the bootstrap secrets (Database: CloudNativePG).
  2. Wait for the cluster to reach READY 3/3.
  3. Restore the latest dump.

Whole node lost (single-node k3s)

The local-path PVCs live on the node — they're gone with the node. Restore from your offsite copy. This is the case offsite sync exists for; if you haven't enabled it yet, do it now.

Upgrade path: barmanObjectStore

When you outgrow logical dumps and want continuous WAL archiving + PITR, switch the cluster to CNPG's native barmanObjectStore backups:

# In k8s/overlays/default/postgres.yaml, add to the Cluster spec:
spec:
  backup:
    barmanObjectStore:
      destinationPath: s3://autocom-backups/
      endpointURL: https://<r2-or-s3-endpoint>
      s3Credentials:
        accessKeyId:     { name: backup-s3-creds, key: ACCESS_KEY }
        secretAccessKey: { name: backup-s3-creds, key: SECRET_KEY }
    retentionPolicy: "30d"

# And a ScheduledBackup CRD for daily base backups:
---
apiVersion: postgresql.cnpg.io/v1
kind: ScheduledBackup
metadata:
  name: autocom-db-daily
spec:
  schedule: "0 0 2 * * *"   # 02:00 daily
  cluster: { name: autocom-db }
  method: barmanObjectStore

That gives you second-level RPO and lets you bootstrap a brand-new cluster from any point in time using bootstrap.recovery.recoveryTarget.targetTime. The pg_dump CronJob can stay (belt-and-braces) or be removed once barman is verified.