Backup & Recovery
The default overlay ships a pg_dump-based backup CronJob that runs every 6 hours, writes a gzipped logical dump to a backup PVC, and (optionally) syncs that PVC to an offsite remote via rclone.
This is intentionally simple — no operators, no S3 SDKs in the cluster, no WAL archiving. If you need point-in-time recovery (PITR) or block-level base backups, see Upgrade path: barmanObjectStore below.
Architecture
CNPG cluster (autocom-db-r service)
│
▼ pg_dump --format=plain | gzip
/backups/autocom-YYYYMMDD-HHMMSS.sql.gz (PVC: backup-pvc, 20Gi)
│
▼ rclone sync (optional, every 6h, +15min after the dump)
<remote>:autocom-backups/<env>/ (B2, R2, S3, GDrive — anything rclone speaks)
| Layer | What | RPO | Schedule |
|---|---|---|---|
| Local dump | pg_dump to backup-pvc |
up to 6h | every 6h on the hour |
| Offsite sync | rclone to your remote | up to 6h 15m | every 6h at :15 |
| Retention | dumps older than LOCAL_RETENTION_DAYS are deleted |
— | each run |
RPO (Recovery Point Objective): up to 6 hours with the default schedule. Shorten by editing the CronJob schedule if you need tighter.
What ships out of the box
Everything below is part of k8s/overlays/default/backup.yaml and applies on the same kubectl apply -k you use for the rest of the stack:
PersistentVolumeClaim/backup-pvc(20 GiB)ConfigMap/backup-config—LOCAL_RETENTION_DAYS=14,RCLONE_REMOTE/RCLONE_PATH(empty by default)Secret/backup-offsite— placeholder for yourrclone.confCronJob/postgres-backup-local— runspg_dump→/backups/autocom-<stamp>.sql.gzCronJob/postgres-backup-offsite— runsrclone sync /backups <remote>:<path>(no-op ifRCLONE_REMOTEis empty)NetworkPolicy/allow-postgres-backup— egress to Postgres on 5432, DNS, and 443 (rclone)
The local dump uses the autocom-db-app secret (created during CNPG setup) and connects to the autocom-db-r service, which prefers replicas — so dumps don't load the primary.
Manual / on-demand dump
Run an immediate backup outside the schedule:
kubectl -n autocom-k8s create job --from=cronjob/postgres-backup-local \
manual-$(date -u +%Y%m%d-%H%M%S)
kubectl -n autocom-k8s wait --for=condition=complete --timeout=180s \
job -l job-name=manual-...
Inspect what's on the PVC:
kubectl -n autocom-k8s run --rm -i ls-backups --image=alpine --restart=Never \
--overrides='{"spec":{"containers":[{"name":"x","image":"alpine","command":["ls","-lh","/backups"],"volumeMounts":[{"name":"b","mountPath":"/backups"}]}],"volumes":[{"name":"b","persistentVolumeClaim":{"claimName":"backup-pvc"}}]}}'
Restore
The default backup is a logical (pg_dump --format=plain) gzipped SQL file. Restore it into the running CNPG cluster by streaming through psql:
# 1. Pick the dump
DUMP=autocom-20260501-080635.sql.gz
# 2. Stream from the backup PVC into the primary
kubectl -n autocom-k8s run --rm -i restore-helper \
--image=postgres:16-alpine --restart=Never \
--env="PGHOST=autocom-db-rw" \
--env="PGUSER=postgres" \
--env="PGPASSWORD=$(kubectl -n autocom-k8s get secret autocom-db-superuser -o jsonpath='{.data.password}' | base64 -d)" \
--env="PGDATABASE=autocom" \
--overrides='{"spec":{"containers":[{"name":"x","image":"postgres:16-alpine","command":["sh","-c","gunzip -c /backups/'"$DUMP"' | psql"],"volumeMounts":[{"name":"b","mountPath":"/backups"}]}],"volumes":[{"name":"b","persistentVolumeClaim":{"claimName":"backup-pvc"}}]}}'
The dump is taken with --clean --if-exists, so it drops and recreates each object — running it on a non-empty database will overwrite existing data.
After restore, fix table ownership (the dump preserves no owner info):
kubectl -n autocom-k8s exec autocom-db-1 -c postgres -- \
psql -U postgres -d autocom -c "
DO \$\$ DECLARE r record; BEGIN
FOR r IN SELECT 'ALTER TABLE \"' || tablename || '\" OWNER TO autocom;' AS sql FROM pg_tables WHERE schemaname='public' LOOP EXECUTE r.sql; END LOOP;
FOR r IN SELECT 'ALTER SEQUENCE \"' || sequencename || '\" OWNER TO autocom;' AS sql FROM pg_sequences WHERE schemaname='public' LOOP EXECUTE r.sql; END LOOP;
END \$\$;
"
Enabling offsite sync
The offsite CronJob no-ops until you give it an rclone config. To enable Backblaze B2 (cheapest egress for this use case):
# 1. Create rclone.conf locally (or copy from your laptop)
cat > /tmp/rclone.conf <<EOF
[b2]
type = b2
account = <your-b2-key-id>
key = <your-b2-app-key>
EOF
# 2. Replace the placeholder secret
kubectl -n autocom-k8s create secret generic backup-offsite \
--from-file=rclone.conf=/tmp/rclone.conf \
--dry-run=client -o yaml | kubectl apply -f -
# 3. Tell the CronJob where to push
kubectl -n autocom-k8s patch cm backup-config --type=merge -p '{
"data": {
"RCLONE_REMOTE": "b2",
"RCLONE_PATH": "autocom-backups/vps-arm"
}
}'
Replace b2 with r2, s3, gdrive, etc. — any rclone backend works. The CronJob uses rclone sync --checksum, so deleted local dumps are deleted remotely too.
Tuning
| Setting | Where | Default | Notes |
|---|---|---|---|
| Backup schedule | CronJob/postgres-backup-local .spec.schedule |
0 */6 * * * |
every 6h on the hour (UTC) |
| Offsite schedule | CronJob/postgres-backup-offsite .spec.schedule |
15 */6 * * * |
15 min after each local dump |
| Local retention | ConfigMap/backup-config LOCAL_RETENTION_DAYS |
14 |
deleted by find -mtime +N -delete |
| Backup PVC size | PersistentVolumeClaim/backup-pvc .resources.requests.storage |
20Gi |
bump if dumps grow |
| Source endpoint | hard-coded in CronJob env (PGHOST=autocom-db-r) |
replicas-preferred | keeps load off the primary |
Disaster recovery scenarios
Database corruption / accidental drop
- Pick the most recent good
autocom-*.sql.gzfrom the backup PVC (or your offsite remote). - Restore via the recipe above.
- Reissue any writes since the dump from app logs / event store if you have one.
Whole CNPG cluster lost
- Recreate the cluster (
kubectl apply -k k8s/overlays/default) and the bootstrap secrets (Database: CloudNativePG). - Wait for the cluster to reach
READY 3/3. - Restore the latest dump.
Whole node lost (single-node k3s)
The local-path PVCs live on the node — they're gone with the node. Restore from your offsite copy. This is the case offsite sync exists for; if you haven't enabled it yet, do it now.
Upgrade path: barmanObjectStore
When you outgrow logical dumps and want continuous WAL archiving + PITR, switch the cluster to CNPG's native barmanObjectStore backups:
# In k8s/overlays/default/postgres.yaml, add to the Cluster spec:
spec:
backup:
barmanObjectStore:
destinationPath: s3://autocom-backups/
endpointURL: https://<r2-or-s3-endpoint>
s3Credentials:
accessKeyId: { name: backup-s3-creds, key: ACCESS_KEY }
secretAccessKey: { name: backup-s3-creds, key: SECRET_KEY }
retentionPolicy: "30d"
# And a ScheduledBackup CRD for daily base backups:
---
apiVersion: postgresql.cnpg.io/v1
kind: ScheduledBackup
metadata:
name: autocom-db-daily
spec:
schedule: "0 0 2 * * *" # 02:00 daily
cluster: { name: autocom-db }
method: barmanObjectStore
That gives you second-level RPO and lets you bootstrap a brand-new cluster from any point in time using bootstrap.recovery.recoveryTarget.targetTime. The pg_dump CronJob can stay (belt-and-braces) or be removed once barman is verified.