Backup & Recovery

AutoCom uses CloudNativePG's built-in backup system with WAL archiving for continuous protection and point-in-time recovery (PITR).

Architecture

CloudNativePG Primary → WAL archiving → MinIO (S3-compatible)
                      → Daily base backup → MinIO

Three layers of protection:

Layer What RPO Schedule
WAL archiving Continuous transaction logs ~seconds Automatic
Base backup Full database snapshot Daily 2 AM daily
Retention Old backups cleaned up 30 days Automatic

RPO (Recovery Point Objective): seconds — you can restore to any point in time within the retention window.

Backup Storage

Backups are stored in MinIO (S3-compatible), deployed as a K8s pod with a 10Gi PVC.

For production, replace MinIO with:

  • Cloudflare R2: free egress, $0.015/GB
  • AWS S3: standard, reliable
  • Google GCS: if on GCP

Just change the endpointURL and credentials in the cluster YAML.

Manual Backup

Trigger an immediate backup:

cat << 'EOF' | kubectl apply -f -
apiVersion: postgresql.cnpg.io/v1
kind: Backup
metadata:
  name: manual-backup-$(date +%Y%m%d-%H%M)
  namespace: autocom
spec:
  cluster:
    name: autocom-db
  method: barmanObjectStore
EOF

Check status:

kubectl get backup -n autocom

Restore from Backup

Full restore (new cluster from backup)

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: autocom-db-restored
  namespace: autocom
spec:
  instances: 3

  bootstrap:
    recovery:
      source: autocom-db
      recoveryTarget:
        targetTime: "2026-04-11T20:00:00Z"  # Point in time to restore to

  externalClusters:
    - name: autocom-db
      barmanObjectStore:
        destinationPath: s3://autocom-backups/
        endpointURL: http://minio:9000
        s3Credentials:
          accessKeyId:
            name: minio-creds
            key: ACCESS_KEY
          secretAccessKey:
            name: minio-creds
            key: SECRET_KEY

  storage:
    size: 5Gi

This creates a new cluster restored to the exact timestamp specified.

Restore to latest

Omit recoveryTarget to restore to the latest available point:

  bootstrap:
    recovery:
      source: autocom-db

Switch application to restored cluster

After restore, update the configmap:

kubectl patch configmap autocom-config -n autocom \
  --type merge -p '{"data":{"DB_HOST":"autocom-db-restored-rw","DB_READ_HOST":"autocom-db-restored-ro"}}'
kubectl rollout restart deployment/api -n autocom

Monitoring Backups

# List all backups
kubectl get backup -n autocom

# Check scheduled backup status
kubectl get scheduledbackup -n autocom

# Check backup size in MinIO
kubectl exec -n autocom deployment/minio -- mc du local/autocom-backups/

# Check WAL archiving status
kubectl get cluster autocom-db -n autocom -o jsonpath='{.status.firstRecoverabilityPoint}'

Disaster Recovery Scenarios

Database corruption

  1. Identify the last known good time
  2. Create a PITR restore to that timestamp
  3. Switch the app to the restored cluster
  4. Verify data integrity

Accidental data deletion

  1. Check firstRecoverabilityPoint — can you go back far enough?
  2. PITR restore to just before the deletion
  3. Export the missing data from restored cluster
  4. Import into the live cluster

Complete node failure

  1. CloudNativePG auto-promotes a replica (no action needed)
  2. A new replica is created automatically
  3. If all nodes fail, restore from the latest backup in MinIO

MinIO failure

Backups in MinIO's PVC survive pod restarts. If the PVC is lost:

  • Existing database is unaffected (it's on separate PVCs)
  • WAL archiving pauses until MinIO is restored
  • Deploy a new MinIO and configure the same bucket

Configuration Reference

Setting Value Description
retentionPolicy 30d Keep backups for 30 days
archive_timeout 300 Force WAL archive every 5 minutes (even if not full)
schedule 0 0 2 * * * Daily backup at 2 AM
Storage 10Gi MinIO PVC Backup storage volume

Switching to Cloud Storage

Replace MinIO with S3:

backup:
  barmanObjectStore:
    destinationPath: s3://your-bucket/autocom/
    # Remove endpointURL for real S3
    s3Credentials:
      accessKeyId:
        name: aws-creds
        key: ACCESS_KEY_ID
      secretAccessKey:
        name: aws-creds
        key: SECRET_ACCESS_KEY

Replace with Cloudflare R2:

backup:
  barmanObjectStore:
    destinationPath: s3://your-bucket/autocom/
    endpointURL: https://YOUR_ACCOUNT_ID.r2.cloudflarestorage.com
    s3Credentials:
      accessKeyId:
        name: r2-creds
        key: ACCESS_KEY_ID
      secretAccessKey:
        name: r2-creds
        key: SECRET_ACCESS_KEY