Disaster Recovery Plan K0nsult CNC Infrastructure

Internal
RPO (Recovery Point Objective)
24h
Daily Fly.io PostgreSQL backup
RTO (Recovery Time Objective)
1h
Redeploy from git repository
Redundancy
2x
Two machines in same region
Backup Frequency
Daily
Fly.io automatic snapshots

Backup Strategy

K0nsult relies on Fly.io's built-in backup infrastructure for all persistent data.

Failover Architecture

K0nsult runs with 2 machines in the same Fly.io region. If one machine fails, Fly.io automatically routes traffic to the surviving instance.

Recovery Steps

Follow these steps in order when a service disruption is detected:

1
Check Fly.io status

Determine whether the issue is platform-wide or application-specific.

fly status -a k0nsult
2
Restart PostgreSQL

If the database is unresponsive, restart the Postgres cluster.

fly postgres restart -a k0nsult-db
3
Redeploy application

If the app container is corrupt or misconfigured, redeploy from the latest git commit.

fly deploy -a k0nsult
4
Verify health endpoint

Confirm the application is responding correctly after recovery.

curl https://k0nsult.fly.dev/health

Expected response: {"ok":true,"status":"operational"}

Important: If the database needs to be restored from a snapshot, use fly postgres restore and expect up to 24 hours of data loss (RPO). Coordinate with 0n40i4 before restoring.

Contact Escalation

If automated recovery fails or manual intervention is needed, contact the following in order:

0n40i4 (Tomasz Obara)
System Owner / Primary Contact
Roxkon (Konrad Rycerz)
Infrastructure / Secondary Contact

Failure Scenarios

Single machine failure

Fly.io automatically reroutes to the second machine. No manual action needed. Monitor logs for root cause.

Database connection failure

Step 1: Check fly postgres status. Step 2: Restart Postgres. Step 3: If persistent, check connection string in fly secrets.

Full application crash

Run fly deploy to redeploy from git. If the latest commit is broken, deploy a known-good commit: fly deploy --image registry.fly.io/k0nsult:sha-xxxxxxx

Data corruption / accidental deletion

Restore from the latest Fly.io Postgres snapshot. Contact 0n40i4 immediately. Maximum data loss: 24 hours.

DNS / certificate failure

Check Fly.io certificate status: fly certs show -a k0nsult. If expired, Fly.io auto-renews via Let's Encrypt. Force renewal: fly certs add k0nsult.fly.dev

DR Testing Schedule