turingpi/TODO-cloudnativepg-migration.md

# TODO: Migrate PostgreSQL to CloudNativePG

**Status:** planned, not started
**Target:** [CloudNativePG](https://cloudnative-pg.io/) (CNPG operator)
**Created:** 2026-05-31

## Why

Both Postgres instances run **Bitnami** images, which Bitnami froze in
Aug 2025 (free images moved to the `bitnamilegacy` archive and stopped
getting updates / security patches):

| Instance | Namespace | Version | Image | How deployed | Consumers |
|---|---|---|---|---|---|
| `pgsql` | `default` | PG **16.2** | `docker.io/bitnami/postgresql:16.2.0-debian-12-r8` (non-legacy, tag effectively gone from the registry) | standalone helm release `pgsql` (chart `postgresql-15.1.2`) | lidarr, radarr, sonarr, prowlarr, alhfmf |
| `gitea-postgresql` | `gitea` | PG **17.6** | `bitnamilegacy/postgresql:17.6.0-debian-12-r4` | **bundled subchart** inside the `gitea` helm release | gitea only |

CloudNativePG is the chosen replacement: actively maintained, operator-managed
HA/backups, first-class `pg_dump`-based import for migration, and not tied to
Bitnami.

> ⚠️ `pgsql` is on the **non-legacy** `bitnami/` path whose `16.2.0` tag is no
> longer pullable — if that pod is ever rescheduled to a node without the image
> cached, it will **fail to start**. Treat instance A as the higher priority.

## Databases to migrate

- **pgsql (PG16):** `alhfmf`, `lidarr_db`, `lidarr_db_log`, `radarr_db`,
  `radarr_db_log`, `sonarr_db`, `sonarr_db_log`, `prowlarr_db`, `prowlarr_db_log`
  (confirm full list at cutover with `\l`).
- **gitea (PG17):** `gitea` (owner role `gitea`).

---

## 0. Prerequisites

- [ ] Install the CNPG operator (pin a recent stable version):
  ```bash
  helm repo add cnpg https://cloudnative-pg.github.io/charts
  helm repo update cnpg
  helm upgrade --install cnpg cnpg/cloudnative-pg \
    -n cnpg-system --create-namespace --wait
  kubectl get deploy -n cnpg-system   # operator Running
  ```
- [ ] Decide on storage: reuse `nfs-client` StorageClass, **or** prefer local
      storage for the DB PVCs (Postgres on NFS is workable but not ideal;
      CNPG defaults to RWO). Pick a `storageClass` + size per cluster below.
- [ ] Decide CNPG topology: homelab can run `instances: 1` (no HA) to keep it
      light on the Pi nodes; bump to 2–3 later if desired.
- [ ] Take a manual backup of both instances first (safety net):
  ```bash
  kubectl exec -n default pgsql-postgresql-0 -- \
    bash -c 'PGPASSWORD=$POSTGRES_PASSWORD pg_dumpall -U postgres' > pgsql-all.sql
  kubectl exec -n gitea gitea-postgresql-0 -- \
    bash -c 'PGPASSWORD=$POSTGRES_PASSWORD pg_dumpall -U postgres' > gitea-all.sql
  ```

---

## Instance A — `pgsql` (media *arr + alhfmf), PG16

CNPG can pull data straight from the old server at bootstrap via
`initdb.import` (it runs `pg_dump`/`pg_restore` for you). Use **monolith** type
to copy all listed databases + roles in one shot.

- [ ] Keep the old `pgsql` release **running** during migration (read source).
- [ ] Create a secret with the old server's superuser creds for the importer:
  ```bash
  # password: kubectl get secret pgsql-postgresql -n default -o jsonpath='{.data.postgres-password}' | base64 -d
  kubectl create secret generic pg16-source-superuser -n default \
    --from-literal=username=postgres --from-literal=password='<OLD_PW>'
  ```
- [ ] Apply a CNPG `Cluster` with import bootstrap (sketch — tune names/size):
  ```yaml
  apiVersion: postgresql.cnpg.io/v1
  kind: Cluster
  metadata:
    name: pg16
    namespace: default
  spec:
    instances: 1
    imageName: ghcr.io/cloudnative-pg/postgresql:16   # match major 16
    storage:
      size: 10Gi
      storageClass: nfs-client            # or local SC — see prereqs
    bootstrap:
      initdb:
        import:
          type: monolith
          databases: ["alhfmf","lidarr_db","lidarr_db_log","radarr_db","radarr_db_log","sonarr_db","sonarr_db_log","prowlarr_db","prowlarr_db_log"]
          roles: ["*"]
          source:
            externalCluster: old-pgsql
    externalClusters:
      - name: old-pgsql
        connectionParameters:
          host: pgsql-postgresql.default.svc.cluster.local
          user: postgres
          dbname: postgres
        password:
          name: pg16-source-superuser
          key: password
  ```
- [ ] Wait for import to finish: `kubectl get cluster pg16 -n default -w`
      (phase → `Cluster in healthy state`). Check logs of the bootstrap job.
- [ ] Verify row counts / `\dt` in a couple of DBs vs. the old server.
- [ ] **Cutover:** the new service is `pg16-rw.default.svc.cluster.local:5432`.
      Update each consumer's DB host:
  - [ ] lidarr  — `custom_helm_charts/lidarr` / `helm-values/lidarr_values.yaml`
  - [ ] radarr  — `helm-values/radarr_values.yaml`
  - [ ] sonarr  — `helm-values/sonarr_values.yaml`
  - [ ] prowlarr — `custom_helm_charts/prowlarr` / `helm-values/prowlarr_values.yml`
  - [ ] alhfmf  — `alhfmf` release (find its DB env/secret)
      (the *arr apps store DB host/creds in their `config.xml`/Postgres env —
      confirm where each one is configured before flipping.)
      Commit + push so Argo redeploys; verify each app reconnects.
- [ ] Soak for a few days.
- [ ] Decommission old: `helm uninstall pgsql -n default`, delete its PVC
      (`data-pgsql-postgresql-0`) and `resources/pgsql_persistent_volume.yml`,
      remove `non_argo_values/pgsql_values.yaml`, drop `pg16-source-superuser`.

---

## Instance B — `gitea` DB, PG17

Trickier because Postgres is a **subchart of the gitea release**, so it must be
externalized from the gitea chart.

- [ ] Stand up a CNPG `Cluster` `pg17` in the `gitea` ns, PG major **17**
      (`imageName: ghcr.io/cloudnative-pg/postgresql:17`), same import approach:
      `databases: ["gitea"]`, `roles: ["*"]`, source =
      `gitea-postgresql.gitea.svc.cluster.local`, superuser secret from
      `kubectl get secret gitea-postgresql -n gitea -o jsonpath='{.data.postgres-password}'`.
- [ ] **Quiesce gitea during final cutover** (scale gitea deploy to 0) so the
      source DB is static, then run/refresh the import (or do a final `pg_dump`
      of `gitea` and restore into `pg17`) to avoid losing writes.
- [ ] Edit `non_argo_values/gitea_values.yaml`:
  - set `postgresql.enabled: false` (drop the bundled subchart)
  - point `gitea.config.database` at the CNPG service:
    ```yaml
    gitea:
      config:
        database:
          DB_TYPE: postgres
          HOST: pg17-rw.gitea.svc.cluster.local:5432
          NAME: gitea
          USER: gitea
          PASSWD: <from CNPG-managed secret pg17-app>
    ```
    (CNPG creates an app user/secret; either reuse the migrated `gitea` role or
    wire gitea to the `pg17-app` secret.)
- [ ] `helm upgrade gitea ...` (Recreate strategy already set). Bring gitea back
      up; verify login + that the GitOps repo (`admin/turingpi.git`) is intact —
      **Argo depends on this**, so validate before moving on.
- [ ] Decommission: once `postgresql.enabled: false` is live, the old
      `gitea-postgresql` StatefulSet is removed by the chart; delete leftover
      PVC `data-gitea-postgresql-0` after confirming the new DB is good.

---

## Rollback

- Old instances stay intact until the explicit decommission steps — to roll
  back, just point the consumer/gitea config back at the original
  `*-postgresql` service. Keep the `pg_dumpall` files until both soaks pass.

## Open questions / decisions

- [ ] NFS vs local storage for DB PVCs (perf + RWO behavior on node loss).
- [ ] HA (`instances: 1` vs `3`) given Pi resource limits.
- [ ] Backups: configure CNPG scheduled backups (Barman to NFS or object store)
      as part of this — replaces whatever (if any) backup the Bitnami charts did.
- [ ] Confirm exact per-app DB connection config for the 5 consumers of `pgsql`
      before cutover (where each *arr stores its Postgres host/creds).