Files
turingpi/TODO-cloudnativepg-migration.md
gilgamezh d6ee993a60 docs: add CloudNativePG migration TODO for postgresql
Plan to move both Bitnami postgres instances (pgsql PG16 in default,
gitea-postgresql PG17 bundled in gitea) to CloudNativePG, since Bitnami
images are frozen (bitnamilegacy). Not executed -- planning doc only.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-31 11:15:54 +02:00

175 lines
7.7 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# TODO: Migrate PostgreSQL to CloudNativePG
**Status:** planned, not started
**Target:** [CloudNativePG](https://cloudnative-pg.io/) (CNPG operator)
**Created:** 2026-05-31
## Why
Both Postgres instances run **Bitnami** images, which Bitnami froze in
Aug 2025 (free images moved to the `bitnamilegacy` archive and stopped
getting updates / security patches):
| Instance | Namespace | Version | Image | How deployed | Consumers |
|---|---|---|---|---|---|
| `pgsql` | `default` | PG **16.2** | `docker.io/bitnami/postgresql:16.2.0-debian-12-r8` (non-legacy, tag effectively gone from the registry) | standalone helm release `pgsql` (chart `postgresql-15.1.2`) | lidarr, radarr, sonarr, prowlarr, alhfmf |
| `gitea-postgresql` | `gitea` | PG **17.6** | `bitnamilegacy/postgresql:17.6.0-debian-12-r4` | **bundled subchart** inside the `gitea` helm release | gitea only |
CloudNativePG is the chosen replacement: actively maintained, operator-managed
HA/backups, first-class `pg_dump`-based import for migration, and not tied to
Bitnami.
> ⚠️ `pgsql` is on the **non-legacy** `bitnami/` path whose `16.2.0` tag is no
> longer pullable — if that pod is ever rescheduled to a node without the image
> cached, it will **fail to start**. Treat instance A as the higher priority.
## Databases to migrate
- **pgsql (PG16):** `alhfmf`, `lidarr_db`, `lidarr_db_log`, `radarr_db`,
`radarr_db_log`, `sonarr_db`, `sonarr_db_log`, `prowlarr_db`, `prowlarr_db_log`
(confirm full list at cutover with `\l`).
- **gitea (PG17):** `gitea` (owner role `gitea`).
---
## 0. Prerequisites
- [ ] Install the CNPG operator (pin a recent stable version):
```bash
helm repo add cnpg https://cloudnative-pg.github.io/charts
helm repo update cnpg
helm upgrade --install cnpg cnpg/cloudnative-pg \
-n cnpg-system --create-namespace --wait
kubectl get deploy -n cnpg-system # operator Running
```
- [ ] Decide on storage: reuse `nfs-client` StorageClass, **or** prefer local
storage for the DB PVCs (Postgres on NFS is workable but not ideal;
CNPG defaults to RWO). Pick a `storageClass` + size per cluster below.
- [ ] Decide CNPG topology: homelab can run `instances: 1` (no HA) to keep it
light on the Pi nodes; bump to 23 later if desired.
- [ ] Take a manual backup of both instances first (safety net):
```bash
kubectl exec -n default pgsql-postgresql-0 -- \
bash -c 'PGPASSWORD=$POSTGRES_PASSWORD pg_dumpall -U postgres' > pgsql-all.sql
kubectl exec -n gitea gitea-postgresql-0 -- \
bash -c 'PGPASSWORD=$POSTGRES_PASSWORD pg_dumpall -U postgres' > gitea-all.sql
```
---
## Instance A — `pgsql` (media *arr + alhfmf), PG16
CNPG can pull data straight from the old server at bootstrap via
`initdb.import` (it runs `pg_dump`/`pg_restore` for you). Use **monolith** type
to copy all listed databases + roles in one shot.
- [ ] Keep the old `pgsql` release **running** during migration (read source).
- [ ] Create a secret with the old server's superuser creds for the importer:
```bash
# password: kubectl get secret pgsql-postgresql -n default -o jsonpath='{.data.postgres-password}' | base64 -d
kubectl create secret generic pg16-source-superuser -n default \
--from-literal=username=postgres --from-literal=password='<OLD_PW>'
```
- [ ] Apply a CNPG `Cluster` with import bootstrap (sketch — tune names/size):
```yaml
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: pg16
namespace: default
spec:
instances: 1
imageName: ghcr.io/cloudnative-pg/postgresql:16 # match major 16
storage:
size: 10Gi
storageClass: nfs-client # or local SC — see prereqs
bootstrap:
initdb:
import:
type: monolith
databases: ["alhfmf","lidarr_db","lidarr_db_log","radarr_db","radarr_db_log","sonarr_db","sonarr_db_log","prowlarr_db","prowlarr_db_log"]
roles: ["*"]
source:
externalCluster: old-pgsql
externalClusters:
- name: old-pgsql
connectionParameters:
host: pgsql-postgresql.default.svc.cluster.local
user: postgres
dbname: postgres
password:
name: pg16-source-superuser
key: password
```
- [ ] Wait for import to finish: `kubectl get cluster pg16 -n default -w`
(phase → `Cluster in healthy state`). Check logs of the bootstrap job.
- [ ] Verify row counts / `\dt` in a couple of DBs vs. the old server.
- [ ] **Cutover:** the new service is `pg16-rw.default.svc.cluster.local:5432`.
Update each consumer's DB host:
- [ ] lidarr — `custom_helm_charts/lidarr` / `helm-values/lidarr_values.yaml`
- [ ] radarr — `helm-values/radarr_values.yaml`
- [ ] sonarr — `helm-values/sonarr_values.yaml`
- [ ] prowlarr — `custom_helm_charts/prowlarr` / `helm-values/prowlarr_values.yml`
- [ ] alhfmf — `alhfmf` release (find its DB env/secret)
(the *arr apps store DB host/creds in their `config.xml`/Postgres env —
confirm where each one is configured before flipping.)
Commit + push so Argo redeploys; verify each app reconnects.
- [ ] Soak for a few days.
- [ ] Decommission old: `helm uninstall pgsql -n default`, delete its PVC
(`data-pgsql-postgresql-0`) and `resources/pgsql_persistent_volume.yml`,
remove `non_argo_values/pgsql_values.yaml`, drop `pg16-source-superuser`.
---
## Instance B — `gitea` DB, PG17
Trickier because Postgres is a **subchart of the gitea release**, so it must be
externalized from the gitea chart.
- [ ] Stand up a CNPG `Cluster` `pg17` in the `gitea` ns, PG major **17**
(`imageName: ghcr.io/cloudnative-pg/postgresql:17`), same import approach:
`databases: ["gitea"]`, `roles: ["*"]`, source =
`gitea-postgresql.gitea.svc.cluster.local`, superuser secret from
`kubectl get secret gitea-postgresql -n gitea -o jsonpath='{.data.postgres-password}'`.
- [ ] **Quiesce gitea during final cutover** (scale gitea deploy to 0) so the
source DB is static, then run/refresh the import (or do a final `pg_dump`
of `gitea` and restore into `pg17`) to avoid losing writes.
- [ ] Edit `non_argo_values/gitea_values.yaml`:
- set `postgresql.enabled: false` (drop the bundled subchart)
- point `gitea.config.database` at the CNPG service:
```yaml
gitea:
config:
database:
DB_TYPE: postgres
HOST: pg17-rw.gitea.svc.cluster.local:5432
NAME: gitea
USER: gitea
PASSWD: <from CNPG-managed secret pg17-app>
```
(CNPG creates an app user/secret; either reuse the migrated `gitea` role or
wire gitea to the `pg17-app` secret.)
- [ ] `helm upgrade gitea ...` (Recreate strategy already set). Bring gitea back
up; verify login + that the GitOps repo (`admin/turingpi.git`) is intact —
**Argo depends on this**, so validate before moving on.
- [ ] Decommission: once `postgresql.enabled: false` is live, the old
`gitea-postgresql` StatefulSet is removed by the chart; delete leftover
PVC `data-gitea-postgresql-0` after confirming the new DB is good.
---
## Rollback
- Old instances stay intact until the explicit decommission steps — to roll
back, just point the consumer/gitea config back at the original
`*-postgresql` service. Keep the `pg_dumpall` files until both soaks pass.
## Open questions / decisions
- [ ] NFS vs local storage for DB PVCs (perf + RWO behavior on node loss).
- [ ] HA (`instances: 1` vs `3`) given Pi resource limits.
- [ ] Backups: configure CNPG scheduled backups (Barman to NFS or object store)
as part of this — replaces whatever (if any) backup the Bitnami charts did.
- [ ] Confirm exact per-app DB connection config for the 5 consumers of `pgsql`
before cutover (where each *arr stores its Postgres host/creds).