feat(maintainerr): deploy for watched-movie cleanup (Plex -> Radarr)

Rule-based deletion of watched movies from Radarr (with files), driven by Maintainerr. Raw manifests + directory-type Argo Application (no Helm). Config on shared plex-data NFS PVC (subPath configs/maintainerr); Recreate strategy since it uses SQLite on RWX NFS. ClusterIP only, no ingress — access via kubectl/k9s port-forward. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
docs: add CloudNativePG migration TODO for postgresql
2026-06-01 21:34:44 +02:00 · 2026-05-31 11:15:54 +02:00 · 2026-05-31 11:13:59 +02:00 · 2026-05-31 10:59:16 +02:00 · 2026-05-31 10:59:03 +02:00 · 2026-05-31 10:20:16 +02:00
9 changed files with 345 additions and 2 deletions
@@ -0,0 +1,39 @@
+# turingpi
+
+Home k3s cluster on a Turing Pi (arm64 nodes), GitOps-managed by ArgoCD.
+
+## Deploying changes
+
+- Apps are ArgoCD `Application`s in `applications/*.yaml`, pointing at an **internal Gitea**
+  repo (`gitea-http.gitea.svc.cluster.local`), `targetRevision: HEAD`.
+- The git remote is `gitea` (`git@192.168.222.26:admin/turingpi.git`); working branch is **`master`**.
+- To deploy: **commit and push to `gitea/master`**. Apps have `syncPolicy.automated` with
+  `selfHeal: true`, so direct `kubectl patch`/`edit` is reverted — changes must go through git.
+- Argo polls Gitea every ~3 min. Force a sync with:
+  `kubectl -n argocd annotate application <name> argocd.argoproj.io/refresh=hard --overwrite`
+- Helm values: `helm-values/<app>_values.yaml`. Custom charts: `custom_helm_charts/<app>/`.
+
+## Gotchas
+
+### qbittorrent + gluetun (AirVPN) — DNS / restart loop
+
+AirVPN blocks outbound **DNS-over-TLS (tcp/853)** to force its own resolver. Gluetun's default
+`DOT=on` resolver (127.0.0.1) therefore never gets answers, **all DNS fails**, and the VPN
+startup healthcheck (`lookup cloudflare.com`) times out — gluetun restarts the VPN every ~6s in a
+permanent loop. The pod still shows `2/2 Running` with 0 restarts, so it looks healthy while
+having no usable network.
+
+The gluetun sidecar in `helm-values/qbittorrent_values.yaml` **must** keep:
+
+```yaml
+- name: DOT
+  value: "off"
+- name: DNS_ADDRESS
+  value: "10.128.0.1"   # AirVPN's pushed resolver, reached over the tunnel — no DNS leak
+```
+
+Diagnose: gluetun logs repeat `restarting VPN ... lookup ... i/o timeout`. Confirm with
+`ping 8.8.8.8` (works) and `nslookup x 10.128.0.1` (works) but `curl 1.1.1.1:853` (times out).
+
+Note: the `7e0a38d` "pin gluetun to v3.41.1" commit message falsely claimed v3.41.1 fixed this
+DNS timeout. It did not — don't trust that claim.
@@ -0,0 +1,174 @@
+# TODO: Migrate PostgreSQL to CloudNativePG
+
+**Status:** planned, not started
+**Target:** [CloudNativePG](https://cloudnative-pg.io/) (CNPG operator)
+**Created:** 2026-05-31
+
+## Why
+
+Both Postgres instances run **Bitnami** images, which Bitnami froze in
+Aug 2025 (free images moved to the `bitnamilegacy` archive and stopped
+getting updates / security patches):
+
+| Instance | Namespace | Version | Image | How deployed | Consumers |
+|---|---|---|---|---|---|
+| `pgsql` | `default` | PG **16.2** | `docker.io/bitnami/postgresql:16.2.0-debian-12-r8` (non-legacy, tag effectively gone from the registry) | standalone helm release `pgsql` (chart `postgresql-15.1.2`) | lidarr, radarr, sonarr, prowlarr, alhfmf |
+| `gitea-postgresql` | `gitea` | PG **17.6** | `bitnamilegacy/postgresql:17.6.0-debian-12-r4` | **bundled subchart** inside the `gitea` helm release | gitea only |
+
+CloudNativePG is the chosen replacement: actively maintained, operator-managed
+HA/backups, first-class `pg_dump`-based import for migration, and not tied to
+Bitnami.
+
+> ⚠️ `pgsql` is on the **non-legacy** `bitnami/` path whose `16.2.0` tag is no
+> longer pullable — if that pod is ever rescheduled to a node without the image
+> cached, it will **fail to start**. Treat instance A as the higher priority.
+
+## Databases to migrate
+
+- **pgsql (PG16):** `alhfmf`, `lidarr_db`, `lidarr_db_log`, `radarr_db`,
+  `radarr_db_log`, `sonarr_db`, `sonarr_db_log`, `prowlarr_db`, `prowlarr_db_log`
+  (confirm full list at cutover with `\l`).
+- **gitea (PG17):** `gitea` (owner role `gitea`).
+
+---
+
+## 0. Prerequisites
+
+- [ ] Install the CNPG operator (pin a recent stable version):
+  ```bash
+  helm repo add cnpg https://cloudnative-pg.github.io/charts
+  helm repo update cnpg
+  helm upgrade --install cnpg cnpg/cloudnative-pg \
+    -n cnpg-system --create-namespace --wait
+  kubectl get deploy -n cnpg-system   # operator Running
+  ```
+- [ ] Decide on storage: reuse `nfs-client` StorageClass, **or** prefer local
+      storage for the DB PVCs (Postgres on NFS is workable but not ideal;
+      CNPG defaults to RWO). Pick a `storageClass` + size per cluster below.
+- [ ] Decide CNPG topology: homelab can run `instances: 1` (no HA) to keep it
+      light on the Pi nodes; bump to 2–3 later if desired.
+- [ ] Take a manual backup of both instances first (safety net):
+  ```bash
+  kubectl exec -n default pgsql-postgresql-0 -- \
+    bash -c 'PGPASSWORD=$POSTGRES_PASSWORD pg_dumpall -U postgres' > pgsql-all.sql
+  kubectl exec -n gitea gitea-postgresql-0 -- \
+    bash -c 'PGPASSWORD=$POSTGRES_PASSWORD pg_dumpall -U postgres' > gitea-all.sql
+  ```
+
+---
+
+## Instance A — `pgsql` (media *arr + alhfmf), PG16
+
+CNPG can pull data straight from the old server at bootstrap via
+`initdb.import` (it runs `pg_dump`/`pg_restore` for you). Use **monolith** type
+to copy all listed databases + roles in one shot.
+
+- [ ] Keep the old `pgsql` release **running** during migration (read source).
+- [ ] Create a secret with the old server's superuser creds for the importer:
+  ```bash
+  # password: kubectl get secret pgsql-postgresql -n default -o jsonpath='{.data.postgres-password}' | base64 -d
+  kubectl create secret generic pg16-source-superuser -n default \
+    --from-literal=username=postgres --from-literal=password='<OLD_PW>'
+  ```
+- [ ] Apply a CNPG `Cluster` with import bootstrap (sketch — tune names/size):
+  ```yaml
+  apiVersion: postgresql.cnpg.io/v1
+  kind: Cluster
+  metadata:
+    name: pg16
+    namespace: default
+  spec:
+    instances: 1
+    imageName: ghcr.io/cloudnative-pg/postgresql:16   # match major 16
+    storage:
+      size: 10Gi
+      storageClass: nfs-client            # or local SC — see prereqs
+    bootstrap:
+      initdb:
+        import:
+          type: monolith
+          databases: ["alhfmf","lidarr_db","lidarr_db_log","radarr_db","radarr_db_log","sonarr_db","sonarr_db_log","prowlarr_db","prowlarr_db_log"]
+          roles: ["*"]
+          source:
+            externalCluster: old-pgsql
+    externalClusters:
+      - name: old-pgsql
+        connectionParameters:
+          host: pgsql-postgresql.default.svc.cluster.local
+          user: postgres
+          dbname: postgres
+        password:
+          name: pg16-source-superuser
+          key: password
+  ```
+- [ ] Wait for import to finish: `kubectl get cluster pg16 -n default -w`
+      (phase → `Cluster in healthy state`). Check logs of the bootstrap job.
+- [ ] Verify row counts / `\dt` in a couple of DBs vs. the old server.
+- [ ] **Cutover:** the new service is `pg16-rw.default.svc.cluster.local:5432`.
+      Update each consumer's DB host:
+  - [ ] lidarr  — `custom_helm_charts/lidarr` / `helm-values/lidarr_values.yaml`
+  - [ ] radarr  — `helm-values/radarr_values.yaml`
+  - [ ] sonarr  — `helm-values/sonarr_values.yaml`
+  - [ ] prowlarr — `custom_helm_charts/prowlarr` / `helm-values/prowlarr_values.yml`
+  - [ ] alhfmf  — `alhfmf` release (find its DB env/secret)
+      (the *arr apps store DB host/creds in their `config.xml`/Postgres env —
+      confirm where each one is configured before flipping.)
+      Commit + push so Argo redeploys; verify each app reconnects.
+- [ ] Soak for a few days.
+- [ ] Decommission old: `helm uninstall pgsql -n default`, delete its PVC
+      (`data-pgsql-postgresql-0`) and `resources/pgsql_persistent_volume.yml`,
+      remove `non_argo_values/pgsql_values.yaml`, drop `pg16-source-superuser`.
+
+---
+
+## Instance B — `gitea` DB, PG17
+
+Trickier because Postgres is a **subchart of the gitea release**, so it must be
+externalized from the gitea chart.
+
+- [ ] Stand up a CNPG `Cluster` `pg17` in the `gitea` ns, PG major **17**
+      (`imageName: ghcr.io/cloudnative-pg/postgresql:17`), same import approach:
+      `databases: ["gitea"]`, `roles: ["*"]`, source =
+      `gitea-postgresql.gitea.svc.cluster.local`, superuser secret from
+      `kubectl get secret gitea-postgresql -n gitea -o jsonpath='{.data.postgres-password}'`.
+- [ ] **Quiesce gitea during final cutover** (scale gitea deploy to 0) so the
+      source DB is static, then run/refresh the import (or do a final `pg_dump`
+      of `gitea` and restore into `pg17`) to avoid losing writes.
+- [ ] Edit `non_argo_values/gitea_values.yaml`:
+  - set `postgresql.enabled: false` (drop the bundled subchart)
+  - point `gitea.config.database` at the CNPG service:
+    ```yaml
+    gitea:
+      config:
+        database:
+          DB_TYPE: postgres
+          HOST: pg17-rw.gitea.svc.cluster.local:5432
+          NAME: gitea
+          USER: gitea
+          PASSWD: <from CNPG-managed secret pg17-app>
+    ```
+    (CNPG creates an app user/secret; either reuse the migrated `gitea` role or
+    wire gitea to the `pg17-app` secret.)
+- [ ] `helm upgrade gitea ...` (Recreate strategy already set). Bring gitea back
+      up; verify login + that the GitOps repo (`admin/turingpi.git`) is intact —
+      **Argo depends on this**, so validate before moving on.
+- [ ] Decommission: once `postgresql.enabled: false` is live, the old
+      `gitea-postgresql` StatefulSet is removed by the chart; delete leftover
+      PVC `data-gitea-postgresql-0` after confirming the new DB is good.
+
+---
+
+## Rollback
+
+- Old instances stay intact until the explicit decommission steps — to roll
+  back, just point the consumer/gitea config back at the original
+  `*-postgresql` service. Keep the `pg_dumpall` files until both soaks pass.
+
+## Open questions / decisions
+
+- [ ] NFS vs local storage for DB PVCs (perf + RWO behavior on node loss).
+- [ ] HA (`instances: 1` vs `3`) given Pi resource limits.
+- [ ] Backups: configure CNPG scheduled backups (Barman to NFS or object store)
+      as part of this — replaces whatever (if any) backup the Bitnami charts did.
+- [ ] Confirm exact per-app DB connection config for the 5 consumers of `pgsql`
+      before cutover (where each *arr stores its Postgres host/creds).
@@ -0,0 +1,23 @@
+apiVersion: argoproj.io/v1alpha1
+kind: Application
+metadata:
+  name: maintainerr
+  namespace: argocd
+spec:
+  project: default
+  source:
+    repoURL: http://gitea-http.gitea.svc.cluster.local:3000/admin/turingpi.git
+    targetRevision: HEAD
+    path: manifests/maintainerr
+    directory:
+      recurse: false
+  destination:
+    server: https://kubernetes.default.svc
+    namespace: default
+  syncPolicy:
+    automated:
+      prune: true
+      selfHeal: true
+    syncOptions:
+    - CreateNamespace=true
+    - ServerSideApply=true
@@ -1,7 +1,7 @@
 claimToken: "claim-E_NxQDtUMMVsLCBFvybK"
 image:
  repository: linuxserver/plex
-  tag: "1.43.1.10611-1e34174b1-ls305"
+  tag: "1.43.2.10687-563d026ea-ls307"
  pullPolicy: Always
 kubePlex:
  enabled: false # kubePlex (transcoder job) is disabled because not available on ARM. The transcoding will be performed by the main Plex instance instead of a separate Job.
@@ -4,7 +4,7 @@ replicaCount: 1
 qbittorrent:
  image:
    repository: lscr.io/linuxserver/qbittorrent
-    tag: "5.1.4-r3-ls453"
+    tag: "5.2.1_v2.0.12-ls459"
    pullPolicy: Always
  env:
    - name: PUID
@@ -76,6 +76,14 @@ gluetun:
      value: "10.160.17.207/32,fd7d:76ee:e68f:a993:61d7:a5fe:f834:90e1/128"
    - name: SERVER_COUNTRIES
      value: "Netherlands"
+    # AirVPN blocks outbound DNS-over-TLS (tcp/853), so gluetun's default
+    # DoT resolver never gets answers and the startup healthcheck loops
+    # forever on "lookup cloudflare.com: i/o timeout". Use AirVPN's pushed
+    # plaintext resolver instead (reached over the tunnel, no DNS leak).
+    - name: DOT
+      value: "off"
+    - name: DNS_ADDRESS
+      value: "10.128.0.1"
    - name: FIREWALL_INPUT_PORTS
      value: "8080"
    - name: FIREWALL_VPN_INPUT_PORTS
@@ -0,0 +1,63 @@
+---
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: maintainerr
+  namespace: default
+  labels:
+    app: maintainerr
+spec:
+  replicas: 1
+  # SQLite on the shared NFS PVC: never run two writers at once.
+  strategy:
+    type: Recreate
+  selector:
+    matchLabels:
+      app: maintainerr
+  template:
+    metadata:
+      labels:
+        app: maintainerr
+    spec:
+      securityContext:
+        runAsUser: 1000
+        runAsGroup: 1000
+        fsGroup: 1000
+      containers:
+        - name: maintainerr
+          image: ghcr.io/maintainerr/maintainerr:latest
+          imagePullPolicy: Always
+          ports:
+            - name: http
+              containerPort: 6246
+              protocol: TCP
+          env:
+            - name: TZ
+              value: "Europe/Amsterdam"
+          volumeMounts:
+            - name: plex-data
+              mountPath: /opt/data
+              subPath: configs/maintainerr
+          readinessProbe:
+            httpGet:
+              path: /api/app/status
+              port: http
+            initialDelaySeconds: 20
+            periodSeconds: 15
+          livenessProbe:
+            httpGet:
+              path: /api/app/status
+              port: http
+            initialDelaySeconds: 60
+            periodSeconds: 30
+          resources:
+            requests:
+              memory: "256Mi"
+              cpu: "100m"
+            limits:
+              memory: "512Mi"
+              cpu: "1000m"
+      volumes:
+        - name: plex-data
+          persistentVolumeClaim:
+            claimName: plex-data
@@ -0,0 +1,17 @@
+---
+apiVersion: v1
+kind: Service
+metadata:
+  name: maintainerr
+  namespace: default
+  labels:
+    app: maintainerr
+spec:
+  type: ClusterIP
+  ports:
+    - port: 6246
+      targetPort: http
+      protocol: TCP
+      name: http
+  selector:
+    app: maintainerr
@@ -4,6 +4,12 @@
 # Single replica for homelab
 replicaCount: 1

+# Gitea data lives on a single RWO NFS PVC, so two pods can't run at once.
+# The chart default (RollingUpdate maxSurge 100%/maxUnavailable 0) surges a
+# second pod and deadlocks on upgrade -- use Recreate instead.
+strategy:
+  type: Recreate
+
 # Service configuration - LoadBalancer for direct access
 service:
  http:
@@ -0,0 +1,13 @@
+# MetalLB configuration for TuringPi K3s cluster
+#
+# L2 (ARP) mode only: the cluster advertises LoadBalancer IPs via the default
+# IPAddressPool + L2Advertisement (resources/metallb.yml). There are no BGP
+# peers, so FRR is not needed. Disable both the legacy embedded FRR and the
+# newer frr-k8s daemonset -- their images cause DiskPressure on the small Pi
+# nodes and add no value without BGP.
+speaker:
+  frr:
+    enabled: false
+
+frrk8s:
+  enabled: false
Author	SHA1	Message	Date
gilgamezh	79a28a674a	feat(maintainerr): deploy for watched-movie cleanup (Plex -> Radarr) Rule-based deletion of watched movies from Radarr (with files), driven by Maintainerr. Raw manifests + directory-type Argo Application (no Helm). Config on shared plex-data NFS PVC (subPath configs/maintainerr); Recreate strategy since it uses SQLite on RWX NFS. ClusterIP only, no ingress — access via kubectl/k9s port-forward. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-01 21:34:44 +02:00
gilgamezh	d6ee993a60	docs: add CloudNativePG migration TODO for postgresql Plan to move both Bitnami postgres instances (pgsql PG16 in default, gitea-postgresql PG17 bundled in gitea) to CloudNativePG, since Bitnami images are frozen (bitnamilegacy). Not executed -- planning doc only. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-31 11:15:54 +02:00
gilgamezh	2c68a21d0b	ops(metallb): upgrade 0.13.12 -> 0.16.1, pin native L2 (no FRR) 0.16.1 chart defaults frr.enabled=false but frrk8s.enabled=true, which deploys a heavy frr-k8s daemonset. With no BGP peers (pure L2/ARP), FRR is unnecessary and its images caused DiskPressure on the Pi nodes, evicting a speaker and stalling the rollout. Disable both frr and frrk8s for a single -container L2 speaker. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-31 11:13:59 +02:00
gilgamezh	1b3f34a432	build: upgrade qbittorrent 5.1.4-r3-ls453 -> 5.2.1_v2.0.12-ls459 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-31 10:59:16 +02:00
gilgamezh	261aebfd10	ops(gitea): Recreate strategy to avoid RWO upgrade deadlock Bumped gitea helm chart 12.4.0->12.6.0 (app 1.24.6->1.26.1). The chart default RollingUpdate (maxSurge 100%/maxUnavailable 0) surges a second pod that can't mount the single RWO NFS PVC, deadlocking 'helm upgrade --wait'. Recreate avoids it. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-31 10:59:03 +02:00
gilgamezh	9b24978342	docs: add CLAUDE.md (GitOps flow + AirVPN/gluetun DNS gotcha) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-31 10:20:16 +02:00
gilgamezh	1a91b72464	fix(qbittorrent): use AirVPN plaintext DNS, disable gluetun DoT AirVPN blocks outbound DNS-over-TLS (tcp/853), so gluetun's default DoT resolver at 127.0.0.1 never gets answers. The startup healthcheck's "lookup cloudflare.com" then times out and the VPN restarts every ~6s in a permanent loop, leaving qbittorrent with no working DNS. Verified inside the pod netns: tunnel egress works (ping 8.8.8.8 18ms), AirVPN's pushed resolver 10.128.0.1 resolves fine, but tcp/853 to both 1.1.1.1 and 8.8.8.8 times out. Set DOT=off and DNS_ADDRESS=10.128.0.1 so gluetun points resolv.conf at AirVPN's pushed DNS, reached over the tunnel (no DNS leak, no port 853). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-31 10:17:11 +02:00
argocd-image-updater	ac637adaf4	build: automatic update of plex updates image linuxserver/plex tag '1.43.1.10611-1e34174b1-ls306' to '1.43.2.10687-563d026ea-ls307'	2026-05-19 19:33:41 +00:00
argocd-image-updater	6082e6fc14	build: automatic update of plex updates image linuxserver/plex tag '1.43.1.10611-1e34174b1-ls305' to '1.43.1.10611-1e34174b1-ls306'	2026-05-18 12:53:50 +00:00