Storage Applications
This document covers the storage-related applications and services running in the cluster.
Storage Stack Overview
graph TD subgraph External Infrastructure CEPH[Proxmox Ceph Cluster] end subgraph Kubernetes Storage Layer ROOK[Rook Ceph Operator] CSI[Ceph CSI Drivers] VOLSYNC[VolSync] end subgraph Storage Consumers APPS[Applications] BACKUPS[Backup Repositories] end CEPH --> ROOK ROOK --> CSI CSI --> APPS VOLSYNC --> BACKUPS CSI --> VOLSYNC
Core Components
Rook Ceph Operator
Namespace: rook-ceph
Type: Helm Release
Purpose: Manages connection to external Ceph cluster and provides CSI drivers
The Rook operator is the bridge between Kubernetes and the external Ceph cluster. It:
- Manages CSI driver deployments
- Maintains connection to Ceph monitors
- Handles authentication and secrets
- Provides CephFS filesystem access
Configuration: kubernetes/apps/rook-ceph/rook-ceph-operator/app/helmrelease.yaml
Current Setup:
- CephFS Driver: Enabled ✅
- RBD Driver: Disabled (Phase 2)
- Connection Mode: External cluster
- Network: Public network 10.150.0.0/24
Key Resources:
# Check operator status
kubectl -n rook-ceph get pods -l app=rook-ceph-operator
# View operator logs
kubectl -n rook-ceph logs -l app=rook-ceph-operator -f
# Check CephCluster resource
kubectl -n rook-ceph get cephcluster
Rook Ceph Cluster Configuration
Namespace: rook-ceph
Type: CephCluster Custom Resource
Purpose: Defines external Ceph cluster connection
Configuration: kubernetes/apps/rook-ceph/rook-ceph-cluster/app/cluster-external.yaml
This resource tells Rook how to connect to the external Ceph cluster:
apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
name: rook-ceph
namespace: rook-ceph
spec:
external:
enable: true
dataDirHostPath: /var/lib/rook
cephVersion:
image: quay.io/ceph/ceph:v18
Monitor Configuration: Defined in ConfigMap rook-ceph-mon-endpoints
- Contains Ceph monitor IP addresses
- Critical for cluster connectivity
- Automatically referenced by CSI drivers
Authentication: Stored in Secret rook-ceph-mon
- Contains
client.kubernetes
Ceph credentials - Encrypted with SOPS
- Referenced by all CSI operations
Ceph CSI Drivers
Namespace: rook-ceph
Type: DaemonSet (nodes) + Deployment (provisioner)
Purpose: Enable Kubernetes to mount CephFS volumes
Components:
-
csi-cephfsplugin (DaemonSet)
- Runs on every node
- Mounts CephFS volumes to pods
- Handles node-level operations
-
csi-cephfsplugin-provisioner (Deployment)
- Creates/deletes CephFS subvolumes
- Handles dynamic provisioning
- Manages volume expansion
Monitoring:
# Check CSI pods
kubectl -n rook-ceph get pods -l app=csi-cephfsplugin
# View CSI driver logs
kubectl -n rook-ceph logs -l app=csi-cephfsplugin -c csi-cephfsplugin
# Check provisioner
kubectl -n rook-ceph get pods -l app=csi-cephfsplugin-provisioner
Storage Classes
Configuration: kubernetes/apps/rook-ceph/rook-ceph-cluster/app/storageclasses.yaml
cephfs-shared (Default)
Primary storage class for all dynamic provisioning:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: cephfs-shared
annotations:
storageclass.kubernetes.io/is-default-class: "true"
provisioner: rook-ceph.cephfs.csi.ceph.com
parameters:
clusterID: rook-ceph
fsName: cephfs
pool: cephfs_data
allowVolumeExpansion: true
reclaimPolicy: Delete
Usage: Default for all PVCs without explicit storageClassName
cephfs-static
For mounting pre-existing CephFS directories:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: cephfs-static
provisioner: rook-ceph.cephfs.csi.ceph.com
# Used with manually created PVs pointing to existing paths
Usage: Requires manual PV creation, see examples below
VolSync
Namespace: storage
Type: Helm Release
Purpose: Backup and recovery for Persistent Volume Claims
VolSync provides automated backup of all stateful applications using Restic.
Configuration: kubernetes/apps/storage/volsync/app/helmrelease.yaml
Backup Repository: CephFS-backed PVC
- Location:
volsync-cephfs-pvc
(5Ti) - Path:
/repository/{APP}/
for each application - Previous: NFS on vault.manor (migrated to CephFS)
How It Works:
- Applications create
ReplicationSource
resources - VolSync creates backup pods with mover containers
- Mover mounts both application PVC and repository PVC
- Restic backs up data to repository
- Retention policies keep configured snapshot count
Backup Pattern:
apiVersion: volsync.backube/v1alpha1
kind: ReplicationSource
metadata:
name: my-app
namespace: my-namespace
spec:
sourcePVC: my-app-data
trigger:
schedule: "0 * * * *" # Hourly
restic:
repository: my-app-restic-secret
retain:
hourly: 24
daily: 7
weekly: 4
Common Operations:
# Manual backup trigger
task volsync:snapshot NS=<namespace> APP=<app>
# List snapshots
task volsync:run NS=<namespace> REPO=<app> -- snapshots
# Unlock repository (if locked)
task volsync:unlock-local NS=<namespace> APP=<app>
# Restore to new PVC
task volsync:restore NS=<namespace> APP=<app>
Repository PVC Configuration: kubernetes/apps/storage/volsync/app/volsync-cephfs-pv.yaml
Static PV Examples
Media Storage
Large media library mounted from pre-existing CephFS path:
Location: kubernetes/apps/media/storage/app/media-cephfs-pv.yaml
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: media-cephfs-pv
spec:
capacity:
storage: 100Ti
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
storageClassName: cephfs-static
csi:
driver: rook-ceph.cephfs.csi.ceph.com
nodeStageSecretRef:
name: rook-csi-cephfs-static
namespace: rook-ceph
volumeAttributes:
clusterID: rook-ceph
fsName: cephfs
staticVolume: "true"
rootPath: /truenas/media
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: media-cephfs-pvc
namespace: media
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 100Ti
storageClassName: cephfs-static
volumeName: media-cephfs-pv
Usage: Mounted by Plex, Sonarr, Radarr, etc. for media library access
Minio Object Storage
Minio data stored on CephFS:
Location: kubernetes/apps/storage/minio/app/minio-cephfs-pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: minio-cephfs-pv
spec:
capacity:
storage: 10Ti
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
storageClassName: cephfs-static
csi:
driver: rook-ceph.cephfs.csi.ceph.com
nodeStageSecretRef:
name: rook-csi-cephfs-static
namespace: rook-ceph
volumeAttributes:
clusterID: rook-ceph
fsName: cephfs
staticVolume: "true"
rootPath: /truenas/minio
Paperless-ngx Document Storage
Document management system storage:
Location: kubernetes/apps/selfhosted/paperless-ngx/app/paperless-cephfs-pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: paperless-cephfs-pv
spec:
capacity:
storage: 5Ti
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
storageClassName: cephfs-static
csi:
driver: rook-ceph.cephfs.csi.ceph.com
nodeStageSecretRef:
name: rook-csi-cephfs-static
namespace: rook-ceph
volumeAttributes:
clusterID: rook-ceph
fsName: cephfs
staticVolume: "true"
rootPath: /truenas/paperless
Storage Operations
Creating a New Static PV
Step 1: Create directory in CephFS (on Proxmox Ceph node)
# SSH to a Proxmox node with Ceph access
mkdir -p /mnt/cephfs/truenas/my-app
chmod 777 /mnt/cephfs/truenas/my-app # Or appropriate permissions
Step 2: Create PV manifest
apiVersion: v1
kind: PersistentVolume
metadata:
name: my-app-cephfs-pv
spec:
capacity:
storage: 1Ti
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
storageClassName: cephfs-static
csi:
driver: rook-ceph.cephfs.csi.ceph.com
nodeStageSecretRef:
name: rook-csi-cephfs-static
namespace: rook-ceph
volumeAttributes:
clusterID: rook-ceph
fsName: cephfs
staticVolume: "true"
rootPath: /truenas/my-app
Step 3: Create PVC manifest
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: my-app-cephfs-pvc
namespace: my-namespace
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Ti
storageClassName: cephfs-static
volumeName: my-app-cephfs-pv
Step 4: Apply and verify
kubectl apply -f pv.yaml
kubectl apply -f pvc.yaml
kubectl get pv my-app-cephfs-pv
kubectl get pvc -n my-namespace my-app-cephfs-pvc
Expanding a PVC
CephFS supports online volume expansion:
# Edit PVC to increase size
kubectl patch pvc my-pvc -n my-namespace -p '{"spec":{"resources":{"requests":{"storage":"20Gi"}}}}'
# Verify expansion
kubectl get pvc -n my-namespace my-pvc -w
Note: Size can only increase, not decrease
Troubleshooting Mount Issues
PVC stuck in Pending:
# Check PVC events
kubectl describe pvc -n <namespace> <pvc-name>
# Check CSI driver logs
kubectl -n rook-ceph logs -l app=csi-cephfsplugin -c csi-cephfsplugin --tail=100
# Verify storage class exists
kubectl get sc cephfs-shared
Pod can't mount volume:
# Check pod events
kubectl describe pod -n <namespace> <pod-name>
# Verify Ceph cluster connectivity
kubectl -n rook-ceph get cephcluster
# Check Ceph health
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph status
# Verify CephFS is available
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph fs status
Slow I/O performance:
# Check MDS performance
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph fs status
# Check OSD performance
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph osd perf
# Identify slow operations
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph health detail
Monitoring and Alerts
Key Metrics
Monitor these via Prometheus/Grafana:
-
Storage Capacity
- Ceph cluster utilization
- Individual PVC usage
- Growth trends
-
Performance
- CSI operation latency
- MDS cache hit ratio
- OSD I/O rates
-
Reliability
- VolSync backup success rate
- Ceph health status
- CSI driver availability
Useful Queries
Check all PVCs by size:
kubectl get pvc -A -o custom-columns=NAMESPACE:.metadata.namespace,NAME:.metadata.name,SIZE:.spec.resources.requests.storage,STORAGECLASS:.spec.storageClassName --sort-by=.spec.resources.requests.storage
Find PVCs using old storage classes:
kubectl get pvc -A -o json | jq -r '.items[] | select(.spec.storageClassName == "nfs-csi" or .spec.storageClassName == "mayastor-etcd-localpv") | "\(.metadata.namespace)/\(.metadata.name) - \(.spec.storageClassName)"'
Check Ceph cluster capacity:
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph df
Monitor VolSync backups:
# Check all ReplicationSources
kubectl get replicationsource -A
# Check specific backup status
kubectl get replicationsource -n <namespace> <app> -o jsonpath='{.status.lastSyncTime}'
Backup and Recovery
VolSync Backup Workflow
- Application creates ReplicationSource
- VolSync creates backup pod (every hour by default)
- Restic backs up PVC to repository
- Snapshots retained per retention policy
- Status updated in ReplicationSource
Restore Procedures
Restore to original PVC:
# Scale down application
kubectl scale deployment -n <namespace> <app> --replicas=0
# Run restore
task volsync:restore NS=<namespace> APP=<app>
# Scale up application
kubectl scale deployment -n <namespace> <app> --replicas=1
Restore to new PVC:
- Create ReplicationDestination pointing to new PVC
- VolSync will restore data from repository
- Update application to use new PVC
- Verify data integrity
Disaster Recovery
Complete cluster rebuild:
- Deploy new Kubernetes cluster
- Install Rook with same external Ceph connection
- Recreate storage classes
- Deploy VolSync
- Restore all applications from backups
CephFS corruption:
- Check Ceph health and repair if possible
- If unrecoverable, restore from VolSync backups
- VolSync repository is on CephFS, so ensure repository is intact
- Consider external backup of VolSync repository
Security Considerations
Ceph Authentication
- Client Key:
client.kubernetes
Ceph user - Permissions: Limited to CephFS pools only
- Storage: SOPS-encrypted in
rook-ceph-mon
secret - Rotation: Should be rotated periodically
PVC Access Control
- Namespace Isolation: PVCs are namespace-scoped
- RBAC: Control who can create/delete PVCs
- Pod Security: Pods must have appropriate security context
- Network Policies: Limit which pods can access storage
Backup Security
- VolSync Repository: Protected by Kubernetes RBAC
- Restic Encryption: Repository encryption with per-app keys
- Snapshot Access: Controlled via ReplicationSource ownership
Future Enhancements (Phase 2)
RBD Block Storage
When Mayastor hardware is repurposed:
- Enable RBD driver in Rook operator
- Create RBD pools on Ceph cluster:
ssd-db
- Critical workloadsrook-pvc-pool
- General purposemedia-bulk
- Erasure-coded bulk storage
- Deploy RBD storage classes
- Migrate workloads based on performance requirements
Planned Improvements
- Ceph dashboard integration
- Advanced monitoring dashboards
- Automated capacity alerts
- Storage QoS policies
- Cross-cluster replication
References
- Rook Operator:
kubernetes/apps/rook-ceph/rook-ceph-operator/
- Cluster Config:
kubernetes/apps/rook-ceph/rook-ceph-cluster/
- Storage Classes:
kubernetes/apps/rook-ceph/rook-ceph-cluster/app/storageclasses.yaml
- VolSync:
kubernetes/apps/storage/volsync/
- Architecture:
docs/src/architecture/storage.md