Storage Architecture
Storage Overview
The Dapper Cluster uses Rook Ceph as its primary storage solution, providing unified storage for all Kubernetes workloads. The external Ceph cluster runs on Proxmox hosts and is connected to the Kubernetes cluster via Rook's external cluster mode.
graph TD subgraph External Ceph Cluster MON[Ceph Monitors] OSD[Ceph OSDs] MDS[Ceph MDS - CephFS] end subgraph Kubernetes Cluster ROOK[Rook Operator] CSI[Ceph CSI Drivers] SC[Storage Classes] end subgraph Applications APPS[Application Pods] PVC[Persistent Volume Claims] end MON --> ROOK ROOK --> CSI CSI --> SC SC --> PVC PVC --> APPS CSI --> MDS CSI --> OSD
Storage Architecture Decision
Why Rook Ceph?
The cluster migrated from OpenEBS Mayastor and various NFS backends to Rook Ceph for several key reasons:
- Unified Storage Platform: Single storage solution for all workload types
- External Cluster Design: Leverages existing Proxmox Ceph cluster infrastructure
- High Performance: Direct Ceph integration without NFS overhead
- Scalability: Native Ceph scalability for growing storage needs
- Feature Rich: Snapshots, cloning, expansion, and advanced storage features
- ReadWriteMany Support: CephFS provides shared filesystem access
- Production Proven: Mature, widely-adopted storage solution
Migration History
- Previous: OpenEBS Mayastor (block storage) + Unraid NFS backends (shared storage)
- Current: Rook Ceph with CephFS and RBD (unified storage platform)
- In Progress: Decommissioning Unraid servers (tower/tower-2) in favor of Ceph
Current Storage Classes
CephFS Shared Storage (Default)
Storage Class: cephfs-shared
Primary storage class for all workloads requiring dynamic provisioning.
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: cephfs-shared
annotations:
storageclass.kubernetes.io/is-default-class: "true"
provisioner: rook-ceph.cephfs.csi.ceph.com
parameters:
clusterID: rook-ceph
fsName: cephfs
pool: cephfs_data
allowVolumeExpansion: true
reclaimPolicy: Delete
volumeBindingMode: Immediate
Characteristics:
- Access Mode: ReadWriteMany (RWX) - Multiple pods can read/write simultaneously
- Use Cases:
- Applications requiring shared storage
- Media applications
- Backup repositories (VolSync)
- Configuration storage
- General application storage
- Performance: Good performance for most workloads, shared filesystem overhead
- Default: Yes - all PVCs without explicit storageClassName use this
CephFS Static Storage
Storage Class: cephfs-static
Used for pre-existing CephFS paths that need to be mounted into Kubernetes.
Characteristics:
- Access Mode: ReadWriteMany (RWX)
- Use Cases:
- Mounting existing data directories (e.g.,
/truenas/*
paths) - Large media libraries
- Shared configuration repositories
- Data migration scenarios
- Mounting existing data directories (e.g.,
- Provisioning: Manual - requires creating both PV and PVC
- Pattern: See "Static PV Pattern" section below
Example: Media storage at /truenas/media
apiVersion: v1
kind: PersistentVolume
metadata:
name: media-cephfs-pv
spec:
capacity:
storage: 100Ti
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
storageClassName: cephfs-static
csi:
driver: rook-ceph.cephfs.csi.ceph.com
nodeStageSecretRef:
name: rook-csi-cephfs-static
namespace: rook-ceph
volumeAttributes:
clusterID: rook-ceph
fsName: cephfs
staticVolume: "true"
rootPath: /truenas/media
RBD Block Storage
Storage Classes: ceph-rbd
, ceph-bulk
High-performance block storage using Ceph RADOS Block Devices.
Characteristics:
- Access Mode: ReadWriteOnce (RWO) - Single pod exclusive access
- Performance: Superior to CephFS for block workloads (databases, etc.)
- Thin Provisioning: Efficient storage allocation
- Features: Snapshots, clones, fast resizing
Use Cases:
- PostgreSQL and other databases
- Stateful applications requiring block storage
- Applications needing high IOPS
- Workloads migrating from OpenEBS Mayastor
Storage Classes:
ceph-rbd
: General-purpose RBD storageceph-bulk
: Erasure-coded pool for large, less-critical data
Legacy Unraid NFS Storage (Being Decommissioned)
Storage Class: used-nfs
(no storage class for static tower/tower-2 PVs)
Legacy NFS storage from Unraid servers, currently being migrated to Ceph.
Servers:
tower.manor
- Primary Unraid server (100Ti NFS) - Decommissioningtower-2.manor
- Secondary Unraid server (100Ti NFS) - Decommissioning
Current Status:
- Some media applications still use hybrid approach during migration
- Active data migration to CephFS in progress
- Will be fully retired once migration complete
Migration Plan: All workloads being moved to Ceph (CephFS or RBD as appropriate)
Storage Provisioning Patterns
Dynamic Provisioning (Default)
For most applications, simply create a PVC and Kubernetes will automatically provision storage:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: my-app-data
namespace: my-namespace
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 10Gi
# No storageClassName specified = uses default (cephfs-shared)
Static PV Pattern
For mounting pre-existing CephFS paths:
Step 1: Create PersistentVolume
apiVersion: v1
kind: PersistentVolume
metadata:
name: my-static-pv
spec:
capacity:
storage: 5Ti
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
storageClassName: cephfs-static
csi:
driver: rook-ceph.cephfs.csi.ceph.com
nodeStageSecretRef:
name: rook-csi-cephfs-static
namespace: rook-ceph
volumeAttributes:
clusterID: rook-ceph
fsName: cephfs
staticVolume: "true"
rootPath: /truenas/my-data # Pre-existing path in CephFS
Step 2: Create matching PersistentVolumeClaim
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: my-static-pvc
namespace: my-namespace
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 5Ti
storageClassName: cephfs-static
volumeName: my-static-pv
Current Static PVs in Use:
media-cephfs-pv
→/truenas/media
(100Ti)minio-cephfs-pv
→/truenas/minio
(10Ti)paperless-cephfs-pv
→/truenas/paperless
(5Ti)
Storage Decision Matrix
Workload Type | Storage Class | Access Mode | Rationale |
---|---|---|---|
Databases (PostgreSQL, etc.) | ceph-rbd | RWO | Best performance for block storage workloads |
Media Libraries | cephfs-static or cephfs-shared | RWX | Shared access for media servers |
Media Downloads | cephfs-shared | RWX | Multi-pod write access |
Application Data (single pod) | ceph-rbd | RWO | High performance block storage |
Application Data (multi-pod) | cephfs-shared | RWX | Concurrent access required |
Backup Repositories | cephfs-shared | RWX | VolSync requires RWX |
Shared Config | cephfs-shared | RWX | Multiple pods need access |
Bulk Storage | ceph-bulk or cephfs-static | RWO/RWX | Large datasets, erasure coding |
Legacy Apps (during migration) | used-nfs | RWX | Temporary until Unraid decom complete |
Backup Strategy
VolSync with CephFS
All persistent data is backed up using VolSync, which now uses CephFS for its repository storage:
- Backup Frequency: Hourly snapshots via ReplicationSource
- Repository Storage: CephFS PVC (migrated from NFS)
- Backend: Restic repositories on CephFS
- Retention: Configurable per-application
- Recovery: Supports restore to same or different PVC
VolSync Repository Location: /repository/{APP}
on CephFS
Network Configuration
Ceph Networks
The external Ceph cluster uses two networks:
- Public Network: 10.150.0.0/24
- Client connections from Kubernetes
- Ceph monitor communication
- Used by CSI drivers
- Cluster Network: 10.200.0.0/24
- OSD-to-OSD replication
- Not directly accessed by Kubernetes
Connection Method
Kubernetes connects to Ceph via:
- Rook Operator: Manages connection to external cluster
- CSI Drivers: cephfs.csi.ceph.com for CephFS volumes
- Mon Endpoints: ConfigMap with Ceph monitor addresses
- Authentication: Ceph client.kubernetes credentials
Performance Characteristics
CephFS Performance
- Sequential Read: Excellent (limited by network, ~10 Gbps)
- Sequential Write: Very Good (COW overhead, CRUSH rebalancing)
- Random I/O: Good (shared filesystem overhead)
- Concurrent Access: Excellent (native RWX support)
- Metadata Operations: Good (dedicated MDS servers)
Optimization Tips
- Use RWO when possible: Even on CephFS, specify RWO if no sharing needed
- Size appropriately: CephFS handles small and large files well
- Monitor MDS health: CephFS performance depends on MDS responsiveness
- Enable client caching: Default CSI settings enable attribute caching
Storage Operations
Common Operations
Expand a PVC:
kubectl patch pvc my-pvc -p '{"spec":{"resources":{"requests":{"storage":"20Gi"}}}}'
Check storage usage:
kubectl get pvc -A
kubectl exec -it <pod> -- df -h
Monitor Ceph cluster health:
kubectl -n rook-ceph get cephcluster
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph status
List CephFS mounts:
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph fs status
Troubleshooting
PVC stuck in Pending:
kubectl describe pvc <pvc-name>
kubectl -n rook-ceph logs -l app=rook-ceph-operator
Slow performance:
# Check Ceph cluster health
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph health detail
# Check MDS status
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph fs status
# Check OSD performance
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph osd perf
Mount issues:
# Check CSI driver logs
kubectl -n rook-ceph logs -l app=csi-cephfsplugin
# Verify connection to monitors
kubectl -n rook-ceph get configmap rook-ceph-mon-endpoints -o yaml
Current Migration Status
Completed
- ✅ RBD storage classes implemented and available
- ✅ CephFS as default storage class
- ✅ VolSync migrated to CephFS backend
- ✅ Static PV pattern established for existing data
- ✅ Migrated from OpenEBS Mayastor to Ceph RBD
In Progress
- 🔄 Decommissioning Unraid NFS servers (tower/tower-2)
- 🔄 Migrating remaining media workloads from NFS to CephFS
- 🔄 Consolidating all storage onto Ceph platform
Future Enhancements
- 📋 Additional RBD pool with SSD backing for critical workloads
- 📋 Erasure coding optimization for bulk media storage
- 📋 Advanced snapshot scheduling and retention policies
- 📋 Ceph performance tuning and optimization
Best Practices
Storage Selection
- Databases and single-pod apps: Use
ceph-rbd
for best performance - Shared storage needs: Use
cephfs-shared
for RWX access - Use static PVs for existing data: Don't duplicate large datasets
- Specify requests accurately: Helps with capacity planning
- Choose appropriate access modes: RWO for RBD, RWX for CephFS
Capacity Planning
- Monitor Ceph cluster capacity:
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph df
- Set appropriate PVC sizes: CephFS supports expansion
- Plan for growth: Ceph cluster can scale by adding OSDs
- Regular capacity reviews: Check usage trends
Data Protection
- Enable VolSync: For all stateful applications
- Test restores regularly: Ensure backup viability
- Monitor backup success: Check ReplicationSource status
- Retain snapshots appropriately: Balance storage cost vs recovery needs
Security
- Use namespace isolation: PVCs are namespace-scoped
- Limit access with RBAC: Control who can create PVCs
- Monitor access patterns: Unusual I/O may indicate issues
- Rotate Ceph credentials: Periodically update client keys
Monitoring and Observability
Key Metrics
Monitor these metrics via Prometheus/Grafana:
- Ceph cluster health status
- OSD utilization and performance
- MDS cache hit rates
- PVC capacity usage
- CSI operation latencies
- VolSync backup success rates
Alerts
Critical alerts configured:
- Ceph cluster health warnings
- High OSD utilization (>80%)
- MDS performance degradation
- PVC approaching capacity
- VolSync backup failures
References
- Rook Documentation: rook.io/docs
- Ceph Documentation: docs.ceph.com
- Local Setup:
kubernetes/apps/rook-ceph/README.md
- Storage Classes:
kubernetes/apps/rook-ceph/rook-ceph-cluster/app/storageclasses.yaml