One of the most common maintenance tasks in Ceph is to replace the disk of an OSD. If a disk is already in a failed state, then you can go ahead and run through the steps in Destroy OSDs. Ceph will recreate those copies on the remaining OSDs if possible. This rebalancing will start as soon as an OSD failure is detected or an OSD was actively stopped.
Note With the default size/min_size (3/2) of a pool, recovery only starts when ‘size + 1nodes are available. The reason for this is that the Ceph object balancer CRUSH defaults to a full node asfailure domain’.
To replace a functioning disk from the GUI, go through the steps in Destroy OSDs. The only addition is to wait until the cluster shows HEALTH_OK before stopping the OSD to destroy it.
On the command line, use the following commands:
ceph osd out osd.<id>
You can check with the command below if the OSD can be safely removed.
ceph osd safe-to-destroy osd.<id>
Once the above check tells you that it is safe to remove the OSD, you can continue with the following commands:
sed -i 's/reef/squid/' /etc/apt/sources.list.d/ceph.listceph osd set nooutapt updateapt full-upgrade -y
After upgrading all cluster nodes, you have to restart the monitor on each node where a monitor runs.
systemctl restart ceph-mon.target
Restart the manager daemons on all nodes
systemctl restart ceph-mgr.target
Restart the OSD daemon on all nodes
systemctl restart ceph-osd.target
Disallow pre-Reef OSDs and enable all new Squid-only functionality
ceph osd require-osd-release squid
Upgrade all CephFS MDS daemons:
Disable standby_replay
ceph fs set <fs_name> allow_standby_replay false
Reduce the number of ranks to 1 (if you plan to restore it later, first take notes of the original number of MDS daemons). This is only necessary if you use more than one MDS per CephFS:
ceph statusceph fs get <fs_name> | grep max_mdsceph fs set <fs_name> max_mds 1
With a rank higher than 1 you will see more than one MDS active for that Ceph FS.
3. Wait for the cluster to deactivate any non-zero ranks by periodically checking the status of Ceph.:
ceph status
Take all standby MDS daemons offline on the appropriate hosts with:
systemctl stop ceph-mds.target
Confirm that only one MDS is online and is on rank 0 for your FS:
ceph status
Upgrade the last remaining MDS daemon by restarting the daemon:
systemctl restart ceph-mds.target
Restart all standby MDS daemons that were taken offline:
systemctl start ceph-mds.target
Restore the original value of max_mds for the volume:
ceph fs set <fs_name> max_mds <original_max_mds>
Unset the 'noout' flag
Once the upgrade process is finished, don't forget to unset the noout flag.