Proxmox VE

8.12.1. Replace OSDs

One of the most common maintenance tasks in Ceph is to replace the disk of an OSD. If a disk is already in a failed state, then you can go ahead and run through the steps in Destroy OSDs. Ceph will recreate those copies on the remaining OSDs if possible. This rebalancing will start as soon as an OSD failure is detected or an OSD was actively stopped.

Note With the default size/min_size (3/2) of a pool, recovery only starts when ‘size + 1nodes are available. The reason for this is that the Ceph object balancer CRUSH defaults to a full node asfailure domain’. To replace a functioning disk from the GUI, go through the steps in Destroy OSDs. The only addition is to wait until the cluster shows HEALTH_OK before stopping the OSD to destroy it.

On the command line, use the following commands:
ceph osd out osd.<id>
You can check with the command below if the OSD can be safely removed.
ceph osd safe-to-destroy osd.<id>
Once the above check tells you that it is safe to remove the OSD, you can continue with the following commands:
systemctl stop ceph-osd@<id>.service
pveceph osd destroy <id>

Upgrade CEPH

Ceph Reef to Squid

sed -i 's/reef/squid/' /etc/apt/sources.list.d/ceph.list
ceph osd set noout
apt update
apt full-upgrade -y

After upgrading all cluster nodes, you have to restart the monitor on each node where a monitor runs.

systemctl restart ceph-mon.target

Restart the manager daemons on all nodes

systemctl restart ceph-mgr.target

Restart the OSD daemon on all nodes

systemctl restart ceph-osd.target

Disallow pre-Reef OSDs and enable all new Squid-only functionality

ceph osd require-osd-release squid

Upgrade all CephFS MDS daemons:

Disable standby_replay
ceph fs set <fs_name> allow_standby_replay false
Reduce the number of ranks to 1 (if you plan to restore it later, first take notes of the original number of MDS daemons). This is only necessary if you use more than one MDS per CephFS:
ceph status
ceph fs get <fs_name> | grep max_mds
ceph fs set <fs_name> max_mds 1
With a rank higher than 1 you will see more than one MDS active for that Ceph FS. 3. Wait for the cluster to deactivate any non-zero ranks by periodically checking the status of Ceph.:
ceph status
Take all standby MDS daemons offline on the appropriate hosts with:
systemctl stop ceph-mds.target
Confirm that only one MDS is online and is on rank 0 for your FS:
ceph status
Upgrade the last remaining MDS daemon by restarting the daemon:
systemctl restart ceph-mds.target
Restart all standby MDS daemons that were taken offline:
systemctl start ceph-mds.target
Restore the original value of max_mds for the volume:
ceph fs set <fs_name> max_mds <original_max_mds>

Unset the 'noout' flag Once the upgrade process is finished, don't forget to unset the noout flag.

ceph osd unset noout

解除集群

systemctl stop pve-cluster corosync
pmxcfs -l
cp -a /etc/corosync /etc/corosync.bak
rm /etc/corosync/*
rm /etc/pve/corosync.conf
killall pmxcfs
systemctl start pve-cluster

Proxmox VE

CEPH

Replace OSDs

Upgrade CEPH

解除集群

On this page