Cinder/Specs/NVMEMDHealingAgent

init
host_uuid host_nqn 

Get host uuid and nqn, schedule main method to run every X seconds

main method
This will be scheduled to run every X seconds on the connectoring host with the following sub methods:

hostprobe
Call storage provisioner /hostprobe API with stored info: host_nqn host_uuid host_name client_type duration version

monitor_host
Query storage provisioner for metadata on all volumes belonging to this host (uuid) Inspect all KS volume NVMe connections / hook into their events Inspect every KS replicated volume host MD for its legs states

Call self_healing spec below with provisioner metadata + inspected host volume devices info:

self_healing
If storage provisioner metadata shows a different set of legs for the volume than what was inspected on the host, reconcile the volume’s MD state: 1. Connect to targets of new replicas if not already connected 2. Remove replica legs from MD that provisioner says no longer part of the volume 3. Re-assemble MD with provisioner replicas info of the volume

Active self healing:
If the host MD shows one of its legs as failed, but metadata from storage provisioner says it is supposed to be available, report to the provisioner the failed/missing leg. (and vice versa for available leg that provisioner says is supposed to be failed/missing.)

If the volume has maxDownTime>0, and the provisioner reports a leg as missing for more than maxDownTime, and the volume is not being migrated, try to replace the leg: 1. Call provisioner add_replica (with node’s host uuid / topology) 2. Publish the replica and connect to it 3. If successful, call provisioner delete_replica for the missing leg

Also in monitor_host report to provisioner any of the detected events below:

Target connect / disconnect Replicated volume degraded / healed Replicated volume started / finished sync NVMe session established / closed

(This is for monitoring/telemetry purposes)