Fencing Instances of an Unreachable Host

Abstract

When an OpenStack controller determines that a connection to a physical host is broken, it is possible to restart some or all of its instances on other hosts. The new instance (the instance restarted on another host) takes over the identity of the obsolete instance (the instance on the unreachable host), thus it has the same volumes attached, IP and MAC addresses. OpenStack supports this remote restart operation through a Nova API command called "evacuate" (the Nova "evacuate" API is referred in this document as remote restart).

It is important to note that the remote restart may be done, whenever the OpenStack controller decides the host's connectivity is broken. This neither implies the host's connectivity is broken for sure from its entire environment nor it is broken forever. When the perceived disconnection is due to some transient or partial failure, the OpenStack remote restart might lead into two identical instances running together and having a dangerous conflict. For example, the obsolete instance may access the application storage, causing data corruption, create an IP address conflict or communicate with other nodes, in a way that may disrupt the new instance communications or create inconsistent states.

In order safely remote restart, the obsolete instance must first be fenced, i.e. shut down or isolated.

The following table shows three fencing approaches. These methods address the case in which not only the instances are unreachable, but also their host is unreachable.

Approach	Initiated by	Method
Power fencing	OpenStack Controller	Shut down the instances by a power off or a hard/cold reboot of the host
Resource fencing	OpenStack Controller	Isolate the instances from the application storage and from the data network
Self fencing	Nova Compute service on the host	Shut down the instances
Example	Example	Example