Jump to: navigation, search

OpsGuide-Maintenance-Complete

Revision as of 19:53, 18 August 2017 by Cmorgan2 (talk | contribs) (trial content to see how the conversion is going)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Handling a Complete Failure

A common way of dealing with the recovery from a full system failure, such as a power outage of a data center, is to assign each service a priority, and restore in order. table_example_priority shows an example.

Use this example priority list to ensure that user-affected services are restored as soon as possible, but not before a stable environment is in place. Of course, despite being listed as a single-line item, each step requires significant work. For example, just after starting the database, you should check its integrity, or, after starting the nova services, you should verify that the hypervisor matches the database and fix any mismatches.