Jump to: navigation, search

Trove-Guest-Agent-Upgrades

Revision as of 22:56, 4 March 2014 by Amcrn (talk | contribs)

Introduction

This article describes the design for Guest Agent upgrades in Trove. Currently Guest Agent upgrades are implemented through external deployment tools that push new code to each guest instance. Usually the same deployment tools for upgrading the control plane handles guest agent upgrades. This can create a bottle neck on the deployment infrastructure.

Goals

  1. Version the RPC API and tie it to the API version (see nova for examples)
    • This is to help prevent non-backward compatibility between the Trove API and the guest but is not necessarily a dependency for upgrades
  2. Implement a notification based upgrade path for guest agent
  3. Allow for different upgrade strategies (swift, jenkins, local disk, rysnc, etc)
  4. Avoid upgrading during times when guest agents are doing other work (i.e. backups, resize, restart)
    • This doesn't seem to be a concern given that the agent can only handle a single message at a time
  5. Reduce overall downtime during upgrade cycles

Description

Configuration

New properties will be added to the trove configs to allow:

  • Enabling/Disabling Guest Agent Upgrades
guest_automatic_updates: False
  • Specifying an upgrade strategy
guest_upgrade_strategy: swift

Affected Trove Components

  • python-troveclient (optional)
  • trove admin API
  • guest agent

Schema Changes

Upgrades.png

Workflow

1. An external process (outside of Trove) will create an upgrade package or artifact
* It's possible the admin may want to trigger an automatic backup at this point

Workflow0.png

2. An Admin user will notify a Guest Agent that an upgrade is available through the Trove Management API
3. The Guest Agent will process the RPC message created by the API call and handle the upgrade accordingly

Workflow1.png

4. The Guest Agent will download the package from the location specified in the RPC message

Workflow2.png

Guest Agent Message Handling

1. Guest agent will handle the message for upgrading to a particular version
2. Simple validation on the message will occur after the message is parsed 
    a. Check the strategy type, check to see if guest_automatic_updates is enabled
    b. Check the location of the package in the message and whether it exists
3. Record the upgrade 'event' in the Trove upgrades database table
4. Execute or process the message

Message handling.png

Guest Agent Process Upgrade Message

1. Choose the correct strategy to process the upgrade
2. Download the file from the given location (retry n-times before Failing)
3. Decrypt the package
4. Validate the package
    a. check size, check version, checksum, format etc
5. Install the package (pip install)
6. Restart (retry n-times before Failing)

Ga processing.png

* on start up update the status of the upgrade to SUCCESS
** if start up fails try to install the last known working version, record the status as FAILED
    I can see this being a config value that gets updated or a file that is written to disk on the instance

Trove Management REST API

Create a notification request to upgrade an trove guest agent

Relative URL:  /v1.0/{admin_tenant_id}/mgmt/upgrades

HTTP Method: POST

HTTP Headers:
  Accept: application/json
  Content-Type: application/json 
  User-Agent: python-troveclient 
  X-Auth-Project-Id: tenant_name 
  X-Auth-Token: HPAuth10_xxxx 

HTTP Post Body
{
    "instance_id": '<UUID>'
    "instance_version": "v1.0.1",
    "strategy": "swift",
    "location": "http://swift/tenant/container/trove-guestagent-v1.0.1.tar.gz"
}

Trove RPC API

unpacked context

{
   ...
    "is_admin": True,
    "tenant":"<SANITIZED>",
    "method": "upgrade",
    "instance_id": '<UUID>'
    "instance_version": "v1.0.1",
    "strategy": "swift",
    "location": "http://swift/tenant/container/trove-guestagent-v1.0.1.tar.gz"
}

Versioning and Package Validation

  • The Guest Agent will be responsible for validating the package before upgrading

Scenarios

  1. What happens when the Guest Agent is in a non-upgradeable state? (backup/restore, resize, restart, error)
    • The message should remain in the queue until the next time the Guest Agent checks and the state is in 'Running'
  2. What happens when an upgrade fails, and how does that feedback to Trove?
    • Record it as a FAIL in the Trove Database, Admin will have to query.
  3. Can we rollback or install a previous version?

Feedback/Discussion

  • amcrn (talk) 22:56, 4 March 2014 (UTC): "New properties will be added to the trove configs to allow Enabling/Disabling Guest Agent Upgrades and Specifying an upgrade strategy". Is it necessary to add a CONF switch for either of these? If the cloud admin doesn't want to initiate a guestagent upgrade via API/RPC, then as long as they don't issue the request, all is well. Or are you suggesting that users can initiate their own upgrades (i.e. this operation isn't limited to the admin role)? As for the upgrade strategy (e.g. "swift"), this is already in the message payload in your examples, so why the need for the CONF? Is the idea to mimic the 'datastore_registry_ext' concept to allow providers to write and add their own strategies?
  • amcrn (talk) 22:56, 4 March 2014 (UTC): Is there a short list of packaging schemes everyone believes we should support? In Austin it looked like some were ok with a simple "pip install", whereas others had strict requirements on package signing, crypto, etc.
  • amcrn (talk) 22:56, 4 March 2014 (UTC): It seems this is inferred, but since it's not explicitly mentioned I'll ask: There is no introduction of a new INSTANCE state (e.g. ACTIVE, BACKUP), correct? To take it a step further then, the idea would be that the user sees ACTIVE, despite a possible in-flight upgrade? In the case of an upgrade failure, would the user still see ACTIVE but we'd have FAILED recorded in the upgrade table?
  • amcrn (talk) 22:56, 4 March 2014 (UTC): Every upgrade attempt will be a new record in the upgrades table, or will each instance have a dedicated row? I would much prefer the former.
  • amcrn (talk) 22:56, 4 March 2014 (UTC): What is the purpose of deleted/deleted_at for upgrades? In the scenario of a botched upgrade, a new upgrade request should be fired, incurring a new row to be inserted.
  • amcrn (talk) 22:56, 4 March 2014 (UTC): As for whether a rollback to a previous install should be supported: that's likely contingent upon the packaging schemes supported.
  • amcrn (talk) 22:56, 4 March 2014 (UTC): Can you elaborate on the instance_version logic?