Jump to: navigation, search

Trove-Guest-Agent-Upgrades

Revision as of 22:47, 4 March 2014 by Dan Nguyen (talk | contribs) (Guest Agent Process Upgrade Message)

Introduction

This article describes the design for Guest Agent upgrades in Trove. Currently Guest Agent upgrades are implemented through external deployment tools that push new code to each guest instance. Usually the same deployment tools for upgrading the control plane handles guest agent upgrades. This can create a bottle neck on the deployment infrastructure.

Goals

  1. Version the RPC API and tie it to the API version (see nova for examples)
    • This is to help prevent non-backward compatibility between the Trove API and the guest but is not necessarily a dependency for upgrades
  2. Implement a notification based upgrade path for guest agent
  3. Allow for different upgrade strategies (swift, jenkins, local disk, rysnc, etc)
  4. Avoid upgrading during times when guest agents are doing other work (i.e. backups, resize, restart)
    • This doesn't seem to be a concern given that the agent can only handle a single message at a time
  5. Reduce overall downtime during upgrade cycles

Description

Configuration

New properties will be added to the trove configs to allow:

  • Enabling/Disabling Guest Agent Upgrades
guest_automatic_updates: False
  • Specifying an upgrade strategy
guest_upgrade_strategy: swift

Affected Trove Components

  • python-troveclient (optional)
  • trove admin API
  • guest agent

Schema Changes

Upgrades.png

Workflow

1. An external process (outside of Trove) will create an upgrade package or artifact
* It's possible the admin may want to trigger an automatic backup at this point

Workflow0.png

2. An Admin user will notify a Guest Agent that an upgrade is available through the Trove Management API
3. The Guest Agent will process the RPC message created by the API call and handle the upgrade accordingly

Workflow1.png

4. The Guest Agent will download the package from the location specified in the RPC message

Workflow2.png

Guest Agent Message Handling

1. Guest agent will handle the message for upgrading to a particular version
2. Simple validation on the message will occur after the message is parsed 
    a. Check the strategy type, check to see if guest_automatic_updates is enabled
    b. Check the location of the package in the message and whether it exists
3. Record the upgrade 'event' in the Trove upgrades database table
4. Execute or process the message

Message handling.png

Guest Agent Process Upgrade Message

1. Choose the correct strategy to process the upgrade
2. Download the file from the given location (retry n-times before Failing)
3. Decrypt the package
4. Validate the package
    a. check size, check version, checksum, format etc
5. Install the package (pip install)
6. Restart (retry n-times before Failing)

Ga processing.png

* on start up update the status of the upgrade to SUCCESS
** if start up fails try to install the last known working version, record the status as FAILED
    I can see this being a config value that gets updated or a file that is written to disk on the instance

Trove Management REST API

Create a notification request to upgrade an trove guest agent

Relative URL:  /v1.0/{admin_tenant_id}/mgmt/upgrades

HTTP Method: POST

HTTP Headers:
  Accept: application/json
  Content-Type: application/json 
  User-Agent: python-troveclient 
  X-Auth-Project-Id: tenant_name 
  X-Auth-Token: HPAuth10_xxxx 

HTTP Post Body
{
    "instance_id": '<UUID>'
    "instance_version": "v1.0.1",
    "strategy": "swift",
    "location": "http://swift/tenant/container/trove-guestagent-v1.0.1.tar.gz"
}

Trove RPC API

unpacked context

{
   ...
    "is_admin": True,
    "tenant":"<SANITIZED>",
    "method": "upgrade",
    "instance_id": '<UUID>'
    "instance_version": "v1.0.1",
    "strategy": "swift",
    "location": "http://swift/tenant/container/trove-guestagent-v1.0.1.tar.gz"
}

Versioning and Package Validation

  • The Guest Agent will be responsible for validating the package before upgrading

Scenarios

  1. What happens when the Guest Agent is in a non-upgradeable state? (backup/restore, resize, restart, error)
    • The message should remain in the queue until the next time the Guest Agent checks and the state is in 'Running'
  2. What happens when an upgrade fails, and how does that feedback to Trove?
    • Record it as a FAIL in the Trove Database, Admin will have to query.
  3. Can we rollback or install a previous version?