Jump to: navigation, search

TroveArchitecture

This page is intended to describe Trove components and how they interoperate.

Components

Trove is a share nothing messaging system, like nova. Its components communicate over a message bus, and can be run on different servers. It behaves very similar to nova in that you send a message over http, that message is translated and sent over the message bus and actions happen asynchronously. It is currently comprised of the following major components.

  • API Server
  • Message Bus
  • Task Manager
  • Guest Agent
  • Conductor

API Server

The API makes command and control of the guest and datastore that's provisioned possible. The API endpoints are basic http web services which handle authentication, authorization, and basic command and control functions related to datastore. There is a concept of extensions for the API depending on the datastore, but this concept needs better engineering around it.

The API Server communicates (currently) with two systems. It will talk to the Task Manager to handle complex, asynchronous tasks. It will also talk directly to the guest agent to handle simple tasks such as retrieving a list of MySQL users. These tasks are all synchronous. The API does not do any heavy lifting. Its job is to take requests, turn them into messages, validate them, and forward them on to the Task Manager or Guest Agent.

Message Bus

  • Copied from Nova architecture

A messaging queue brokers the interaction between API endpoints, the Task Manager, and the Guest Agent. Communication to and from the cloud controller is by HTTP requests through multiple API endpoints.

A typical message passing event begins with the API server receiving a request from a user. The API server authenticates the user and ensures that the user is permitted to issue the subject command. Availability of objects implicated in the request is evaluated and, if available, the request is routed to the queuing engine for the relevant workers. Workers continually listen to the queue based on their role. When such listening produces a work request, the worker takes assignment of the task and begins its execution. Upon completion, a response is dispatched to the queue which is received by the API server and relayed to the originating user. Database entries are queried, added, or removed as necessary throughout the process.

Task Manager

The Task Manager service does the heavy lifting as far as provisioning instances, managing the lifecycle of instances, and performing operations on the instance. It takes messages from the API Server, responds accordingly with a message of consent, and begins tasks. A few complex tasks, for example, are resize database flavor and create instance. They both require HTTP calls to OpenStack services, as well as polling those services until the instance becomes active, and also sending messages to the Guest Agent. The Task Manager handles the flow of processes as they occur across multiple, distributed systems.

The Task Manager is stateful. It instruments complex flows within its system. Operations are known to fail if a Task Manager node goes offline during stateful processing. The Task Flow system will be eventually implemented for long running tasks.

Guest Agent

The Guest Agent is a service that runs within the guest instance, responsible for managing and performing operations on the datastore itself. It is in charge of bringing a datastore online, which can be a complicated task. Heat support is going to be the default provisioning and instrumentation engine for Trove in the future so the task of bringing a datastore online is lessened. The Guest Agent also sends heartbeat messages to the API via conductor.

Each datastore implementation has a Guest Agent implementation in charge of doing specific tasks for that datastore. For instance, a Redis guest agent will behave in different ways than a MySQL guest. They must fulfill a contract for basic actions such as create and resize.

There has been much refactoring in the guest agent in the last cycle (Havana). It is more extensible for multiple implementations. Just as API extensions exist, the guest should have its own set of extensions that differ across datastore implementations. This functionality is still lacking in the present guest implementation.

Conductor

Conductor is a service that runs on the host, responsible for recieving messages from guest instances to update information on the host. For example, instance statuses and the current status of a backup. With conductor, guest instances do not need a direct connection to the host's database. Conductor listens for RPC messages through the message bus and performs the relevant operation.

Conductor is similar to guest-agent in that it is a service that listens to a RabbitMQ topic. The difference is conductor lives on the host, not the guest. Guest agents communicate to conductor by putting messages on the topic defined in config as conductor_queue. By default this is "trove-conductor".

  • Entry point - Trove/bin/trove-conductor
  • Runs as RpcService configured by Trove/etc/trove/trove-conductor.conf.sample which defines trove.conductor.manager.Manager as the manager. This is the entry point for requests arriving on the queue.
  • As guestagent above, requests are pushed to MQ from another component using _cast() (asynchronous), generally of the form {"method": "<method_name>", "args": {<arguments>}}
  • Actual database update work is done by trove/conductor/manager.py
  • The "heartbeat" method updates the status of an instance. This is used by the guest agent to report that instance has changed from NEW to BUILDING to ACTIVE and so on.
  • The "update_backup" method changes the details of a backup, including its current status, size of the backup, type, and checksum.