Cloud Gateway
Introduction
RightScale already has a multi-cloud gateway that closely aligns with the Cloud Gateway project goals. RightScale proposes to submit its gateway to OpenStack in the Cloud Gateway project. The following sections describe the existing Gateway's architecture such that the gateway can be submitted as is for the developer community to consume and enhance to further meet OpenStack's needs.
Note: The Cloud Gateway project assumes that OpenStack is one of many cloud types available to users. Indeed, users will likely use many clouds at any given time, both public and private.
In short, a cloud gateway is a common interface used to manage multiple clouds, which Figure 1 represents below.
Figure 1: General Diagram depicting abstraction between management layer and multiple clouds.
With a common interface to many clouds through the Cloud Gateway, users should be able to design their cloud application environments once and use it on any cloud type. Ultimately, the Cloud Gateway should evolve into an interoperable cloud standard to facilitate exactly this use case.
Gateway Overview
The Gateway is composed of the following major functional subsystems:
- A data store that stores the state of all resources in the cloud.
- A polling subsystem that continuously polls the cloud for all resources, compares the responses with the state in the data store, updates the store and generates notifications to API clients for all discovered differences.
An API handler that receives requests from API clients (like the RightScale dashboard) and either responds based on the content of the data store or translates the requests and forwards them to the cloud. This translation can be abstracted to plug ins that easily enable additional cloud support.
The implementation is in Ruby and uses mysql as data store:
- The data store is mysql for ease of development. The schema is very simple with very few relations making it amenable to a nosql implementation.
- The API handler is written as a Rack application, allowing it to be plugged into a variety of app servers.
- An HTTP app server, such as Thin, can be used, but we have a custom AMQP-based app server that receives essentially HTTP requests encapsulated in an AMQP message and sends similarly encapsulated HTTP responses.
- The polling subsystem runs as a separate process (or multiple processes) and uses asynchronous I/O to issue large numbers of concurrent requests to the cloud.
In terms of persistence, the GW follows these principles:
- Account information is non-recoverable, that is, if the GW DB gets wiped then all GW clients must re-register all accounts on which they want to operate (see below for a few more details)
- The information about all resources is fully recoverable if the GW DB gets wiped, obviously some intermediate state information may get lost, but the GW can rebuild its state from the information provided by the cloud and communication with clients can resume
- The underlying goal is one of resiliency and also replication / fail-over. The account information is relatively small and slow-changing. Plus it can be re-created by clients.
The protocol is simple in principle:
- The client sends requests to the GW to manipulate resources, these requests are loosely patterned after the EC2 requests
The GW sends resource updates as responses back to the client that consist of a copy of the resource representation with id, timestamp, generic and cloud specific attributes. With respect to OpenStack, generic and cloud specific attributes would be one and the same.
The GW generates resource updates as resources change state and either publishes them to an AMQP server which a client can pull from as desired or save them locally to generate a list of events. RightScale leverages the AMQP approach because it gives us immediate and asynchronous notification of events.
In terms of access control, the GW is designed to support multiple clients operating on overlapping clouds and accounts. Each client must authenticate itself and must send the GW the cloud credentials for all accounts it intends to operate on. Thus if two clients want to operate on an account A both need to prove to the GW that they are authorized to do so by sending the GW the credentials for A. The GW returns a cryptographic token to each client that the client uses to make requests.
Gateway API
The GW currently supports the following cloud resource types, which are generic representations of common cloud resources. Each resource is specifically designed to abstract a common cloud concept across multiple types of clouds:
- Datacenter (represents regions, availability zones, etc)
- Image
- Instance
VolumeSnapshot (represents EBS snapshot and like functionality in other cloud platforms)
- Volume (represents EBS and like functionality in other cloud platforms)
- Subnets
For each of these resources, the Gateway supports the following API calls (when applicable):
- List
- Create
- Destroy
- Edit
Furthermore, the Gateway can check credentials to validate if it can properly connect to and authenticate with the cloud based on the provided credentials (CheckCredentials API call).
In addition to resource API listed above, we have calls for events and resource state synchronization which support functionality described below.
Polling Subsystem
The polling subsystem pulls information on each of above identified resources via the Cloud API and stores it locally in the data store. Status requests from API Clients can then be satisfied from this local store. Another possible configuration will allow for detected changes to be sent to API Clients via AMQP.
The GW polls all resources periodically to detect state transitions and maintains this data in the data store for API Client requests. Most resources have a state field that will be detected by the GW when the state changes. The resources that do not require state typically are multi-use resources that either exist or do not exist. An example is SshKey.
Polling by definition is an expensive exercise. To limit subsequent describe calls on each resource, resource listing must include the detail of all listed resources.
Gateway API Handler
As mentioned above, the GW API Handler responds to requests from API Clients by either pulling data from the data store or by translating and forwarding to the respective cloud. These requests are buffered by Advanced Message Queuing Protocol (AMQP). Currently, the entire HTTP request is serialized to JSON and sent as the payload of an AMQP message. AMQP headers are used only to provide routing information. The GW API Handler routes an individual response based on the message id and reply-to headers in each request.
Upon resource creation, the cloud API should return unique resource IDs to prevent race conditions between the creation of the resource and discovery (polling) of resources. In order to efficiently handle requests, the GW expects resources and their states to be well defined within the specific cloud and attributed to an account.
For shared and published resources, the GW needs to know the exact owner so it can appropriately determine which operations are allowed. The data store contains a single "global" record for each published resources, but per-account records for each individually shared resources. Furthermore, the GW API Handler supports both strong authentication on every request as well as encrypted communication at the protocol (HTTP or AMQP) level. While encryption is preferred, cloud APIs may use unencrypted communication if the security risks are well controlled.
The GW assumes the cloud is synchronized to world-time and rejects communication with API endpoints that are too far out of sync. Current threshold is set at greater than 5 minutes.