Limited condutor api

This wiki page is created specially for design summit session at http://junodesignsummit.sched.org/event/2cc6b4e6c2569394855f2699f8f078b7#.U2J4ufldWrQ . It's not intended to be a blueprint, although it can be a base for future BP.

Unprivileged Compute node is a big topic, and this page focus mostly on DB access.

Scenario and Issue

Currently compute node has unlimited access to DB with the conductor, either directly using conductor API or through NovaObject. With this privilege, a compute node can modify any DB table, like delete any instance hosted by other compute node, change the service status etc.

This is a security issue if the compute node service is running at the same environment as the server, considering an escaped server can control the whole host environment.

Goal

The goal is to enhance conductor, or the NovaObject, to have access control for the compute node. A compute node can only access relevant DB table/row.

For example, a compute node can only update the single corresponding row in compute_node table, or a compute node can only update instance table row if the instance is hosted on the compute node.

Design

To achieve the access control to the compute node, the conductor should be able to identification/authentication the caller, and give the authorization based on the caller information.

Identification

Currently some identification information provided in the API context, like user_id, tenant_id when a operation is invoked based on user request. For action based on nova service like through periodic task, an admin context is used, which only specify 'is_admin' as True, and no user_id/tenant_id information.

The trusted messaging proposal adds the rpc caller information to the message metadata as topic/hostname, however, we have to consider:

Passing Identification to rpc Layer

In current implementation, such information is kept in the message envelop and is dropped after the message decoding and verification, thus is not passed to the RPC layer.

Thus we'd suggest to extend the rpc context to keep track this information. A 'oslo-rpc-caller' field added to the context.

Node/Host Identification

In current implementation, the message is identified with topic/host. Not sure we need consider the node. At least in current implementation, one compute service (i.e. one host) can have multiple node. W/o distinguish of the node and host means all node behind the same host have same privilege. This is possibly bad for vcenter.

What's the information in the Identification

In current implementation, the identification include topic and host name. We need check if in some situation we need more information like node id, to avoid extra DB access.

Authentication

I think we can depends on the trusted messaging for the authentication.

Authorization

We need an mechanism to define the privilege boundary for the compute node. A request can be expressed as a (initiator, caller, objects, operation), i.e. a compute node, as a caller, delegate for an initiator, like tenant/user, to have some operation on some objects. When the context is an admin context, the initiator is usually the caller itself (is this correct? will an admin context be passed among services?)

white list based authorization:

If not specified, no DB access, i.e. object remote operation, is disallowed if the caller is from the compute node.

Implementation:

0. Which layer? DB? Conductor API? 1. The object that can be handled by compute nodes should be very limited: instance/migration/compute_node/service: 2. Should there be a central policy , or per-object policy? Should the check be function-based, or just parameter based? 3. How about conductor API? Will the compute node be able to access the conductor API, or can they only through object interface? If only through object interface, how to prevent api call from compute node?

How about the scheduler DB?