Limited condutor api

This wiki page is created specially for design summit session at http://junodesignsummit.sched.org/event/2cc6b4e6c2569394855f2699f8f078b7#.U2J4ufldWrQ . It's intended as a base for discussion, although can be enhanced for BP in future.

Unprivileged Compute node is a big topic, and this page focus mostly on DB access.

Scenario and Issue

Currently compute node has unlimited access to DB with the conductor, either directly using conductor API or through NovaObject. With this privilege, a compute node can modify any DB table, like delete any instance hosted by other compute node, change the service status etc.

This is a security issue if the compute node service is running at the same environment as the server, considering an escaped server can control the whole host environment.

Goal

The goal is to enhance conductor to have access control for the compute node. A compute node can only access relevant DB table/row.

For example, a compute node can only update the single corresponding row in compute_node table, or a compute node can only update instance table row if the instance is hosted on the compute node.

Reference

Trusted messaging (https://wiki.openstack.org/wiki/MessageSecurity ) is a requirement for this discussion. https://blueprints.launchpad.net/oslo.messaging/+spec/trusted-messaging is the corresponding BP and https://review.openstack.org/#/c/37913/ gives an implementation.

Design

To achieve the access control to the compute node, the conductor should be able to identification/authentication the caller, and give the authorization based on the caller information.

Identification

Currently some identification information provided in the API context, like user_id, tenant_id when a operation is invoked based on user request. For action based on nova service like through periodic task, an admin context is used, which only specify 'is_admin' as True, and no user_id/tenant_id information.

The trusted messaging proposal adds the rpc caller information to the message metadata as topic/hostname, however, we have to consider:

Passing Identification to rpc Layer

In current implementation, such information is kept in the message envelop and is dropped after the message decoding and verification, thus is not passed to the RPC layer.

We'd suggest to extend the rpc context to keep track this information. A 'oslo-rpc-caller' field added to the context.

Node/Host Identification

In current implementation, the message is identified with topic/host. Not sure we need consider the node. At least in current implementation, one compute service (i.e. one host) can have multiple node. W/o distinguish of the node and host means all node behind the same host have same privilege. This is possibly bad for vcenter.

What's the information in the Identification

In current implementation, the identification include topic and host name. We need check if in some situation we need more information like node id, to avoid extra DB access.

Authentication

I think we can depends on the trusted messaging for the authentication.

Authorization

We need an mechanism to define the privilege boundary for the compute node. I think a request can be expressed as (initiator, caller, objects, operation), i.e. a compute node, as a caller, delegate for an initiator, like tenant/user, to have some operation on some objects like update an instance status. A special case is in some situation like in periodic task, a compute node can initiate a DB access using get_admin_context(), with the initiator not identified and the is_admin as True.

white list based authorization:

If not specified, no DB access, i.e. object remote operation, is disallowed by default if the caller is from the compute node.

Implementation:

Which layer should this happen. The authorization can happen in eiher DB layer or the conducor layer. Personally I prefer to the conductor layer. It the object changes are finished, I think we need only do it in the object_action() in conductor manager.
The allowed operations. IMHO, compute node should have limited access to object like only instance hosted by the node, or migration with the node involved, or compute_node/service itself.
How about conductor API? Will the compute node be able to access the conductor API, or can they only through object interface? If only through object interface, how to prevent conductor api call from compute node?
If scheduler is splitted, the access to the scheduler DB will also be limited.