Manila/design/manila-ceph-native-driver

Ceph Native Driver

In this context, native means that guests will access the shares using the Ceph network protocol, rather than by any gateway such as NFS or CIFS. This driver would be the basis for any subsequent driver that enabled NFS access to CephFS.

Protocol and authentication

CephFS has its own network protocol. To access shares, guests must use one of the CephFS clients (choice of FUSE or Kernel clients): this necessitates having the correct packages installed on guests. It also requires an extension to Manila to handle the new protocol -- this is similar to the change for the GlusterFS native driver.

CephFS has its own authentication system. See https://wiki.openstack.org/wiki/Manila/design/manila-auth-access-keys for the proposal to expose this in a friendly way via Manila. We do not use IP addresses or X509 certificates. Access is granted to user IDs, which are implicitly created in Ceph by the driver. When I grant access to "alice", the user is created if they do not already exist.

In the absence of the mechanism described in the auth-access-keys blueprint, we can do the CephFS driver using a less elegant user workflow where they are required to use the Ceph CLI in addition to the Manila API to handle their user authentication needs. In fact, because the implementation for auth-access-keys relies on the driver returning a key from allow_access (and the existing API ignores the return value), the CephFS driver can return its key from allow_access(), and it'll just be ignored if the auth-access-keys code isn't there.

Features

The Ceph driver will initially implement snapshots and consistency groups. It will not implement replication,

Out-of-driver changes

New access_type "cephx" allowed in API code which validates access control requests.

New protocol "CEPHFS" added to SUPPORTED_SHARE_PROTOCOLS

Driver Implementation details

A new Python interface to Ceph has been created for Manila and Manila-like systems, called ceph_volume_client. This will be in Ceph master soon (at time of writing Dec 3 2015), and will be in included in the Jewel stable release of Ceph in spring 2016. This driver will require a Ceph cluster version >= Jewel, and will not work at all with any older versions of Ceph.

The majority of the implementation is inside Ceph, such that the Manila driver is a fairly lightweight wrapper only a few hundred lines long.

Internally to Ceph, Manila shares are just directories, whose size constraints are implemented using quotas. These looks like separate filesystems to clients, because they mount the named subdirectory for their share. Clients are restricted to their subdirectory by Ceph authorization capabilities: any client with a manila-issued auth identity which tries to mount a directory other than a properly configured share will be rejected on the server side.

Security and caveats

Because the driver may feed back access keys using the auth-access-keys mechanism, it must be careful to prevent scenarios like:

* Manila API consumer requests to grant access to "admin" user and receives the Ceph admin key in response
* Tenant A authorises "alice" to Share 1, and Tenant B also authorises "alice" to access Share 2, thereby giving Tenant B access to Share 1.

This will be handled inside the driver.

By default, multiple shares will share the same Ceph data pool. This will rely on well behaved clients to not go poking around at other share's data. The medium-term solution will be to implement division of shares by RADOS namespaces. The short term solution is to offer an optional "data_isolated" flag on share types: this creates a physically separate data pool for each share, at the cost of additional system resources.

Because share sizes are set using quotas, and Ceph quotas are currently enforced client side, we rely on well behaved clients to not write more data than they are permitted to. Currently only the ceph-fuse client respects quotas: users using the Kernel client to mount their Manila share will find that they can write as much data as they like. Changes in Ceph will be necessary to resolve this.