Puppet/ceph-blueprint

Overview
This document is intended to capture requirements for a single puppet-ceph module.

Very much like vswitch, Ceph is not exclusively used in the context of OpenStack. There is however a significant community relying on puppet to deploy Ceph in the context of OpenStack, as is shown in the of the existing efforts Having a puppet ceph module under the umbrella of the stackforge infrastructure helps federate the efforts while providing a workflow that will improve the overall quality of the module.


 * gerrit review page
 * puppet-ceph at launchpad

Roadmap
Almost each component of this module deserve a discussion and it would take a long time to agree on everything before getting something useful. The following list sets the order in which each module is going to be implemented. Each step must be a useable puppet module, unit tested and including integration tests.


 * conf ( OK )
 * key
 * mon ( OK )
 * osd ( OK )
 * pool
 * rbd

I want to try this module, heard of ceph, want to see it in action
I want to run it on my own laptop, all in one. The ceph::conf class will create configuration file with no authentication enabled, on my localhost. The ceph::mon resource configures and runs a monitor to which two ceph::osd daemon will connect to provide disk storage, using directories in /srv on the laptop.

/node/ { class { 'ceph::conf': auth_enable => false, mon_host   => 'localhost' };    ceph::mon { $hostname: }; ceph::osd { '/srv/osd1': }; ceph::osd { '/srv/osd2': }; }


 * install puppet,
 * paste this in site.pp and replace /node/ with the name of your current node,
 * puppet apply site.pp,
 * type ceph -s : it will connect to the monitor and report that the cluster is HEALTH_OK

I want to run benchmarks on three new machines

 * There are four machines, 3 OSD, 1 MON and one machine that is the client from which the user runs commands.
 * install puppetmaster and create site.pp with:

/ceph-default/ { class { 'ceph::conf': auth_enable => false, mon_host   => 'node1' };   }

/node1/ inherits ceph-default { ceph::mon { $hostname: }; ceph::osd { 'discover': }; }

/node2/, /node3/ inherits ceph-default { ceph::osd { 'discover': }; }

/client/ inherits ceph-default { class { 'ceph::client' }; }


 * ssh client
 * rados bench
 * interpret the results

I want to operate a production cluster
$admin_key = 'AQCTg71RsNIHORAAW+O6FCMZWBjmVfMIPk3MhQ==' $mon_key = 'AQDesGZSsC7KJBAAw+W/Z4eGSQGAIbxWjxjvfw==' $boostrap_osd_key = 'AQABsWZSgEDmJhAAkAGSOOAJwrMHrM5Pz5On1A=='

/ceph-default/ { class { 'ceph::conf': mon_host => 'mon1.a.tld,mon2.a.tld.com,mon3.a.tld' };   }

/mon[123]/ inherits ceph-default { ceph::mon { $hostname: key => $mon_key } ceph::key { 'client.admin': secret => $admin_key, caps_mon => '*', caps_osd => '*', inject => true, }    ceph::key { 'client.bootstrap-osd': secret => $bootstrap_osd_key, caps_mon => 'profile bootstrap-osd' inject => true, }  }

/osd*/ inherits ceph-default { ceph::osd { 'discover': } ceph::key { 'client.bootstrap-osd': keyring => '/var/lib/ceph/bootstrap-osd/ceph.keyring', secret => $bootstrap_osd_key, }  }

/client/ inherits ceph-default { ceph::key { 'client.admin': keyring => '/etc/ceph/ceph.client.admin.keyring', secret => $admin_key }     class { 'ceph::client' }; }


 * the osd* nodes only contain disks that are used for OSD and using the discover option to automatically use new disks and provision them as part of the cluster is acceptable, there is no risk of destroying unrelated data.
 * when a hardware is decomissioned, all its disks can be placed in another machines and the OSDs will automatically be re-inserted in the cluster, even if an external journal is used

I want to spawn a cluster configured with a puppetmaster as part of a continuous integration effort
Leveraging vagrant, vagrant-openstack, openstack
 * Ceph is used as a backend storage for various use cases
 * There are tests to make sure the Ceph cluster was instantiated properly
 * There are tests to make sure various other infrastructure components (or products) can use the Ceph cluster

No complex cross host orchestration
All cross host orchestration should be assumed to be managed outside of Puppet. Provided that it's dependencies have already been configured and are known, each component should support being adding without having to run Puppet on more than one node.

For example


 * cinder-volume instances should be configured to join a Ceph cluster simply by running Puppet on that node
 * OSD instances should be configured to join a cluster simply by running puppet agent on a node and targeting that role.

All cross host orchestration should be assumed to be managed outside of Puppet. The Puppet implementation should only be concerned with


 * what components need to be defined (where these are implemented as classes)
 * what data is required for those components (where that data is passed in a class parameters)

Supporting versions
The Operating System versions supported must be tested with integration on the actual operating system. Although it is fairly easy to add support for an Operating System, it is prone to regressions if not tested. The per Operating System support strategy mimics the way OpenStack modules do it.

The supported versions of the components that deal with the environment in which Ceph is used ( OpenStack, Cloudstack, Ganeti etc. ) are handled by each component on a case by case basis. There probably is too much heterogeneity to set a rule.

Provide sensible defaults
If the high level components ( osd + mon + mds + rgw for instance ) are included without any parameter, the result must be a functional Ceph cluster.

Architectured to leverage Ceph to its full potential
It means talking to the MON when configuring or modifying the cluster, using ceph-disk as a low level tool to create the storage required for an OSD, creating a minimal /etc/ceph/ceph.conf to allow a client to connect to the Ceph cluster. The MON exposes a very rich API ( either via the ceph cli or a REST API ) and it offers a great flexibility to the system administrator. It is unlikely that the first versions of the puppet module captures all of it. But it should be architectured to allow the casual contributor to add a new feature or a new variation without the need to workaround architectural limitations.

The ceph-deploy utility is developed as part of the ceph project, to help people get up to speed as quickly as possible for test and POCs. Alfredo Deza made a compeling argument against using ceph-deploy as a helper for a puppet module. Because it is designed to hide some of the flexibility ceph offers for the sake of simplicity. An inconvenience that is incompatible with the goal of a puppet module designed to accommodate all use cases.

Keeping keys / secrets out of Puppet
Some environments, like ours at CERN, use shared puppet masters between different services/teams. It therefore must be possible to omit all keys from appearing in the puppet db anywhere. This means that we shouldn’t be required to add an admin key or any other key to boot up a cluster, and those keys should not be exported within puppet or published by any fact on the machines to be shared among hosts.

We accomplish this using a k5 authenticate scp Exec to copy in the admin keyrings, etc...

define k5remotefile ($source, $keytab = '/etc/krb5.keytab', $principal = "host/"host/${::hostname}.foo.bar@FOO.BAR") {     $cmd = "/usr/bin/kinit -k -t ${keytab} ${principal} && /usr/bin/scp -p -o StrictHostKeyChecking=no ${source} ${name} && kdestroy"      exec { $cmd:        creates => $name      }      file { $name:        ensure  => file,        replace => false,        require => Exec[$cmd]      }    }

Prefer cli over REST
The ceph cli is preferred because the rest-api requires the installation of an additional daemon.

Module versioning
Create a branch for each Ceph release ( stable/cuttlefish, stable/dumpling etc. ) and follow the same pattern as the OpenStack modules

Support Ceph versions from cuttlefish
Do not support Ceph versions released before cuttlefish

Support scenario based deployment
Support scenario based deployments. When a resource is defined, a corresponding class is declared to wrap it and rely on create_resources to call a list of resources. Such a wrapper must be kept light weight as it will eventually be unecessary. Example:

# file manifests/osd.pp   class ceph::osd($instance_hash) { create_resources('osd::instance', $instance_hash) }   # file manifests/osd/instance.pp    define ceph::osd::instance(param1, ...) { # logic goes here }

Integration tests
All scenarios can probably be covered with 2 virtual machines, 2 interfaces and one disk attached to one of the machines. A number of scenarios can be based on a single machine, using directories instead of disks and a single interface.

export OS_PASSWORD=admin_pass export OS_AUTH_URL=http://127.0.0.1:5000/v2.0/ export OS_USERNAME=admin export OS_TENANT_NAME=openstack
 * use https://github.com/puppetlabs/rspec-system-puppet and check that it can be used with the vagrant openstack backend https://github.com/cloudbau/vagrant-openstack-plugin
 * use openstack by running a script like this with a dedicated tenant to prevent breakage ( see http://ci.openstack.org/third_party.html )

ssh -p 29418 review.example.com gerrit stream-events | while read event ; do  if event is commit ; then git clone puppet-ceph from gerrit cd puppet-ceph bundle exec rake spec:system # https://github.com/puppetlabs/rspec-system-puppet#run-spec-tests if fail ; then ssh -p 29418 review.example.com \ gerrit review -m '"Test failed"' --verified=-1 c0ff33 fi   fi done

Puppet user components
This section outlines the roles and well as configuration components that are visible to the puppet user. They must be understandable for the system administrator willing to deploy Ceph for the first time.

conf
A class wrapper around the ceph_config provider ( derived from ini_settings ). The benefit of having a wrapper is that it enables injection of parameters. ceph::conf does not provide any default : it relies on the defaults provided by ceph itself and the user is expected to use the ceph conf documentation as a reference.

Although the key separator can either be space or underscore, only underscore is allowed to help with consistency.


 * proposed name: ceph::conf
 * purpose: keeps and writes config and their options for the top level sections of the ceph config. This includes these sections:
 * [global]
 * [mon]
 * [osd]
 * [mds]
 * interface: key / value is passed directly to ceph_config. If the argument is a hash, it is injected into ceph_config.
 * auth_enable - true or false, enables/disables cephx, defaults to true ( this is implemented in ceph_config )
 * If enable is true, set the following in the [global] section of the conf file:

auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx auth_supported = cephx


 * If enable is false, set the following in the [global] section of the conf file:

auth_cluster_required = none auth_service_required = none auth_client_required = none auth_supported = none


 * It should support disabling or enabling cephx when the values change. If it does not support updating, it must fail when changed on an existing Ceph cluster.

Using a inifile child provider ( such as cinder_config ) a setting would look like

ceph_conf { 'GLOBAL/fsid': value => $fsid; }

And create /etc/ceph/ceph.conf such as:

[global] fsid = 918340183294812038

Improvements to be implemented later:
 * If a key/value pair is modified in the *mon*, *osd* or *mds* sections, all daemons are notified of the change with ceph {daemon} tell * ....

osd

 * proposed name: ceph::osd
 * purpose: configures a ceph OSD using the ceph-disk helper and update the /etc/ceph/ceph.conf file with [osd.X] sections matching the osd found in /var/lib/ceph/osd
 * interface:
 * directory/disk - a disk or a directory to be used as a storage for the OSD.
 * bootstrap-osd - the bootstrap-osd secret key (optional if cephx = none )
 * dmcrypt - options needed to encrypt disks (optional)

The generated [osd.X] section must contain the host and disk so that rcscript run the osd daemon at boot time.

If the directory/disk is set to discover, ceph-disk list is used to find unknown disks or partitions. All unknown disks are prepared with ceph-disk prepare. That effectively allows someone to say : use whatever disks are not in use for ceph and leave the rest alone. An operator would only have to add new disk and way for the next puppet client pass to have them integrated in the cluster. If a disk is removed, the OSD is not launched at boot time and there is nothing to do.

Support ceph-disk suppress

Here is what should happen on a node with at least one OSD [client.bootstrap-osd] key = AQCUg71RYEi7DxAAxlyC1KExxSnNJgim6lmuGA== $ ceph auth list ...   client.bootstrap-osd key: AQCUg71RYEi7DxAAxlyC1KExxSnNJgim6lmuGA== caps: [mon] allow profile bootstrap-osd
 * common to all OSD on the same node:
 * the /etc/ceph/ceph.conf file is setup with the IPs of the monitors
 * the /var/lib/ceph/bootstrap-osd/{cluster}.bootstrap-osd.keyring file contains a user/key that is used to to create an OSD. The bootstrap-osd user key is usually the same for all OSD. For instance:
 * The user bootstrap-osd with this key with caps to bootstrap an OSD:
 * for each OSD
 * in the same way ceph-deploy prepare the disk call ceph-disk-prepare that will set magic partition uuid and trigger udev rules to ceph osd create. When udev settles, the new osd is integrated into the cluster and uses its own key, created, registered to the MON and stored locally as a side effect of --mkkey. The osd daemon is also run as a side effect of udev detecting the disk and calling /etc/init/ceph-osd.conf. ceph-disk contains a high level description of the process
 * dmcrypt is also handled by the udev logic ( details ??? keys ??? )

At boot time the /var/lib/ceph/osd directory is explored to discover all OSDs that need to be started. Operating systems for which the same logic is not implemented will need an additional script run at boot time to perform the same exploration until the default script is updated to add this capability.

mds

 * proposed name: ceph::mds
 * purpose: configures a ceph MDS, setup /etc/ceph/ceph.conf with the MONs IPs, declare the MDS to the cluster via the MON, optionaly set the key to allow the MDS to connect to the MONs
 * interface:
 * monitor_ips - list of ip addresses used to connect to the monitor servers
 * key - the secret key for the id user
 * id - the id of the user

mons class
A wrapper that create_resources of type mon as defined below.

mon define
Creates the hierarchy and keyring supporting a mon, runs the daemon.

The ceph configuration file must be created via ceph::conf before ceph::mon is called. It must contain at least: [global] mon_initial_members = idA,idB,idC mon_host = A.tld,B.tld,C.tld because the list of mon_initial_members protects against the creation of multiple quorums when multiple mons are deployed in parallel or   [global] mon_host = A.tld if there is just one monitor. Immediately after ceph_mon, the caller is expected to inject admin keys and bootstrap keys for the mds and osd via ceph auth. If the monitor(s) cannot be reached or if there is no quorum yet, it will hang until a quorum is formed. There is no need for ceph_mon to check for the quorum, the ceph client waits until it happens.


 * proposed name: ceph_mon define
 * purpose: configures a ceph MON
 * interface:
 * cluster - the name of the cluster (optional defaults to ceph and implies /etc/ceph/$cluster.conf)
 * id - the id of the mon (required)
 * public_addr - the ip addresses of the mon which must resolve the same as one of mon_host (required)
 * authentication_type - auth mode can be either none or cephx (optional defaults to cephx)
 * key - the mon. user key (optional defaults to undef)
 * keyring - the path of the temporary keyring (optional defaults to undef)

[mon.$id] public_addr = $public_addr mon_data = /var/lib/ceph/mon/$cluster-$id
 * add a [mon.$id] section via ceph_config with
 * if auth == cephx:
 * keyring and key are mutually exclusive
 * if the mon. key is specified it needs to be set by the user to be a valid ceph key. The documentation should contain an example key and explanations about how to create an auth key. The key is written to a temporary keyring file that is given in argument to ceph-mon --keyring tmpfile --mkfs and deleted afterwards ( it is copied in the mon file tree ).
 * if the keyring is specified it is expected to exist on the node and is used as an argument to ceph-mon --keyring $keyring --mkfs. The puppetmaster does not have full control over the creation of this temporary keyring, which is required in setups where the puppetmaster is not trusted with secrets. ( see the CERN requirements above ).
 * writes the keyring
 * the directory in which to create the mon is determined via ceph-conf
 * run ceph-mon --cluster --id $id --mkfs --mon-data $mon_data --public-addr $public_addr --keyring /tmp/monkeyring.tmp
 * runs the mon daemon

See mon configuration reference

rbd

 * proposed name: ceph::rbd
 * purpose: maps and mounts a rbd image, taking care of dependencies (packages, rbd kernel module, /etc/ceph/rbdmap, fstab)
 * interface:
 * name - the name of the image
 * pool - the pool in which the image is
 * mount_point - where the image will be mounted
 * key - the secret key for the id user
 * id - the id of the user
 * David Moreau Simard (talk) Should ceph::client be a dependency ?

cephfs

 * proposed name: ceph::cephfs
 * purpose: mounts a cephfs filesystem, taking care of dependencies (e.g, fstab, packages)
 * interface:
 * Lots - See http://ceph.com/docs/next/man/8/mount.ceph/
 * David Moreau Simard (talk) Should ceph::client be a dependency ?

Implementor components
These components are dependencies of the Puppet user components and can be used by other components. They should be a library of components where the code common to at least two independant components ( think OpenStack and Cloudstack ) is included.

ceph
The top level class found in init.pp


 * proposed name: ceph
 * purpose: Should ultimately be a small class that takes care of installing/configuring the common dependencies of each classes.
 * interface:

params

 * proposed name: ceph::params
 * purpose: A class that is used to store variables, likely defaults and/or constants, to be used in various classes
 * interface:
 * None ?

repository
Inspired by openstack::repo.


 * proposed name: ceph::repo
 * purpose: use puppetlabs/apt to configure the official ceph repository so we can install ceph packages
 * interface:
 * release: target ceph release (cuttlefish, dumpling, etc)

ceph client implementation

 * proposed name: ceph::client
 * purpose: setup /etc/ceph/ceph.conf to connect to the Ceph cluster and install the ceph cli
 * interface:
 * monitor_ips - list of ip addresses used to connect to the monitor servers
 * client_id - name of the client to find the correct id for key
 * keypath - path to the clients key file

key
Keyring management, authentication. It would be a class to create keys for new users (e.g. a user that can create RBDs or use the Objectstore) which may require special access rights. But would also be used by the other classes like ceph::mon or ceph::osd to place e.g. the shared 'client.admin' or 'mon.' keys.


 * proposed name: ceph::key
 * purpose: handles ceph keys (cephx), generates keys, creates keyring files, inject keys into or delete keys from the cluster/keyring via ceph and ceph-authtool tools.
 * interface:
 * secret - key secret
 * keyring_path - path to the keyring
 * cap_mon/cap_osd/cap_mds - cephx capabilities
 * user/group/mode: settings for the keyring file if needed
 * inject - options to inject a key into the cluster

See key.pp for an example implementation of this semantic.

pool

 * proposed name: ceph::pool
 * purpose: manage operations on the pools in the cluster such as: create/delete pools, set PG/PGP number
 * interface:
 * pool_name - name of the pool
 * create - if to create a new pool
 * delete - if to delete an existing pool
 * pg_num - number of Placement Groups (PGs) for a pool, if the pool already exists this may increase the number of PGs if the current value is lower
 * pgp_num - same as for pg_num
 * replica_level - increase or decrease the replica level of a pool

OpenStack components
ceph specific configuration for cinder/glance (already provided by the puppet-cinder and puppet-glance modules in the volume/rdb and backend/rdb classes). RGW Keystone is noted below


 * --xarses (talk) RGW keystone should be included in the ceph module as RGW is the consumer of the keystone service. Unlike cinder/glance where they are consumers of ceph.

RadosGW components
The RadosGW is developped as an integral part of Ceph. It is however not required to deploy a cluster and should be treated as any client application of the cluster.

rgw

 * proposed name: ceph::rgw
 * purpose: configures a ceph radosgw, setup /etc/ceph/ceph.conf with the MONs IPs, optionaly set the key to allow the OSD to connect to the MONs
 * interface:
 * monitor_ips - list of ip addresses used to connect to the monitor servers
 * key - the secret key for the id user
 * id - the id of the user
 * rgw_data - the path where the radosgw data should be stored
 * fcgi_file - path to the fcgi file e.g. /var/www/s3gw.fcgi
 * Danny Al-Gaaf (talk) the monitor_ips are not needed: IMO ceph::conf should provide these information to all other
 * --xarses (talk) agree with Danny; we are missing:


 * user - user to run rados as as well as own files
 * host - hostname for this ini section
 * keyring_path - path to key file
 * log_file - where to write logs to
 * rgw_dns_name - dns name (may include wildcard ) to use with s3 api calls
 * rgw_socket_path - path to socket file
 * rgw_print_continue - (bool) if we are going to send 100 codes to the client


 * --xarses (talk) also we should include apache magic here to setup vhost and script-server. in which case we should also support *port* param.

rgw keystone

 * proposed name: ceph::rgw::keystone
 * purpose: extends radosgw configuration to be able to retrieve auth from keystone tokens and setup keystone endpoint
 * interface:
 * rgw_keystone_url - the internal or admin url for keystone
 * rgw_keystone_admin_token - the admin token for keystone
 * rgw_keystone_accepted_roles - which roles should we accept from keystone
 * rgw_keystone_token_cache_size - how many tokens to keep cached, not useful if not using PKI as every token is checked
 * rgw_keystone_revocation_interval - interval to check for expired tokens, not useful if not using PKI tokens (if not, set to high value)
 * use_pki - (bool) to determine if keystone is using token_format = PKI and if so do PKI signing parts
 * nss_db_path - path to NSS < - > keystone tokens db files

rgw_user

 * proposed name: ceph::rgw_user
 * purpose: create/remove users and Swift users for the RadosGW S3/Swift API
 * interface:
 * user - username
 * key - secret key (could get generated if needed)
 * swift_user - username for the Swift API user
 * swift_key - secret key for the Swift API user

Related tools and implementations

 * deploy ceph : ceph-deploy

for test / POC purposes https://github.com/ceph/ceph-deploy

maintainer: Alfredo Deza


 * deploy ceph with puppet : puppet-cephdeploy

relies on ceph-deploy https://github.com/dontalton/puppet-cephdeploy/

maintainer: Don Talton


 * deploy ceph with puppet : puppet-ceph

developped in 2012 but still useful, upstream https://github.com/enovance/puppet-ceph

maintainer: community

fork of puppet-ceph, updated recently https://github.com/TelekomCloud/puppet-ceph/tree/rc/eisbrecher handling of secrets https://github.com/TelekomCloud/puppet-secret https://github.com/TelekomCloud/puppet-ceph/blob/rc/eisbrecher/examples/example-site.pp maintainer: Deutsche Telekom AG (DTAG)

another fork of puppet-ceph, uses disks by-path links, includes rgw support and removes secrets/keys from puppet https://github.com/cernceph/puppet-ceph maintainer: CERN


 * ceph + openstack : ceph docs

manual integration http://ceph.com/docs/next/rbd/rbd-openstack/ maintainer: John Wilkins + Josh Durgin


 * ceph + openstack with puppet : stackforge

https://github.com/stackforge/puppet-glance/blob/stable/grizzly/manifests/backend/rbd.pp https://github.com/stackforge/puppet-cinder/blob/stable/grizzly/manifests/volume/rbd.pp

maintainer: community


 * ceph + openstack with puppet : COI

targeting Cisco use case https://github.com/CiscoSystems/puppet-coe/tree/grizzly/manifests/ceph http://docwiki.cisco.com/wiki/OpenStack:Ceph-COI-Installation

maintainer : Don Talton + Robert Starmer


 * ceph + openstack with puppet : mirantis

in the context of Fuel https://github.com/Mirantis/fuel/tree/master/deployment/puppet/ceph https://github.com/Mirantis/fuel/blob/master/deployment/puppet/cinder/manifests/volume/ceph.pp https://github.com/Mirantis/fuel/blob/master/deployment/puppet/glance/manifests/backend/ceph.pp

maintainer : Andrew Woodward


 * openstack with puppet : openstack-installer

data driven approach to deploy OpenStack https://github.com/CiscoSystems/openstack-installer/

maintainer: Robert Starmer + Dan Bode