Jump to: navigation, search

Difference between revisions of "Puppet/ceph-blueprint"

(No difference)

Latest revision as of 21:17, 2 April 2015


This document is intended to capture requirements for a single puppet-ceph module.

Very much like vswitch, Ceph is not exclusively used in the context of OpenStack. There is however a significant community relying on puppet to deploy Ceph in the context of OpenStack, as is shown in the of the existing efforts Having a puppet ceph module under the umbrella of the stackforge infrastructure helps federate the efforts while providing a workflow that will improve the overall quality of the module.


Almost each component of this module deserve a discussion and it would take a long time to agree on everything before getting something useful. The following list sets the order in which each module is going to be implemented. Each step must be a useable puppet module, unit tested and including integration tests.

User Stories

I want to try this module, heard of ceph, want to see it in action

I want to run it on my own laptop, all in one. The ceph::conf class will create configuration file with no authentication enabled, on my localhost. The ceph::mon resource configures and runs a monitor to which two ceph::osd daemon will connect to provide disk storage, using directories in /srv on the laptop.

   /node/ { 
    class { 'ceph::conf': 
        auth_enable => false,
        mon_host    => 'localhost'
    ceph::mon { $hostname: }; 
    ceph::osd { '/srv/osd1': }; 
    ceph::osd { '/srv/osd2': }; 
  • install puppet,
  • paste this in site.pp and replace /node/ with the name of your current node,
  • puppet apply site.pp,
  • type ceph -s : it will connect to the monitor and report that the cluster is HEALTH_OK

I want to run benchmarks on three new machines

  • There are four machines, 3 OSD, 1 MON and one machine that is the client from which the user runs commands.
  • install puppetmaster and create site.pp with:
   /ceph-default/ {
     class { 'ceph::conf':
        auth_enable => false,
        mon_host    => 'node1'
   /node1/ inherits ceph-default { 
     ceph::mon { $hostname: }; 
     ceph::osd { 'discover': }; 
   /node2/, /node3/ inherits ceph-default { 
     ceph::osd { 'discover': }; 
  /client/ inherits ceph-default { 
    class { 'ceph::client' };
  • ssh client
  • rados bench
  • interpret the results

I want to operate a production cluster

  $admin_key = 'AQCTg71RsNIHORAAW+O6FCMZWBjmVfMIPk3MhQ=='
  $mon_key = 'AQDesGZSsC7KJBAAw+W/Z4eGSQGAIbxWjxjvfw=='
  $boostrap_osd_key = 'AQABsWZSgEDmJhAAkAGSOOAJwrMHrM5Pz5On1A=='
  /ceph-default/ {
     class { 'ceph::conf':
       mon_host => 'mon1.a.tld,mon2.a.tld.com,mon3.a.tld' 
  /mon[123]/ inherits ceph-default { 
    ceph::mon { $hostname: key => $mon_key }
    ceph::key { 'client.admin':
        secret => $admin_key,
        caps_mon => '*',
        caps_osd => '*',
        inject => true,
    ceph::key { 'client.bootstrap-osd':
        secret => $bootstrap_osd_key,
        caps_mon => 'profile bootstrap-osd'
        inject => true,
  /osd*/ inherits ceph-default { 
    ceph::osd { 'discover': }
    ceph::key { 'client.bootstrap-osd':
       keyring => '/var/lib/ceph/bootstrap-osd/ceph.keyring',
       secret => $bootstrap_osd_key,
   /client/ inherits ceph-default { 
     ceph::key { 'client.admin':
       keyring => '/etc/ceph/ceph.client.admin.keyring',
       secret => $admin_key
     class { 'ceph::client' };
  • the osd* nodes only contain disks that are used for OSD and using the discover option to automatically use new disks and provision them as part of the cluster is acceptable, there is no risk of destroying unrelated data.
  • when a hardware is decomissioned, all its disks can be placed in another machines and the OSDs will automatically be re-inserted in the cluster, even if an external journal is used

I want to spawn a cluster configured with a puppetmaster as part of a continuous integration effort

Leveraging vagrant, vagrant-openstack, openstack

  • Ceph is used as a backend storage for various use cases
  • There are tests to make sure the Ceph cluster was instantiated properly
  • There are tests to make sure various other infrastructure components (or products) can use the Ceph cluster


High level requirements

No complex cross host orchestration

All cross host orchestration should be assumed to be managed outside of Puppet. Provided that it's dependencies have already been configured and are known, each component should support being adding without having to run Puppet on more than one node.

For example

  • cinder-volume instances should be configured to join a Ceph cluster simply by running Puppet on that node
  • OSD instances should be configured to join a cluster simply by running puppet agent on a node and targeting that role.

All cross host orchestration should be assumed to be managed outside of Puppet. The Puppet implementation should only be concerned with

  • what components need to be defined (where these are implemented as classes)
  • what data is required for those components (where that data is passed in a class parameters)

Supporting versions

The Operating System versions supported must be tested with integration on the actual operating system. Although it is fairly easy to add support for an Operating System, it is prone to regressions if not tested. The per Operating System support strategy mimics the way OpenStack modules do it.

The supported versions of the components that deal with the environment in which Ceph is used ( OpenStack, Cloudstack, Ganeti etc. ) are handled by each component on a case by case basis. There probably is too much heterogeneity to set a rule.

Provide sensible defaults

If the high level components ( osd + mon + mds + rgw for instance ) are included without any parameter, the result must be a functional Ceph cluster.

Architectured to leverage Ceph to its full potential

It means talking to the MON when configuring or modifying the cluster, using ceph-disk as a low level tool to create the storage required for an OSD, creating a minimal /etc/ceph/ceph.conf to allow a client to connect to the Ceph cluster. The MON exposes a very rich API ( either via the ceph cli or a REST API ) and it offers a great flexibility to the system administrator. It is unlikely that the first versions of the puppet module captures all of it. But it should be architectured to allow the casual contributor to add a new feature or a new variation without the need to workaround architectural limitations.

The ceph-deploy utility is developed as part of the ceph project, to help people get up to speed as quickly as possible for test and POCs. Alfredo Deza made a compeling argument against using ceph-deploy as a helper for a puppet module. Because it is designed to hide some of the flexibility ceph offers for the sake of simplicity. An inconvenience that is incompatible with the goal of a puppet module designed to accommodate all use cases.

Keeping keys / secrets out of Puppet

Some environments, like ours at CERN, use shared puppet masters between different services/teams. It therefore must be possible to omit all keys from appearing in the puppet db anywhere. This means that we shouldn’t be required to add an admin key or any other key to boot up a cluster, and those keys should not be exported within puppet or published by any fact on the machines to be shared among hosts.

We accomplish this using a k5 authenticate scp Exec to copy in the admin keyrings, etc...

   define k5remotefile ($source, $keytab = '/etc/krb5.keytab', $principal = "host/"host/${::hostname}.foo.bar@FOO.BAR") {
     $cmd = "/usr/bin/kinit -k -t ${keytab} ${principal} && /usr/bin/scp -p -o StrictHostKeyChecking=no ${source} ${name} && kdestroy"
     exec { $cmd:
       creates => $name
     file { $name:
       ensure  => file,
       replace => false,
       require => Exec[$cmd]

Prefer cli over REST

The ceph cli is preferred because the rest-api requires the installation of an additional daemon.

Module versioning

Create a branch for each Ceph release ( stable/cuttlefish, stable/dumpling etc. ) and follow the same pattern as the OpenStack modules

Support Ceph versions from cuttlefish

Do not support Ceph versions released before cuttlefish

Support scenario based deployment

Support scenario based deployments. When a resource is defined, a corresponding class is declared to wrap it and rely on create_resources to call a list of resources. Such a wrapper must be kept light weight as it will eventually be unecessary. Example:

   # file manifests/osd.pp
   class ceph::osd($instance_hash) {
   create_resources('osd::instance', $instance_hash)
   # file manifests/osd/instance.pp
   define ceph::osd::instance(param1, ...) {
   # logic goes here

Integration tests

All scenarios can probably be covered with 2 virtual machines, 2 interfaces and one disk attached to one of the machines. A number of scenarios can be based on a single machine, using directories instead of disks and a single interface.

export OS_PASSWORD=admin_pass
export OS_AUTH_URL=
export OS_USERNAME=admin
export OS_TENANT_NAME=openstack

ssh -p 29418 review.example.com gerrit stream-events |
 while read event ; do
   if event is commit ; then
      git clone puppet-ceph from gerrit
      cd puppet-ceph 
      bundle exec rake spec:system # https://github.com/puppetlabs/rspec-system-puppet#run-spec-tests
      if fail ; then
        ssh -p 29418 review.example.com \
          gerrit review -m '"Test failed"' --verified=-1 c0ff33

Puppet user components

This section outlines the roles and well as configuration components that are visible to the puppet user. They must be understandable for the system administrator willing to deploy Ceph for the first time.


A class wrapper around the ceph_config provider ( derived from ini_settings ). The benefit of having a wrapper is that it enables injection of parameters. ceph::conf does not provide any default : it relies on the defaults provided by ceph itself and the user is expected to use the ceph conf documentation as a reference.

Although the key separator can either be space or underscore, only underscore is allowed to help with consistency.

  • proposed name: ceph::conf
  • purpose: keeps and writes config and their options for the top level sections of the ceph config. This includes these sections:
    • [global]
    • [mon]
    • [osd]
    • [mds]
  • interface: key / value is passed directly to ceph_config. If the argument is a hash, it is injected into ceph_config.
  • auth_enable - true or false, enables/disables cephx, defaults to true ( this is implemented in ceph_config )
If enable is true, set the following in the [global] section of the conf file:
       auth_cluster_required = cephx
       auth_service_required = cephx
       auth_client_required = cephx
       auth_supported = cephx
If enable is false, set the following in the [global] section of the conf file:
       auth_cluster_required = none
       auth_service_required = none
       auth_client_required = none
       auth_supported = none
It should support disabling or enabling cephx when the values change. If it does not support updating, it must fail when changed on an existing Ceph cluster.

Using a inifile child provider ( such as cinder_config ) a setting would look like

   ceph_conf {
     'GLOBAL/fsid': value => $fsid;

And create /etc/ceph/ceph.conf such as:

       fsid = 918340183294812038

Improvements to be implemented later:

  • If a key/value pair is modified in the *mon*, *osd* or *mds* sections, all daemons are notified of the change with ceph {daemon} tell * ....


  • proposed name: ceph::osd
  • purpose: configures a ceph OSD using the ceph-disk helper and update the /etc/ceph/ceph.conf file with [osd.X] sections matching the osd found in /var/lib/ceph/osd
  • interface:
    • directory/disk - a disk or a directory to be used as a storage for the OSD.
    • bootstrap-osd - the bootstrap-osd secret key (optional if cephx = none )
    • dmcrypt - options needed to encrypt disks (optional)

The generated [osd.X] section must contain the host and disk so that rcscript run the osd daemon at boot time.

If the directory/disk is set to discover, ceph-disk list is used to find unknown disks or partitions. All unknown disks are prepared with ceph-disk prepare. That effectively allows someone to say : use whatever disks are not in use for ceph and leave the rest alone. An operator would only have to add new disk and way for the next puppet client pass to have them integrated in the cluster. If a disk is removed, the OSD is not launched at boot time and there is nothing to do.

Support ceph-disk suppress

Here is what should happen on a node with at least one OSD

       key = AQCUg71RYEi7DxAAxlyC1KExxSnNJgim6lmuGA==
    • The user bootstrap-osd with this key with caps to bootstrap an OSD:
   $ ceph auth list
       key: AQCUg71RYEi7DxAAxlyC1KExxSnNJgim6lmuGA==
       caps: [mon] allow profile bootstrap-osd

At boot time the /var/lib/ceph/osd directory is explored to discover all OSDs that need to be started. Operating systems for which the same logic is not implemented will need an additional script run at boot time to perform the same exploration until the default script is updated to add this capability.


  • proposed name: ceph::mds
  • purpose: configures a ceph MDS, setup /etc/ceph/ceph.conf with the MONs IPs, declare the MDS to the cluster via the MON, optionaly set the key to allow the MDS to connect to the MONs
  • interface:
    • monitor_ips - list of ip addresses used to connect to the monitor servers
    • key - the secret key for the id user
    • id - the id of the user

mons class

A wrapper that create_resources of type mon as defined below.

mon define

Creates the hierarchy and keyring supporting a mon, runs the daemon.

The ceph configuration file must be created via ceph::conf before ceph::mon is called. It must contain at least:

   mon_initial_members = idA,idB,idC
   mon_host = A.tld,B.tld,C.tld

because the list of mon_initial_members protects against the creation of multiple quorums when multiple mons are deployed in parallel or

   mon_host = A.tld

if there is just one monitor.

Immediately after ceph_mon, the caller is expected to inject admin keys and bootstrap keys for the mds and osd via ceph auth. If the monitor(s) cannot be reached or if there is no quorum yet, it will hang until a quorum is formed. There is no need for ceph_mon to check for the quorum, the ceph client waits until it happens.

  • proposed name: ceph_mon define
  • purpose: configures a ceph MON
  • interface:
    • cluster - the name of the cluster (optional defaults to ceph and implies /etc/ceph/$cluster.conf)
    • id - the id of the mon (required)
    • public_addr - the ip addresses of the mon which must resolve the same as one of mon_host (required)
    • authentication_type - auth mode can be either none or cephx (optional defaults to cephx)
    • key - the mon. user key (optional defaults to undef)
    • keyring - the path of the temporary keyring (optional defaults to undef)
  • add a [mon.$id] section via ceph_config with
    public_addr = $public_addr
    mon_data = /var/lib/ceph/mon/$cluster-$id
  • if auth == cephx:
    • keyring and key are mutually exclusive
    • if the mon. key is specified it needs to be set by the user to be a valid ceph key. The documentation should contain an example key and explanations about how to create an auth key. The key is written to a temporary keyring file that is given in argument to ceph-mon --keyring tmpfile --mkfs and deleted afterwards ( it is copied in the mon file tree ).
    • if the keyring is specified it is expected to exist on the node and is used as an argument to ceph-mon --keyring $keyring --mkfs. The puppetmaster does not have full control over the creation of this temporary keyring, which is required in setups where the puppetmaster is not trusted with secrets. ( see the CERN requirements above ).
    • writes the keyring
  • the directory in which to create the mon is determined via ceph-conf
  • run ceph-mon --cluster --id $id --mkfs --mon-data $mon_data --public-addr $public_addr --keyring /tmp/monkeyring.tmp
  • runs the mon daemon

See mon configuration reference


  • proposed name: ceph::rbd
  • purpose: maps and mounts a rbd image, taking care of dependencies (packages, rbd kernel module, /etc/ceph/rbdmap, fstab)
  • interface:
    • name - the name of the image
    • pool - the pool in which the image is
    • mount_point - where the image will be mounted
    • key - the secret key for the id user
    • id - the id of the user
David Moreau Simard (talk) Should ceph::client be a dependency ?


David Moreau Simard (talk) Should ceph::client be a dependency ?

Implementor components

These components are dependencies of the Puppet user components and can be used by other components. They should be a library of components where the code common to at least two independant components ( think OpenStack and Cloudstack ) is included.


The top level class found in init.pp

  • proposed name: ceph
  • purpose: Should ultimately be a small class that takes care of installing/configuring the common dependencies of each classes.
  • interface:
    •  ?


  • proposed name: ceph::params
  • purpose: A class that is used to store variables, likely defaults and/or constants, to be used in various classes
  • interface:
    • None ?


Inspired by openstack::repo.

  • proposed name: ceph::repo
  • purpose: use puppetlabs/apt to configure the official ceph repository so we can install ceph packages
  • interface:
    • release: target ceph release (cuttlefish, dumpling, etc)

ceph client implementation

  • proposed name: ceph::client
  • purpose: setup /etc/ceph/ceph.conf to connect to the Ceph cluster and install the ceph cli
  • interface:
    • monitor_ips - list of ip addresses used to connect to the monitor servers
    • client_id - name of the client to find the correct id for key
    • keypath - path to the clients key file


Keyring management, authentication. It would be a class to create keys for new users (e.g. a user that can create RBDs or use the Objectstore) which may require special access rights. But would also be used by the other classes like ceph::mon or ceph::osd to place e.g. the shared 'client.admin' or 'mon.' keys.

  • proposed name: ceph::key
  • purpose: handles ceph keys (cephx), generates keys, creates keyring files, inject keys into or delete keys from the cluster/keyring via ceph and ceph-authtool tools.
  • interface:
    • secret - key secret
    • keyring_path - path to the keyring
    • cap_mon/cap_osd/cap_mds - cephx capabilities
    • user/group/mode: settings for the keyring file if needed
    • inject - options to inject a key into the cluster

See key.pp for an example implementation of this semantic.


  • proposed name: ceph::pool
  • purpose: manage operations on the pools in the cluster such as: create/delete pools, set PG/PGP number
  • interface:
    • pool_name - name of the pool
    • create - if to create a new pool
    • delete - if to delete an existing pool
    • pg_num - number of Placement Groups (PGs) for a pool, if the pool already exists this may increase the number of PGs if the current value is lower
    • pgp_num - same as for pg_num
    • replica_level - increase or decrease the replica level of a pool

OpenStack components

ceph specific configuration for cinder/glance (already provided by the puppet-cinder and puppet-glance modules in the volume/rdb and backend/rdb classes). RGW Keystone is noted below

--xarses (talk) RGW keystone should be included in the ceph module as RGW is the consumer of the keystone service. Unlike cinder/glance where they are consumers of ceph.

RadosGW components

The RadosGW is developped as an integral part of Ceph. It is however not required to deploy a cluster and should be treated as any client application of the cluster.


  • proposed name: ceph::rgw
  • purpose: configures a ceph radosgw , setup /etc/ceph/ceph.conf with the MONs IPs, optionaly set the key to allow the OSD to connect to the MONs
  • interface:
    • monitor_ips - list of ip addresses used to connect to the monitor servers
    • key - the secret key for the id user
    • id - the id of the user
    • rgw_data - the path where the radosgw data should be stored
    • fcgi_file - path to the fcgi file e.g. /var/www/s3gw.fcgi
Danny Al-Gaaf (talk) the monitor_ips are not needed: IMO ceph::conf should provide these information to all other
--xarses (talk) agree with Danny; we are missing:
    • user - user to run rados as as well as own files
    • host - hostname for this ini section
    • keyring_path - path to key file
    • log_file - where to write logs to
    • rgw_dns_name - dns name (may include wildcard ) to use with s3 api calls
    • rgw_socket_path - path to socket file
    • rgw_print_continue - (bool) if we are going to send 100 codes to the client
--xarses (talk) also we should include apache magic here to setup vhost and script-server. in which case we should also support *port* param.

rgw keystone

  • proposed name: ceph::rgw::keystone
  • purpose: extends radosgw configuration to be able to retrieve auth from keystone tokens and setup keystone endpoint
  • interface:
    • rgw_keystone_url - the internal or admin url for keystone
    • rgw_keystone_admin_token - the admin token for keystone
    • rgw_keystone_accepted_roles - which roles should we accept from keystone
    • rgw_keystone_token_cache_size - how many tokens to keep cached, not useful if not using PKI as every token is checked
    • rgw_keystone_revocation_interval - interval to check for expired tokens, not useful if not using PKI tokens (if not, set to high value)
    • use_pki - (bool) to determine if keystone is using token_format = PKI and if so do PKI signing parts
    • nss_db_path - path to NSS < - > keystone tokens db files


  • proposed name: ceph::rgw_user
  • purpose: create/remove users and Swift users for the RadosGW S3/Swift API
  • interface:
    • user - username
    • key - secret key (could get generated if needed)
    • swift_user - username for the Swift API user
    • swift_key - secret key for the Swift API user

Related tools and implementations

  • deploy ceph : ceph-deploy
 for test / POC purposes
 maintainer: Alfredo Deza
  • deploy ceph with puppet : puppet-cephdeploy
 relies on ceph-deploy
 maintainer: Don Talton
  • deploy ceph with puppet : puppet-ceph
 developped in 2012 but still useful, upstream
 maintainer: community

 fork of puppet-ceph, updated recently
 handling of secrets
 maintainer: Deutsche Telekom AG (DTAG)

 another fork of puppet-ceph, uses disks by-path links, includes rgw support and removes secrets/keys from puppet
 maintainer: CERN
  • ceph + openstack : ceph docs
 manual integration
  maintainer: John Wilkins + Josh Durgin
  • ceph + openstack with puppet : stackforge
 maintainer: community
  • ceph + openstack with puppet : COI
 targeting Cisco use case
 maintainer : Don Talton + Robert Starmer
  • ceph + openstack with puppet : mirantis
 in the context of Fuel
 maintainer : Andrew Woodward
  • openstack with puppet : openstack-installer
 data driven approach to deploy OpenStack
 maintainer: Robert Starmer + Dan Bode