Difference between revisions of "Puppet/ceph-blueprint"

Latest revision as of 21:17, 2 April 2015

Overview

This document is intended to capture requirements for a single puppet-ceph module.

Very much like vswitch, Ceph is not exclusively used in the context of OpenStack. There is however a significant community relying on puppet to deploy Ceph in the context of OpenStack, as is shown in the of the existing efforts Having a puppet ceph module under the umbrella of the stackforge infrastructure helps federate the efforts while providing a workflow that will improve the overall quality of the module.

Roadmap

Almost each component of this module deserve a discussion and it would take a long time to agree on everything before getting something useful. The following list sets the order in which each module is going to be implemented. Each step must be a useable puppet module, unit tested and including integration tests.

conf ( OK )
key
mon ( OK )
osd ( OK )
pool
rbd

User Stories

I want to try this module, heard of ceph, want to see it in action

I want to run it on my own laptop, all in one. The ceph::conf class will create configuration file with no authentication enabled, on my localhost. The ceph::mon resource configures and runs a monitor to which two ceph::osd daemon will connect to provide disk storage, using directories in /srv on the laptop.

   /node/ { 
    class { 'ceph::conf': 
        auth_enable => false,
        mon_host    => 'localhost'
    };
    ceph::mon { $hostname: }; 
    ceph::osd { '/srv/osd1': }; 
    ceph::osd { '/srv/osd2': }; 
   }

install puppet,
paste this in site.pp and replace /node/ with the name of your current node,
puppet apply site.pp,
type ceph -s : it will connect to the monitor and report that the cluster is HEALTH_OK

I want to run benchmarks on three new machines

There are four machines, 3 OSD, 1 MON and one machine that is the client from which the user runs commands.
install puppetmaster and create site.pp with:

   /ceph-default/ {
     class { 'ceph::conf':
        auth_enable => false,
        mon_host    => 'node1'
      };
   }

   /node1/ inherits ceph-default { 
     ceph::mon { $hostname: }; 
     ceph::osd { 'discover': }; 
   }

   /node2/, /node3/ inherits ceph-default { 
     ceph::osd { 'discover': }; 
   }

  /client/ inherits ceph-default { 
    class { 'ceph::client' };
  }

ssh client
rados bench
interpret the results

I want to operate a production cluster

  $admin_key = 'AQCTg71RsNIHORAAW+O6FCMZWBjmVfMIPk3MhQ=='
  $mon_key = 'AQDesGZSsC7KJBAAw+W/Z4eGSQGAIbxWjxjvfw=='
  $boostrap_osd_key = 'AQABsWZSgEDmJhAAkAGSOOAJwrMHrM5Pz5On1A=='

  /ceph-default/ {
     class { 'ceph::conf':
       mon_host => 'mon1.a.tld,mon2.a.tld.com,mon3.a.tld' 
     }; 
  }

  /mon[123]/ inherits ceph-default { 
    ceph::mon { $hostname: key => $mon_key }
    ceph::key { 'client.admin':
        secret => $admin_key,
        caps_mon => '*',
        caps_osd => '*',
        inject => true,
    }
    ceph::key { 'client.bootstrap-osd':
        secret => $bootstrap_osd_key,
        caps_mon => 'profile bootstrap-osd'
        inject => true,
    }
  }

  /osd*/ inherits ceph-default { 
    ceph::osd { 'discover': }
    ceph::key { 'client.bootstrap-osd':
       keyring => '/var/lib/ceph/bootstrap-osd/ceph.keyring',
       secret => $bootstrap_osd_key,
    }
  }

   /client/ inherits ceph-default { 
     ceph::key { 'client.admin':
       keyring => '/etc/ceph/ceph.client.admin.keyring',
       secret => $admin_key
     }
     class { 'ceph::client' };
   }

the osd* nodes only contain disks that are used for OSD and using the discover option to automatically use new disks and provision them as part of the cluster is acceptable, there is no risk of destroying unrelated data.
when a hardware is decomissioned, all its disks can be placed in another machines and the OSDs will automatically be re-inserted in the cluster, even if an external journal is used

I want to spawn a cluster configured with a puppetmaster as part of a continuous integration effort

Leveraging vagrant, vagrant-openstack, openstack

Ceph is used as a backend storage for various use cases
There are tests to make sure the Ceph cluster was instantiated properly
There are tests to make sure various other infrastructure components (or products) can use the Ceph cluster

Requirements

High level requirements

No complex cross host orchestration

All cross host orchestration should be assumed to be managed outside of Puppet. Provided that it's dependencies have already been configured and are known, each component should support being adding without having to run Puppet on more than one node.

For example

cinder-volume instances should be configured to join a Ceph cluster simply by running Puppet on that node
OSD instances should be configured to join a cluster simply by running puppet agent on a node and targeting that role.

All cross host orchestration should be assumed to be managed outside of Puppet. The Puppet implementation should only be concerned with

what components need to be defined (where these are implemented as classes)
what data is required for those components (where that data is passed in a class parameters)

Supporting versions

The Operating System versions supported must be tested with integration on the actual operating system. Although it is fairly easy to add support for an Operating System, it is prone to regressions if not tested. The per Operating System support strategy mimics the way OpenStack modules do it.

The supported versions of the components that deal with the environment in which Ceph is used ( OpenStack, Cloudstack, Ganeti etc. ) are handled by each component on a case by case basis. There probably is too much heterogeneity to set a rule.

Provide sensible defaults

If the high level components ( osd + mon + mds + rgw for instance ) are included without any parameter, the result must be a functional Ceph cluster.

Architectured to leverage Ceph to its full potential

It means talking to the MON when configuring or modifying the cluster, using ceph-disk as a low level tool to create the storage required for an OSD, creating a minimal /etc/ceph/ceph.conf to allow a client to connect to the Ceph cluster. The MON exposes a very rich API ( either via the ceph cli or a REST API ) and it offers a great flexibility to the system administrator. It is unlikely that the first versions of the puppet module captures all of it. But it should be architectured to allow the casual contributor to add a new feature or a new variation without the need to workaround architectural limitations.

The ceph-deploy utility is developed as part of the ceph project, to help people get up to speed as quickly as possible for test and POCs. Alfredo Deza made a compeling argument against using ceph-deploy as a helper for a puppet module. Because it is designed to hide some of the flexibility ceph offers for the sake of simplicity. An inconvenience that is incompatible with the goal of a puppet module designed to accommodate all use cases.

Keeping keys / secrets out of Puppet

Some environments, like ours at CERN, use shared puppet masters between different services/teams. It therefore must be possible to omit all keys from appearing in the puppet db anywhere. This means that we shouldn’t be required to add an admin key or any other key to boot up a cluster, and those keys should not be exported within puppet or published by any fact on the machines to be shared among hosts.

We accomplish this using a k5 authenticate scp Exec to copy in the admin keyrings, etc...

   define k5remotefile ($source, $keytab = '/etc/krb5.keytab', $principal = "host/"host/${::hostname}.foo.bar@FOO.BAR") {
   
     $cmd = "/usr/bin/kinit -k -t ${keytab} ${principal} && /usr/bin/scp -p -o StrictHostKeyChecking=no ${source} ${name} && kdestroy"
   
     exec { $cmd:
       creates => $name
     }
   
     file { $name:
       ensure  => file,
       replace => false,
       require => Exec[$cmd]
     }
   
   }

Prefer cli over REST

The ceph cli is preferred because the rest-api requires the installation of an additional daemon.

Module versioning

Create a branch for each Ceph release ( stable/cuttlefish, stable/dumpling etc. ) and follow the same pattern as the OpenStack modules

Support Ceph versions from cuttlefish

Do not support Ceph versions released before cuttlefish

Support scenario based deployment

Support scenario based deployments. When a resource is defined, a corresponding class is declared to wrap it and rely on create_resources to call a list of resources. Such a wrapper must be kept light weight as it will eventually be unecessary. Example:

   # file manifests/osd.pp
   class ceph::osd($instance_hash) {
   create_resources('osd::instance', $instance_hash)
   }
    
   # file manifests/osd/instance.pp
   define ceph::osd::instance(param1, ...) {
   # logic goes here
   }

Integration tests

All scenarios can probably be covered with 2 virtual machines, 2 interfaces and one disk attached to one of the machines. A number of scenarios can be based on a single machine, using directories instead of disks and a single interface.

use https://github.com/puppetlabs/rspec-system-puppet and check that it can be used with the vagrant openstack backend https://github.com/cloudbau/vagrant-openstack-plugin
use openstack by running a script like this with a dedicated tenant to prevent breakage ( see http://ci.openstack.org/third_party.html )

export OS_PASSWORD=admin_pass
export OS_AUTH_URL=http://127.0.0.1:5000/v2.0/
export OS_USERNAME=admin
export OS_TENANT_NAME=openstack

ssh -p 29418 review.example.com gerrit stream-events |
 while read event ; do
   if event is commit ; then
      git clone puppet-ceph from gerrit
      cd puppet-ceph 
      bundle exec rake spec:system # https://github.com/puppetlabs/rspec-system-puppet#run-spec-tests
      if fail ; then
        ssh -p 29418 review.example.com \
          gerrit review -m '"Test failed"' --verified=-1 c0ff33
      fi
    fi
 done

Puppet user components

This section outlines the roles and well as configuration components that are visible to the puppet user. They must be understandable for the system administrator willing to deploy Ceph for the first time.

conf

A class wrapper around the ceph_config provider ( derived from ini_settings ). The benefit of having a wrapper is that it enables injection of parameters. ceph::conf does not provide any default : it relies on the defaults provided by ceph itself and the user is expected to use the ceph conf documentation as a reference.

Although the key separator can either be space or underscore, only underscore is allowed to help with consistency.

proposed name: ceph::conf
purpose: keeps and writes config and their options for the top level sections of the ceph config. This includes these sections:
- [global]
- [mon]
- [osd]
- [mds]
interface: key / value is passed directly to ceph_config. If the argument is a hash, it is injected into ceph_config.
auth_enable - true or false, enables/disables cephx, defaults to true ( this is implemented in ceph_config )

If enable is true, set the following in the [global] section of the conf file:

       auth_cluster_required = cephx
       auth_service_required = cephx
       auth_client_required = cephx
       auth_supported = cephx

If enable is false, set the following in the [global] section of the conf file:

       auth_cluster_required = none
       auth_service_required = none
       auth_client_required = none
       auth_supported = none

It should support disabling or enabling cephx when the values change. If it does not support updating, it must fail when changed on an existing Ceph cluster.

Using a inifile child provider ( such as cinder_config ) a setting would look like

   ceph_conf {
     'GLOBAL/fsid': value => $fsid;
   }

And create /etc/ceph/ceph.conf such as:

   [global]
       fsid = 918340183294812038

Improvements to be implemented later:

If a key/value pair is modified in the *mon*, *osd* or *mds* sections, all daemons are notified of the change with ceph {daemon} tell * ....

osd

proposed name: ceph::osd
purpose: configures a ceph OSD using the ceph-disk helper and update the /etc/ceph/ceph.conf file with [osd.X] sections matching the osd found in /var/lib/ceph/osd
interface:
- directory/disk - a disk or a directory to be used as a storage for the OSD.
- bootstrap-osd - the bootstrap-osd secret key (optional if cephx = none )
- dmcrypt - options needed to encrypt disks (optional)

The generated [osd.X] section must contain the host and disk so that rcscript run the osd daemon at boot time.

If the directory/disk is set to discover, ceph-disk list is used to find unknown disks or partitions. All unknown disks are prepared with ceph-disk prepare. That effectively allows someone to say : use whatever disks are not in use for ceph and leave the rest alone. An operator would only have to add new disk and way for the next puppet client pass to have them integrated in the cluster. If a disk is removed, the OSD is not launched at boot time and there is nothing to do.

Support ceph-disk suppress

Here is what should happen on a node with at least one OSD

common to all OSD on the same node:
- the /etc/ceph/ceph.conf file is setup with the IPs of the monitors
- the /var/lib/ceph/bootstrap-osd/{cluster}.bootstrap-osd.keyring file contains a user/key that is used to to create an OSD. The bootstrap-osd user key is usually the same for all OSD. For instance:

   [client.bootstrap-osd]
       key = AQCUg71RYEi7DxAAxlyC1KExxSnNJgim6lmuGA==

- The user bootstrap-osd with this key with caps to bootstrap an OSD:

   $ ceph auth list
   ...
   client.bootstrap-osd
       key: AQCUg71RYEi7DxAAxlyC1KExxSnNJgim6lmuGA==
       caps: [mon] allow profile bootstrap-osd

for each OSD
- in the same way ceph-deploy prepare the disk call ceph-disk-prepare that will set magic partition uuid and trigger udev rules to ceph osd create. When udev settles, the new osd is integrated into the cluster and uses its own key, created, registered to the MON and stored locally as a side effect of --mkkey. The osd daemon is also run as a side effect of udev detecting the disk and calling /etc/init/ceph-osd.conf. ceph-disk contains a high level description of the process
- dmcrypt is also handled by the udev logic ( details ??? keys ??? )

At boot time the /var/lib/ceph/osd directory is explored to discover all OSDs that need to be started. Operating systems for which the same logic is not implemented will need an additional script run at boot time to perform the same exploration until the default script is updated to add this capability.

mds

proposed name: ceph::mds
purpose: configures a ceph MDS, setup /etc/ceph/ceph.conf with the MONs IPs, declare the MDS to the cluster via the MON, optionaly set the key to allow the MDS to connect to the MONs
interface:
- monitor_ips - list of ip addresses used to connect to the monitor servers
- key - the secret key for the id user
- id - the id of the user

mons class

A wrapper that create_resources of type mon as defined below.

mon define

Creates the hierarchy and keyring supporting a mon, runs the daemon.

The ceph configuration file must be created via ceph::conf before ceph::mon is called. It must contain at least:

   [global]
   mon_initial_members = idA,idB,idC
   mon_host = A.tld,B.tld,C.tld

because the list of mon_initial_members protects against the creation of multiple quorums when multiple mons are deployed in parallel or

   [global]
   mon_host = A.tld

if there is just one monitor.

Immediately after ceph_mon, the caller is expected to inject admin keys and bootstrap keys for the mds and osd via ceph auth. If the monitor(s) cannot be reached or if there is no quorum yet, it will hang until a quorum is formed. There is no need for ceph_mon to check for the quorum, the ceph client waits until it happens.

proposed name: ceph_mon define
purpose: configures a ceph MON
interface:
- cluster - the name of the cluster (optional defaults to ceph and implies /etc/ceph/$cluster.conf)
- id - the id of the mon (required)
- public_addr - the ip addresses of the mon which must resolve the same as one of mon_host (required)
- authentication_type - auth mode can be either none or cephx (optional defaults to cephx)
- key - the mon. user key (optional defaults to undef)
- keyring - the path of the temporary keyring (optional defaults to undef)

add a [mon.$id] section via ceph_config with

    [mon.$id]
    public_addr = $public_addr
    mon_data = /var/lib/ceph/mon/$cluster-$id

if auth == cephx:
- keyring and key are mutually exclusive
- if the mon. key is specified it needs to be set by the user to be a valid ceph key. The documentation should contain an example key and explanations about how to create an auth key. The key is written to a temporary keyring file that is given in argument to ceph-mon --keyring tmpfile --mkfs and deleted afterwards ( it is copied in the mon file tree ).
- if the keyring is specified it is expected to exist on the node and is used as an argument to ceph-mon --keyring $keyring --mkfs. The puppetmaster does not have full control over the creation of this temporary keyring, which is required in setups where the puppetmaster is not trusted with secrets. ( see the CERN requirements above ).
- writes the keyring
the directory in which to create the mon is determined via ceph-conf
run ceph-mon --cluster --id $id --mkfs --mon-data $mon_data --public-addr $public_addr --keyring /tmp/monkeyring.tmp
runs the mon daemon

See mon configuration reference

rbd

proposed name: ceph::rbd
purpose: maps and mounts a rbd image, taking care of dependencies (packages, rbd kernel module, /etc/ceph/rbdmap, fstab)
interface:
- name - the name of the image
- pool - the pool in which the image is
- mount_point - where the image will be mounted
- key - the secret key for the id user
- id - the id of the user

David Moreau Simard (talk) Should ceph::client be a dependency ?

cephfs

proposed name: ceph::cephfs
purpose: mounts a cephfs filesystem, taking care of dependencies (e.g, fstab, packages)
interface:
- Lots - See http://ceph.com/docs/next/man/8/mount.ceph/

David Moreau Simard (talk) Should ceph::client be a dependency ?

Implementor components

These components are dependencies of the Puppet user components and can be used by other components. They should be a library of components where the code common to at least two independant components ( think OpenStack and Cloudstack ) is included.

ceph

The top level class found in init.pp

proposed name: ceph
purpose: Should ultimately be a small class that takes care of installing/configuring the common dependencies of each classes.
interface:
- ?

params

proposed name: ceph::params
purpose: A class that is used to store variables, likely defaults and/or constants, to be used in various classes
interface:
- None ?

repository

Inspired by openstack::repo.

proposed name: ceph::repo
purpose: use puppetlabs/apt to configure the official ceph repository so we can install ceph packages
interface:
- release: target ceph release (cuttlefish, dumpling, etc)

ceph client implementation

proposed name: ceph::client
purpose: setup /etc/ceph/ceph.conf to connect to the Ceph cluster and install the ceph cli
interface:
- monitor_ips - list of ip addresses used to connect to the monitor servers
- client_id - name of the client to find the correct id for key
- keypath - path to the clients key file

key

Keyring management, authentication. It would be a class to create keys for new users (e.g. a user that can create RBDs or use the Objectstore) which may require special access rights. But would also be used by the other classes like ceph::mon or ceph::osd to place e.g. the shared 'client.admin' or 'mon.' keys.

proposed name: ceph::key
purpose: handles ceph keys (cephx), generates keys, creates keyring files, inject keys into or delete keys from the cluster/keyring via ceph and ceph-authtool tools.
interface:
- secret - key secret
- keyring_path - path to the keyring
- cap_mon/cap_osd/cap_mds - cephx capabilities
- user/group/mode: settings for the keyring file if needed
- inject - options to inject a key into the cluster

See key.pp for an example implementation of this semantic.

pool

proposed name: ceph::pool
purpose: manage operations on the pools in the cluster such as: create/delete pools, set PG/PGP number
interface:
- pool_name - name of the pool
- create - if to create a new pool
- delete - if to delete an existing pool
- pg_num - number of Placement Groups (PGs) for a pool, if the pool already exists this may increase the number of PGs if the current value is lower
- pgp_num - same as for pg_num
- replica_level - increase or decrease the replica level of a pool

OpenStack components

ceph specific configuration for cinder/glance (already provided by the puppet-cinder and puppet-glance modules in the volume/rdb and backend/rdb classes). RGW Keystone is noted below

--xarses (talk) RGW keystone should be included in the ceph module as RGW is the consumer of the keystone service. Unlike cinder/glance where they are consumers of ceph.

RadosGW components

The RadosGW is developped as an integral part of Ceph. It is however not required to deploy a cluster and should be treated as any client application of the cluster.

rgw

proposed name: ceph::rgw
purpose: configures a ceph radosgw , setup /etc/ceph/ceph.conf with the MONs IPs, optionaly set the key to allow the OSD to connect to the MONs
interface:
- monitor_ips - list of ip addresses used to connect to the monitor servers
- key - the secret key for the id user
- id - the id of the user
- rgw_data - the path where the radosgw data should be stored
- fcgi_file - path to the fcgi file e.g. /var/www/s3gw.fcgi

Danny Al-Gaaf (talk) the monitor_ips are not needed: IMO ceph::conf should provide these information to all other

--xarses (talk) agree with Danny; we are missing:

- user - user to run rados as as well as own files
- host - hostname for this ini section
- keyring_path - path to key file
- log_file - where to write logs to
- rgw_dns_name - dns name (may include wildcard ) to use with s3 api calls
- rgw_socket_path - path to socket file
- rgw_print_continue - (bool) if we are going to send 100 codes to the client

--xarses (talk) also we should include apache magic here to setup vhost and script-server. in which case we should also support *port* param.

rgw keystone

proposed name: ceph::rgw::keystone
purpose: extends radosgw configuration to be able to retrieve auth from keystone tokens and setup keystone endpoint
interface:
- rgw_keystone_url - the internal or admin url for keystone
- rgw_keystone_admin_token - the admin token for keystone
- rgw_keystone_accepted_roles - which roles should we accept from keystone
- rgw_keystone_token_cache_size - how many tokens to keep cached, not useful if not using PKI as every token is checked
- rgw_keystone_revocation_interval - interval to check for expired tokens, not useful if not using PKI tokens (if not, set to high value)
- use_pki - (bool) to determine if keystone is using token_format = PKI and if so do PKI signing parts
- nss_db_path - path to NSS < - > keystone tokens db files

rgw_user

proposed name: ceph::rgw_user
purpose: create/remove users and Swift users for the RadosGW S3/Swift API
interface:
- user - username
- key - secret key (could get generated if needed)
- swift_user - username for the Swift API user
- swift_key - secret key for the Swift API user

Related tools and implementations

deploy ceph : ceph-deploy

 for test / POC purposes
 https://github.com/ceph/ceph-deploy

 maintainer: Alfredo Deza

deploy ceph with puppet : puppet-cephdeploy

 relies on ceph-deploy
 https://github.com/dontalton/puppet-cephdeploy/

 maintainer: Don Talton

deploy ceph with puppet : puppet-ceph

 developped in 2012 but still useful, upstream
 https://github.com/enovance/puppet-ceph

 maintainer: community

 fork of puppet-ceph, updated recently
 https://github.com/TelekomCloud/puppet-ceph/tree/rc/eisbrecher
 handling of secrets
 https://github.com/TelekomCloud/puppet-secret
 https://github.com/TelekomCloud/puppet-ceph/blob/rc/eisbrecher/examples/example-site.pp
 maintainer: Deutsche Telekom AG (DTAG)

 another fork of puppet-ceph, uses disks by-path links, includes rgw support and removes secrets/keys from puppet
 https://github.com/cernceph/puppet-ceph
 maintainer: CERN

ceph + openstack : ceph docs

 manual integration
 http://ceph.com/docs/next/rbd/rbd-openstack/
  maintainer: John Wilkins + Josh Durgin

ceph + openstack with puppet : stackforge

 https://github.com/stackforge/puppet-glance/blob/stable/grizzly/manifests/backend/rbd.pp
 https://github.com/stackforge/puppet-cinder/blob/stable/grizzly/manifests/volume/rbd.pp

 maintainer: community

ceph + openstack with puppet : COI

 targeting Cisco use case
 https://github.com/CiscoSystems/puppet-coe/tree/grizzly/manifests/ceph
 http://docwiki.cisco.com/wiki/OpenStack:Ceph-COI-Installation

 maintainer : Don Talton + Robert Starmer

ceph + openstack with puppet : mirantis

 in the context of Fuel
 https://github.com/Mirantis/fuel/tree/master/deployment/puppet/ceph
 https://github.com/Mirantis/fuel/blob/master/deployment/puppet/cinder/manifests/volume/ceph.pp
 https://github.com/Mirantis/fuel/blob/master/deployment/puppet/glance/manifests/backend/ceph.pp

 maintainer : Andrew Woodward

openstack with puppet : openstack-installer

 data driven approach to deploy OpenStack
 https://github.com/CiscoSystems/openstack-installer/

 maintainer: Robert Starmer + Dan Bode

Revision as of 17:06, 30 March 2014 (view source) Dachary (talk \| contribs) (→‎Roadmap) ← Older edit	Latest revision as of 21:17, 2 April 2015 (view source) Mgagne (talk \| contribs) m (Mgagne moved page Puppet-openstack/ceph-blueprint to Puppet/ceph-blueprint)
(No difference)