Jump to: navigation, search

TripleO/TripleOCloud

< TripleO
Revision as of 03:13, 7 October 2013 by Lifeless (talk | contribs) (access rules)

Draft

This is draft docs about an experiment we're considering! It describes an intent, not reality.

Overview

TripleO developers run a production-ready cloud as part of ensuring they are delivering something usable. We treat this as a production CD environment: Kanban for the folk working on it is at https://trello.com/b/0jIoMrdo/tripleo and https://trello.com/tripleo. We iterate and evolve in response to OpenStack as a whole, while improving the deployment mechanisms and code. Note: trello is just -a- kanban to story the operational status: an arbitrary choice to let us experiment with the setup, and subject to change if this works

tripleo-cd-admins team

Direct access to the hardware needs to be kept to a balanced team : too few and they become a bottleneck, too many and it becomes a security / reliability concern. Any TripleO ATC can apply to be a member, and existing members will vote, with the PTL having a veto (but not an override). The tripleo-cd-admins team will be the determining factor for all access that is equivalent to 'run arbitrary code outside of kvm/Xen'; having this set be decoupled and manually reviewed is part of the security policy on the machines : we can't trivially change it - the machines were provisioned with the expectation of non-HP staff administering them, but that set is expected to be known quantities.

The current list of members is maintained in the incubator tree [1]. Any TripleO ATC can request access by submitting a review adding themselves, but access won't take effect immediately as nova keypairs are not synced out automatically. Only existing tripleo-cd-admins should approve changes to the tripleo-cd-admins list

access rules

We expect http://ci.openstack.org/sysadmin.html#ssh-access (or some to be defined subset) to be followed - this is a production, operational environment.

https://docs.google.com/spreadsheet/ccc?key=0AlLkXwa7a4bpdERqN0p5RjNMQUJJeDdhZ05fRVUxUnc&usp=sharing has the undercloud admin credentials; please only use in emergencies - normal operation should use dedicated user codes. Access to the spreadsheet is given when joining the team.

Workflow

Basic principles:

  • unblock bottlenecks first, then unblock everyone else.
  • folk are still self directed - it's open source - but clear

visibility of work needed and it's relevance to the product *right now* is crucial info for people to make good choices. (and similar Mark McLoughlin was asking about what bugs to work on which is a symptom of me/us failing to provide clear visibility)

  • clear communication about TripleO and plans / strategy and priority

(Datacentre ops? Continuous deployment story?)

Reviews are king

Since we want to minimise folk with root access to the cluster, we need to ensure that most/all changes can be done entirely via code review or non-root-equivalent APIs. latency in review will drive WIP higher.

Roadmap

The tripleo roadmap should live in the roadmap Kanban board: it's where new initiatives will be drawn down from. At any point in time there should be just one conceptual goal on the roadmap, at least until we've eliminated stalls and driven inventory down.

TripleO WIP

The current TripleO team (== folk working on TripleO, regardless of organisation) track what is being worked on concurrently in the TripleO Kanban board. The following guidelines will help the team understand what things are progressing and what things are stalled at any point in time:

  • any one individual should only ever be working on one card: if one yak shaves, one should clearly show what's being worked on.
  • multiple people can be working on the same card: dogpiling and collaboration is fine - and a good thing
  • lane priority is right-to-left: moving the cards that are closest to the right side (done) is more useful than moving cards that are further away: the goal is to reduce work in progress and deliver rapidly.
  • cards can be any size/scope but the ideal, similar to code review is that they should be a single conceptual goal : there may be multiple steps in a single card.
  • Anyone can split/combine cards as needed to make the board communicate our WIP more usefully.

Bugs

TripleO still uses bugs:- issues in the code, and things that users would reasonably like to achieve that involve code changes are best represented as bugs. Kanban cards can and should link to bugs. We don't require a bug for every code change, but if there is a code change that you aren't going to do right now, but which should be captured: a bug is the right place.

use bugs for things which need code changes.

Blueprints

TripleO still uses blueprints: they are the approval mechanism for new items on the roadmap. Design work for them should still be done in etherpads.

use blueprints to get new items into the roadmap.

etherpads

TripleO still uses etherpads: they are great for collaborative editing of plans/designs, and tracking work being done on a card in a lightweight accessible fashion.

use etherpads for status of and design for cards

Seed VM host

The seed VM host is a manually installed machine, reachable over ssh. Until we've done an audit for prior data, access to the machine is restricted to HP employees who are TripleO ATC's. (It's also got some weird networking shit, avoid it!)

Seed

The seed is a VM provisioned via boot-seed vm on the Seed VM host, reachable over ssh from other machines in the cloud. Access is available to any tripleo-cd-admins member.

Undercloud

The API endpoint for the undercloud that deploys the cloud is: https://cd-undercloud.tripleo.org:13000/v2.0. Ssh credentials are available from Robert Collins (and is limited to tripleo-cd-admins).

API Access

API access to the Undercloud is available to tripleo-cd-admins, as API access permits deploying arbitrary code to the physical machines. Credentials are setup when joining tripleo-cd-admins, and removed when leaving the team.

The undercloud can deploy machines to the range 10.10.16.171 10.10.16.188 today (limited due to hardware availability).

Updates to software

We can't deploy the undercloud automatically yet, so it's basically static.

To update e.g. nova: cd /opt/stack/nova git review -d <somereviewwitheverythingwewantinit> /opt/stack/venvs/nova/bin/pip install . os-collect-config --one --force

Hardware configuration

The machines have 24 cores, 96G of ram and 2x2TB hard disks with a hardware raid controller. There is a 10G mellanox dual-port card as well, which shows up as eth2 - this is the only wired up network port. As far as we can tell the BIOS defaults to not enabling/net-booting the 10G card, so it needs to be both enabled (via pci devices) and have net boot turned on for it in the system BIOS setup.

Current adhoc variance in the undercloud

Overcloud

The overcloud is deployed per commit, and open access is available to any TripleO ATC: they may use the cloud as desired - having users on the cloud helps validate that the cloud is usable! File all bugs on the tripleo bug tracker. Access may also be granted to non-(TripleO ATC's) at the TripleO PTL's discretion.

API Access

Contact anyone in tripleo-cd-admins for API credentials to the overcloud.

SSH Access

This is granted via membership in tripleo-cd-admins, as bare metal access permits running anything at all in that environment. The technical implementation is that SSH keys are provisioned via the 'admin/default' key-pair in the undercloud (which is manually synced with tripleo-cd-admins).