TripleO/TripleOCloud
Contents
Draft
This is draft docs about an experiment we're considering! It describes an intent, not reality.
Overview
TripleO developers run a production-ready cloud as part of ensuring they are delivering something usable. We treat this as a production CD environment: Kanban for the folk working on it is at https://trello.com/b/0jIoMrdo/tripleo and https://trello.com/tripleo. We iterate and evolve in response to OpenStack as a whole, while improving the deployment mechanisms and code. Note: trello is just -a- kanban to story the operational status: an arbitrary choice to let us experiment with the setup, and subject to change if this works
tripleo-cd-admins team
Direct access to the hardware needs to be kept to a balanced team : too few and they become a bottleneck, too many and it becomes a security / reliability concern. Any TripleO ATC can apply to be a member, and existing members will vote, with the PTL having a veto (but not an override). The tripleo-cd-admins team will be the determining factor for all access that is equivalent to 'run arbitrary code outside of kvm/Xen'; having this set be decoupled and manually reviewed is part of the security policy on the machines : we can't trivially change it - the machines were provisioned with the expectation of non-HP staff administering them, but that set is expected to be known quantities.
The current list of members is maintained in the incubator tree [1]. Any TripleO ATC can request access by submitting a review adding themselves, but access won't take effect immediately as nova keypairs are not synced out automatically. Only existing tripleo-cd-admins should approve changes to the tripleo-cd-admins list
access rules
We expect http://ci.openstack.org/sysadmin.html#ssh-access (or some to be defined subset) to be followed - this is a production, operational environment.
https://docs.google.com/spreadsheet/ccc?key=0AlLkXwa7a4bpdERqN0p5RjNMQUJJeDdhZ05fRVUxUnc&usp=sharing has the undercloud admin credentials; please only use in emergencies - normal operation should use dedicated user codes. Access to the spreadsheet is given when joining the team.
Workflow
Basic principles:
- unblock bottlenecks first, then unblock everyone else.
- folk are still self directed - it's open source - but clear
visibility of work needed and it's relevance to the product *right now* is crucial info for people to make good choices. (and similar Mark McLoughlin was asking about what bugs to work on which is a symptom of me/us failing to provide clear visibility)
- clear communication about TripleO and plans / strategy and priority
(Datacentre ops? Continuous deployment story?)
Reviews are king
Since we want to minimise folk with root access to the cluster, we need to ensure that most/all changes can be done entirely via code review or non-root-equivalent APIs. latency in review will drive WIP higher.
Roadmap
The tripleo roadmap should live in the roadmap Kanban board: it's where new initiatives will be drawn down from. At any point in time there should be just one conceptual goal on the roadmap, at least until we've eliminated stalls and driven inventory down.
TripleO WIP
The current TripleO team (== folk working on TripleO, regardless of organisation) track what is being worked on concurrently in the TripleO Kanban board. The following guidelines will help the team understand what things are progressing and what things are stalled at any point in time:
- any one individual should only ever be working on one card: if one yak shaves, one should clearly show what's being worked on.
- multiple people can be working on the same card: dogpiling and collaboration is fine - and a good thing
- lane priority is right-to-left: moving the cards that are closest to the right side (done) is more useful than moving cards that are further away: the goal is to reduce work in progress and deliver rapidly.
- cards can be any size/scope but the ideal, similar to code review is that they should be a single conceptual goal : there may be multiple steps in a single card.
- Anyone can split/combine cards as needed to make the board communicate our WIP more usefully.
Bugs
TripleO still uses bugs:- issues in the code, and things that users would reasonably like to achieve that involve code changes are best represented as bugs. Kanban cards can and should link to bugs. We don't require a bug for every code change, but if there is a code change that you aren't going to do right now, but which should be captured: a bug is the right place.
use bugs for things which need code changes.
Blueprints
TripleO still uses blueprints: they are the approval mechanism for new items on the roadmap. Design work for them should still be done in etherpads.
use blueprints to get new items into the roadmap.
etherpads
TripleO still uses etherpads: they are great for collaborative editing of plans/designs, and tracking work being done on a card in a lightweight accessible fashion.
use etherpads for status of and design for cards
Seed VM host
The seed VM host is a manually installed machine, reachable over ssh. Until we've done an audit for prior data, access to the machine is restricted to HP employees who are TripleO ATC's. (It's also got some weird networking shit, avoid it!)
Seed
The seed is a VM provisioned via boot-seed vm on the Seed VM host, reachable over ssh from other machines in the cloud. Access is available to any tripleo-cd-admins member.
Undercloud
The API endpoint for the undercloud that deploys the cloud is: https://cd-undercloud.tripleo.org:13000/v2.0. Ssh credentials are available from Robert Collins (and is limited to tripleo-cd-admins).
API Access
API access to the Undercloud is available to tripleo-cd-admins, as API access permits deploying arbitrary code to the physical machines. Credentials are setup when joining tripleo-cd-admins, and removed when leaving the team.
The undercloud can deploy machines to the range 10.10.16.171 10.10.16.188 today (limited due to hardware availability).
Updates to software
We can't deploy the undercloud automatically yet, so it's basically static.
To update e.g. nova: cd /opt/stack/nova git review -d <somereviewwitheverythingwewantinit> /opt/stack/venvs/nova/bin/pip install . os-collect-config --one --force
Hardware configuration
The machines have 24 cores, 96G of ram and 2x2TB hard disks with a hardware raid controller. There is a 10G mellanox dual-port card as well, which shows up as eth2 - this is the only wired up network port. As far as we can tell the BIOS defaults to not enabling/net-booting the 10G card, so it needs to be both enabled (via pci devices) and have net boot turned on for it in the system BIOS setup.
Note that you'll need to watch it try and boot to get the MAC - assume unenrolled machines have the wrong mac in machine-information-tab.txt. Use 'F12' right after the smart array scrolls by to force a network boot so that you don't have a bad enrollment in nova. The mellanox nic0 address (e.g. 78:e7:d1:03:00:23:94:cd) has the first three and last three octets of the nic1 address, which is the one we want (78:e7:d1:23:94:cd). alternatively configure pxe booting and then watch for dhcp in tcpdump: 'set system1/bootconfig1/bootsource5 bootorder=1'
Current adhoc variance in the undercloud
- Nova is running https://review.openstack.org/#/c/49658/ to let deploys happen
- We are manually running the tripleo-cd logic from the undercloud control plane
Overcloud
The overcloud is deployed per commit, and open access is available to any TripleO ATC: they may use the cloud as desired - having users on the cloud helps validate that the cloud is usable! File all bugs on the tripleo bug tracker. Access may also be granted to non-(TripleO ATC's) at the TripleO PTL's discretion.
API Access
Contact anyone in tripleo-cd-admins for API credentials to the overcloud.
SSH Access
This is granted via membership in tripleo-cd-admins, as bare metal access permits running anything at all in that environment. The technical implementation is that SSH keys are provisioned via the 'admin/default' key-pair in the undercloud (which is manually synced with tripleo-cd-admins).