Octavia/Non-arbitrary Decisions

= Octavia Non-Arbitrary Decisions = In the course of writing code for the Octavia project, developers may often encounter situations where a choice between several different (but often equally viable) ways to accomplish the same thing may present themselves. Most of the time, developers should feel free to make whatever choice they'd like to, arbitrarily, as long as they are following the best practices set forth in the HACKING guide for the project.

However, it is also often the case that some choices which may seem arbitrary at first glance end up not being arbitrary once more is understood about the problem. (That is to say, reasons emerge which make two roughly equivalent but not identical ways of accomplishing the same thing... less equivalent.) Evaluation of these choices often sparks debate on the mailing list, IRC channel, garret review comments, etc.

In order to prevent re-hashing the same discussions over these decisions, we have created this page to document non-arbitrary decisions made in the course of creating and maintaining Octavia. This does not mean that we shouldn't re-evaluate some of these decisions at all, but rather, that we shouldn't unless new information is available which adds to the previous discussion, or unless circumstances or assumptions made when making the non-arbitrary decision change. Basically, this page should be consulted to make sure the thing you object to isn't just rehashing a discussion we had about it in the past (to save everyone some time).

If a decision sparks a debate on the mailing list or elsewhere that takes you more than a few minutes to resolve, please document it here and save your fellow team-mates some time!

Now, on to the decisions:

Description
For the Octavia nova nodes which perform the actual load balancing functions, the initial reference implementation will be accomplished using haproxy. But the question remains: Should we use one haproxy instance per listening service, or one haproxy instance per loadbalancer (vip)?

Why does this matter?

 * This decision will affect a lot of things having to do with the specific ways in which individual haproxy configuration files are created and manipulated, how statistics and status are evaluated, and the kind of experience visible to the end user.
 * This decision is unlikely to be altered later as it both represents a paradigm around service delivery, and because it affects enough subtle areas of code that changing at a later date is likely to be troublesome.

IN FAVOR of one haproxy process per loadbalancer

 * Reduced memory overhead
 * Reduced CPU overhead
 * Decreased load balancer build time
 * Reduced network traffic and health monitoring overhead
 * Allows sharing of back-end pools
 * Allows single, unified log (simpler log aggregation, too)
 * "Shared fate" of all listeners on a single load balancer (customers expect this)
 * Fewer TCP ports in use
 * Single haproxy process per octavia VM is a "simpler" set-up, and we can utilize standard OS init scripts for process management.

Counterpoints to the above

 * Reduction in memory and CPU overhead may actually be significant-- benchmarks should be run.
 * Increased load balancer build time is insignificant (milliseconds for a process that will exist for months to years)
 * Reduction in network traffic due to fewer configuration files is also insignificant
 * Our models do not allow for sharing of back-end pools anyway. And if/when they do, sharing pools between listeners will be a rare edge case.
 * Multiple processes can also use a unified log. (In fact, multiple processes allows for more flexibility here.) Log aggregation is equivalently easy with multiple haproxy processes.
 * "Shared fate" is otherwise known as "no fault isolation" which is actually a bad thing.
 * Same number of TCP ports will be used for either solution
 * We were never talking about single haproxy process per Octavia VM, we were talking about single haproxy process per loadbalancer (and an Octavia VM may have many loadbalancers), meaning we can't use the default OS init scripts anyway. There is also no reason we couldn't write OS init scripts (ie. systemd process management) for each listener as well.

IN FAVOR of one haproxy process per listener

 * Single process per listener is more flexible than single process per loadbalancer because:
 * Multiple log files can be used (and multiple log verbosity levels) if desired
 * haproxy keywords that belong in 'defaults' section can differ from listener to listener (example: keepalive configuration and timeouts)
 * Simpler haproxy configuration templates
 * Fault isolation between listeners
 * Reduced service interruptions due to normal configuration changes
 * Simpler to parse usage / stats data per listener
 * Simpler to parse operational status per listener
 * SLAs are equivalent between models
 * Troubleshooting is simpler with one haproxy process per listener because:
 * Operator can easily see resource usage of individual listeners with standard OS tools
 * Operator can start / stop / etc. single listener without affecting other listeners on same loadbalancer
 * Operator can alter global logging configuration for one listener without affecting other listeners
 * Same for any other parameters in 'defaults' section of configuration

Counterpoints to the above

 * None offered

Where did the discussion happen?

 * Mailing list: http://lists.openstack.org/pipermail/openstack-dev/2014-August/043596.html
 * Weekly IRC meeting: http://eavesdrop.openstack.org/meetings/octavia/2014/octavia.2014-08-27-20.00.log.txt

Additional research notes

 * Benchmarks (run by German Eichberger @ HP): https://etherpad.openstack.org/p/Octavia_LBaaS_Benchmarks
 * Benchmark results showed no significant performance or resource usage difference between running 1 haproxy per listener versus 1 haproxy per loadbalancer in testing.

Decision reached
Vote in IRC meeting on 2014-08-27 was IN FAVOR of one haproxy process per listener.

Rescinded
Due to the memory overhead and shared fate between the HAProxy processes, we have rescinded this decision and have now moved to a single process model. See: https://storyboard.openstack.org/#!/story/2005412

Description
In the original Octavia design documents, the "thing which does the actual load balancing" was referred to as the "Octavia VM." At some point, someone pointed out that this thing might not actually be a virtual machine because the role might be filled by a container. This sparked discussion in IRC and elsewhere, and it became clear we needed to come up with another name.

Why does this matter?

 * Code depends on the name we choose here and it's not trivial to change it once a significant amount of code depends on the name we choose.
 * Using common terminology is extremely important to ensure we all understand each other.
 * While what we call this thing is mostly arbitrary, it's actually more important to pick and stick with a name.

Summary of primary arguments from each side of debate
There were many name ideas suggested, and no clear "sides" of this debate. But concerns raised by various parties include:
 * Don't want to use something too commonly used
 * Try to avoid names with pre-conceived meanings (especially meanings in the OpenStack or python coding world. "instance" is a terrible name, for example.)
 * Name should represent the idea of a virtual machine / container / host / appliance in some way
 * Name should have a plural or group representation for when we group these things together (ex. "bee / swarm")

Where did the discussion happen?

 * IRC: (see discussion between 2014-09-01 and 2014-09-05): http://eavesdrop.openstack.org/irclogs/%23openstack-lbaas/
 * Octavia IRC meeting: http://eavesdrop.openstack.org/meetings/octavia/2014/octavia.2014-09-03-20.00.log.html
 * Voting etherpad: https://etherpad.openstack.org/p/octavia-backend-name

Decision reached
We voted via etherpad on 2014-09-03 and 2014-09-04, and the name that won was: amphora