Octavia Non-Arbitrary Decisions

In the course of writing code for the Octavia project, developers may often encounter situations where a choice between several different (but often equally viable) ways to accomplish the same thing may present themselves. Most of the time, developers should feel free to make whatever choice they'd like to, arbitrarily, as long as they are following the best practices set forth in the HACKING guide for the project.

However, it is also often the case that some choices which may seem arbitrary at first glance end up not being arbitrary once more is understood about the problem. (That is to say, reasons emerge which make two roughly equivalent but not identical ways of accomplishing the same thing... less equivalent.) Evaluation of these choices often sparks debate on the mailing list, IRC channel, garret review comments, etc.

In order to prevent re-hashing the same discussions over these decisions, we have created this page to document non-arbitrary decisions made in the course of creating and maintaining Octavia. This does not mean that we shouldn't re-evaluate some of these decisions at all, but rather, that we shouldn't unless new information is available which adds to the previous discussion, or unless circumstances or assumptions made when making the non-arbitrary decision change. Basically, this page should be consulted to make sure the thing you object to isn't just rehashing a discussion we had about it in the past (to save everyone some time).

If a decision sparks a debate on the mailing list or elsewhere that takes you more than a few minutes to resolve, please document it here and save your fellow team-mates some time!

Now, on to the decisions:

One haproxy process per listener, or one haproxy process per loadbalancer? (Rescinded)

Description

For the Octavia nova nodes which perform the actual load balancing functions, the initial reference implementation will be accomplished using haproxy. But the question remains: Should we use one haproxy instance per listening service, or one haproxy instance per loadbalancer (vip)?

Why does this matter?

This decision will affect a lot of things having to do with the specific ways in which individual haproxy configuration files are created and manipulated, how statistics and status are evaluated, and the kind of experience visible to the end user.
This decision is unlikely to be altered later as it both represents a paradigm around service delivery, and because it affects enough subtle areas of code that changing at a later date is likely to be troublesome.

Summary of primary arguments from each side of debate

IN FAVOR of one haproxy process per loadbalancer

Reduced memory overhead
Reduced CPU overhead
Decreased load balancer build time
Reduced network traffic and health monitoring overhead
Allows sharing of back-end pools
Allows single, unified log (simpler log aggregation, too)
"Shared fate" of all listeners on a single load balancer (customers expect this)
Fewer TCP ports in use
Single haproxy process per octavia VM is a "simpler" set-up, and we can utilize standard OS init scripts for process management.

Counterpoints to the above

Reduction in memory and CPU overhead may actually be significant-- benchmarks should be run.
Increased load balancer build time is insignificant (milliseconds for a process that will exist for months to years)
Reduction in network traffic due to fewer configuration files is also insignificant
Our models do not allow for sharing of back-end pools anyway. And if/when they do, sharing pools between listeners will be a rare edge case.
Multiple processes can also use a unified log. (In fact, multiple processes allows for more flexibility here.) Log aggregation is equivalently easy with multiple haproxy processes.
"Shared fate" is otherwise known as "no fault isolation" which is actually a bad thing.
Same number of TCP ports will be used for either solution
We were never talking about single haproxy process per Octavia VM, we were talking about single haproxy process per loadbalancer (and an Octavia VM may have many loadbalancers), meaning we can't use the default OS init scripts anyway. There is also no reason we couldn't write OS init scripts (ie. systemd process management) for each listener as well.

IN FAVOR of one haproxy process per listener

Single process per listener is more flexible than single process per loadbalancer because:
- Multiple log files can be used (and multiple log verbosity levels) if desired
- haproxy keywords that belong in 'defaults' section can differ from listener to listener (example: keepalive configuration and timeouts)
Simpler haproxy configuration templates
Fault isolation between listeners
Reduced service interruptions due to normal configuration changes
Simpler to parse usage / stats data per listener
Simpler to parse operational status per listener
SLAs are equivalent between models
Troubleshooting is simpler with one haproxy process per listener because:
- Operator can easily see resource usage of individual listeners with standard OS tools
- Operator can start / stop / etc. single listener without affecting other listeners on same loadbalancer
- Operator can alter global logging configuration for one listener without affecting other listeners
- Same for any other parameters in 'defaults' section of configuration

Counterpoints to the above

None offered

Where did the discussion happen?

Mailing list: http://lists.openstack.org/pipermail/openstack-dev/2014-August/043596.html
Weekly IRC meeting: http://eavesdrop.openstack.org/meetings/octavia/2014/octavia.2014-08-27-20.00.log.txt

Additional research notes

Benchmarks (run by German Eichberger @ HP): https://etherpad.openstack.org/p/Octavia_LBaaS_Benchmarks
- Benchmark results showed no significant performance or resource usage difference between running 1 haproxy per listener versus 1 haproxy per loadbalancer in testing.

Decision reached

Vote in IRC meeting on 2014-08-27 was IN FAVOR of one haproxy process per listener.

Rescinded

Due to the memory overhead and shared fate between the HAProxy processes, we have rescinded this decision and have now moved to a single process model. See: https://storyboard.openstack.org/#!/story/2005412

What should we call the back-end VM / container / machine / appliance / thingy?

Description

In the original Octavia design documents, the "thing which does the actual load balancing" was referred to as the "Octavia VM." At some point, someone pointed out that this thing might not actually be a virtual machine because the role might be filled by a container. This sparked discussion in IRC and elsewhere, and it became clear we needed to come up with another name.

Why does this matter?

Code depends on the name we choose here and it's not trivial to change it once a significant amount of code depends on the name we choose.
Using common terminology is extremely important to ensure we all understand each other.
While what we call this thing is mostly arbitrary, it's actually more important to pick and stick with a name.

Summary of primary arguments from each side of debate

There were many name ideas suggested, and no clear "sides" of this debate. But concerns raised by various parties include:

Don't want to use something too commonly used
Try to avoid names with pre-conceived meanings (especially meanings in the OpenStack or python coding world. "instance" is a terrible name, for example.)
Name should represent the idea of a virtual machine / container / host / appliance in some way
Name should have a plural or group representation for when we group these things together (ex. "bee / swarm")

Where did the discussion happen?

IRC: (see discussion between 2014-09-01 and 2014-09-05): http://eavesdrop.openstack.org/irclogs/%23openstack-lbaas/
Octavia IRC meeting: http://eavesdrop.openstack.org/meetings/octavia/2014/octavia.2014-09-03-20.00.log.html
Voting etherpad: https://etherpad.openstack.org/p/octavia-backend-name

Decision reached

We voted via etherpad on 2014-09-03 and 2014-09-04, and the name that won was: amphora

Octavia/Non-arbitrary Decisions

Contents

Octavia Non-Arbitrary Decisions

One haproxy process per listener, or one haproxy process per loadbalancer? (Rescinded)

Description

Why does this matter?

Summary of primary arguments from each side of debate

IN FAVOR of one haproxy process per loadbalancer

Counterpoints to the above

IN FAVOR of one haproxy process per listener

Counterpoints to the above

Where did the discussion happen?

Additional research notes

Decision reached

Rescinded

What should we call the back-end VM / container / machine / appliance / thingy?

Description

Why does this matter?

Summary of primary arguments from each side of debate

Where did the discussion happen?

Decision reached