NovaNetNeutronParity

[ out of date, updates for Juno in progress ]

''One document, one mission. - author's note"

Parity related efforts in Icehouse were primarily analysis, documentation, and tracking of related blueprints, bugs and reviews. Endeavors for Juno need to be very specific in order to get further. There are five areas that need serious focus:


 * quality and performance
 * gate performance
 * interoperability
 * test coverage
 * migration paths

Pragmatically, it is pointless to discuss this without considering resourcing. 5 areas require serious focus implies 5 principle resources at minimum. Additionally, a core should be either be one of these resources or, perhaps optimally, "in the loop". Besides facilitating reviews a core may be better positioned to intervene when a patch or effort risks breaking parity related functionality. Ideally, persons with a parity focus (or at least keeps it in mind) should be actively involved with all related efforts (e.g. DVR, HA, etc.). In short, parity is a sort of cross cutting concern and it needs sufficient representation across the relevant efforts to be achieved. Under-resourcing risks any practical progress requiring mulitple cycles.

While an overall objective of deprecating nova-networking may be an OpenStack goal, this is a distraction. The focus is to provide a superior option for OpenStack users to migrate to that need more flexibility, features performance and scalabilty.

Quality and Performance
Specifically, the quality and performance of the OpenSource components of Neutron have to be beyond reasonable reproach.


 * Functionality must be reliable with well understood code paths and state transitions.
 * Configuration and use should be, if not precisely intuitive, be sensible within the eyes of the user. A simple deployment should not be unduly complex to configure. Complexity should mount in direct correlation to configuration.
 * Operations should have reasonable completion times and not be unduly related to number of concurrent operations (e.g. while "batching" is acceptable for some things, delay by design in provisioning is unaccepable)
 * Test coverage- the review queue for tests needs constant attention!
 * Real world perspective and vigilence - reasonable subjective understanding of real world use cases! The persons working on parity should be, besides extending and amplifying the tempest test suite need to really know how neutron is behaving in a realistic deployments. Even simple ones. Some things are not reasonable to put in tempest, but are still reasonable tests. How long does it take to assign floating IPs? What's the impact of multiple concurrent spawns? Can you break things by just doing a bunch of stuff using the client concurrently? This is just a matter of really knowing what the product is doing.

Gate Performance
Each parity team member should almost appear as though they are infra team members. Where the gate is wrong, the gate needs to be made right. This is not just a matter of making the tests right, but making sure that the infrastructure has what it needs to do what is needed! Poor performance of neutron as a whole in the gate is a parity blocker.

Interoperability
Nova is both the client and the server with respect to neutron. The workings of these interactions cannot be ignored and have to be extremely well understood. This implies some straddling of the teams and getting to know the related goals and objectives of the active nova efforts. Active representation is essential.

(There are people who are currently straddling both, but we can use more!)

Test Coverage
mlavalle, anteaya, and rossella_s (and probably others) have been spearheading this effort to date. These efforts need to be considered as directly relevant.

Migration Paths
There is an overlap with interoperability and quality/performance and migration paths. This is effectively a cross-cutting concern that partly defines what aspects of quality and interoperability are specific to parity. That is, if it will never affect a migrating user, it is not likely a parity concern. Realistically, there are infinite possibilties of deployments so this it is impractical to empirically define all migration paths. Parity principals should be familiar with the nova network manager types, multi-host, L3 functionality, etc. so that they can conceptualize how a user may migrate. This is not a "nice to have". It is essential and also part of why overall neutron project awareness is important. If a design decision is made that explicitly and deliberately hinders a basic migration path, then it needs to be mitigated.

Roles of Cores and Principals
Vigilence on the review queues is essential. Support for patches that enhance migration potential, performance, etc. must be given in a timely fashion as is intervention in deviations. Core involvement facilitates proper emphasis where required. REMEMBER however that anyone can -1.. In short, the role of the the principal is to be actively involved in at least one related activity, safeguarding migration paths, stability, etc. and otherwise contributing to closing perceived feature and functionality gaps that are migration obstacles. Principals attend the biweekly parity meetings and remain in close contact with the other principals in the interim. Situations will arise where principals must work in closely directed efforts to rapidly overcome particular obstacles (geurilla tactics? hero sprints?). A principal need not be core, but the demands on overall knowledge and the complexity of some of the problem areas does imply a certain amount of experience.

For the purposes of continuity and progress, a principal should be in a positiion to commit to the parity effort for a minimum of a complete cycle. Ownership of a blueprint exclusive to parity implicitly assigns the role of principal and all of the associated obligations.

Principals should be vigilent and vocal about perceived or anticipated risks that affect parity goals.

As most of us are employed to work on OpenStack, the priorities of the employer may sometimes conflict with commitments made as principal. This is a "fact of life", needs to be accepted as a risk and mitigated. If a priority change occurs that will impact involvement, the team should do what it can to mitigate the impact, honestly and critically assess the impact that effectively results, communicating it upwards to the neutron community as whole and if possible onboard a "backup".

Approach to Organization
It is more important to be active in the principal areas than to have frequent formal meetings. Most principal areas are addressed under other guises weekly at the IRC meeting. A meeting every two weeks for the principals is worthwhile to start, accelerating or spreading out as the overall situation demands. Interoperability should continue as a weekly item in the neutron team meetings. Individuals wishing to contribute but are unable to commit to a protracted effort for a cycle are still valuable and all contributions are welcome. If a contributor steps in, the principals must effectively sponsor the effort, providing timely feedback, review support, etc. to make the most of these contributions. This implies a relationship between the number of non-principal and principal contributors, but it is unlikely that this would become a limitation beyond that of the core/non-core contributor ratio in the neutron project as a whole.

For a practical period estabished by the principals, a principal must be selected to "own" the parity effort as a whole. The period may be a cycle, a milestone or some other reasonable practical interval. The role of the "owner" is to coordinate the meetings and foster communication. The "owner" should also be vigilent of risks pertaining to organization and meeting objectives. The "owner" is the principle point of contact with respect to the effort for the PTL and other team members. The "owner" is not directly responsible for the entire body of work. If the current "owner" for any reason whatsoever, feels unable to continue in that role it should be immediately transferred to an acting owner until another principal accepts the responsibility. The portability of the "mantle" of owner effectively limits the scope and weight of that responsibility. That is, the ability of a current owner to continue in that role should not pose an undue risk to the success of the effort.

A more hierarchical approach has been considered, but poses certain organizational problems. Considering that parity related efforts overlap with other efforts (IPv6 support, performance, etc), in a hierarchical model a principal could effectively working within two hierarchies. This can lead to conflict and loss of productivity, not to mention loss of interest. Also, a hierarchy implies a certain responsibility for the end result. Again, where there are multiple parallel efforts with different ultimate goals, this effectively marks a person as responsible for something they do not have direct control over.

Obviously a "non-organization" has also been attempted where the objectvies were outlined, analysis provided, priority stated, etc.. This is too unstructured and simply results in no discernible progress where popular priorities lie elsewhere. The notion of principals, a defined minimal critical mass (the minimal number of principals required to pull it off) and consideration of how to work within efforts with greater general buy-in from the rest of the community is in direct response to this. If the minimal number of principals cannot be obtained for a cycle, it is a sort of darwinian indication that there is insufficient interest within the community to achieve the goals. Working through existing efforts somewhat increases the potential for obtaining this minimal number (a principal could in fact be the owner of a related effort) and at the same time avoids going too far in the other direction of having an autonomous team of parity-gorillas running rampant across parallel efforts, generally disrupting efforts and creating a lot of conflict.

Minimal Number of Principals/Low Numbers of Contributors
Operating with the bare minimum of principals and few contributors (like 0) comes with a fair degree of risk and is not the most effective way to get things done. In this case, it may be necessary to coalesce the available principles around particular well defined milestones and execute a sprint (now is the time for parity-gorilla mode). These need to be well defined and scoped to fit a reasonable for a sustainable sprint duration. While getting together has its merits it can be difficult for some organizations to assign a clear $ value to a ROI on such a thing. Google hangouts, IRC, free phone conference are all reasonable tools to employ in these situations with good results. Of course, spritns can and should be used where warranted even if there are lots of contributors. As long as the scope is well defined and the objectives are reasonable, this can be employed as frequently as necessary.

Moving Forward and Reporting Status
Although we are several cycles trying to get here, time has always been critical to the parity effort and we are a few cycles past due. This detail informs the general state of urgency.

Reporting parity is a more of state of what is broken than what it is not. A tag denoting an issue as network-parity related should be incorporated into the blueprints and launchpad bug system.

Two new blueprints need to be written (or resurrected as the case may be) with a realistic and proper design and plan, including testing:
 * multi-host
 * simplified all-in-one with "external access" (wording?)

Both of these issues are largely concerned with resolving remaining migration path issues that are completely unaddressed within nova.

Additional blueprints are required:
 * Provide Flat Network Migration Path
 * Provide FlatDHCP Network Migration Path
 * Provide VLAN Migration Path

These blueprints are the muster points for new code and bugs that directly impact the respective migration path. In addition to these specific migration paths:


 * Real World Migrations from Nova Network

This last blueprint may be defined as a parent or the dependencies may be defined such that the last cannot reasonably be completed without the others also being completed.

The blueprints for the specific areas are important for several reasons:
 * it brings the specific efforts of the principals *in process" and allows the activities to be planned for and reported as logical groupings in a given cycle
 * it provides an opportunity for the neutron project as a whole to agree on priorities and relevance of the particular efforts

These blueprints cannot be placeholders, but must be properly defined bodies of work. If it is not possible to define them so that they are first class blueprints that may be approved (or not) in a regular fashion than more work is required in their definition. Dependencies on existing blueprints and bugs should be indicated where relevant. These blueprints should ideally be in a completed state for the spring summit in 2014. Plans and efforts for mitigating risks based on dependencies should be addressed as soon as possible.

Coordination
The experience with Icehouse with respect to parity is that it seemed like everyone was working on something that was parity related, but was not actually for the sake of parity. This is partly the motivation behind organizing parity related efforts as a sort of "team member with ulterior motives" or infiltrator across the relevant teams. It also results renders the co-ordination with the related teams moot, or at the very least innate. The challenge in this approach will be for the principals to maintain focus on parity, in a sense inverting the coordination problem. However, this is more easily a team-dynamics risk for the parity team itself than an organizational behavior issue. Effective mitigation will therefore be dependent on the principals that make up the parity team and cannot realistically be divined before it forms.

>>>>>>>>>>>>>>>>>>>>>>>>>>>>

What is beyond this point is historical and needs revision.

Background

 * Early Parity Discussion Document

Tasks

 * Icehouse Summit QA Neutron session etherpad
 * Icehouse Summit on testing with mulitple nodes
 * Icehouse Summit on negative testing

Tasks
bzs... don't you wish you could suck content from gerrit.

Performance
Default implementation (openvswitch)

Configuration
e.g. How long does it take for a floating IP to take effect

Scalability

 * node counts
 * network counts
 * tenant counts
 * dhcp agents
 * l3 agents
 * multiple processes
 * metrics

HA Options

 * answer to the multi-host (fault isolation) question
 * real/better HA

API Integration
NovaNetNeutronRecipes

The FlatNetworkManager (thanks rkukura for spelling this out!): (There are interop bugs related to security groups that may break this!)


 * flat networks
 * flat networks

(would this work for FlatDHCPNetworkManager as well?) (the aforementioned security group related issue would not affect the flat dhcp scenario the same way)

Multi-Host

 * Icehouse summit Distributed Router (possible multi-host approach)

Background

 * Early Parity Discussion Document

Tasks

 * Recipes

Background

 * Icehouse Summit QA Neutron session etherpad
 * Icehouse Summit on testing with mulitple nodes
 * Icehouse Summit on negative testing