Obsolete:UnifiedServiceArchitecture
- Launchpad Entry: NovaSpec:unified-service-architecture
- Created: 2010-10-25
- Contributors: termie
Contents
Summary
The approaches to launching a long-running service in Nova are currently disparate, relying on a variety of underlying platforms and libraries and creating an abundance of integration and adaptation issues.
Currently in use are:
- Twisted's Application/Service, along with `twistd` (maintaining custom version of Twisted)
- Twisted's Application/Service using alternate custom approaches
- Twisted's Web
- eventlet's wsgi
- the python-daemon library
This proposal suggests standardizing on eventlet only and providing a common toolset for all services. It also suggests standardizing the interface provided to the system by the services to directly interact with common supervisor tools.
Release Note
Dependencies on Twisted and python-daemon have been removed as of this release, services are now all based on eventlet and are expected to be managed by a supervisor process (e.g. initd, upstart or daemontools) on your production systems.
Additionally, for use in testing and development, `bin/nova-combined` has been added to support running the entire nova system in a single process.
Rationale
eventlet vs Twisted
Multiple underlying platforms has led to considerable duplication of effort and many sections of code that have fallen into legacy status despite being such a young project.
As frequent development hurdles have been run into directly related to the complexity of doing most of the simple operations with Twisted's patterns, Twisted should be removed in favor of eventlet, whose patterns more consistently meet the expectations of developers for simple tasks.
For more complex tasks there will still be no substitute for deep understanding of the platform, but eventlet is expected to lower the learning curve significantly.
Additionally, while Twisted's twistd tool and current usage of it provided many useful features during initial development (uid management, logging, pidfiles and daemonization to name a few), those features are either better handled by a supervisor (uid management, daemonization, pidfiles), or easily available in eventlet (logging) without the overhead or hacks required to use twistd.
Nova is also maintaining a custom version of Twisted to make use of patches in Twisted's core that have not made it into released packages resulting in wasted time for developers and end-users who have to install this custom package.
In a quick investigation into the extent of such a change, it was found that in almost all cases code was switched to eventlet simply by the removal of boilerplate code used by Twisted.
supervisors vs daemonization
There are a variety of powerful, common and reliable supervisor systems in use and all of them handle the initialization, setting uid, dealing with output, starting, stopping and restarting of services in standard ways that are already well integrated with the underlying operating system with no need for Nova to maintain its own tools or depend on libraries to handle the same use cases.
Providing a simple wrapper for use in testing and development is all Nova needs to do.
User stories
Assumptions
- Twisted is not providing any desired and used features that cannot be implemented reasonably quickly in eventlet.
- eventlet's patterns for handling non-blocking calls are intuitively closer than Twisted's to what most developers expect to write in the vast majority of cases.
Design
There are two fundamental types of services involved in Nova, System services and Web services, however the code required to launch either can be roughly the same.
This design will walk through the various stages at which a developer interacts with the service layer. All in all it very similar to the majority of the current modern services and is simply requiring that all services conform.
The Binary
As is the case in a majority of System services already, the actual code required to 'run' the service in the bin/nova-* script can be the same in all cases with the appropriate flag being looked up to determine which Manager class the Service wrapper will use.
Web services will use a similar pattern but with a simplified Service wrapper that is only meant to deal with WSGI.
The Service Wrappers
These will remain largely as they exist now and follow the form:
- <classmethod> Service.create(): Determine service to run based on given flags and binary, and return the Service instance.
- Service.start(): Instantiate the manager class, and push its main loop, (which for a System service is listening to the queue and for a Web service is listening to the web) and set up the monitoring interfaces (currently using a heartbeat mechanism).
- Service.stop(): Disconnect from queue and attempt a clean shutdown of the service.
- Proxying messages from the queue to the Manager.
- Signal handling to detect hard and soft shutdowns.
In most cases the developer will have no interaction with the Service wrapper, all the interaction will take place on the Manager level and by setting a flag of the form `<servicename>_manager`
The Manager
This is basically freeform with a minimal interface consisting of a couple optional hooks the Service can use to allow the Manager to register periodic tasks when the Manager is loaded in a Service context.
A System Manager consists of methods triggered by the queue in the Service context or by another Manager outside of the Service context.
A Web Manager consists of WSGI handlers triggered off of web requests.
Soft Shutdowns
Some functionality involves calling methods that may take a long time to complete, it is suggested that the Service layer track calls taking place in the Manager layer and wait to exit until they have completed.
Idempotent Restarts
Being managed by a supervisor has the benefit of being able to restart your process should it die for any reason, the Service layer and the Manager layer should do their best to recover from any failures by ensuring the system is in a sane state when restarting.
Implementation
UI Changes
The UI in this case is the command-line.
Various twistd and daemonization related flags will be removed, for example --guid and --pidfile, as that functionality will be expected from the supervisor.
Code Changes
In almost all cases the move from Twisted to eventlet involves simply removing `yield` and `@defer.inlineCallbacks` wherever they are present, along with replacing `defer.returnValue(x)` with `return x`.
In a few cases there is use of twisted.task.LoopingCall which is easily replaced with an eventlet version (see nova.utils.LoopingCall in the related branch)
The larger pieces of work will be:
- Replacing the nova.process.ProcessPool pieces simply because its return signature and error conditions are fairly varied, but eventlet's subprocess module should handle the basic functionality fine.
- Moving Objectstore off of twisted.web and onto an eventlet/wsgi based system. It should be noted however that Objectstore is very nearly deprecated so this may not be required.
- Various testing code uses Twisted's Trial's SkipTest feature of which there is no specific eventlet feature, however the feature is largely superfluous and could be handled conditionally if `nosetests` were being used.
- Testing code in nova.test would generally have to be mildly adapted not to expect Deferreds.
Related branch: http://code.launchpad.net/~termie/nova/eventlet_merge
Migration
Code only change, no data migration.
While it is possible to run eventlet and Twisted side by side, the goal of this proposal is to standardize _all_ the services and would preferably take place all at once so that no time is wasted writing code to adapt twisted's defer.inlineCallback results to eventlet.
Test/Demo Plan
Normal unit tests are expected to run.
If smoketests are still applicable they should be functional as well without changes.
Unresolved issues
- It is unclear whether subprocesses, multiple processes, multiple threads or multiple greenthreads will be the preferred way to handle some kinds of long tasks. Eventlet does provide straightforward ways of handling all but the multiple processes solution, and will probably be sufficient.
- Existing integration testing via smoketests may be insufficient to locate all bugs related to the transition so it may take longer to verify that the transition is complete.
BoF agenda and discussion
Use this section to take notes during the BoF; if you keep it in the approved spec, use it for summarising what was discussed and note any options that were rejected.