Jump to: navigation, search

Difference between revisions of "Nova"

Line 2: Line 2:
 
== Nova Cloud Review ==
 
== Nova Cloud Review ==
 
The purpose of this document is to capture the pluses and minuses of using Nova's code as a part of Cloud servers v2
 
The purpose of this document is to capture the pluses and minuses of using Nova's code as a part of Cloud servers v2
 +
 +
=== Glossary ===
 +
* Public API Servers - Know as "Nucleus" in cloud servers v1 or "Cloud Controller" in Eucalyptus.
 +
* Pod - A group of physical host nodes. Known as "QB" in cloud server v1 or "Cluster Controller" in Eucalyptus.
 +
* Nodes - individual physical hosts in a Pod
  
 
=== What's Done ===
 
=== What's Done ===
 
* Scalable and elastic architecture - fully message based and asynchronous
 
* Scalable and elastic architecture - fully message based and asynchronous
 
* Many months ahead of us
 
* Many months ahead of us
* written in good Python
+
* Written in good Python
* open source and it appears that they will be following an open development model
+
* Open source and it appears that they will be following an open development model
* have stubbed out all components for testing
+
* Have stubbed out all components for testing
 
* Actually write SSH keys and authorized_keys properly
 
* Actually write SSH keys and authorized_keys properly
 +
* openldap based authentication and authorization
 
* All functionality is created via an adapter model, so implementations (for instance, storage backends, messaging backends, etc) can be swapped out as needed
 
* All functionality is created via an adapter model, so implementations (for instance, storage backends, messaging backends, etc) can be swapped out as needed
  
 
=== What Needs to be Done ===
 
=== What Needs to be Done ===
 +
* Create a layer inside Nova that would be able to distinguish between different pods
 +
** Currently, the [[CloudController]] class in /endpoints/cloud.py represents a mixture of a public API server and a pod controller
 +
*** The [[CloudController]] class receives public API requests and sends messages to the nodes to perform actions
 +
*** Separate out the receipt and translation of public API requests to a separate APIServer class
 +
*** Separate out the transmission of private action messages to a [[PodController]] class
 
* Detach from Amazon/Eucalyptus specifics and make some things more generic
 
* Detach from Amazon/Eucalyptus specifics and make some things more generic
 
** API:  We need to add the Rackspace API, and a caching layer
 
** API:  We need to add the Rackspace API, and a caching layer
 
*** It is not reasonable for us to use the Amazon API, we would be unable to innovate and would constantly be catch up
 
*** It is not reasonable for us to use the Amazon API, we would be unable to innovate and would constantly be catch up
*** We would also need to add a distinct API for each service we layer on top, so they can be used  with either the ec2 or racjkspace API's
+
*** We would also need to add a distinct API for each service we layer on top, so they can be used  with either the ec2 or rackspace API's
 
* AOE
 
* AOE
 
** Definitely needs to be adapted for other services like [[CloudFiles]], gluster, etc
 
** Definitely needs to be adapted for other services like [[CloudFiles]], gluster, etc
Line 30: Line 41:
 
** see /adminclient.py
 
** see /adminclient.py
 
* Only supports AMIs, we should add OVA support
 
* Only supports AMIs, we should add OVA support
* Requires use of euca2ools, which are tainted
+
* Requires use of euca2ools, which are tainted, we need a set of ova tools and possibly a clean room rewrite of the AMI tools, if we care
 
* Overarching documentation is sparse (though the code comments are pretty decent)
 
* Overarching documentation is sparse (though the code comments are pretty decent)
 
* twisted (and Python) is, by nature, single-core, so it *may* be a bottleneck, but that remains to be demonstrated
 
* twisted (and Python) is, by nature, single-core, so it *may* be a bottleneck, but that remains to be demonstrated
Line 39: Line 50:
 
** Puppet, Chef, or even a DKVS
 
** Puppet, Chef, or even a DKVS
 
** The "flavors" are hardcoded in /compute/node.py (grep for INSTANCE_TYPES)
 
** The "flavors" are hardcoded in /compute/node.py (grep for INSTANCE_TYPES)
 
+
* While there is decent unittest coverage, there is no real systems testing or documentation of plans for one
 +
** There would need to be a good chunk of code written to automate the testing of pod deployments, the testing of network partitions, and more
 
=== Unknowns ===
 
=== Unknowns ===
  
 
* Asked jm to take a looksie into any possible Windows issues with the code base (in using Windows as a host with Hyper-V? Not sure what this means)
 
* Asked jm to take a looksie into any possible Windows issues with the code base (in using Windows as a host with Hyper-V? Not sure what this means)
 
** We know that ssh keys will not work with windows, so another method is necessary
 
** We know that ssh keys will not work with windows, so another method is necessary

Revision as of 21:01, 10 June 2010

Nova Cloud Review

The purpose of this document is to capture the pluses and minuses of using Nova's code as a part of Cloud servers v2

Glossary

  • Public API Servers - Know as "Nucleus" in cloud servers v1 or "Cloud Controller" in Eucalyptus.
  • Pod - A group of physical host nodes. Known as "QB" in cloud server v1 or "Cluster Controller" in Eucalyptus.
  • Nodes - individual physical hosts in a Pod

What's Done

  • Scalable and elastic architecture - fully message based and asynchronous
  • Many months ahead of us
  • Written in good Python
  • Open source and it appears that they will be following an open development model
  • Have stubbed out all components for testing
  • Actually write SSH keys and authorized_keys properly
  • openldap based authentication and authorization
  • All functionality is created via an adapter model, so implementations (for instance, storage backends, messaging backends, etc) can be swapped out as needed

What Needs to be Done

  • Create a layer inside Nova that would be able to distinguish between different pods
    • Currently, the CloudController class in /endpoints/cloud.py represents a mixture of a public API server and a pod controller
      • The CloudController class receives public API requests and sends messages to the nodes to perform actions
      • Separate out the receipt and translation of public API requests to a separate APIServer class
      • Separate out the transmission of private action messages to a PodController class
  • Detach from Amazon/Eucalyptus specifics and make some things more generic
    • API: We need to add the Rackspace API, and a caching layer
      • It is not reasonable for us to use the Amazon API, we would be unable to innovate and would constantly be catch up
      • We would also need to add a distinct API for each service we layer on top, so they can be used with either the ec2 or rackspace API's
  • AOE
    • Definitely needs to be adapted for other services like CloudFiles, gluster, etc
  • Defaults of VLANs could be changed
    • though you can manually allocate IPs or use DHCP (see /compute/network.py)
  • Functionality needed by hosting providers
    • Metrics
      • CPU, memory, disk usage, network RX/TX
      • but, again, the backend storage is already taken care of...
  • Billing
    • Need to define the billable events in a model
  • Admin client is AWS-specific and needs an adapter interface
    • see /adminclient.py
  • Only supports AMIs, we should add OVA support
  • Requires use of euca2ools, which are tainted, we need a set of ova tools and possibly a clean room rewrite of the AMI tools, if we care
  • Overarching documentation is sparse (though the code comments are pretty decent)
  • twisted (and Python) is, by nature, single-core, so it *may* be a bottleneck, but that remains to be demonstrated
  • No support for gluster or drbd, but there are adapters for plugging such functionality into the app domain
  • Add an endpoint so different compute clusters can be discovered for different clusters, especially when distributed geographically.
  • Configuration management is almost non-existent
    • Need to plugin/adapt the configuration retrieval
    • Puppet, Chef, or even a DKVS
    • The "flavors" are hardcoded in /compute/node.py (grep for INSTANCE_TYPES)
  • While there is decent unittest coverage, there is no real systems testing or documentation of plans for one
    • There would need to be a good chunk of code written to automate the testing of pod deployments, the testing of network partitions, and more

Unknowns

  • Asked jm to take a looksie into any possible Windows issues with the code base (in using Windows as a host with Hyper-V? Not sure what this means)
    • We know that ssh keys will not work with windows, so another method is necessary