Difference between revisions of "Sahara/WhyNotHeat"
Line 1: | Line 1: | ||
− | == 1. The first question is “Why doesn’t Savanna use Heat to provision VMs?” == | + | ==1. The first question is “Why doesn’t Savanna use Heat to provision VMs?”== |
+ | |||
+ | Generally using Heat underneath for infrastructure provisioning looks reasonable. In a tactic perspective there are few factors making Heat usage underneath Savanna problematic: | ||
+ | * Heat stability for Grizzly release. Savanna currently maintains Grizzly+ compatibility. | ||
+ | * Installation of large Hadoop clusters (100+ nodes). Will be addressed by proposed architecture changes. Currently Heat uses eventlet to perform concurrent provisioning. But it’s still limited to one engine. | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
* Anti-affinity support for HDFS redundancy in cloud environment | * Anti-affinity support for HDFS redundancy in cloud environment | ||
− | * Circular dependencies - we should generate ‘/etc/hosts’ for all | + | * Circular dependencies - we should generate ‘/etc/hosts’ for all instances in provisioned cluster. We can’t use cloud init for this directly. There are a couple possible solutions using Heat, but none of them looks like a straightforward solution. |
− | instances in provisioned cluster. We can’t use cloud init for this | + | |
− | directly. There are a couple possible solutions using Heat, but none | + | |
− | of them looks like a straightforward solution. | + | * Level of complexity. We try to keep things as simple as possible. Adding extra layer will increase overall complexity of the solution. In addition both Savanna and Heat under active development changing lots of internals and even APIs and will require extra effort to coordinate. |
− | * Level of complexity. We try to keep things as simple as possible. | + | |
− | Adding extra layer will increase overall complexity of the solution. | + | Once Heat supports all the features we need, it will make sense to use Heat to provision VMs for Savanna. Here is what we’ll do: |
− | In addition both Savanna and Heat under active development changing | + | - Create a wiki page with text from this email |
− | lots of internals and even APIs and will require extra effort to | + | - Create a list of requirements for Heat |
− | |||
− | Once Heat fulfills all | + | Once Heat fulfills all these requirements we will be able and should use Heat for VM provisioning. |
− | Heat for VM provisioning. | ||
− | == 2. Let’s answer the second question - why we need Savanna? Can’t we use Heat to do what Savanna does? == | + | ==2. Let’s answer the second question - why we need Savanna? Can’t we use Heat to do what Savanna does?== |
− | * Savanna provides bunch of Hadoop-specific features. It’ll be hard to | + | * Savanna provides bunch of Hadoop-specific features. It’ll be hard to provide them as Heat plugin |
− | provide them as Heat plugin | + | * Savanna provides Hadoop-specific APIs and functionality. Heat use cases are mostly around provisioning/deployment. |
− | * Savanna provides Hadoop-specific APIs and functionality. Heat use | + | * Savanna provides integration with various Hadoop distributions through pluggable mechanism |
− | cases are mostly around provisioning/deployment. | ||
− | * Savanna provides integration with various Hadoop distributions | ||
− | through pluggable mechanism | ||
Now, more details on each item. | Now, more details on each item. | ||
Hadoop specific features: | Hadoop specific features: | ||
− | + | - Tight Swift integration. Hadoop can read and write from/to Swift object storage. Savanna provides required configs for the Hadoop cluster. | |
− | object storage. Savanna provides required configs for the Hadoop | + | - Usage of anti-affinity to preserve data-redundancy of HDFS nodes |
− | cluster. | ||
− | |||
Hadoop-specific APIs and functionality: | Hadoop-specific APIs and functionality: | ||
− | + | - Hadoop cluster scaling | |
− | + | - Elastic Data Processing: https://wiki.openstack.org/wiki/Savanna/EDP | |
Integration with Hadoop distributions through pluggable mechanism: | Integration with Hadoop distributions through pluggable mechanism: | ||
− | - Usually Hadoop cluster deployment is a multi-step operation. First | + | - Usually Hadoop cluster deployment is a multi-step operation. First step is to install management console (for instance Apache Ambari). Second step is to communicate with management console through REST API to provision Hadoop on the cluster. Savanna wraps all this operations under well-defined API. |
− | step is to install management console (for instance Apache Ambari). | ||
− | Second step is to communicate with management console through REST API | ||
− | to provision Hadoop on the cluster. Savanna wraps all this operations | ||
− | under well-defined API. | ||
− | I hope all the items above explain why we need Savanna as a separate | + | I hope all the items above explain why we need Savanna as a separate OpenStack service. |
− | OpenStack service. | ||
− | == 3. Why can’t Savanna be used as a plugin for Heat? == | + | ==3. Why can’t Savanna be used as a plugin for Heat?== |
It should be and it will be someday. | It should be and it will be someday. |
Revision as of 13:31, 30 July 2013
1. The first question is “Why doesn’t Savanna use Heat to provision VMs?”
Generally using Heat underneath for infrastructure provisioning looks reasonable. In a tactic perspective there are few factors making Heat usage underneath Savanna problematic:
- Heat stability for Grizzly release. Savanna currently maintains Grizzly+ compatibility.
- Installation of large Hadoop clusters (100+ nodes). Will be addressed by proposed architecture changes. Currently Heat uses eventlet to perform concurrent provisioning. But it’s still limited to one engine.
- Anti-affinity support for HDFS redundancy in cloud environment
- Circular dependencies - we should generate ‘/etc/hosts’ for all instances in provisioned cluster. We can’t use cloud init for this directly. There are a couple possible solutions using Heat, but none of them looks like a straightforward solution.
- Level of complexity. We try to keep things as simple as possible. Adding extra layer will increase overall complexity of the solution. In addition both Savanna and Heat under active development changing lots of internals and even APIs and will require extra effort to coordinate.
Once Heat supports all the features we need, it will make sense to use Heat to provision VMs for Savanna. Here is what we’ll do: - Create a wiki page with text from this email - Create a list of requirements for Heat
Once Heat fulfills all these requirements we will be able and should use Heat for VM provisioning.
2. Let’s answer the second question - why we need Savanna? Can’t we use Heat to do what Savanna does?
- Savanna provides bunch of Hadoop-specific features. It’ll be hard to provide them as Heat plugin
- Savanna provides Hadoop-specific APIs and functionality. Heat use cases are mostly around provisioning/deployment.
- Savanna provides integration with various Hadoop distributions through pluggable mechanism
Now, more details on each item. Hadoop specific features: - Tight Swift integration. Hadoop can read and write from/to Swift object storage. Savanna provides required configs for the Hadoop cluster. - Usage of anti-affinity to preserve data-redundancy of HDFS nodes
Hadoop-specific APIs and functionality: - Hadoop cluster scaling - Elastic Data Processing: https://wiki.openstack.org/wiki/Savanna/EDP
Integration with Hadoop distributions through pluggable mechanism: - Usually Hadoop cluster deployment is a multi-step operation. First step is to install management console (for instance Apache Ambari). Second step is to communicate with management console through REST API to provision Hadoop on the cluster. Savanna wraps all this operations under well-defined API.
I hope all the items above explain why we need Savanna as a separate OpenStack service.
3. Why can’t Savanna be used as a plugin for Heat?
It should be and it will be someday.