Difference between revisions of "HeterogeneousArchitectureScheduler"
Line 43: | Line 43: | ||
euca-run-instances -t p7f.grande -k fred-keypair emi-12345678 | euca-run-instances -t p7f.grande -k fred-keypair emi-12345678 | ||
euca-run-instances -t m1.xlarge -k fred-keypair emi-87654321 | euca-run-instances -t m1.xlarge -k fred-keypair emi-87654321 | ||
+ | euca-run-instances -t "m1.xlarge;xpu=gpu;xpus=3" -k fred-keypair emi-87654321 | ||
</nowiki></pre> | </nowiki></pre> | ||
Line 50: | Line 51: | ||
The assumption is that [[OpenStack]] runs on the target hardware architecture or on a proxy running on behalf of the target hardware architecture. See related blueprints above for what our team is doing. | The assumption is that [[OpenStack]] runs on the target hardware architecture or on a proxy running on behalf of the target hardware architecture. See related blueprints above for what our team is doing. | ||
− | We also assume that | + | We also assume that instance_type_metadata is created. cpu_info, xpu_arch, xpu_info, xpus, net_arch, net_info, and net_mbps as attributes added to instance_type_metadata and instance_metadata. |
== Design == | == Design == | ||
Line 59: | Line 60: | ||
* cpu_arch, xpu_arch, and net_arch are intended to be high-level label switches for fast row filtering (like "i386" or "fermi" or "infiniband"). | * cpu_arch, xpu_arch, and net_arch are intended to be high-level label switches for fast row filtering (like "i386" or "fermi" or "infiniband"). | ||
* xpus and net_mbps are treated as quantity fields exactly like vcpus is used by schedulers | * xpus and net_mbps are treated as quantity fields exactly like vcpus is used by schedulers | ||
− | * | + | * The cpu_info, xpu_info, and net_info follows the instance_migration branch example using a json formatted string to capture arbitrary configurations. |
The basic scheduler flow through nova is as follows: | The basic scheduler flow through nova is as follows: | ||
− | # nova-compute starts on a host and registers architecture, accelerator, and networking capabilities in | + | # nova-compute starts on a host and registers architecture, accelerator, and networking capabilities to the zone_manager (scheduler/zone_manager.py). The data is stored in memory (not in database). The capability information is refreshed periodically (default is 1 minute). |
− | # nova-api receives a run-instances request with instance_type string "m1.small" or " | + | # nova-api receives a run-instances request with instance_type string "m1.small" or "m1.small;xpu_arch=fermi;xpus=2". |
− | # nova-api passes instance_type to compute/api.py create() from api/ec2/cloud.py run_instances() or api/openstack/servers.py create(). | + | # nova-api passes instance_type to compute/api.py create() from api/ec2/cloud.py run_instances() or api/openstack/servers.py create(). |
− | # nova-api compute/api.py create() reads from instance_types table and adds rows to instances table. | + | # nova-api compute/api.py create() reads from instance_types table and adds rows to instances table. The cpu_arch is stored in instance table and other information is stored in instance_metadata table. The instance_metadata table has a field that contains the corresponding instance entry id in instance table. |
# nova-api does an rpc.cast() to scheduler num_instances times, passing instance_id. No change here. | # nova-api does an rpc.cast() to scheduler num_instances times, passing instance_id. No change here. | ||
− | # '''nova-scheduler as architecture scheduler selects compute_service host that matches the options specified in the instance table fields. The arch scheduler filters available compute_nodes by cpu_arch, cpu_info, xpu_arch, xpu_info, xpus, net_arch, net_info, and net_mbps with the same fields. ''' | + | # '''nova-scheduler as architecture scheduler selects compute_service host that matches the options specified in the instance table and instance_metadata fields. The arch scheduler filters available compute_nodes by cpu_arch, cpu_info, xpu_arch, xpu_info, xpus, net_arch, net_info, and net_mbps with the same fields. ''' |
# nova-scheduler rpc.cast() to each selected compute service. | # nova-scheduler rpc.cast() to each selected compute service. | ||
# nova-compute receives rpc.cast() with instance_id, launches the virtual machine, etc. | # nova-compute receives rpc.cast() with instance_id, launches the virtual machine, etc. | ||
Line 90: | Line 91: | ||
</nowiki></pre> | </nowiki></pre> | ||
+ | |||
+ | User gives input argument as an extended string. Examples are "cg1.small" and "m1.small;xpu=fermi;xpus=3". | ||
=== Code Changes === | === Code Changes === | ||
Line 99: | Line 102: | ||
- def hosts_up_with_arch(self, context, topic, instance_id): | - def hosts_up_with_arch(self, context, topic, instance_id): | ||
- def schedule(self, context, topic, *_args, **_kwargs): | - def schedule(self, context, topic, *_args, **_kwargs): | ||
− | * | + | * api/ec2/cloud.py |
− | |||
− | |||
− | |||
− | |||
− | |||
=== Migration === | === Migration === | ||
Line 116: | Line 114: | ||
== Unresolved issues == | == Unresolved issues == | ||
− | + | None. | |
== BoF agenda and discussion == | == BoF agenda and discussion == |
Revision as of 22:27, 23 May 2011
- Launchpad Entry: NovaSpec:schedule-instances-on-heterogeneous-architectures
- Created: Brian Schott
- Maintained:Jinwoo "Joseph" Suh
- Contributors: USC Information Sciences Institute
Summary
Nova should have support for cpu architectures, accelerator architectures, and network interfaces and be able to route run_instances() requests to a compute node capable of running that architecture. This blueprint is dependent on the schema changes described in HeterogeneousInstanceTypes blueprint. The target release for this is Diablo. A stable test branch and deployment available now.
The USC-ISI team has a functional prototype here:
- https://code.launchpad.net/~usc-isi/nova/hpc-trunk (more up-to-date version)
- https://code.launchpad.net/~usc-isi/nova/hpc-testing (more stable version)
This blueprint is related to the HeterogeneousInstanceTypes blueprint here:
We are also drafting blueprints for three machine types:
- http://wiki.openstack.org/HeterogeneousGpuAcceleratorSupport
- http://wiki.openstack.org/HeterogeneousSgiUltraVioletSupport
- http://wiki.openstack.org/HeterogeneousTileraSupport
An etherpad for discussion of this blueprint is available at http://etherpad.openstack.org/heterogeneousarchitecturescheduler
Release Note
Nova has been extended to allow deployments to advertise and users to request specific processor, accelerator, and network interface options using instance_types (or flavors) as the primary mechanism. This blueprint is for a scheduler plugin that supports routing run_instance requests to the appropriate physical compute node.
Rationale
See HeterogeneousInstanceTypes. The short answer is that real deployments will have heterogeneous resources.
There are several related blueprints:
User stories
See HeterogeneousInstanceTypes.
George has two different processing clusters, one x86_64, the other Power7. These two run_instances commands need to go to the appropriate compute nodes. In addition, nova should prevent a user from inadvertently specifying an x86_64 machine image to run on a Power7 compute node or vice-versa. The scheduler should check for inconsistencies.
euca-run-instances -t p7f.grande -k fred-keypair emi-12345678 euca-run-instances -t m1.xlarge -k fred-keypair emi-87654321 euca-run-instances -t "m1.xlarge;xpu=gpu;xpus=3" -k fred-keypair emi-87654321
Assumptions
The assumption is that OpenStack runs on the target hardware architecture or on a proxy running on behalf of the target hardware architecture. See related blueprints above for what our team is doing.
We also assume that instance_type_metadata is created. cpu_info, xpu_arch, xpu_info, xpus, net_arch, net_info, and net_mbps as attributes added to instance_type_metadata and instance_metadata.
Design
See HeterogeneousInstanceTypes.
The architecture aware scheduler will compare these additional fields when selecting target compute_nodes for the run_instances request.
- cpu_arch, xpu_arch, and net_arch are intended to be high-level label switches for fast row filtering (like "i386" or "fermi" or "infiniband").
- xpus and net_mbps are treated as quantity fields exactly like vcpus is used by schedulers
- The cpu_info, xpu_info, and net_info follows the instance_migration branch example using a json formatted string to capture arbitrary configurations.
The basic scheduler flow through nova is as follows:
- nova-compute starts on a host and registers architecture, accelerator, and networking capabilities to the zone_manager (scheduler/zone_manager.py). The data is stored in memory (not in database). The capability information is refreshed periodically (default is 1 minute).
- nova-api receives a run-instances request with instance_type string "m1.small" or "m1.small;xpu_arch=fermi;xpus=2".
- nova-api passes instance_type to compute/api.py create() from api/ec2/cloud.py run_instances() or api/openstack/servers.py create().
- nova-api compute/api.py create() reads from instance_types table and adds rows to instances table. The cpu_arch is stored in instance table and other information is stored in instance_metadata table. The instance_metadata table has a field that contains the corresponding instance entry id in instance table.
- nova-api does an rpc.cast() to scheduler num_instances times, passing instance_id. No change here.
- nova-scheduler as architecture scheduler selects compute_service host that matches the options specified in the instance table and instance_metadata fields. The arch scheduler filters available compute_nodes by cpu_arch, cpu_info, xpu_arch, xpu_info, xpus, net_arch, net_info, and net_mbps with the same fields.
- nova-scheduler rpc.cast() to each selected compute service.
- nova-compute receives rpc.cast() with instance_id, launches the virtual machine, etc.
Schema Changes
See HeterogeneousInstanceTypes.
Implementation
The USC-ISI team has a functional prototype: https://code.launchpad.net/~usc-isi/nova/hpc-trunk https://code.launchpad.net/~usc-isi/nova/hpc-testing
UI Changes
Functionality is accessed through selecting the scheduler in nova.conf:
scheduler_driver = nova.scheduler.arch.ArchitectureScheduler
User gives input argument as an extended string. Examples are "cg1.small" and "m1.small;xpu=fermi;xpus=3".
Code Changes
Summary of changes:
- nova/scheduler/arch.py
- Implements the architecture aware scheduler. - def hosts_up_with_arch(self, context, topic, instance_id): - def schedule(self, context, topic, *_args, **_kwargs):
- api/ec2/cloud.py
Migration
Very little needs to be changed in terms of the way deployments will use this if we set sane defaults like "x86_64" as assumed today.
Test/Demo Plan
This need not be added or completed until the specification is nearing beta.
Unresolved issues
None.
BoF agenda and discussion
Use this section to take notes during the BoF; if you keep it in the approved spec, use it for summarising what was discussed and note any options that were rejected.