TopologyService

Summary:

The goal of this blueprint is to provide a top-notch topology service which is, being augmented by a number of adapters, a primary way to provide a solution for failure zone’s centered IaaS deployment.

Assumptions:

I. Any real world DC is heterogeneous structure (hardware, networking, etc.). And if it was designed as homogeneous it diverges in heterogeneous direction in no time
I. DC capacity is limited
I. We are to provide different SLA for instances of different types.

Rationale:

This 3 assumptions make it obvious that simple scheduling approach is not satisfactory. That why we have to implement smart resource placement mechanism. For implementation this kind of mechanism we need to have ability to gather cloud wide node estate information. We propose topology service which by means of topology agents collect and later serve the required information to the scheduler.

The typical client request to deploy IaaS looks like that: I would like to have a system of N nodes consisting from <
> k1 high performant computing nodes (to be preferable deployed on high end machines)<
> k2 web front-ends<
> k3 load balancers<
> k4 DB servers <
> iki=N<
>

where HA for k1 is not needed, while HA for k4 is a must. It’s obvious that provisioning the requested IaaS based on round-robin like scheduler is not possible, provided that DC is not homogeneous. The only solution is to keep track on current

The major task to be solved is hardware/software components inventor

Architecture:

We see the top level architecture in the following way: Main components - topology service

Goal:

Scheduler gets a list of instance creation requests, each of which is instrumented with an integrated SLA attribute. Given the current request, the scheduler asks TS with the SLA attribute, and gets a list of nodes, satisfied in a sub-optimal way to this SLA. Finally the scheduler choose the node from the list more or less blindly and commit the instance provisioning to the node without any additional checks.

Message flow:

scheduler -> topology service->Distributed DB->scheduler->compute->DB<
> additional automated sources (switches, UPS) SNMP->topology service->Distributed DB<
> additional non automated sources (rooms, cooling zones... that we cannot get from software agent on node) admin dashboard->topology service->Distributed DB<
>

Future:

In future, the monitoring service might be augmented by RCA and impact analysis