Trove/guest agent communication

Guest agent communication
There are the following issues with the guest agents communication method.
 * 1) Heartbeats are directly written to the infrastructure database
 * 2) Multiple heartbeats used when one will do.  There is no need to send a heartbeat saying the agent is up and one for the database state.  We know the agent is up if it is reporting the database status.
 * 3) Uses the same transport as the rest of the infrastructure and right now only rabbitmq.
 * 4) Multiple security implications with current design

Remove guest agent from writing heartbeats directly to infra database


Remove heartbeats being written to the database directly via guest agent. This would solve a huge security issue along with cutting down on connection load to the infra db.

Kenneth Wilke proposed this blueprint https://blueprints.launchpad.net/trove/+spec/taskmanager-statusupdate

It was decided in the weekly meeting to not extend task manager to handle this but write another manager called trove-conductor that would handle heartbeat updates to the infra database. It was expressed in the meeting that it would be nice to store to data store besides the infra database, while those ideas were accepted it was noted as out of scope for first pass.

http://eavesdrop.openstack.org/meetings/trove/2013/trove.2013-08-07-20.00.html

http://eavesdrop.openstack.org/meetings/trove/2013/trove.2013-08-07-20.00.log.html

One heartbeat should be sent every 60 seconds. There is a possibility of the guest-agent being able to send an ad-hoc heartbeat if it were to detect a status change in any of the components it reporting on. For example it could send a heartbeat saying mysql is active every 60 seconds but the guest agent could check MySQL locally every 10. With this option if there was a state change in MySQL e.g. it crashed 20 seconds into the next heartbeat the status could be updated in an ad-hoc update immediately. This should cut down on heartbeat chatter while still being flexible enough to reflect status changes immediately.

Heartbeat message definition
Regardless of transport used the heartbeat message should always be in json. Attribute "heartbeat" should be an array that way it is extensible in the future.

{       "instance_id": "uuid of instance", "heartbeat": { "service_status": "running", "service_type": "mysql" }    }

Phase 2. Let taskmanager speak with Trove backend through conductor

 * 1) Change Trove to contact trove-conductor for heartbeat status on the instance and no longer check the database but using RPC calls between the two managers.
 * 2) Exchange all requests from taskmanager to Trove backend to conductor requests (such as asking for DBInstance, DBBackup, etc.)
 * 3) Extend conductor manager to handle required requests from taskmanager.

Phase 3. Let API service speak with Trove backend through conductor


Last phase of conductor integration: let API service speack with Trove backend through conductor. This phase will give an ability to allocate Trove backend communication in one place that is great value add for Trove as scalable database service.

1. Abstract out the status data store in a way that attributes in heartbeat message can be mapped to fields in the data store.

The logic behind providing a way to map array attributes in the heartbeat message is so that this could be extended as each deployment sees fit. This would also allow us to store information that the api currently asks the guest-agent for such as volume space used. The real goal of having a flexible heartbeat message would be to create a data store that could be used for internal reporting purposes.

Phase 4. TBA


Move all agent communication to work though the trove-conductor

Phase 5
Get community to agree on defined message protocol in json for talking to guest agent.

1. First step of protocol outlined in Phase 1 with heartbeat message. 2. Document the rest of the protocol for discussion.