Jump to: navigation, search

Difference between revisions of "Trove/guest agent communication"

Line 32: Line 32:
=== Phase 2 ===
Change Trove to contact trove-conductor for status on the instance and no longer check the database.
=== Future direction ===
=== Future direction ===

Revision as of 18:27, 19 August 2013

Guest agent communication

There are the following issues with the guest agents communication method.

  1. Heartbeats are directly written to the infrastructure database
  2. Multiple heartbeats used when one will do. There is no need to send a heartbeat saying the agent is up and one for the database state. We know the agent is up if it is reporting the database status.
  3. Uses the same transport as the rest of the infrastructure and right now only rabbitmq.
  4. Multiple security implications with current design

Phase 1

Remove guest agent from writing heartbeats directly to infra database

Remove heartbeats being written to the database directly via guest agent. This would solve a huge security issue along with cutting down on connection load to the infra db.

Kenneth Wilke proposed this blueprint https://blueprints.launchpad.net/trove/+spec/taskmanager-statusupdate

It was decided in the weekly meeting to not extend task manager to handle this but write another manager called trove-conductor that would handle heartbeat updates to the infra database. It was expressed in the meeting that it would be nice to store to data store besides the infra database, while those ideas were accepted it was noted as out of scope for first pass.



One heartbeat should be sent every 60 seconds. There is a possibility of the guest-agent being able to send an ad-hoc heartbeat if it were to detect a status change in any of the components it reporting on. For example it could send a heartbeat saying mysql is active every 60 seconds but the guest agent could check MySQL locally every 10. With this option if there was a state change in MySQL e.g. it crashed 20 seconds into the next heartbeat the status could be updated in an ad-hoc update immediately. This should cut down on heartbeat chatter while still being flexible enough to reflect status changes immediately.

Heartbeat message definition

Regardless of transport used the heartbeat message should always be in json. Attribute "heartbeat" should be an array that way it is extensible in the future.

       "heartbeat": [
           { "agentstatus":"online" , "mysqlstatus":"running" }

Phase 2

Change Trove to contact trove-conductor for status on the instance and no longer check the database.

Future direction

Ideas to finish writing about

  1. Write heartbeats to a different data store, api's servers would need updates to talk to trove-conductor for status information.
  2. Allow for expanded information to be passed back in the heartbeat. Ideas tossed around were MySQL version and/or guest agent version but the possibilities would be endless. With this it was thought that we could possible create an in memory reporting database to gather information on instance w/o having to query all instances. The idea of keeping it in memory is because non of the information is stored for historical value and if it was all lost it would be populated in one heartbeat cycle. There was a question brought up on what happens when an instance shows up as no in the in memory data store, response was if it doesnt show up it must not be on-line.

  1. Move all agent communication to work though the trove-conductor
  2.  ????
  3. Profit

Yea need to finish up this section :D