Jump to: navigation, search

Difference between revisions of "Designate/MdnsScalability"

(Possible Solutions)
(3. Caching for MiniDNS)
Line 58: Line 58:
 
==== 3. Caching for MiniDNS ====
 
==== 3. Caching for MiniDNS ====
  
TODO
+
Having a cache at the MiniDNS level that cached all of the information it received would help to minimize database queries. SOA queries could be answered directly from the cache, and new zones could only be transferred at MiniDNS' direction.
 +
 
 +
===== Possible Issues =====
 +
# The cache may get out of sync and serve incorrect information in the event of a loss of database communication. If an update to a zone, or new zone information was added to the database while MiniDNS in some location had no access, the cache couldn't get the information until the next periodic zone sync.
 +
# If the cache were to die, it would have to be rebuilt. In the intervening time, database queries from MiniDNS may overload the database, or build up to a point where something falls over.
 +
 
 +
If the cache were persistent and globally replicated from the "control" data center, it might be possible to mitigate some of these issues. (That kind of sounds like a database)
  
 
==== 4. Do it all ====
 
==== 4. Do it all ====
 
There isn't a reason a NoSQL database couldn't be local to every deployment of MiniDNS/DNS servers with the agent.
 
There isn't a reason a NoSQL database couldn't be local to every deployment of MiniDNS/DNS servers with the agent.

Revision as of 14:46, 15 August 2014

The Problem

Having MiniDNS be the master for many (Bind9, maybe other) DNS servers presents some problems. Specifically for large deployments.

An assumption: MiniDNS will reading the database a lot.

  • Zone transfers will mean getting the information from the database
  • Every time a refresh comes up for a zone on the slave DNS servers, MiniDNS will need to find out the current serial and send it back


The second case is the most worrisome. A (very) large DNS deployment can have DNS masters that handle upwards of 2000 refresh requests per second. This, along with the other requests coming from the Designate end means that MiniDNS would be

  • Handling a massive load of operations
  • Making a lot of database calls.


If MiniDNS were to be deployed (and replicated) in many datacenters around the world, this large load is multiplied. With the single point of answers is the Master database, according to our current architecture, in one "control" datacenter. Handling this kind of load with systems all over the world is prone to failure.

tl;dr: Two issues: 1. Reading a database in a single location thousands of times a second from many places across the planet is going to be a headache. 2. MiniDNS is going to be handling a metric ton of requests from multiple directions.

Possible Solutions

1. Replicate that Database!

It's possible to put a read-only slave of the master database everywhere that MiniDNS is going to be. This would eliminate the latency issues and probable loss of connectivity issues between MiniDNS and the database (which would spell big problems for the underlying DNS servers if the outage persisted).

The issue then becomes keeping that local database. Replicating a MySQL database trans-continental is fraught with peril. Especially when the number of writes becomes high. Performance takes a major hit as the number of writes rises. Other issues, like slaves getting out of sync and needing to be re synced when cross-DC communication is less than optimal are reasons to consider alternatives to traditionally replicating across large distances.

An actual possible solution

Designate could implement a NoSQL database system like Cassandra or MongoDB for its "storage" that would allow for a more effective replication strategy. This would require major additions to Designate. But the database could be replicated on a larger scale without as much heartburn, or the performance issues.

The approach here is to make it possible for the Database to handle all of the requests it will need to from Designate.


2. A New Agent

The current Bind9 "Agent" could be reimplemented to do the following:

  • Sit on a Master Bind9 server
  • Create zone files
  • Issue RNDC calls to add and delete zonefiles
  • Receive NOTIFYs from MiniDNS
  • Initiate AXFR/IXFR zone transfer requests
  • Replace zone files


This agent would sit on Master DNS servers. Technically it could sit on every DNS server, but the intent is that Bind9 would have a master-slave set up, and the agent would only service the master servers. The slaves would get sync up with the masters without any help from Designate.


This has a few benefits:

  • The large number of refresh requests from DNS slaves are handled easily by the Master, as it has a copy of all the zone files, and the database need not be involved
  • The only database communication that must happen is MiniDNS getting new zone/update information (this may negate the need to replicate).
  • The only communication between MiniDNS and the DNS servers are zone transfers and making sure the transfer worked. (Creates/Deletes can be pool manager plugin <-> agent actions)
  • The number of slaves to monitor/communicate with/manage goes down to just dealing with a master or two. This makes the process of checking for live changes much easier/less prone to failure.
  • This doesn't (shouldn't) break any contracts. The agent will act like a DNS server as far as MiniDNS is concerned, and this can be a second Bind9 pool manager plugin.


The approach here is to minimize the communication between MiniDNS and the database, as well as MiniDNS and the Slave servers

3. Caching for MiniDNS

Having a cache at the MiniDNS level that cached all of the information it received would help to minimize database queries. SOA queries could be answered directly from the cache, and new zones could only be transferred at MiniDNS' direction.

Possible Issues
  1. The cache may get out of sync and serve incorrect information in the event of a loss of database communication. If an update to a zone, or new zone information was added to the database while MiniDNS in some location had no access, the cache couldn't get the information until the next periodic zone sync.
  2. If the cache were to die, it would have to be rebuilt. In the intervening time, database queries from MiniDNS may overload the database, or build up to a point where something falls over.

If the cache were persistent and globally replicated from the "control" data center, it might be possible to mitigate some of these issues. (That kind of sounds like a database)

4. Do it all

There isn't a reason a NoSQL database couldn't be local to every deployment of MiniDNS/DNS servers with the agent.