Jump to: navigation, search

Designate/MdnsScalability

< Designate
Revision as of 18:20, 12 August 2014 by Tim Simmons (talk | contribs) (Created page with "=== The Problem === Having MiniDNS be the master for many (bind9, maybe other) DNS servers presents some problems. Specifically for large deployments. An assumption: MiniDNS...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

The Problem

Having MiniDNS be the master for many (bind9, maybe other) DNS servers presents some problems. Specifically for large deployments.

An assumption: MiniDNS will reading the database a lot.

  • Zone transfers will mean getting the information from the database
  • Every time a refresh comes up for a zone on the slave DNS servers, MiniDNS will need to find out the current serial, and if it's higher, transfer the zone.


The second case is the most worrisome. A (very) large DNS deployment can have DNS masters that handle upwards of 2000 refresh requests per second. This, along with the other requests coming from the Designate end means that MiniDNS would be

  • Handling a massive load of operations
  • Making a lot of database calls.


If MiniDNS were to be deployed (and replicated) in many datacenters around the world, this large load is multiplied. With the single point of answers is the Master database, according to our current architecture, in one "control" datacenter. Handling this kind of load with systems all over the world is prone to failure.

tl;dr: Two issues: 1. Reading a database in a single location thousands of times a second from many places across the planet is going to be a headache. 2. MiniDNS is going to be handling a metric ton of requests from multiple directions.

Possible Solutions

1. Replicate that Database!

It's possible to put a read-only slave of the master database everywhere that MiniDNS is going to be. This would eliminate the latency issues and probable loss of connectivity issues between MiniDNS and the database (which would spell big problems for the underlying DNS servers if the outage persisted).

The issue then becomes keeping that local database. Replicating a MySQL database trans-continental is fraught with peril. Especially when the number of writes becomes high. Performance takes a major hit as the number of writes rises. Other issues, like slaves getting out of sync and needing to be re synced when cross-DC communication is less than optimal are reasons to consider alternatives to traditionally replicating across large distances.

An actual possible solution

Designate could implement a NoSQL database system like Cassandra or MongoDB for its "storage" that would allow for a more effective replication strategy. This would require major additions to Designate. But the database could be replicated on a larger scale without as much heartburn, or the performance issues.

The approach here is to make it possible for the Database to handle all of the requests it will need to from Designate.


2. A New Agent

The current bind9 "Agent" could be reimplemented to do the following:

  • Sit on a Master bind9 server
  • Create zone files
  • Issue RNDC calls to add and delete zonefiles
  • Receive NOTIFYs from MiniDNS
  • Initiate AXFR/IXFR zone transfer requests
  • Replace zone files


This has a few benefits:

  • The large number of refresh requests from DNS slaves are handled easily by the Master, as it has a copy of all the zone files, and the database need not be involved
  • The only database communication that must happen is MiniDNS getting new zone/update information (this may negate the need to replicate).
  • The only communication between MiniDNS and the DNS servers are zone transfers and making sure the transfer worked. (Creates/Deletes can be pool manager plugin <-> agent actions)
  • The number of slaves to monitor/communicate with/manage goes down to just dealing with a master or two. This makes the process of checking for live changes much easier/less prone to failure.
  • This doesn't (shouldn't) break any contracts. The agent will act like a DNS server as far as MiniDNS is concerned, and this can be a second bind9 pool manager plugin.


The approach here is to minimize the communication between MiniDNS and the database, as well as MiniDNS and the Slave servers


3. Do Both

There isn't a reason a NoSQL database couldn't be local to every deployment of MiniDNS/DNS servers with the agent.