Service Discovery has been very prominent as of lately, it is a key part of any self-healing and autoscaling system.
Companies have been creating these systems in house as there was no clear open source offering, some companies like Netflix and Airbnb have published their own publicly to try to help others in embracing good practices.
Hashicorp created Consul as a natural extension to one of their previous projects called Serf, Serf is a local cluster membership orchestration mechanism, detecting failures and executing actions based on that.
Consul takes this concept several steps further, let’s see how.
What is Consul
Consul is a service discovery system, it provides a DNS system to be able to query services in a k/v storage mechanism which is strongly consistent as proven by the rigorous Jepsen tests it has witheld.
Consul introduces two concepts that are very well segmented, the concept of Service definitions and a Health Check, Health Checks can be included in Service definitions to make sure they work fine but at the same time can be running standalone to ensure that the health of the system itself is not compromised.
Consul is based on the same Gossip protocol that Serf is based (as it uses Serf itself for Gossip), Gossip will transmit at random intervals the health check of the system to the rest of the cluster; if a system fails to report in it will be checked indirectly by a random number of nodes, if it fails to do so it will be marked as “suspicious” and taken off the cluster in a reasonable time if the node itself does not challenge this suspicion. Gossip is a fairly good balance between decentralisation and network traffic generated to guarantee quorum in the system.
Consul can use Gossip in a LAN system (also known as a datacentre in the config files) and a WAN system, as Consul can connect and talk to other Consul clusters in other datacentres, all the clusters share information through the k/v system that is queryable from every single node, ensuring consistency with very short “eventuality”.
Consul needs a quorum of servers in order to run reliably, for production purposes a minimum of three servers is recommended although this number can be take up to five or more nodes for high consistency.
When starting a cluster one of the server nodes will be the bootstrap node, this node will generate the initial k/v definitions and assume the temporary leadership of the cluster until the minimum number of quorum servers is reached. At that point a leader election will be forced and a cluster leader will be chosen amongst all the servers, having the right number of servers will help the cluster vote for new leaders and ensure consistency, this can be trickier than it looks like in some environments, specially datacentre partitioned ones like AWS, here’s a table I use for configuring mine in AWS.
|Region||Zones||Servers per zone||Total Servers||Minimum Quorum||Consistency|
Consul Agents are any other node that runs Consul and joins the Cluster but are not responsible of any of Consul internal services, they will run health checks for the node and declare services available.
Consul Agents will also have a full copy of the k/v database and share the strong consistency of the system, this way we can always query through DNS or through the k/v itself the services available in the cluster.
In the next blog post I will talk about how to install Consul, start a basic cluster and add service and health check definitions.