One day we've been noticing a lot of
ERROR getaddrinfo EAI_AGAIN log events in our Kubernetes cluster. All NodeJS apps have been having this problem from time to time, because NodeJS runtime does not cache
getaddrinfo() function results. Unlike JVM that does cache them, so Java apps were fairly silent.
That gave clear indication problem is on DNS server. Soon after I've notices that 1 out of 3
kube-dns pods is failing, so we were running at 2/3 capacity. Restarting would be enough of a fix, but being "SRE wannabe" I wanted to make sure we improve the situation for the future.
Googling for the problem
Soon I've found post listing potential causes for the issue among others:
- NodeJS performance issues with
dns.lookup()internal implementation (yeah, but I can't change that).
- CPU throttling in K8s (unlikely, but very hard to pin down).
- Linux networking race conditions in DNAT, fixed in 5.x kernel (we run older version, so it was probable cause).
DNS cache in app? Not for apps in Kubernetes
Above post gave two solutions. One was to install NPM package in Node app that would cache the DNS entries. Not a solution I'm particularly found of, as I prefer to have such a simple thing as domain name resolution to be available in a cluster. Also taking into consideration that DNS serves as a service discovery mechanism in Kubernetes cluster makes it even more important to keep up-to-date records.
NodeLocal DNSCache in Kubernetes cluster
Better solution was to use NodeLocal DNSCache in Kubernetes cluster. Essentially that runs DNS on every cluster node as a DaemonSet. Definitely the way to go for most cases, because it improves both performance and resilience for very little cost. Unfortunately it requires K8s 1.18 version, which we did not have :(
I don't know how domain name resolution works!
Something about this issue kept bugging me though, I thought I was missing something.
Our Kube apps work in microservices fashion, so they communicate with many other services a lot. One of the main page components connects to 13 other services, but that is not unusual. All of those links are full URLs, domains setup to public ELB servers. Still you'd expect that
kube-dns caches those names, so resolution is fast. Well yes, but actually no.
Enlightenment came with a post about
options ndots setting in
/etc/resolv.conf file. In there Marco Pranucci explains how DNS resolving works for non-qualified domain names and how
options ndots:5 affects this. I encourage you to read it through (with comments!), but here's the gist and some corrections.
DNS for Kubernetes Pod and Service
Kubernetes creates internal domain names for Pod and Service objects for the purpose of service discovery pattern. On top of that the namespace is added to the domain as well, so you can have
data service in the
prod namespace. If pod in
test namespace tries to connect to
data host, DNS will not resolve it, but
data.prod would be fine. However that allows adding
data service to
test namespace, so
data would have different IP depending whether DNS query is fired from
My guess this dynamic nature and flexibility is the reason why Kubernetes injects the following in the
/etc/resolv.conf for every pod:
nameserver 10.32.0.10 search <namespace>.svc.cluster.local svc.cluster.local cluster.local options ndots:5
ndots:5 affects name resolution performance?
ndots:5 setting according to docs every domain that less than 5 dots in the name will not be send to DNS servers, but rather items from
search config list will be appended to it first. So in most cases 3 local resolutions will be attempted before any query is send to DNS server! More on why it happens in this particular order read an excellent post on glibc
getaddinfo() function internals.
Solutions proposed and my comment
First: Switch to Fully Qualified Domain Name (FQDN) for public domains is generally good advice. It will not only make name resolution faster, but also prevent security issue explained in RFC1535 (quite short for a RFC!). Can't see any drawback, even thought it looks like quick and dirty solution.
dnsConfig setting. That might makes sense for specific pods that are connection to public domains mostly. You'd have to be careful picking
ndots value that would speed things up, but do not mess with Kube DNS setup for service discovery. In other words: there might be dragons.
What is the ultimate solution then?
As I've tried to explain domain name resolutions is very nuanced problem, much more awkward than I initially anticipated. Keeping in mind that DNS should be managed on the cluster I'd approach solutions in this particular order:
- Setup NodeLocal DNSCache on the cluster.
- Use Fully Qualified Domain Name (FQDN) for specific apps.
ndotsto lower value for specific pods.
- Try DNS cache in language runtime (JVM, NodeJS) or in code.
In the case of failure I've described at the beginning bring up all 3
kube-dns pods was enough. We probably still suffer from a lot of local resolutions due to
ndots:5 settings. Would be nice to know if switching to FQDN made application faster, but that requires much more granular metrics. Maybe next time ;)