nSights Talks

Amazon EKS Optimizations (Part III)

Tutorial Highlights & Transcript

00:00 - Problems
Hello, my name is Parth Vyas. Today, I’m going to give a demo on EKS Optimizations Part Three. And let’s get started. So why do we need optimizations on EKS as it is managed by the AWS? Here we have some problems that we are going to resolve by applying these optimizations. Today, I’m going to cover problems related to DNS only. The first problem is that by default on AWS, EKS comes with the CoreDNS, but it does not have any auto-scaling setup. On the Kubernetes cluster, DNS is a major thing and critical part of the infrastructure. If the CoreDNS is down, your DNS service is down, the whole Kubernetes cluster will go down and it will be a huge downtime for the application. Today, we are going to solve that issue. The first problem is auto-scaling as it does not come with auto-scaling. We will add proper auto-scaling for that. The second one is as the cluster grows, the DNS query becomes slow. Even if you have multiple horizontal scaling available, still, the DNS queries will slow down because of a specific issue, which includes the resolution.
01:46 - Solutions
Let’s go to the resolution, what are the resolution steps that we should use? For the slow queries, the first thing we have to do is use a fully qualified domain name (FQDN) for the external traffic, non Kubernetes means not internal traffic for the external traffic. For example, on AWS, we are connecting to the RDS ElastiCache. All these services come with a URL, which is provided by AWS and which is non-Kubernetes internal URLs. For that, we have to use FQDN and fully qualified domain names. Another we can use is AutoPath. With some of the services, we cannot control the URLs and control the domains. For that kind of scenario, we can use the AutoPath CoreDNS plugin. That will resolve a huge issue which I will show in the demo. Another one is the local DNS cache. The local DNS cache is an official Kubernetes tool to reduce the load on a CoreDNS. As the name suggests, it will cache all your DNS queries to a particular EC2 instance host-based. All the pods running on that particular EC2 instance will not directly query to the CoreDNS, but it will fetch all values from the local cache only. It will be faster and it will reduce the load on a CoreDNS, which is a DNS service on Kubernetes. And the last part is adding auto-scaling to the CoreDNS. These are the reference links that I will use today on the demo.
03:56 - Demo - Fully Qualified Domain Name
And let’s get started with the demo. The first thing I will cover is a fully qualified domain name. How can you decide what is a fully qualified domain name? You need to add a trailing dot at the end of the URL. Here in this example, this google.com. This is not a fully qualified domain name. If I want to make a fully qualified domain name, I need to add a trailing dot and this will make this domain as a fully qualified domain name. The advantage of it I will show you in this demo.

First, this is the CoreDNS and its default settings currently deployed that the EKS cluster comes with. It comes with two pods and two replicas. But for the easiness of the demo, I will reduce it to one, so that we can check the logs of a single container. I will execute this command, and we will see how the Kubernetes cluster results in DNS requests. It provided us with the IP addresses. Now let’s check how the query works. By default, whenever it is a fully qualified domain, the request will look like this. Of course, it will match up against these services, default dot services dot cluster dot local. These are all the Kubernetes internal domain names, including this EC2 internal, right? After creating four times, CoreDNS comes up with the right IP addresses and the right domain name to query against. If you want to consider and calculate the time it is taken, you need to take all the query times and sum up all the things into one. Before the demo, I had a basic timing notation. This is the normal timing. This is with the fully qualified domain name and this is with the AutoPath.

The fully qualified domain name and AutoPath work similarly, but AutoPath requires it to create a CNAME record internally for the CoreDNS for the domains that you are using. So that’s why it is hard compared to normal DNS queries. This is the normal DNS query. I will delete this code so it will recreate a new pod. It will be easier to check the logs with the fully qualified domain name. I’m just copying this command again. In this command, I will add the trailing dot which is a fully qualified domain name. The pod is up and running and I will execute this command. It provided us with the IP addresses and as you can see the timing only queried one time and provided us the IP addresses and the query time is very low compared to the five queries. So, this is the fully qualified domain name and for each and every AWS service, we need to use a case in the Kubernetes variables. So that query performance can be optimized for the smaller scale cluster, it is not much of an impact, but when the cluster grows, and for high traffic clusters, it will be very useful.

08:37 - Demo - AutoPath
In the next step, I will be adding the AutoPath plugin to the CoreDNS and it will work at the same for that, I need to apply a config map. Here I am applying a few things and I will cover this to terraform best practices that I came across. In this TerraForm code, there are some optimizations and tools, and technologies that you need to apply before deploying the worker nodes on the Kubernetes cluster. For that, you can create a TerraForm module with a pre-node name or any name you want. That way it will be deployed before deploying the worker nodes. As you can see, I also created a post with no deploys. For example, there is a load balancer controller which requires an active pod running to validate and support that kind of scenario. Your worker nodes should be deployed first and after that, the Kubernetes Helm charts and everything you need to deploy so that you will not face any kind of errors in TerraForm Apply commands. In these three nodes, I have created a Kubernetes manifest for the optimization of the config map which I will show you here. I will compare it with the default manifest, as well. Here I have created a default manifest. If we compare the side by side, this AutoPath Kubernetes is the step that I have applied for enabling AutoPath plugin on the CoreDNS and another one here is pod by default, it uses the pod and here I have used the pod verifier. These two things are required to apply the AutoPath plugin. If you want to check about this insecure and verify, you can open the Kubernetes plugin with the CoreDNS here, they have provided the proper explanation for the pod disabled insecure and verify. I will apply this in Kubernetes. There is one thing if you change the config map, it will not reload your Kubernetes pods. Here we are deploying via TerraForm. We need to make the changes because by default Kubernetes clusters come with a deployed called DNS pods. And we need to restart the pods. For that, I have created a new deployment file, which is a copy-paste of the default. But here I’ll change the config map name. Here, I have changed the config map name to Optimize and deployed a new config map. This is the Kubernetes configuration change and it will trigger a Kubernetes API server to reload those pods. It will deploy a new pod and the new pods will have this config map already applied right. So I will move forward with this deployment.
08:37 - Demo - AutoPath
In the meantime, I will cover the Kubernetes auto-scaling pod. We can apply the basic HPA. It will be based on a CPU or RAM which is a default Kubernetes object. Another option is can use cluster proportional autoscaler. How does the cluster proportional autoscaler work? We have two parameters to set up the auto-scaling based on nodes per replica and cost per replica. You can decide and it can work similarly to the daemon set if you set one here. If there are 10 nodes, it will deploy 10 replicas, right? And if a single worker node has four pods and you have set up the cost per replica to two, then for the four pods, it will deploy two CoreDNS replicas. That way, based on your workload, you can scale out and scale up CoreDNS pods. For more methods, this is the linear method that I have used here and another is a ladder mode, as well. You can check the documentation provided in the reference URLs here. Our TerraForm apply code is done and now I go here. For the change purpose, I have used the single replica also here in this manifest. As you can see, I use one replica only. There is a new optimized config map, which is already applied. Now we will verify the CoreDNS. As you can see, it automatically restarted the pod. I have done nothing. Now the AutoPath is already applied here. I go into the logs, and I will execute the same command. And as you can see, this is not a fully qualified domain name, but I will apply the command here, and let’s wait for the log. As you can see, in a single command in a single query, it provided us the IP addresses and AutoPath automatically corrected our domain to the fully qualified domain name. And the query time as you can see, it’s very low. So that is the AutoPath plugin and a fully qualified domain name to optimize your DNS.

Now I will enable the node’s local DNS cache and Kubernetes proportional autoscaler, as well. For that, this is the definition that I have used, which is a Helm chart and Kubernetes manifest. If I go to the proportioner autoscaler configuration file, the system main configuration file that I have mentioned that I use. Currently, I’m using a two-core CPU and eight GB RAM for instance. I use the two number of cores per replica and nodes per replica are also one so it will behave like a daemon set. As there is another option that prevents single-point failure. I have deployed a single node only. Basically, for one node, it should be a single replica, but this autoscaler will deploy two replicas and it will scale up our CoreDNS. I will deploy this and for this node local DNS cache.

17:49 - Official Kubernetes Documentation
There is the official Kubernetes documentation. This is the official Kubernetes documentation and after version 1.18, it’s generally available for all. This is the diagram and as I mentioned, it will store the local DNS cache, and pods will directly fetch queries and run queries on the local cache only. It will not go to the CoreDNS and it will reduce the load on the CoreDNS. The main URL says this is the TKNG which is the Kubernetes guide. You can follow this for other tutorials and basic overviews, as well. This is the main website that I have taken a reference from and you can see it also includes the external DNS and everything which includes the AutoPath node local DNS cache. In the local node DNS, all the configuration that you require is provided here. So before applying, you need to read this and you need to create node local DNS configuration files I have created here. This is the custom configuration file that I have created, which includes optimization for the EKS. These IP addresses are IP addresses for your DNS and on the EKS it will not change. But on a different Kubernetes cluster, it can change. So we depend on the Kubernetes cluster you have to use. Even if it’s a cost cluster, it will change. This is the DNS upstream service that you need to create a node-local DNS service you have to create a service account also you have to create and this is the daemon set. Data runs on each and every node. Our apply command is done now let’s wait and check if the CoreDNS pods are automatically scaled. And as you can see, the CoreDNS autoscaler automatically scaled the CoreDNS pods for us. As you can see, this pod is running for seven minutes and this one is the auto-scaled-up load deployment.
Jasmeet Singh

Parth Vyas

DevOps Engineer


Parth is a DevOps Engineer at nClouds with multiple AWS certifications including AWS Certified Solutions Architect - Professional, AWS Certified DevOps Engineer - Professional, AWS Certified Developer - Associate, and AWS Certified SysOps Administrator - Associate.