Amazon Web Services Elastic Kubernetes Service (AWS EKS) is a managed Kubernetes service that makes it easier to deploy and operate Kubernetes clusters in the cloud. It fully supports standard distributions of Kubernetes, meaning you can use it for existing Kubernetes clusters and leverage standard tools in the Kubernetes ecosystem.
AWS EKS best practices are guidelines, tips, and strategies designed to help you maximize the benefits of using the EKS service. They aim to help you avoid common pitfalls, improve performance, enhance security, and ensure the reliability and scalability of your applications. These practices result from accumulated knowledge and experience from experts who have spent years working with AWS services.
In this guide, we’ll provide best practices in three categories: application reliability, cost management, and security.
In this article
Reliability is a critical aspect of managing AWS EKS clusters. Here are some best practices to ensure your applications run reliably.
In Kubernetes, a singleton pod is a pod that runs alone without being part of a replication controller, deployment, or a stateful set. This is not advisable for reliability. Running singleton pods in AWS EKS can lead to reliability issues, particularly if the node hosting the singleton pod fails.
To mitigate this, avoid deploying critical applications as singleton pods. Instead, deploy them as part of a deployment or stateful set. This approach ensures that if the pod or the node fails, the Kubernetes control plane can reschedule the pod on a different node. Additionally, spread these replicas across different availability zones to ensure high availability in case of zone failures.
Running multiple replicas of your pods in AWS EKS is crucial for both reliability and load balancing. When defining your deployments, specify a replica count that reflects your application’s needs and anticipated load. It’s recommended to have at least two replicas to ensure at least one is available in case of a failure. Use pod anti-affinity rules to ensure that these replicas are scheduled on different nodes, providing resilience against node failures.
Implement Horizontal Pod Autoscaler (HPA) and Vertical Pod Autoscaler (VPA) in AWS EKS for dynamic scaling based on workload demands. HPA adjusts the number of pod replicas in a deployment or replica set based on observed CPU and memory usage. VPA, on the other hand, adjusts the CPU and memory limits and requests for the pods, allowing the efficient use of resources. When using both, be mindful of their interactions and potential conflicts, particularly how they scale resources.
Configure health checks for your applications running on AWS EKS to ensure self-healing and reliability. Use Kubernetes liveness and readiness probes to monitor the health of your pods. Liveness probes keep pods healthy by restarting containers that fail the health checks, while readiness probes ensure that traffic is not sent to a pod until it’s ready to handle it. This approach ensures that AWS EKS can automatically manage and maintain the health of your applications.
There are several measures you can take to optimize your EKS costs.
Right-sizing your workloads in AWS EKS is essential for cost optimization. Analyze your application resource usage and adjust pod limits and requests accordingly. Avoid over-provisioning resources as it leads to unnecessary costs. Utilize tools like Amazon CloudWatch and Container Insights—regularly review and adjust the allocations based on these insights to ensure you are only paying for the resources you need.
Amazon provides various purchasing options to help you optimize costs. The following are the main options:
NAT gateways provide a way for instances in a private subnet to connect to the internet or other AWS services but prevent external parties from initiating a connection with those instances. While they enhance security, they also add to your AWS bill.
To optimize costs, consider using a single NAT gateway per availability zone instead of per private subnet. Also, be mindful of data transfer costs, as data transferred through NAT gateways is charged.
AWS EKS comes with logging capabilities that allow you to monitor and troubleshoot your clusters. However, storing and managing these logs can add to your costs.
To optimize these costs, consider limiting the retention period of your logs and disabling unnecessary log types. For instance, if you do not require audit logs, you can disable them to save on storage costs.
If you do not require high performance access to storage, it can be more cost effective to store persistent data in Amazon S3, rather than in storage volumes directly attached to your EC2 instances. Amazon S3 offers various storage classes optimized for specific use cases and cost points. Understanding these classes can help you optimize your storage costs.
For frequently accessed data, consider using S3 Standard, which offers high durability, availability, and performance. For infrequently accessed data, S3 Intelligent-Tiering, S3 Standard-IA, or S3 One Zone-IA can help save costs.
Let’s explore some best practices that will help keep your EKS clusters secure.
AWS operates on the shared responsibility model, which means that AWS is responsible for the security ‘of’ the cloud, while customers are responsible for security ‘in’ the cloud. AWS ensures the infrastructure running AWS services is protected. Customer responsibilities will vary depending on the services you utilize, but typically include securing applications and data.
For instance, with EKS, AWS manages Kubernetes control plane security, including the master nodes and etcd database security. As a customer, you are responsible for the worker node security, cluster network configurations, and the security groups that control network access to your resources.
By default, your EKS cluster endpoint is publicly accessible from the internet. While this is convenient, it can also be risky if not properly managed.
It’s advisable to restrict public access and enable private access instead. This ensures that all traffic to your API server travels through your VPC (Virtual Private Cloud), reducing the exposure to potential threats. Also, use security groups and NACLs (Network Access Control Lists) to control inbound and outbound traffic to your EKS cluster.
Within Kubernetes, a pod can request high levels of access, which could be misused. It is crucial to implement policies that limit the permissions of pods, preventing them from requesting too much access.
Consider using Kubernetes security contexts or Open Policy Agent (OPA) to enforce such policies. These tools enable you to limit pod permissions on the host to a reasonable level.
Only some applications in your cluster need to communicate with every other application. Limiting communication to the minimum required will significantly reduce the potential attack surface.
You can achieve this by implementing network policies that specify which pods can communicate with each other. A CNI (Container Network Interface) plugin that supports network policies, like Calico or Cilium, can significantly help enforce these policies.
Learn more in our detailed guide to AWS EKS architecture