The Node Not Ready error in Kubernetes indicates a situation where a node within a Kubernetes cluster is not in a healthy state to accept pods for execution. This status is a crucial indicator for cluster administrators, as it signifies that the Kubernetes scheduler will not assign new pods to the affected node until it returns to a Ready state.
To quickly check if any of your nodes are experiencing the Node Not Ready error, run this kubectl command:
kubectl get nodes
The output will list all the nodes in your cluster along with their statuses. For example, in the following output, you can see that node2 has a Node Not Ready error:
NAME STATUS ROLES AGE VERSION
node1 Ready <none> 18d v1.20.4
node2 NotReady <none> 12d v1.20.4
node3 Ready <none> 20d v1.20.4
Nodes may enter a Not Ready state for a variety of reasons, ranging from network issues, resource exhaustion, misconfigurations, or underlying hardware problems. Understanding and resolving the root cause of this error is essential to maintain the operational efficiency and reliability of a Kubernetes cluster.
This is part of a series of articles about Kubernetes troubleshooting.
In this article
In Kubernetes, node states are critical for managing the cluster’s health and workload distribution. Nodes can be in one of several states, reflecting their current status and ability to accept workloads:
To determine if a node is experiencing a Node Not Ready error, and obtain the information necessary to solve the problem, follow these steps:
The first step is to check the state of the nodes in the cluster. This can be done using the kubectl get nodes command, which lists all nodes and their statuses. A node marked as NotReady requires further investigation to understand the underlying issues.
The kubectl describe node <node-name> command provides comprehensive details about the node, including its conditions, events, and configuration. This information is useful for diagnosing the root cause of the Not Ready status, offering insights into any errors or warnings that the node might be experiencing. Analyzing the output of this command helps pinpoint specific issues, guiding the troubleshooting and resolution processes.
Here’s a simplified example of the output for a node experiencing issues:
Name: node2
Roles: <none>
Labels: beta.kubernetes.io/os=linux
Annotations: node.alpha.kubernetes.io/ttl=0
CreationTimestamp: Thu, 12 Aug 2021 12:00:00 +0000
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False Thu, 12 Aug 2021 12:30:00 +0000 Thu, 12 Aug 2021 12:00:00 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Thu, 12 Aug 2021 12:30:00 +0000 Thu, 12 Aug 2021 12:00:00 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Thu, 12 Aug 2021 12:30:00 +0000 Thu, 12 Aug 2021 12:00:00 +0000 KubeletHasSufficientPID kubelet has sufficient PID available
Ready False Thu, 12 Aug 2021 12:30:00 +0000 Thu, 12 Aug 2021 12:20:00 +0000 KubeletNotReady PLEG is not healthy: pleg was last seen active 3m0s ago; threshold is 3m0s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Starting 12m kubelet, node2 Starting kubelet.
Warning NodeNotReady 3m kubelet, node2 Node node2 status is now: NodeNotReady
Warning ContainerdStartFail 2m kubelet, node2 Failed to start container runtime: Error
Here are a few things to notice in the output, which could indicate the cause of the problem:
Logs from the kubelet, the primary component running on each node that communicates with the Kubernetes master, can provide insights into any errors or issues it is encountering.
You can access kubelet logs using journalctl or other logging utilities, depending on the node’s operating system:
journalctl -u kubelet
Reviewing these logs can reveal issues related to resource constraints, network problems, or errors in the kubelet itself, offering clues to the underlying cause of the Not Ready status.
There are several conditions that can result in a node having a Not Ready status.
One common cause of the Node Not Ready error is the scarcity of resources, such as CPU or memory exhaustion. Monitoring resource usage can help identify if this is the cause. The following commands can be used to check resource allocations and usage on a node:
This command displays the amount of CPU and memory resources allocated and used by the pods running on the node. If the node is over-allocated, consider scaling down workloads or adding more nodes to the cluster.
Here is another command you can use to show the current CPU and memory usage of the node, helping to identify if resource scarcity is impacting the node’s readiness:
kubectl top node <node-name>
Checking network settings and connectivity is crucial for investigating Node Not Ready errors.
For example, this command checks connectivity to the Kubernetes master node, ensuring the affected node can communicate with the rest of the cluster:
ping <master-node-ip>
This command traces the path packets take to reach the master node, helping to identify any network hops that may be causing delays or connectivity issues.
traceroute <master-node-ip>
Restarting the kubelet might resolve some issues in the kubelet process. The command to restart the kubelet varies depending on the system manager in use. In a Linux system, the command is typically:
sudo systemctl restart kubelet
This command restarts the kubelet service, potentially resolving issues that prevent the node from reaching a Ready state.
Issues with kube-proxy, the network proxy running on each node, can also affect node readiness. Checking the status of kube-proxy and restarting it if necessary can help:
sudo systemctl status kube-proxy
This command checks the status of the kube-proxy service. If it’s not running as expected, it can be restarted with:
sudo systemctl restart kube-proxy
Restarting kube-proxy can resolve network-related issues affecting the node’s ability to communicate with the cluster, potentially resolving the Not Ready error.
Kubernetes Troubleshooting with Lumigo
Lumigo is a troubleshooting platform, purpose-built for microservice-based applications. For those using Kubernetes to orchestrate their containerized applications, Lumigo provides fast monitoring, tracing, and troubleshooting capabilities. Deployed with zero-code changes and automated in one click, Lumigo stitches together every interaction between micro and managed service into end-to-end stack traces. These traces served alongside request payload data, give developers complete visibility into their container environments. Using Lumigo gives you the ability to:
To try Lumigo for Kubernetes, check out our Kubernetes operator on GitHub.