• Guide Content

Kubernetes Exit Code 143: A Practical Guide

What Is Kubernetes Exit Code 143 (SIGTERM)? 

Exit code 143 in Kubernetes signals that a container was terminated by receiving a SIGTERM signal. This code indicates that the process within the container was stopped due to an external request, commonly initiated by Kubernetes for various operational reasons, such as pod deletion or scaling down. 

The SIGTERM signal is a way for an operating system to notify a process that it should finish its current tasks and terminate gracefully, allowing for a clean shutdown of applications. Understanding this exit code is crucial to the operational nuances when working with Kubernetes, as it helps diagnose why a container stopped running and may highlight deeper issues within deployments. 

Exit code 143 distinguishes intentional stops from unexpected errors, providing insights into the application’s behavior and the orchestration’s management actions.

This is part of a series of articles about Kubernetes troubleshooting

Common Scenarios Leading to Exit Code 143 

There are several contexts that can result in a SIGTERM signal.

Pod Scaling Operations

In Kubernetes, scaling operations often lead to the generation of exit code 143. When a deployment is scaled down, Kubernetes must terminate excess pods to match the desired state. This action is initiated by sending a SIGTERM signal to the containers running in the scaled-down pods. 

The signal serves as a notice for the applications to terminate gracefully, allowing them to complete current tasks and save necessary data before shutting down. This process ensures that scaling operations do not disrupt the application’s functionality or cause data loss.

Pod Eviction Due to Resource Constraints

When a node in the cluster runs out of resources like memory or CPU, the Kubernetes scheduler may decide to evict pods to stabilize the node’s condition. This eviction process involves sending a SIGTERM signal to the containers, signaling them to shut down gracefully to free up resources.

Manual Pod Deletion or Rolling Updates

When a pod is manually deleted or replaced during a rolling update, Kubernetes sends a SIGTERM signal to the containers as part of the process to gracefully remove the pod from service. This allows the application to terminate connections, complete in-progress tasks, and ensure a smooth transition to new pods without losing critical information or disrupting the service.

Why Does Kubernetes Use SIGTERM Instead of SIGKILL? 

The SIGKILL signal is another way to terminate processes, but it does not allow processes to gracefully shut down. Here are the key reasons Kubernetes always uses SIGTERM and not SIGKILL to terminate processes:

SIGTERM Ensures Processes Can Clean Up Data Before Shutting Down

SIGTERM gives processes the opportunity to clean up their data and perform necessary shutdown procedures before termination. This approach is vital for maintaining data integrity and ensuring that applications can resume operations smoothly after a restart. It provides a controlled environment for the application to close connections, save state, and release resources, minimizing the risk of data corruption or loss.

SIGTERM Is Safer for the Kubernetes Environment

The safety provided by SIGTERM helps prevent environmental corruption. By allowing applications to shut down gracefully, Kubernetes minimizes the risk of leaving the system in an inconsistent state. This is crucial for complex applications that manage significant amounts of data or maintain persistent connections. The orderly shutdown process facilitated by SIGTERM protects against data loss and ensures that resources are cleanly released.

SIGTERM Automatically Launches SIGKILL If the Pod Isn’t Responding

If a pod does not respond to the SIGTERM signal within a specified grace period, Kubernetes escalates the termination by sending a SIGKILL signal. This ensures that unresponsive processes are forcefully terminated, maintaining the cluster’s stability and performance. Even if a graceful shutdown isn’t possible, stuck or unresponsive processes do not hinder the cluster’s functionality.

The Kubernetes Graceful Termination Process  

Here is an overview of the Exit Code 143 process in Kubernetes.

1. The Pod Is Set to Terminating Status

When a pod in Kubernetes is marked for termination, its status is set to Terminating. This status indicates that the pod has received a shutdown request, typically through a SIGTERM signal. During this phase, the pod remains in the cluster but stops receiving new requests. 

This status allows the system to manage resources effectively while providing the pod the opportunity to close gracefully, ensuring that ongoing tasks are completed and resources are properly released.

2. The preStop Hook

The preStop hook in Kubernetes provides a way to execute custom commands or scripts before a pod is terminated. This hook is triggered after the SIGTERM signal is sent but before the process is forcibly stopped with SIGKILL. This allows for the definition of cleanup or preparatory actions that should be taken immediately before the shutdown. 

This mechanism is crucial for complex applications that require specific steps to ensure data integrity and application state before termination.

3. A SIGTERM Signal Is Sent to the Pod

The SIGTERM signal is the initial step in the graceful termination process, telling the processes within the pod to shut down. It marks the beginning of the termination procedure, giving processes the chance to conclude their operations in an orderly fashion. 

4. The Grace Period

Following the SIGTERM signal, Kubernetes provides a grace period, a predefined amount of time for processes to shut down gracefully. If the processes do not terminate within this period, Kubernetes escalates to sending a SIGKILL signal, forcibly terminating the processes. 

The grace period is configurable, allowing developers to specify the time needed for their applications to shut down properly, depending on their complexity and shutdown requirements.

5. A SIGKILL Signal Is Sent to the Pod

If a process does not terminate after the SIGTERM signal and the grace period elapses, Kubernetes sends a SIGKILL signal to the pod. This signal forcefully stops the process, ensuring that resources are freed and the pod is removed from the cluster. 

While SIGKILL effectively guarantees the termination of the pod, it bypasses the graceful shutdown process. It is a “last resort” in the Kubernetes termination process.

Best Practices to Prevent Unwanted Exit Code 143 Errors in Kubernetes 

Exit Code 143 is not an error—it can result from healthy Kubernetes operations, such as normal scaling operations. However, sometimes containers return Exit Code 143 in an unexpected manner. Here are a few ways to reduce unwanted Exit Code 143 responses.

Enable Container Logging for Received Signals

Logs are useful for diagnosing and understanding termination events. By logging signals like SIGTERM, developers can gain insights into the termination process, including timing and context. This information is invaluable for troubleshooting issues related to graceful shutdowns and for improving applications’ resilience against disruptions.

Run One Application per Container

Running a single application per container is a best practice that simplifies management and enhances the predictability of container behavior upon receiving a SIGTERM signal. This approach ensures that each container has a focused purpose, making it easier to manage lifecycle events and signal handling. It facilitates cleaner shutdowns, restarts, and more straightforward resource allocation and scaling.

Configure Pod Priorities, Resource Requests, and Limits

Properly configuring pod priorities, along with resource requests and limits, is essential in preventing unnecessary shutdowns due to resource contention or eviction. Specifying these configurations can ensure that critical applications have sufficient resources and are less likely to be evicted in favor of lower-priority workloads. This practice helps maintain application availability and performance, even under resource constraints.

Implement Auto Scaling for Automatic Resource Adjustment

Autoscaling in Kubernetes allows for automatic adjustment of resources based on demand, reducing the likelihood of exit code 143 due to resource shortages or overallocation. Autoscaling can adjust the number of pod replicas in response to current load, ensuring that applications have the resources to perform optimally while avoiding unnecessary resource consumption.

Kubernetes Troubleshooting with Lumigo

Lumigo is a troubleshooting platform that is purpose-built for microservice-based applications. Developers using Kubernetes to orchestrate their containerized applications can use Lumigo to monitor, trace, and troubleshoot issues quickly. Deployed with zero-code changes and automated in 1-click, Lumigo stitches every interaction between micro and managed service into end-to-end stack traces. These traces served alongside request payload data, give a more complete visibility into container environments. 

Using Lumigo, allows you to get a more comprehensive view of your deployments, including:

  • End-to-end virtual stack traces across every micro and managed service that makes up a serverless application, in context
  • API visibility that makes all the data passed between services available and accessible, making it possible to perform root cause analysis without digging through logs.
  • Distributed tracing that is deployed with no code and automated in one click.
  • Unified platform to explore and query across microservices, see a real-time view of applications, and optimize performance.

To try Lumigo for Kubernetes, check out our Kubernetes operator on GitHub.