One of the main advantages of using Kubernetes is its ability to maintain containers running in a cluster: Simply create a pod resource, let Kubernetes choose a worker node for it, and it will run the pod’s containers on that node. But what if a container or a pod fails?

As soon as a pod is scheduled to a node, the kubelet on that node will run its containers and keep them running as long as the pod exists. If the container’s main process crashes, the kubelet will restart the container. However, if the application in the container has a defect that causes it to restart every time, Kubernetes can heal it.

The kubelet uses liveness probes to know when to restart a container. Liveness probes can fix a situation in which an application is running but unable to make progress, for example. Restarting a container in such a state can help to make the application available despite bugs.

The kubelet uses readiness probes to know when a container is ready to start accepting traffic. A pod is considered ready when all of its containers are ready. One use of this signal is to control which pods are used as backends for services. When a pod is not ready, it is removed from service load balancers.

Kubernetes liveness probes

Kubernetes can probe the container in three ways:

An HTTP GET probe performs an HTTP GET request on the container’s IP. A TCP socket probe tries to open a TCP connection to the specified container. An Exec probe runs a command inside the container.

HTTP Get liveness probe

The following YAML listing creates a pod that includes an HTTP GET liveness probe:

apiVersion : v1

kind : Pod

metadata :

labels :

test : liveness

name : liveness-http

spec :

containers :

- name : liveness

image : k8s.gcr.io/liveness

args :

- /server

livenessProbe :

httpGet :

path : /healthz

port : 8080

httpHeaders :

- name : X-Custom-Header

value : Awesome

initialDelaySeconds : 3

periodSeconds : 3

The pod descriptor defines an httpGet liveness probe, which tells Kubernetes to periodically perform HTTP Get requests on path /healthz on Port 8080 to determine if the container is still healthy.

The periodSeconds field specifies that the kubelet should perform a liveness probe every 3 seconds. The initialDelaySeconds field tells the kubelet to wait 3 seconds before performing the first probe.

TCP socket liveness probe

The following YAML listing creates a pod that includes a TCP socket liveness probe:

apiVersion : v1

kind : Pod

metadata :

name : goproxy

labels :

app : goproxy

spec :

containers :

- name : goproxy

image : k8s.gcr.io/goproxy: 0.1

ports :

- containerPort : 8080

readinessProbe :

tcpSocket :

port : 8080

initialDelaySeconds : 5

periodSeconds : 10

livenessProbe :

tcpSocket :

port : 8080

initialDelaySeconds : 15

periodSeconds : 20

Exec liveness probe

The following YAML listing creates a pod that includes an Exec liveness probe:

apiVersion : v1

kind : Pod

metadata :

labels :

test : liveness

name : liveness-exec

spec :

containers :

- name : liveness

image : k8s.gcr.io/busybox

args :

- /bin/sh

- -c

- touch /tmp/healthy; sleep 30 ; rm -rf /tmp/healthy; sleep 600

livenessProbe :

exec :

command :

- cat

- /tmp/healthy

initialDelaySeconds : 5

periodSeconds : 5

Kubernetes readiness probes

Sometimes applications are temporarily unable to serve traffic—for example, if the application needs to load large data or configuration files during startup. In such cases, you don’t want to kill the application, but you don’t want to send it requests either. Kubernetes provides readiness probes to detect and mitigate these situations.

Readiness probes are configured similarly to liveness probes. The only difference is that you use the readinessProbe field instead of the livenessProbe field.

HTTP Get readiness probe

The following YAML listing creates a pod that includes an HTTP Get readiness probe:

apiVersion : v1

kind : Pod

metadata :

labels :

test : readiness

name : readiness-http

spec :

containers :

- name : readiness

image : k8s.gcr.io/liveness

args :

- /server

readinessProbe :

httpGet :

path : /healthz

port : 8080

httpHeaders :

- name : X-Custom-Header

value : Awesome

initialDelaySeconds : 3

periodSeconds : 3

TCP socket readiness probe

The following YAML listing creates a pod that includes a TCP socket readiness probe:

apiVersion : v1

kind : Pod

metadata :

name : goproxy

labels :

app : goproxy

spec :

containers :

- name : goproxy

image : k8s.gcr.io/goproxy: 0.1

ports :

- containerPort : 8080

readinessProbe :

tcpSocket :

port : 8080

initialDelaySeconds : 5

periodSeconds : 10

livenessProbe :

tcpSocket :

port : 8080

initialDelaySeconds : 15

periodSeconds : 20

Exec readiness probe

The following YAML listing creates a pod that includes an Exec readiness probe:

apiVersion : v1

kind : Pod

metadata :

labels :

test : readiness

name : readiness-exec

spec :

containers :

- name : readiness

image : k8s.gcr.io/busybox

args :

- /bin/sh

- -c

- touch /tmp/healthy; sleep 30 ; rm -rf /tmp/healthy; sleep 600

readinessProbe :

exec :

command :

- cat

- /tmp/healthy

initialDelaySeconds : 5

periodSeconds : 5

Kubernetes keeps your containers running by restarting them if they crash or if their liveness probes fail. This job is performed by the kubelet on the node hosting the pod.

If the node itself crashes, the control plane must replace all the pods running on that node. In this case, you can use a replication mechanism to recreate the pods on another node within the cluster. I’ll cover that topic in my next article.