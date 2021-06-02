Do you know how your system will respond to an arbitrary failure? Will your application fail? Will anything survive after a loss? If you're not sure, it's time to see if your system passes the Litmus test, a detailed way to cause chaos at random with many experiments.

In the first article in this series, I explained what chaos engineering is, and in the second article, I demonstrated how to get your system's steady state so that you can compare it against a chaos state. This third article will show you how to install and use Litmus to test arbitrary failures and experiments in your Kubernetes cluster. In this walkthrough, I'll use Pop!_OS 20.04, Helm 3, Minikube 1.14.2, and Kubernetes 1.19.

Configure Minikube

If you haven't already, install Minikube in whatever way makes sense for your environment. If you have enough resources, I recommend giving your virtual machine a bit more than the default memory and CPU power:

$ minikube config set memory 8192

❗ These changes will take effect upon a minikube delete and then a minikube start

$ minikube config set cpus 6

❗ These changes will take effect upon a minikube delete and then a minikube start

Then start and check your system's status:

$ minikube start

😄 minikube v1.14.2 on Debian bullseye / sid

🎉 minikube 1.19.0 is available ! Download it: https: // github.com / kubernetes / minikube / releases / tag / v1.19.0

💡 To disable this notice, run: 'minikube config set WantUpdateNotification false'



✨ Using the docker driver based on user configuration

👍 Starting control plane node minikube in cluster minikube

🔥 Creating docker container ( CPUs = 6 , Memory =8192MB ) ...

🐳 Preparing Kubernetes v1.19.0 on Docker 19.03.8 ...

🔎 Verifying Kubernetes components...

🌟 Enabled addons: storage-provisioner, default-storageclass

🏄 Done ! kubectl is now configured to use "minikube" by default

jess@Athena:~$ minikube status

minikube

type: Control Plane

host: Running

kubelet: Running

apiserver: Running

kubeconfig: Configured

Install Litmus

As outlined on Litmus' homepage, the steps to install Litmus are: add your repo to Helm, create your Litmus namespace, then install your chart:

$ helm repo add litmuschaos https: // litmuschaos.github.io / litmus-helm /

"litmuschaos" has been added to your repositories



$ kubectl create ns litmus

namespace / litmus created



$ helm install chaos litmuschaos / litmus --namespace =litmus

NAME: chaos

LAST DEPLOYED: Sun May 9 17 :05: 36 2021

NAMESPACE: litmus

STATUS: deployed

REVISION: 1

TEST SUITE: None

NOTES:

Verify the installation

You can run the following commands if you want to verify all the desired components are installed correctly.

Check if api-resources for chaos are available:

root@demo:~# kubectl api-resources | grep litmus

chaosengines litmuschaos.io true ChaosEngine

chaosexperiments litmuschaos.io true ChaosExperiment

chaosresults litmuschaos.io true ChaosResult

Check if the Litmus chaos operator deployment is running successfully:

root@demo:~# kubectl get pods -n litmus

NAME READY STATUS RESTARTS AGE

litmus-7d998b6568-nnlcd 1 / 1 Running 0 106s

Start running chaos experiments

With this out of the way, you are good to go! Refer to Litmus' chaos experiment documentation to start executing your first experiment.

To confirm your installation is working, check that the pod is up and running correctly:

jess@Athena:~$ kubectl get pods -n litmus

NAME READY STATUS RESTARTS AGE

litmus-7d6f994d88-2g7wn 1 / 1 Running 0 115s

Confirm the Custom Resource Definitions (CRDs) are also installed correctly:

jess@Athena:~$ kubectl get crds | grep chaos

chaosengines.litmuschaos.io 2021 -05-09T21:05:33Z

chaosexperiments.litmuschaos.io 2021 -05-09T21:05:33Z

chaosresults.litmuschaos.io 2021 -05-09T21:05:33Z

Finally, confirm your API resources are also installed:

jess@Athena:~$ kubectl api-resources | grep chaos

chaosengines litmuschaos.io true ChaosEngine

chaosexperiments litmuschaos.io true ChaosExperiment

chaosresults litmuschaos.io true ChaosResult

That's what I call easy installation and confirmation. The next step is setting up deployments for chaos.

Prep for destruction

To test for chaos, you need something to test against. Add a new namespace:

$ kubectl create namespace more-apps

namespace / more-apps created

Then add a deployment to the new namespace:

$ kubectl create deployment ghost --namespace more-apps --image =ghost:3.11.0-alpine

deployment.apps / ghost created

Finally, scale your deployment up so that you have more than one pod in your deployment to test against:

$ kubectl scale deployment / ghost --namespace more-apps --replicas = 4

deployment.apps / ghost scaled

For Litmus to cause chaos, you need to add an annotation to your deployment to mark it ready for chaos. Currently, annotations are available for deployments, StatefulSets, and DaemonSets. Add the annotation chaos=true to your deployment:

$ kubectl annotate deploy / ghost litmuschaos.io / chaos = "true" -n more-apps

deployment.apps / ghost annotated

Make sure the experiments you will install have the correct permissions to work in the "more-apps" namespace.

Make a new rbac.yaml file for the prepper bindings and permissions:

$ touch rbac.yaml

Then add permissions for the generic testing by copying and pasting the code below into your rbac.yaml file. These are just basic, minimal permissions to kill pods in your namespace and give Litmus permissions to delete a pod for a namespace you provide:

---

apiVersion : v1

kind : ServiceAccount

metadata :

name : pod-delete-sa

namespace : more-apps

labels :

name : pod-delete-sa

---

apiVersion : rbac.authorization.k8s.io/v1

kind : Role

metadata :

name : pod-delete-sa

namespace : more-apps

labels :

name : pod-delete-sa

rules :

- apiGroups : [ "" ]

resources : [ "pods" , "events" ]

verbs : [ "create" , "list" , "get" , "patch" , "update" , "delete" , "deletecollection" ]

- apiGroups : [ "" ]

resources : [ "pods/exec" , "pods/log" , "replicationcontrollers" ]

verbs : [ "create" , "list" , "get" ]

- apiGroups : [ "batch" ]

resources : [ "jobs" ]

verbs : [ "create" , "list" , "get" , "delete" , "deletecollection" ]

- apiGroups : [ "apps" ]

resources : [ "deployments" , "statefulsets" , "daemonsets" , "replicasets" ]

verbs : [ "list" , "get" ]

- apiGroups : [ "apps.openshift.io" ]

resources : [ "deploymentconfigs" ]

verbs : [ "list" , "get" ]

- apiGroups : [ "argoproj.io" ]

resources : [ "rollouts" ]

verbs : [ "list" , "get" ]

- apiGroups : [ "litmuschaos.io" ]

resources : [ "chaosengines" , "chaosexperiments" , "chaosresults" ]

verbs : [ "create" , "list" , "get" , "patch" , "update" ]

---

apiVersion : rbac.authorization.k8s.io/v1

kind : RoleBinding

metadata :

name : pod-delete-sa

namespace : more-apps

labels :

name : pod-delete-sa

roleRef :

apiGroup : rbac.authorization.k8s.io

kind : Role

name : pod-delete-sa

subjects :

- kind : ServiceAccount

name : pod-delete-sa

namespace : more-apps

Apply the rbac.yaml file:

$ kubectl apply -f rbac.yaml

serviceaccount / pod-delete-sa created

role.rbac.authorization.k8s.io / pod-delete-sa created

rolebinding.rbac.authorization.k8s.io / pod-delete-sa created

The next step is to prepare your chaos engine to delete pods. The chaos engine will connect the experiment you need to your application instance by creating a chaosengine.yaml file and copying the information below into the .yaml file. This will connect your experiment to your namespace and the service account with the role bindings you created above.

This chaos engine file only specifies the pod to delete during chaos testing:

apiVersion : litmuschaos.io/v1alpha1

kind : ChaosEngine

metadata :

name : moreapps-chaos

namespace : more-apps

spec :

appinfo :

appns : 'more-apps'

applabel : 'app=ghost'

appkind : 'deployment'

# It can be true/false

annotationCheck : 'true'

# It can be active/stop

engineState : 'active'

#ex. values: ns1:name=percona,ns2:run=more-apps

auxiliaryAppInfo : ''

chaosServiceAccount : pod-delete-sa

# It can be delete/retain

jobCleanUpPolicy : 'delete'

experiments :

- name : pod-delete

spec :

components :

env :

# set chaos duration (in sec) as desired

- name : TOTAL_CHAOS_DURATION

value : '30'



# set chaos interval (in sec) as desired

- name : CHAOS_INTERVAL

value : '10'



# pod failures without '--force' & default terminationGracePeriodSeconds

- name : FORCE

value : 'false'

Don't apply this file until you install the experiments in the next section.

Add new experiments for causing chaos

Now that you have an entirely new environment with deployments, roles, and the chaos engine to test against, you need some experiments to run. Since Litmus has a large community, you can find some great experiments in the Chaos Hub.

In this walkthrough, I'll use the generic experiment of killing a pod.

Run a kubectl command to install the generic experiments into your cluster. Install this in your more-apps namespace; you will see the tests created when you run it:

$ kubectl apply -f https: // hub.litmuschaos.io / api / chaos / 1.13.3? file =charts / generic / experiments.yaml -n more-apps

chaosexperiment.litmuschaos.io / pod-network-duplication created

chaosexperiment.litmuschaos.io / node-cpu-hog created

chaosexperiment.litmuschaos.io / node-drain created

chaosexperiment.litmuschaos.io / docker-service-kill created

chaosexperiment.litmuschaos.io / node-taint created

chaosexperiment.litmuschaos.io / pod-autoscaler created

chaosexperiment.litmuschaos.io / pod-network-loss created

chaosexperiment.litmuschaos.io / node-memory-hog created

chaosexperiment.litmuschaos.io / disk-loss created

chaosexperiment.litmuschaos.io / pod-io-stress created

chaosexperiment.litmuschaos.io / pod-network-corruption created

chaosexperiment.litmuschaos.io / container-kill created

chaosexperiment.litmuschaos.io / node-restart created

chaosexperiment.litmuschaos.io / node-io-stress created

chaosexperiment.litmuschaos.io / disk-fill created

chaosexperiment.litmuschaos.io / pod-cpu-hog created

chaosexperiment.litmuschaos.io / pod-network-latency created

chaosexperiment.litmuschaos.io / kubelet-service-kill created

chaosexperiment.litmuschaos.io / k8-pod-delete created

chaosexperiment.litmuschaos.io / pod-delete created

chaosexperiment.litmuschaos.io / node-poweroff created

chaosexperiment.litmuschaos.io / k8-service-kill created

chaosexperiment.litmuschaos.io / pod-memory-hog created

Verify the experiments installed correctly:

$ kubectl get chaosexperiments -n more-apps

NAME AGE

container-kill 72s

disk-fill 72s

disk-loss 72s

docker-service-kill 72s

k8-pod-delete 72s

k8-service-kill 72s

kubelet-service-kill 72s

node-cpu-hog 72s

node-drain 72s

node-io-stress 72s

node-memory-hog 72s

node-poweroff 72s

node-restart 72s

node-taint 72s

pod-autoscaler 72s

pod-cpu-hog 72s

pod-delete 72s

pod-io-stress 72s

pod-memory-hog 72s

pod-network-corruption 72s

pod-network-duplication 72s

pod-network-latency 72s

pod-network-loss 72s

Run the experiments

Now that everything is installed and configured, use your chaosengine.yaml file to run the pod-deletion experiment you defined. Apply your chaos engine file:

$ kubectl apply -f chaosengine.yaml

chaosengine.litmuschaos.io / more-apps-chaos created

Confirm the engine started by getting all the pods in your namespace; you should see pod-delete being created:

$ kubectl get pods -n more-apps

NAME READY STATUS RESTARTS AGE

ghost-5bdd4cdcc4-blmtl 1 / 1 Running 0 53m

ghost-5bdd4cdcc4-z2lnt 1 / 1 Running 0 53m

ghost-5bdd4cdcc4-zlcc9 1 / 1 Running 0 53m

ghost-5bdd4cdcc4-zrs8f 1 / 1 Running 0 53m

moreapps-chaos-runner 1 / 1 Running 0 17s

pod-delete-e443qx-lxzfx 0 / 1 ContainerCreating 0 7s

Next, you need to be able to observe your experiments using Litmus. The following command uses the ChaosResult CRD and provides a large amount of output:

$ kubectl describe chaosresult moreapps-chaos-pod-delete -n more-apps

Name: moreapps-chaos-pod-delete

Namespace: more-apps

Labels: app.kubernetes.io / component =experiment-job

app.kubernetes.io / part-of=litmus

app.kubernetes.io / version =1.13.3

chaosUID =a6c9ab7e-ff07- 4703 -abe4-43e03b77bd72

controller-uid=601b7330-c6f3-4d9b-90cb-2c761ac0567a

job-name=pod-delete-e443qx

name =moreapps-chaos-pod-delete

Annotations: < none >

API Version: litmuschaos.io / v1alpha1

Kind: ChaosResult

Metadata:

Creation Timestamp: 2021 -05-09T22:06:19Z

Generation: 2

Managed Fields:

API Version: litmuschaos.io / v1alpha1

Fields Type: FieldsV1

fieldsV1:

f:metadata:

f:labels:

.:

f:app.kubernetes.io / component:

f:app.kubernetes.io / part-of:

f:app.kubernetes.io / version:

f:chaosUID:

f:controller-uid:

f:job-name:

f:name:

f:spec:

.:

f:engine:

f:experiment:

f:status:

.:

f:experimentStatus:

f:history:

Manager: experiments

Operation: Update

Time: 2021 -05-09T22:06:53Z

Resource Version: 8406

Self Link: / apis / litmuschaos.io / v1alpha1 / namespaces / more-apps / chaosresults / moreapps-chaos-pod-delete

UID: 08b7e3da-d603-49c7-bac4-3b54eb30aff8

Spec:

Engine: moreapps-chaos

Experiment: pod-delete

Status:

Experiment Status:

Fail Step: N / A

Phase: Completed

Probe Success Percentage: 100

Verdict: Pass

History:

Failed Runs: 0

Passed Runs: 1

Stopped Runs: 0

Events:

Type Reason Age From Message

---- ------ ---- ---- -------

Normal Pass 104s pod-delete-e443qx-lxzfx experiment: pod-delete, Result: Pass

You can see the pass or fail output from your testing as you run the chaos engine definitions.

Congratulations on your first (and hopefully not last) chaos engineering test! Now you have a powerful tool to use and help your environment grow.

Final thoughts

You might be thinking, "I can't run this manually every time I want to run chaos. How far can I take this, and how can I set it up for the long term?"

Litmus' best part (aside from the Chaos Hub) is its scheduler function. You can use it to define times and dates, repetitions or sporadic, to run experiments. This is a great tool for detailed admins who have been working with Kubernetes for a while and are ready to create some chaos. I suggest staying up to date on Litmus and how to use this tool for regular chaos engineering. Happy pod hunting!