It can be daunting to get started with Kubernetes and OKD (a Kubernetes distribution formerly known as OpenShift Origin). There are a lot of concepts and components to take in and understand. This tutorial walks through creating an example Kubernetes cron job that uses a service account and a Python script to list all the pods in the current project/namespace. The job itself is relatively useless, but this tutorial introduces many parts of the Kubernetes & OKD infrastructure. Also, the Python script is a good example of using the OKD REST API for cluster-related tasks.
This tutorial covers several pieces of the Kubernetes/OKD infrastructure, including:
- Service accounts and tokens
- Role-based access control (RBAC)
- Image streams
- BuildConfigs
- Source-to-image (S2I)
- Jobs and cron jobs (mostly the latter)
- Kubernetes' Downward API
The Python script and example OKD YAML files for this tutorial can be found on GitHub; to use them, just replace the value: https://okd.host:port line in cronJob.yml with the hostname of your OKD instance.
Prerequisites
Required software
- An OKD or MiniShift cluster with the default S2I image streams installed
- An integrated image registry
Optional software (for playing with the Python script)
To install the modules, run:
pip3 install --user openshift kubernetes
Authentication
The Python script will use the service account API token to authenticate to OKD. The script expects an environment variable pointing to the OKD host where it will connect:
HOST: OKD host to connect to (e.g., https://okd.host:port)
OKD will also need the token to be created automatically for the service account that will run the cron job pods (see "Set how environment variables and tokens are used" below).
Process
Setting up this sync is a good learning experience—since you need to set up a number of Kubernetes and OKD tasks, you get a good feel for its various features.
The general process is:
- Create a new project
- Create a Git repository with the Python script in it
- Create a service account
- Grant RBAC permissions to the service account
- Set how environment variables and tokens are used
- Create an image stream to accept the images created by the BuildConfig
- Create a BuildConfig to turn the Python script into an image/image stream
- Build the image
- Create the cron job
1. Create a new project
Create a new project in OKD for this exercise:
oc new-project py-cron
Depending on how your cluster is set up, you may need to ask your cluster administrator to create a new project for you.
2. Create a Git repository with the Python script
Clone or fork this repo:
https://github.com/clcollins/openshift-cronjob-example.git
You can also reference it directly in the code examples below. This will serve as the repository where the Python script will be pulled and built into the final running image.
3. Create a service account
A service account is a non-user account that can be associated with resources, permissions, etc., within OKD. For this exercise, you must create a service account to run the pod with the Python script in it, authenticate to the OKD API via token auth, and make REST API calls to list all the pods.
Since the Python script will query the OKD REST API to get a list of pods in the namespace, the service account will need permissions to list pods and namespaces. Technically, one of the default service accounts automatically created in a project—system:serviceaccount:default:deployer—already has these permissions. However, this exercise will create a new service account to explain service account creation and RBAC permissions.
Create a new service account by entering:
oc create serviceaccount py-cron
This creates a service account named, appropriately, py-cron. (Technically, it's system:serviceaccounts:py-cron:py-cron, or the "py-cron" service account in the "py-cron" namespace). The account automatically receives two secrets: an OKD API token and credentials for the OKD Container Registry. The API token will be used in the Python script to identify the service account to OKD for REST API calls.
The tokens associated with the service account can be viewed with the command:
oc describe serviceaccount py-cron
4. Grant RBAC permissions for the service account
OKD and Kubernetes use RBAC (role-based access control) to allow fine-grained control of who can do what in a complicated cluster. In RBAC:
- Permissions are based on verbs and resources (e.g., create group, delete pod, etc.)
- Sets of permissions are grouped into roles or ClusterRoles, the latter being, predictably, cluster-wide.
- Roles and ClusterRoles are associated with (or bound to) groups and service accounts (or, if you want to do it all wrong, individual users) by creating RoleBindings or ClusterRoleBindings.
- Groups, service accounts, and users can be bound to multiple roles.
For this exercise, create a role within the project, grant the role permissions to list pods and projects, and bind the py-cron service account to the role:
oc create role pod-lister --verb=list --resource=pods,namespaces
oc policy add-role-to-user pod-lister --role-namespace=py-cron system:serviceaccounts:py-cron:py-cron
Note that --role-namespace=py-cron has to be added to prevent OKD from looking for ClusterRoles.
Verify the service account has been bound to the role:
oc get rolebinding | awk 'NR==1 || /^pod-lister/'
NAME ROLE USERS GROUPS SERVICE ACCOUNTS SUBJECTS
pod-lister py-cron/pod-lister py-cron
5. Set how environment variables and tokens are used
The tokens associated with the service account and various environment variables are referenced in the Python script as py-cron API token.
The API token automatically created for the py-cron service account is mounted by OKD into any pod the service account is running. This token is mounted to a specific path in every container in the pod:
/var/run/secrets/kubernetes.io/serviceaccount/token
The Python script reads this file and uses it to authenticate to the OKD API to manage the groups.
-
HOST environment variable: The HOST environment variable is specified in the cron job definition and contains the OKD API hostname in the format: https://okd.host:port.
-
NAMESPACE environment variable: The NAMESPACE environment variable is referenced in the cron job definition and uses the Kubernetes Downward API to dynamically populate the variable with the name of the project where the cron job pod is running.
6. Create an image stream
An image stream is a collection of images, in this case, created by the BuildConfig builds, and an abstraction layer between images and Kubernetes objects that allows them to reference the image stream rather than the image directly.
Before a newly built image can be pushed to an image stream, the stream must already exist. The easiest way to create a new, empty stream is with the oc command-line command:
oc create imagestream py-cron
7. Create a BuildConfig
The BuildConfig is the definition of the entire build process—the act of taking input parameters and code and turning them into an image.
The BuildConfig for this exercise will use the source-to-image (S2I) build strategy, using the Red Hat-provided Python S2I image, and adding the Python script to it where the requirements.txt is parsed and those modules are installed. This results in a final Python-based image with the script and the required Python modules to run it.
The important pieces of the BuildConfig are: .spec.output, .spec.source, and .spec.strategy.
.spec.output
The output section of the BuildConfig describes what to do with the build's output. In this case, the BuildConfig outputs the resulting image as an image stream tag (e.g., py-cron:1.0) that can be used in the deploymentConfig to reference the image.
These are probably self-explanatory.
spec:
output:
to:
kind: ImageStreamTag
name: py-cron:1.0
.spec.source
The source section of the BuildConfig describes where the content of the build comes from. In this case, it references the Git repository where the Python script and its supporting files are kept.
Most of these are self-explanatory as well.
spec:
source:
type: Git
git:
ref: master
uri: https://github.com/clcollins/openshift-cronjob-example.git
.spec.strategy
The strategy section of the BuildConfig describes the build strategy to use, in this case, the source (i.e., S2I) strategy. The .spec.strategy.sourceStrategy.from section defines the public Python 3.6 image stream that exists in the default OpenShift namespace for anyone to use. This image stream contains S2I builder images that take Python code as input, install the dependencies listed in any requirements.txt files, then output a finished image with the code and requirements installed.
strategy:
type: Source
sourceStrategy:
from:
kind: ImageStreamTag
name: python:3.6
namespace: openshift
The complete BuildConfig for this example looks like the YAML below. Substitute your Git repo and create the BuildConfig with the oc command:
oc create -f <path.to.buildconfig.yaml>
The YAML:
---
apiVersion: build.openshift.io/v1
kind: BuildConfig
metadata:
labels:
app: py-cron
name: py-cron
spec:
output:
to:
kind: ImageStreamTag
name: py-cron:1.0
runPolicy: Serial
source:
type: Git
git:
ref: master
uri: https://github.com/clcollins/openshift-cronjob-example.git
strategy:
type: Source
sourceStrategy:
from:
kind: ImageStreamTag
name: python:3.6
namespace: openshift
8. Build the image
Most of the time, it would be more efficient to add a webhook trigger to the BuildConfig to allow the image to be automatically rebuilt each time code is committed and pushed to the repo. For this exercise, however, the image build will be kicked off manually whenever the image needs to be updated.
A new build can be triggered by running:
oc start-build BuildConfig/py-cron
Running this command outputs the name of a build; for example:
build.build.openshift.io/py-cron-1 started
The build's progress can be followed by watching the logs:
oc logs -f build.build.openshift.io/py-cron-1
When the build completes, the image will be pushed to the image stream listed in the .spec.output section of the BuildConfig.
9. Create the cron job
The Kubernetes cron job object defines the cron schedule and behavior as well as the Kubernetes job that is created to run the actual sync.
The important parts of the cron job definition are: .spec.concurrencyPolicy, .spec.schedule, and .spec.JobTemplate.spec.template.spec.containers.
.spec.concurrencyPolicy
The concurrencyPolicy field of the cron job spec is an optional field that specifies how to treat concurrent executions of a job that are created by this cron job. In this exercise, it will replace an existing job that may still be running if the cron job creates a new job.
Note: Other options are to allow concurrency (multiple jobs running at once) or forbid concurrency (new jobs are skipped until the running job completes).
.spec.schedule
The schedule field of the cron job spec is (unsurprisingly) a Vixie cron-format schedule. At the time(s) specified, Kubernetes will create a job, as defined in the JobTemplate spec below.
spec:
schedule: "*/5 * * * *"
.spec.JobTemplate.spec.template.spec.containers
The cron job spec contains a JobTemplate spec and a template spec, which in turn contains a container spec. All of these follow the standard spec for their type, i.e., the .spec.containers section is just a normal container definition you might find in any other pod definition.
The container definition for this example is a straightforward container definition that uses the environment variables discussed above.
The only important part is:
.spec.JobTemplate.spec.template.spec.containers.ServiceAccountName
This section sets the service account created earlier, py-cron, as the account running the containers. This overrides the default deployer service account.
The complete OKD cron job for py-cron looks like the YAML below. Substitute the URL of the OKD cluster API and create the cron job by using the following oc command:
oc create -f <path.to.cronjob.yaml>
The YAML:
---
apiVersion: batch/v1beta1
kind: CronJob
metadata:
labels:
app: py-cron
name: py-cron
spec:
concurrencyPolicy: Replace
failedJobsHistoryLimit: 1
JobTemplate:
metadata:
annotations:
alpha.image.policy.openshift.io/resolve-names: '*'
spec:
template:
spec:
containers:
- env:
- name: NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: HOST
value: https://okd.host:port
image: py-cron/py-cron:1.0
imagePullPolicy: Always
name: py-cron
ServiceAccountName: py-cron
restartPolicy: Never
schedule: "*/5 * * * *"
startingDeadlineSeconds: 600
successfulJobsHistoryLimit: 3
suspend: false
Looking around
Once the cron job has been created, its component can be viewed with the oc get cronjob command. This shows a brief description of the cron job, its schedule and last run, and whether it is active or paused:
oc get cronjob py-cron
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
py-cron */5 * * * * False 0 1m 7d
As mentioned, the cron job creates Kubernetes jobs to do the work whenever the scheduled time passes. The oc get jobs command lists the jobs that have been created by the cron job, the number of desired jobs (in this case, just one job per run), and whether the job was successful:
oc get jobs
NAME DESIRED SUCCESSFUL AGE
py-cron-1544489700 1 1 10m
py-cron-1544489760 1 1 5m
py-cron-1544489820 1 1 30s
The jobs are pods, which can be seen with the oc get pods command. In this example, you can see both the job pods (named after the job that created them, such as Job "py-cron-1544489760" created pod "py-cron-1544489760-xl4vt"). There are also build pods, or the pods that built the container image, as described in the BuildConfig above.
oc get pods
NAME READY STATUS RESTARTS AGE
py-cron-1-build 0/1 Completed 0 7d
py-cron-1544489760-xl4vt 0/1 Completed 0 10m
py-cron-1544489820-zgfg8 0/1 Completed 0 5m
py-cron-1544489880-xvmsn 0/1 Completed 0 44s
py-cron-2-build 0/1 Completed 0 7d
py-cron-3-build 0/1 Completed 0 7d
Finally, because this example is just a Python script that connects to OKD's REST API to get information about pods in the project, oc get logs can be used verify that the script is working by getting the logs from the pod to see the output of the script written to standard output:
oc logs py-cron-1544489880-xvmsn
---> Running application from Python script (app.py) ...
ResourceInstance[PodList]:
apiVersion: v1
items:
- metadata:
annotations: {openshift.io/build.name: py-cron-1, openshift.io/scc: privileged}
creationTimestamp: '2018-12-03T18:48:39Z'
labels: {openshift.io/build.name: py-cron-1}
name: py-cron-1-build
namespace: py-cron
ownerReferences:
- {apiVersion: build.openshift.io/v1, controller: true, kind: Build, name: py-cron-1,
uid: 0c9cf9a8-f72c-11e8-b217-005056a1038c}
<snip>
In summary
This tutorial showed how to create a Kubernetes cron job with OKD, a Kubernetes distribution formerly called OpenShift Origin. The BuildConfig described how to build the container image containing the Python script that calls the OKD REST API, and an image stream was created to describe the different versions of the image built by the builds. A service account was created to run the container, and a role was created to allow the service account to get pod information from OKD's REST API. A RoleBinding associated the role with the service account. Finally, a cron job was created to run a job—the container with the Python script—on a specified schedule.
Comments are closed.