Kubernetes
TODO table of contents
A YAML cluster configuration file for a Kubernetes resource manager on an HPC cluster looks like:
# /etc/ood/config/clusters.d/my_k8s_cluster.yml
---
v2:
metadata:
title: "My K8s Cluster"
# you may not want a login section. There may not be a login node
# for your kuberenetes cluster
login:
host: "my_k8s_cluster.my_center.edu"
job:
adapter: "kubernetes"
config_file: "~/.kube/config"
cluster: "ood-prod"
context: "ood-prod"
bin: "/usr/bin/kubectl"
username_prefix: "prod-"
namespace_prefix: "user-"
all_namespaces: false
auto_supplemental_groups: false
server:
endpoint: "https://my_k8s_cluster.my_center.edu"
cert_authority_file: "/etc/pki/tls/certs/kubernetes-ca.crt"
auth:
type: "oidc"
mounts: []
batch_connect:
ssh_allow: false
- adapter
This is set to
kubernetes
.- config_file
The KUBECONFIG file. Optional. Defaults to ~/.kube/config. Sites can also set the KUBECONFIG environment varaible, but this configuration has precedence.
- cluster
The cluster name. Saved to and referenced from your KUBECONFIG.
- context
The context to use when issuing kubectl commands. Optional. Defaults to cluster when using OIDC authentication. Saved to and referenced from your KUBECONFIG.
- username_prefix
The prefix to your users in your KUBECONFIG. Use this prefix to differentiate between different clusters (like test and production).
- namespace_prefix
The prefix to your namespace. Use this prefix if you have assertions on what namespaces are available. I.e., a Kyverno policy that ensures all namespaces are
user-\w+
.- all_namespaces
A boolean to determine if the user will query for pods in other namespaces. When false users will only query in their namespace. If true they will query and display pods from all namespaces.
- auto_supplemental_groups:
Automatically populate a container’s
securityContext.supplementalGroups
with the users supplemental groups.- server
The kubernetes server to communicate with. This field is a map with
endpoint
andcert_authority_file
keys.- auth:
See the notes on Authentication below.
- mounts:
Site wide mount points for all kubernetes jobs. See the documentation on kubernetes mounts for more details.
Note
The batch_connect.ssh_allow
is important to disable OnDemand from rendering links to SSH into your
Kubernetes worker nodes when Batch Connect apps are running.
Per User Kubernetes
To get kubernetes to act like a Per User resource there are some conventions we put in place. Users only schedule pods in their own namespaces and they always run those pods as themselves.
At most users could be allow to read pods from other namespaces (have sufficient
privileges to run kubectl get pods --all-namespaces
), but this is not required.
Being able to view pods in other namespaces is only applicable to a feature like
viewing active jobs and seeing pods from other namespaces in that view.
Second is that we specify the Kubernetes security context such that pods have the same UID and GID as the actual user.
Open OnDemand will always use the users UID and GID as the runAsUser and runAsGroup. fsGroup is always the same as runAsGroup. runAsNonRoot is always set to true. supplementalGroups are empty by default. One can automatically populate them with a cluster configuration above or specify them for each app individually.
You should have policies in place to enforce these.
Bootstrapping the Kuberenetes cluster
Before anyone can use your Kubernetes cluster from Open OnDemand, you’ll need to create the open ondemand kubernetes resources on your cluster.
Below is an example of adding the necessary resources:
kubectl apply -f https://raw.githubusercontent.com/OSC/ondemand/master/hooks/k8s-bootstrap/ondemand.yaml
Bootstrapping OnDemand web node to communicate with Kubernetes
The OnDemand web node root
user must be configured
to use the ondemand
service account deployed by the open ondemand kubernetes resources and
be able to execute kubectl
commands.
First deploy kubectl
to the OnDemand web node.
Replace $VERSION
with the version of the Kubernetes controller, eg. 1.21.5
.
wget -O /usr/local/bin/kubectl https://storage.googleapis.com/kubernetes-release/release/v$VERSION/bin/linux/amd64/kubectl
chmod +x /usr/local/bin/kubectl
Tokens for Bootstrapping
The root
user on the OnDemand web node needs a Kubernetes token to bootstrap users.
Specifically to create user namespaces and give the users sufficient privileges in their
namespace.
Service account tokens are not generated automatically since Kubernetes 1.24. You have two
options here: You can either create a non-expiring token for the service account and save it
as a secret or you can create a crontab entry to refresh the root
users token. Both are
described here.
Tip
Kubernetes recommends that you use rotating tokens, so we recommend the same.
To do use rotating tokens, you can use the kubectl create token
API to create a token
and save it in a crontab entry. Here’s an example of what you could use to create new tokens
for the root
user. The tokens last 9 hours, so you can set a crontab entry for every 8 hours
to refresh your tokens before they expire.
#!/bin/bash
set -e
if command -v kubectl >/dev/null 2>&1;
then
CMD_USER=$(whoami)
if [ "$CMD_USER" == "root" ]; then
TOKEN=$(kubectl create token ondemand --namespace=ondemand --duration 9h)
kubectl config set-credentials ondemand@kubernetes --token="$TOKEN"
else
>&2 echo "this program needs to run as 'root' and you are $CMD_USER."
exit 1
fi
fi
If you wish to create a non-expiring token, you will need to create the secret through a
kubectl apply
command on the yaml below.
Next extract the ondemand
ServiceAccount token. Here is an example command to extract
the token using an account that has ClusterAdmin privileges:
# token.yml
apiVersion: v1
kind: Secret
type: kubernetes.io/service-account-token
metadata:
name: token
namespace: ondemand
annotations:
kubernetes.io/service-account.name: ondemand
kubectl apply -f token.yml
TOKEN=$(kubectl describe serviceaccount ondemand -n ondemand | grep Tokens | awk '{ print $2 }')
kubectl describe secret $TOKEN -n ondemand | grep "token:"
Below are example commands to bootstrap the kubeconfig for root
user on the OnDemand web node
using the token from above. Run these commands as root
on the OnDemand web node.
kubectl config set-cluster kubernetes --server=https://$CONTROLLER:6443 --certificate-authority=$CACERT
kubectl config set-credentials ondemand@kubernetes --token=$TOKEN
kubectl config set-context ondemand@kubernetes --cluster=kubernetes --user=ondemand@kubernetes
kubectl config use-context ondemand@kubernetes
Replace the following values:
$CONTROLLER
with the Kubernetes Controller FQDN or IP address$CACERT
the path to Kubernetes cluster CA cert$TOKEN
the token forondemand
ServiceAccount
Below is an example of verifying the kubeconfig is valid:
kubectl cluster-info
Deploy Hooks to bootstrap users Kubernetes configuration
We ship with open ondemand provided hooks to bootstrap users when the login to Open OnDemand. These scripts will create their namespace, a networking policy, and rolebindings for user and the service accounts in their namespace.
A user oakley
would create the oakley
namespace. If you’ve configured
to use prefix user-
, then the namespace would be user-oakley
.
The networking policy ensures that pods cannot communicate inbetween namespaces.
The RoleBindings give user, oakley
in this case, sufficient privileges
to the oakley
namespace. Refer to the open ondemand kubernetes resources
for details on the roles and privileges created.
You’ll need to employ PUN pre hooks to bootstrap your users to this cluster.
You’ll also have to modify /etc/ood/config/hooks.env
because open ondemand provided hooks
require a HOOKENV
environment variable.
Here’s what you’ll have to edit in the hook.env.example
file we ship.
# /etc/ood/config/hook.env
# required if you changed the items in the cluster.d file
K8S_USERNAME_PREFIX=""
NAMESPACE_PREFIX=""
# required
NETWORK_POLICY_ALLOW_CIDR="127.0.0.1/32"
# required if you're using OIDC
IDP_ISSUER_URL="https://idp.example.com/auth/realms/main/protocol/openid-connect/token"
CLIENT_ID="changeme"
CLIENT_SECRET="changeme"
# required if you're using a secret registry
IMAGE_PULL_SECRET=""
REGISTRY_DOCKER_CONFIG_JSON="/some/path/to/docker/config.json"
# enable if are enforcing walltimes through the job pod reaper
# see 'Enforcing walltimes' below.
USE_JOB_POD_REAPER=false
You can refer to osc’s prehook but we’ll also provide this example.
As you can see in this pre hook, the username is passed in to the script
which then defines the HOOKENV
and calls two open ondemand provided hooks.
k8s-bootstrap-ondemand.sh
boostraps the user in the kubernetes cluster as described
above.
Since we use OIDC at OSC we use set-k8s-creds.sh
to add or update the user in their
~/.kube/config
with the relevant OIDC credentials.
#!/bin/bash
for arg in "$@"
do
case $arg in
--user)
ONDEMAND_USERNAME=$2
shift
shift
;;
esac
done
if [ "x${ONDEMAND_USERNAME}" = "x" ]; then
echo "Must specify username"
exit 1
fi
HOOKSDIR="/opt/ood/hooks"
HOOKENV="/etc/ood/config/hook.env"
/bin/bash "$HOOKSDIR/k8s-bootstrap/k8s-bootstrap-ondemand.sh" "$ONDEMAND_USERNAME" "$HOOKENV"
/bin/bash "$HOOKSDIR/k8s-bootstrap/set-k8s-creds.sh" "$ONDEMAND_USERNAME" "$HOOKENV"
Authentication
Here are the current configurations you can list for different types of authentication.
Managed Authentication
# /etc/ood/config/clusters.d/my_k8s_cluster.yml
---
v2:
job:
# ...
auth:
type: 'managed'
This is the simplest case and is the default. The authentication
is managed outside of Open OnDemand. We will not set-context
or set-cluster
.
We will pass --context
to kubectl commands if you have it configured
in the cluster config (above). Otherwise, it’s assumed that the current context
is set out of bounds.
OIDC Authentication
For OIDC authentication the tokens provided to OnDemand users must be seen as valid for Kubernetes in order for that token to be used to authenticate with Kubernetes. First both OnDemand and Kubernetes must be using the same OIDC provider. In order for the OnDemand token to work with Kubernetes, it’s simplest to configure an audience on the OnDemand OIDC client. An alternative approach would be to update the pre-PUN hooks to perform a token exchange. Another approach would be to use the same OIDC client configuration for OnDemand and Kubernetes.
# /etc/ood/config/clusters.d/my_k8s_cluster.yml
---
v2:
job:
# ...
auth:
type: 'odic'
This uses the OIDC credentails that you’ve logged in with. When
the dashboard starts up it will set-context
and set-cluster
to what you’ve configured.
We will pass --context
to kubectl commands. This defaults to
the cluster but can be something different if you configure it so.
GKE Authentication
# /etc/ood/config/clusters.d/my_k8s_cluster.yml
---
v2:
job:
# ...
auth:
type: 'gke'
svc_acct_file: '~/.gke/my-service-account-file'
It’s expected that you have a service account that can then manipulate the cluster you’re interacting with. Every user should have a cooresponding service account to interact with GKE.
When the dasbhoard starts up, we use gcloud
to configure your KUBECONFIG.
Google Cloud’s Goolge Kubernetes Engine (GKE) needs some more documentation on what privileges this serivce account is setup with and how one may bootstrap it.
OIDC Audience
The simplest way to have the OnDemand OIDC tokens be valid for Kubernetes is to update the OnDemand client configuration to include the audience of the Kubernetes client.
Keycloak
In the Keycloak web UI, logged in as the admin user:
Navigate to
Clients
then choose the OnDemand client.Choose the
Mappers
tab and clickCreate
Fill in a
Name
and selectAudience
forMapper Type
For
Included Client Audience
choose the Kubernetes client entryTurn on both
Add to ID token
andAdd to access token
OIDC Token Exchange
Keycloak
Refer to the Keycloak token exchange documentation
Open OnDemand apps in a Kuberenetes cluster
Kuberenetes is so different from other HPC clusters that the interface we have for
other schedulers didn’t quite fit. So Open OnDemand apps developed for kubernetes
clusters look quite different from other schedulers. Essentially most things we’ll
need are packed into the native
key of the submit.yml.erb
files.
See the tutorial for a kubernetes app that behaves like HPC compute as well as the tutorial for a kubernetes app for more details.
Kyverno Policies
Once Kubernetes is available to OnDemand, it’s possible for users to use kubectl
to submit arbirary pods to
Kubernetes. To ensure proper security with Kubernetes a policy engine such as Kyverno can be used to ensure certain
security standards.
For OnDemand, many of the Kyverno baseline and restricted sescurity policies will work. There are also policies that can be deployed to ensure the UID/GID of user pods match that user’s UID/GID on the HPC clusters. Some example policies do things such as enforce UID/GID and other security standards for OnDemand. These policies rely heavily on the fact that OnDemand usage in Kubernetes using a namespace prefix.
The policies enforcing UID/GID and supplemental groups are using data supplied by the k8-ldap-configmap tool that generates ConfigMap resources based on LDAP data. This tool runs as a deployment inside the Kubernetes cluster.
Enforcing Walltimes
In order to enforce that OnDemand pods are shut down after so much time, it’s necessary to deploy a service that can cleanup pods that have run past their walltime. Also because OnDemand is bootstrapping a namespace per user it’s useful to cleanup unused namespaces.
The OnDemand pods will have the pod.kubernetes.io/lifetime
annotation set that
is read by job-pod-reaper that will kill pods that have reached their walltime.
The job-pod-reaper service runs as a Deployment inside Kubernetes and will kill
any pods based on the lifetime annotation.
Below is an example of Helm values that can be used to configure job-pod-reaper for OnDemand:
reapNamespaces: false
namespaceLabels: app.kubernetes.io/name=open-ondemand
objectLabels: app.kubernetes.io/managed-by=open-ondemand
You will need to tell OnDemand you are using job-pod-reaper and to bootstrap the necessary RoleBinding so that
job-pod-reaper can delete OnDemand pods. Update /etc/ood/config/hooks.env
to include the following configuration:
USE_JOB_POD_REAPER="true"
In order to cleanup unused namespaces the k8-namespace-reaper tool can be used. This tool will delete a namespace based on several factors:
The creation timestamp of the namespace
openondemand.org/last-hook-execution
annotation set by the OnDemand pre-PUN hookThe last pod to run in that namespace based on Prometheus metrics
Below is an example of Helm values to deploy this tool for OnDemand where the OnDemand namespaces have user-
prefix:
config:
namespaceLabels: app.kubernetes.io/name=open-ondemand
namespaceRegexp: user-.+
namespaceLastUsedAnnotation: openondemand.org/last-hook-execution
prometheusAddress: http://prometheus.prometheus:9090
reapAfter: 8h
lastUsedThreshold: 4h
interval: 2h
Using a private image registry
OnDemand’s Kubernetes integration can be setup to pull images from a private registry like Harbor.
In order to pull images from a private registry that requires authentication, OnDemand can be configured to setup Image Pull Secrets. The OnDemand web node will need a JSON file setup that includes the username and password of a registry user authorized to pull images used by OnDemand apps.
Warning
Once the OnDemand user’s namespace is given the registry auth secret, it will be readable by the user. It’s recommended to use a read-only auth token that has limited to access just images used by OnDemand.
In the following example you can set the following values:
$REGISTRY
the registry address.$REGISTRY_USER
the username of the user authorized to pull the images$REGISTRY_PASSWORD
the password of the user authorized to pull the images
AUTH=$(echo -n "${REGISTRY_USER}:${REGISTRY_PASSWORD}" | base64)
cat > /etc/ood/config/image-registry.json <<EOF
{
"auths": {
"${REGISTRY}": {
"auth": "${AUTH}"
}
}
}
EOF
chmod 0600 /etc/ood/config/image-registry.json
Once the registry JSON is created you must configure /etc/ood/config/hooks.env
so OnDemand knows how to bootstrap
a user’s namespaces with the ability to pull from this registry:
IMAGE_PULL_SECRET="private-docker-registry"
REGISTRY_DOCKER_CONFIG_JSON="/etc/ood/config/image-registry.json"