Kubernetes
A YAML cluster configuration file for a Kubernetes resource manager on an HPC cluster looks like:
# /etc/ood/config/clusters.d/my_k8s_cluster.yml
---
v2:
metadata:
title: "My K8s Cluster"
# you may not want a login section. There may not be a login node
# for your Kubernetes cluster
login:
host: "my_k8s_cluster.my_center.edu"
job:
adapter: "kubernetes"
config_file: "~/.kube/config"
cluster: "ood-prod"
context: "ood-prod"
bin: "/usr/bin/kubectl"
username_prefix: "prod-"
namespace_prefix: "user-"
all_namespaces: false
auto_supplemental_groups: false
server:
endpoint: "https://my_k8s_cluster.my_center.edu"
cert_authority_file: "/etc/pki/tls/certs/kubernetes-ca.crt"
auth:
type: "oidc"
mounts: []
batch_connect:
ssh_allow: false
adapterThis is set to
kubernetes.config_fileThe KUBECONFIG file. Optional. Defaults to
~/.kube/config. Sites can also set the KUBECONFIG environment variable, but this configuration has precedence.clusterThe cluster name. Saved to and referenced from your KUBECONFIG.
contextThe context to use when issuing kubectl commands. Optional. Defaults to cluster when using OIDC authentication. Saved to and referenced from your KUBECONFIG.
username_prefixThe prefix to your users in your KUBECONFIG. Use this prefix to differentiate between different clusters (like test and production).
namespace_prefixThe prefix to your namespace. Use this prefix if you have assertions on what namespaces are available. I.e., a Kyverno policy that ensures all namespaces are
user-\w+.all_namespacesA boolean to determine if the user will query for pods in other namespaces. When false users will only query in their namespace. If true they will query and display pods from all namespaces.
auto_supplemental_groupsAutomatically populate a container's
securityContext.supplementalGroupswith the users supplemental groups.serverThe Kubernetes server to communicate with. This field is a map with
endpointandcert_authority_filekeys.authSee the notes on Authentication below.
mountsSite wide mount points for all Kubernetes jobs. See the documentation on Kubernetes mounts for more details.
Note
The batch_connect.ssh_allow is important to disable OnDemand from rendering links to SSH into your
Kubernetes worker nodes when Batch Connect apps are running.
Per User Kubernetes
To get Kubernetes to act like a Per User resource there are some conventions we put in place. Users only schedule pods in their own namespaces and they always run those pods as themselves.
At most users could be allow to read pods from other namespaces (have sufficient
privileges to run kubectl get pods --all-namespaces), but this is not required.
Being able to view pods in other namespaces is only applicable to a feature like
viewing active jobs and seeing pods from other namespaces in that view.
Second is that we specify the Kubernetes security context such that pods have the same UID and GID as the actual user.
Open OnDemand will always use the users UID and GID as the runAsUser and runAsGroup.
fsGroup is always the same as runAsGroup. runAsNonRoot is always set to true.
supplementalGroups are empty by default. One can automatically populate them with a
cluster configuration above or specify them for each app individually.
You should have policies in place to enforce these.
Bootstrapping the Kubernetes cluster
Before anyone can use your Kubernetes cluster from Open OnDemand, you'll need to create the open ondemand Kubernetes resources on your cluster.
Below is an example of adding the necessary resources:
kubectl apply -f https://raw.githubusercontent.com/OSC/ondemand/master/hooks/k8s-bootstrap/ondemand.yaml
Bootstrapping OnDemand web node to communicate with Kubernetes
The OnDemand web node root user must be configured
to use the ondemand service account deployed by the open ondemand Kubernetes resources and
be able to execute kubectl commands.
First deploy kubectl to the OnDemand web node.
Replace $VERSION with the version of the Kubernetes controller, e.g., 1.21.5.
wget -O /usr/local/bin/kubectl https://storage.googleapis.com/kubernetes-release/release/v$VERSION/bin/linux/amd64/kubectl
chmod +x /usr/local/bin/kubectl
Tokens for Bootstrapping
The root user on the OnDemand web node needs a Kubernetes token to bootstrap users.
Specifically to create user namespaces and give the users sufficient privileges in their
namespace.
Service account tokens are not generated automatically since Kubernetes 1.24. You have two
options here: You can either create a non-expiring token for the service account and save it
as a secret or you can create a crontab entry to refresh the root users token. Both are
described here.
Tip
Kubernetes recommends that you use rotating tokens, so we recommend the same.
To do use rotating tokens, you can use the kubectl create token API to create a token
and save it in a crontab entry. Here's an example of what you could use to create new tokens
for the root user. The tokens last 9 hours, so you can set a crontab entry for every 8 hours
to refresh your tokens before they expire.
#!/bin/bash
set -e
if command -v kubectl >/dev/null 2>&1;
then
CMD_USER=$(whoami)
if [ "$CMD_USER" == "root" ]; then
TOKEN=$(kubectl create token ondemand --namespace=ondemand --duration 9h)
kubectl config set-credentials ondemand@kubernetes --token="$TOKEN"
else
>&2 echo "this program needs to run as 'root' and you are $CMD_USER."
exit 1
fi
fi
If you wish to create a non-expiring token, you will need to create the secret through a
kubectl apply command on the YAML below.
Next extract the ondemand ServiceAccount token. Here is an example command to extract
the token using an account that has ClusterAdmin privileges:
# token.yml
apiVersion: v1
kind: Secret
type: kubernetes.io/service-account-token
metadata:
name: token
namespace: ondemand
annotations:
kubernetes.io/service-account.name: ondemand
kubectl apply -f token.yml
TOKEN=$(kubectl describe serviceaccount ondemand -n ondemand | grep Tokens | awk '{ print $2 }')
kubectl describe secret $TOKEN -n ondemand | grep "token:"
Below are example commands to bootstrap the Kubernetes configuration for root user on the OnDemand web node
using the token from above. Run these commands as root on the OnDemand web node.
kubectl config set-cluster kubernetes --server=https://$CONTROLLER:6443 --certificate-authority=$CACERT
kubectl config set-credentials ondemand@kubernetes --token=$TOKEN
kubectl config set-context ondemand@kubernetes --cluster=kubernetes --user=ondemand@kubernetes
kubectl config use-context ondemand@kubernetes
Replace the following values:
$CONTROLLERwith the Kubernetes Controller FQDN or IP address$CACERTthe path to Kubernetes cluster CA cert$TOKENthe token forondemandServiceAccount
Below is an example of verifying the Kubernetes configuration is valid:
kubectl cluster-info
Deploy Hooks to bootstrap users Kubernetes configuration
We ship with open ondemand provided hooks to bootstrap users when the login to Open OnDemand. These scripts will create their namespace, a networking policy, and role-bindings for user and the service accounts in their namespace.
A user oakley would create the oakley namespace. If you've configured
to use prefix user-, then the namespace would be user-oakley.
The networking policy ensures that pods cannot communicate inbetween namespaces.
The RoleBindings give user, oakley in this case, sufficient privileges
to the oakley namespace. Refer to the open ondemand Kubernetes resources
for details on the roles and privileges created.
You'll need to employ PUN pre hooks to bootstrap your users to this cluster.
You'll also have to modify /etc/ood/config/hooks.env because open ondemand provided hooks
require a HOOKENV environment variable.
Here's what you'll have to edit in the hook.env.example file we ship.
# /etc/ood/config/hook.env
# required if you changed the items in the cluster.d file
K8S_USERNAME_PREFIX=""
NAMESPACE_PREFIX=""
# required
NETWORK_POLICY_ALLOW_CIDR="127.0.0.1/32"
# required if you're using OIDC
IDP_ISSUER_URL="https://idp.example.com/auth/realms/main/protocol/openid-connect/token"
CLIENT_ID="changeme"
CLIENT_SECRET="changeme"
# required if you're using a secret registry
IMAGE_PULL_SECRET=""
REGISTRY_DOCKER_CONFIG_JSON="/some/path/to/docker/config.json"
# enable if are enforcing walltimes through the job pod reaper
# see 'Enforcing walltimes' below.
USE_JOB_POD_REAPER=false
You can refer to osc's prehook but we'll also provide this example.
As you can see in this pre hook, the username is passed in to the script
which then defines the HOOKENV and calls two open ondemand provided hooks.
k8s-bootstrap-ondemand.sh boostraps the user in the Kubernetes cluster as described
above.
Since we use OIDC at OSC we use set-k8s-creds.sh to add or update the user in their
~/.kube/config with the relevant OIDC credentials.
#!/bin/bash
for arg in "$@"
do
case $arg in
--user)
ONDEMAND_USERNAME=$2
shift
shift
;;
esac
done
if [ "x${ONDEMAND_USERNAME}" = "x" ]; then
echo "Must specify username"
exit 1
fi
HOOKSDIR="/opt/ood/hooks"
HOOKENV="/etc/ood/config/hook.env"
/bin/bash "$HOOKSDIR/k8s-bootstrap/k8s-bootstrap-ondemand.sh" "$ONDEMAND_USERNAME" "$HOOKENV"
/bin/bash "$HOOKSDIR/k8s-bootstrap/set-k8s-creds.sh" "$ONDEMAND_USERNAME" "$HOOKENV"
Authentication
Here are the current configurations you can list for different types of authentication.
Managed Authentication
# /etc/ood/config/clusters.d/my_k8s_cluster.yml
---
v2:
job:
# ...
auth:
type: 'managed'
This is the simplest case and is the default. The authentication
is managed outside of Open OnDemand. We will not set-context
or set-cluster.
We will pass --context to kubectl commands if you have it configured
in the cluster configuration (above). Otherwise, it's assumed that the current context
is set out of bounds.
OIDC Authentication
For OIDC authentication the tokens provided to OnDemand users must be seen as valid for Kubernetes in order for that token to be used to authenticate with Kubernetes. First both OnDemand and Kubernetes must be using the same OIDC provider. In order for the OnDemand token to work with Kubernetes, it's simplest to configure an audience on the OnDemand OIDC client. An alternative approach would be to update the pre-PUN hooks to perform a token exchange. Another approach would be to use the same OIDC client configuration for OnDemand and Kubernetes.
# /etc/ood/config/clusters.d/my_k8s_cluster.yml
---
v2:
job:
# ...
auth:
type: 'oidc'
This uses the OIDC credentials that you've logged in with. When
the dashboard starts up it will set-context and set-cluster
to what you've configured.
We will pass --context to kubectl commands. This defaults to
the cluster but can be something different if you configure it so.
GKE Authentication
# /etc/ood/config/clusters.d/my_k8s_cluster.yml
---
v2:
job:
# ...
auth:
type: 'gke'
svc_acct_file: '~/.gke/my-service-account-file'
It's expected that you have a service account that can then manipulate the cluster you're interacting with. Every user should have a corresponding service account to interact with GKE.
When the dashboard starts up, we use gcloud to configure your KUBECONFIG.
Google Cloud's Google Kubernetes Engine (GKE) needs some more documentation on what privileges this service account is setup with and how one may bootstrap it.
OIDC Audience
The simplest way to have the OnDemand OIDC tokens be valid for Kubernetes is to update the OnDemand client configuration to include the audience of the Kubernetes client.
Keycloak
In the Keycloak web UI, logged in as the admin user:
Navigate to
Clientsthen choose the OnDemand client.Choose the
Mapperstab and clickCreate
Fill in a
Nameand selectAudienceforMapper TypeFor
Included Client Audiencechoose the Kubernetes client entryTurn on both
Add to ID tokenandAdd to access token
OIDC Token Exchange
Keycloak
Refer to the Keycloak token exchange documentation
Open OnDemand apps in a Kubernetes cluster
Kubernetes is so different from other HPC clusters that the interface we have for
other schedulers didn't quite fit. So Open OnDemand apps developed for Kubernetes
clusters look quite different from other schedulers. Essentially most things we'll
need are packed into the native key of the submit.yml.erb files.
See the tutorial for a Kubernetes app that behaves like HPC compute as well as the tutorial for a Kubernetes app for more details.
Kyverno Policies
Once Kubernetes is available to OnDemand, it's possible for users to use kubectl to submit arbitrary pods to
Kubernetes. To ensure proper security with Kubernetes a policy engine such as Kyverno can be used to ensure certain
security standards.
For OnDemand, many of the Kyverno baseline and restricted sescurity policies will work. There are also policies that can be deployed to ensure the UID/GID of user pods match that user's UID/GID on the HPC clusters. Some example policies do things such as enforce UID/GID and other security standards for OnDemand. These policies rely heavily on the fact that OnDemand usage in Kubernetes using a namespace prefix.
The policies enforcing UID/GID and supplemental groups are using data supplied by the k8-ldap-configmap tool that generates ConfigMap resources based on LDAP data. This tool runs as a deployment inside the Kubernetes cluster.
Enforcing Walltimes
In order to enforce that OnDemand pods are shut down after so much time, it's necessary to deploy a service that can cleanup pods that have run past their walltime. Also because OnDemand is bootstrapping a namespace per user it's useful to cleanup unused namespaces.
The OnDemand pods will have the pod.kubernetes.io/lifetime annotation set that
is read by job-pod-reaper that will kill pods that have reached their walltime.
The job-pod-reaper service runs as a Deployment inside Kubernetes and will kill
any pods based on the lifetime annotation.
Below is an example of Helm values that can be used to configure job-pod-reaper for OnDemand:
reapNamespaces: false
namespaceLabels: app.kubernetes.io/name=open-ondemand
objectLabels: app.kubernetes.io/managed-by=open-ondemand
You will need to tell OnDemand you are using job-pod-reaper and to bootstrap the necessary RoleBinding so that
job-pod-reaper can delete OnDemand pods. Update /etc/ood/config/hooks.env to include the following configuration:
USE_JOB_POD_REAPER="true"
In order to cleanup unused namespaces the k8-namespace-reaper tool can be used. This tool will delete a namespace based on several factors:
The creation timestamp of the namespace
openondemand.org/last-hook-executionannotation set by the OnDemand pre-PUN hookThe last pod to run in that namespace based on Prometheus metrics
Below is an example of Helm values to deploy this tool for OnDemand where the OnDemand namespaces have user- prefix:
config:
namespaceLabels: app.kubernetes.io/name=open-ondemand
namespaceRegexp: user-.+
namespaceLastUsedAnnotation: openondemand.org/last-hook-execution
prometheusAddress: http://prometheus.prometheus:9090
reapAfter: 8h
lastUsedThreshold: 4h
interval: 2h
Using a private image registry
OnDemand's Kubernetes integration can be setup to pull images from a private registry like Harbor.
In order to pull images from a private registry that requires authentication, OnDemand can be configured to setup Image Pull Secrets. The OnDemand web node will need a JSON file setup that includes the username and password of a registry user authorized to pull images used by OnDemand apps.
Warning
Once the OnDemand user's namespace is given the registry auth secret, it will be readable by the user. It's recommended to use a read-only auth token that has limited to access just images used by OnDemand.
In the following example you can set the following values:
$REGISTRYthe registry address.$REGISTRY_USERthe username of the user authorized to pull the images$REGISTRY_PASSWORDthe password of the user authorized to pull the images
AUTH=$(echo -n "${REGISTRY_USER}:${REGISTRY_PASSWORD}" | base64)
cat > /etc/ood/config/image-registry.json <<EOF
{
"auths": {
"${REGISTRY}": {
"auth": "${AUTH}"
}
}
}
EOF
chmod 0600 /etc/ood/config/image-registry.json
Once the registry JSON is created you must configure /etc/ood/config/hooks.env so OnDemand knows how to bootstrap
a user's namespaces with the ability to pull from this registry:
IMAGE_PULL_SECRET="private-docker-registry"
REGISTRY_DOCKER_CONFIG_JSON="/etc/ood/config/image-registry.json"