Add a Jupyter App on a Kubernetes Cluster that behaves like HPC compute

This tutorial will walk you through creating an interactive Jupyter app that your users will use to launch a Jupyter Notebook Server in a Kubernetes cluster. The container will behave much like a HPC compute node. This has the benefit that a single app can serve both traditional HPC as well as Kubernetes.

It assumes you have a working understanding of app development already. The purpose of this document is to describe how to write apps specifically for a Kubernetes cluster, so it skips a lot of important details about app development that may be found in other tutorials like Add a Jupyter App.

We're going to be looking at the bc osc jupyter app which is OSC's production Jupyter app. You can fork, clone and modify for your site. This page also holds the submit.yml in full for reference.

Refer to the interactive K8s Jupyter app for additional details on items defined in submit.yml.erb as well as a more traditional container approach.

The container

The container to make Kubernetes pods look like HPC compute would end up needing all the OS packages present on the HPC environment. The OS of the container itself would also need to compatible with packages installed on HPC environment so if you run RHEL 8 on HPC you would also need to run RHEL 8 inside the container.

Things like Lmod and HPC applications will need to be run inside the Pod's container just like if a job was spawned in a traditional HPC resource manager.

Switch between SLURM and Kubernetes

The first big change from a traditional HPC interactive app is the main YAML structure is wrapped in a large if statement based on the cluster choice. If a user chooses one of the HPC clusters, the SLURM submit YAML is rendered, otherwise the Kubernetes YAML is rendered. In the following examples the SLURM clusters are named owens and pitzer and the Kubernetes cluster is named kubernetes.

Here is the beginning of the block:

<% if cluster =~ /owens|pitzer/ -%>
---
batch_connect:
  template: "basic"
  conn_params:
    - jupyter_api
script:
...SLURM specific...

Here is the logic to select Kubernetes:

<% elsif cluster =~ /kubernetes/
...Ruby variables setup here...

container spec

Let's look at this section first of the Kubernetes block. Here you must specify the name, image and command. The name determines the Pod Id (the job name in HPC parlance).

The image should be the HPC container image and command will be the job script that has been adapted to work with both SLURM and Kubernetes. The command will be run from the user's home directory and will cover mount requirements in mount requirements.

Warning

These examples use images from the Ohio SuperComputer Center's private registry. They will not work at your site as this registry requires authentication.

One important aspect of the command is that the job script executed is built using the standard before.sh, script.sh and after.sh that one would normally use to build the job script for interactive apps running on HPC resources. The way this pod is being setup, the same job script that runs on SLURM would also be used to launch the container in Kubernetes.

Next you can specify additional environment variables in env.

native:
  container:
    name: "jupyter"
    image: "docker-registry.osc.edu/ondemand/ondemand-base-rhel7:0.3.1"
    image_pull_policy: "IfNotPresent"
    command: ["/bin/bash","-l","<%= staged_root %>/job_script_content.sh"]
    restart_policy: 'OnFailure'

mounts

For a pod to look like an HPC environment the home directory of the user and any shared filesystems would need to be mounted on Kubernetes worker nodes and then made available to the pod.

In the example a Ruby structure is created to streamline some of the direct mounts where the path outside the container is the same as the path inside the container:

mounts = {
  'home'    => OodSupport::User.new.home,
  'support' => OodSupport::User.new('support').home,
  'project' => '/fs/project',
  'scratch' => '/fs/scratch',
  'ess'     => '/fs/ess',
}

These mounts are defined in the YAML using a loop:

mounts:
<%- mounts.each_pair do |name, mount| -%>
  - type: host
    name: <%= name %>
    host_type: Directory
    path: <%= mount %>
    destination_path: <%= mount %>
<%- end -%>

Additional mounts are needed to make the pod behave like a HPC compute node. Following are mounted into the container:

MUNGE socket so SLURM commands inside the pod can work
SLURM configuration so SLURM commands inside the pod know about scheduler host
SSSD pipes and configuration as well as nsswitch.conf so ID lookups inside the pod will work
Lmod initialization script
Lmod HPC applications

- type: host
  name: munge-socket
  host_type: Socket
  path: /var/run/munge/munge.socket.2
  destination_path: /var/run/munge/munge.socket.2
- type: host
  name: slurm-conf
  host_type: Directory
  path: /etc/slurm
  destination_path: /etc/slurm
- type: host
  name: sssd-pipes
  host_type: Directory
  path: /var/lib/sss/pipes
  destination_path: /var/lib/sss/pipes
- type: host
  name: sssd-conf
  host_type: Directory
  path: /etc/sssd
  destination_path: /etc/sssd
- type: host
  name: nsswitch
  host_type: File
  path: /etc/nsswitch.conf
  destination_path: /etc/nsswitch.conf
- type: host
  name: lmod-init
  host_type: File
  path: /apps/<%= compute_cluster %>/lmod/lmod.sh
  destination_path: /etc/profile.d/lmod.sh
- type: host
  name: intel
  host_type: Directory
  path: /nfsroot/<%= compute_cluster %>/opt/intel
  destination_path: /opt/intel
- type: host
  name: apps
  host_type: Directory
  path: /apps/<%= compute_cluster %>
  destination_path: <%= apps_path %>

`submit.yml` in full

# submit.yml.erb
<%-
  cores = num_cores.to_i

  if cores == 0 && cluster == "pitzer"
    # little optimization for pitzer nodes. They want the whole node, if they chose 'any',
    # it can be scheduled on p18 or p20 nodes. If not, they'll get the constraint below.
    base_slurm_args = ["--nodes", "1", "--exclusive"]
  elsif cores == 0
    # full node on owens
    cores = 28
    base_slurm_args = ["--nodes", "1", "--ntasks-per-node", "28"]
  else
    base_slurm_args = ["--nodes", "1", "--ntasks-per-node", "#{cores}"]
  end

  slurm_args = case node_type
              when "gpu-40core"
                base_slurm_args + ["--constraint", "40core"]
              when "gpu-48core"
                base_slurm_args + ["--constraint", "48core"]
              when "any-40core"
                base_slurm_args + ["--constraint", "40core"]
              when "any-48core"
                base_slurm_args + ["--constraint", "48core"]
              when "hugemem"
                base_slurm_args + ["--partition", "hugemem", "--exclusive"]
              when "largemem"
                base_slurm_args + ["--partition", "largemem", "--exclusive"]
              when "debug"
                base_slurm_args += ["--partition", "debug", "--exclusive"]
              else
                base_slurm_args
              end

-%>
<% if cluster =~ /owens|pitzer/ -%>
---
batch_connect:
  template: "basic"
  conn_params:
    - jupyter_api
script:
  accounting_id: "<%= account %>"
<% if node_type =~ /gpu/ -%>
  gpus_per_node: 1
<% end -%>
  native:
    <%- slurm_args.each do |arg| %>
    - "<%= arg %>"
    <%- end %>
<% elsif cluster =~ /kubernetes/
   if node_type =~ /owens/
     compute_cluster = "owens"
     apps_path = "/usr/local"
     # Memory per core with hyperthreading enabled
     memory_mb = num_cores.to_i * 2200
   elsif node_type =~ /pitzer/
     compute_cluster = "pitzer"
     apps_path = "/apps"
     # Memory per core with hyperthreading enabled
     memory_mb = num_cores.to_i * 4000
   end
   mounts = {
     'home'    => OodSupport::User.new.home,
     'support' => OodSupport::User.new('support').home,
     'project' => '/fs/project',
     'scratch' => '/fs/scratch',
     'ess'     => '/fs/ess',
   }
-%>
---
script:
  accounting_id: "<%= account %>"
  wall_time: "<%= bc_num_hours.to_i * 3600 %>"
  <%- if node_type =~ /gpu/ -%>
  gpus_per_node: 1
  <%- end -%>
  native:
    container:
      name: "jupyter"
      image: "docker-registry.osc.edu/ondemand/ondemand-base-rhel7:0.3.1"
      image_pull_policy: "IfNotPresent"
      command: ["/bin/bash","-l","<%= staged_root %>/job_script_content.sh"]
      restart_policy: 'OnFailure'
      env:
        NB_UID: "<%= Etc.getpwnam(ENV['USER']).uid %>"
        NB_USER: "<%= ENV['USER'] %>"
        NB_GID: "<%= Etc.getpwnam(ENV['USER']).gid %>"
        CLUSTER: "<%= compute_cluster %>"
        KUBECONFIG: "/dev/null"
      labels:
        osc.edu/cluster: "<%= compute_cluster %>"
      port: "8080"
      cpu: "<%= num_cores %>"
      memory: "<%= memory_mb %>Mi"
    mounts:
    <%- mounts.each_pair do |name, mount| -%>
      - type: host
        name: <%= name %>
        host_type: Directory
        path: <%= mount %>
        destination_path: <%= mount %>
    <%- end -%>
      - type: host
        name: munge-socket
        host_type: Socket
        path: /var/run/munge/munge.socket.2
        destination_path: /var/run/munge/munge.socket.2
      - type: host
        name: slurm-conf
        host_type: Directory
        path: /etc/slurm
        destination_path: /etc/slurm
      - type: host
        name: sssd-pipes
        host_type: Directory
        path: /var/lib/sss/pipes
        destination_path: /var/lib/sss/pipes
      - type: host
        name: sssd-conf
        host_type: Directory
        path: /etc/sssd
        destination_path: /etc/sssd
      - type: host
        name: nsswitch
        host_type: File
        path: /etc/nsswitch.conf
        destination_path: /etc/nsswitch.conf
      - type: host
        name: lmod-init
        host_type: File
        path: /apps/<%= compute_cluster %>/lmod/lmod.sh
        destination_path: /etc/profile.d/lmod.sh
      - type: host
        name: apps
        host_type: Directory
        path: /apps/<%= compute_cluster %>
        destination_path: <%= apps_path %>
    node_selector:
      osc.edu/role: ondemand
<% end -%>