Cluster Configuration

Cluster configuration files describe each cluster a user may submit jobs to and login hosts the user can ssh to. These files detail how the system can interact with your scheduler. Without them, many of the features in Open OnDemand won’t work - including interactive apps.

Indeed, one of the main reasons to use Open OnDemand is use your clusters interactively.

Apps that require proper cluster configuration include:

  • shell (connect to a cluster login node from the Dashboard App)

  • active-jobs (view a list of active jobs for the various clusters)

  • job-composer (submit jobs to various clusters)

  • All interactive apps such as Jupyter and RStudio

Tip

We provide a puppet module and an ansible role for automating all of this configuration.

  1. Create the default directory that the cluster configuration files reside under:

    sudo mkdir -p /etc/ood/config/clusters.d
    
  2. Create a cluster YAML configuration file for each HPC cluster you want to provide access to. They must have the .yml or .yaml extensions.

    Note

    It is best to name the file after the HPC cluster it is defining. For example, we added the cluster configuration file /etc/ood/config/clusters.d/oakley.yml for the Oakley cluster here at OSC.

    The simplest cluster configuration file for an HPC cluster with only a login node and no resource manager looks like:

    # /etc/ood/config/clusters.d/my_cluster.yml
    ---
    v2:
      metadata:
        title: "My Cluster"
      login:
        host: "my_cluster.my_center.edu"
    

    Where host is the SSH server host for the given cluster.

    In production you will also want to add a resource manager. That is because the active-jobs and job-composer pages won’t be able to list or submit jobs without a defined resource manager.

The Test Configuration page provides directions on using a Rake task to verify the resource manager configuration.

The A Working Example of a bin_overrides Script provides directions on how to provide a replacement or wrapper script for one or more of the resource manager client binaries.