Grid Engine

A YAML cluster configuration file for a Grid Engine (Sun Grid Engine, Son of Grid Engine, Univa Grid Engine) resource manager on an HPC cluster looks like:

# /etc/ood/config/clusters.d/my_cluster.yml
---
v2:
  metadata:
    title: "My Cluster"
  login:
    host: "my_cluster.my_center.edu"
  job:
    adapter: "sge"
    cluster: "my_cluster"
    bin: "/usr/lib/gridengine"
    conf: "/etc/default/gridengine"
    sge_root: "/var/lib/gridengine"
    libdrmaa_path: "/var/lib/gridengine/drmaa/libdrmaa.so"
    # bin_overrides:
      # qsub: "/usr/local/bin/qsub_wrapper"
      # qstat: ""
      # qhold: ""
      # qrls: ""
      # qdel: ""

with the following configuration options:

adapter

This is set to sge.

cluster

The Grid Engine cluster name. Optional

bin

The path to the Grid Engine client installation binaries.

sge_root

The path to the root directory of the Grid Engine installation. Default: /var/lib/gridengine

conf

The path to the Grid Engine configuration file for this cluster. Optional

libdrmaa_path

The full path to libdrmaa.so. Provide this to enable use of libdrmaa for more precise job status reporting. Optional

submit_host

A different, optional host to ssh to and then issue commands. Optional

bin_overrides

Replacements/wrappers for Grid Engine’s job submission and control clients. Optional

Supports the following clients:

  • qsub

  • qstat

  • qhold

  • qrls

  • qdel

Tip

DRMAA improves OnDemand’s ability to report on the precise status of jobs. To use this feature ensure that libdrmaa.so for Grid Engine is installed or built and set the config value for libdrmaa_path and sge_root. If DRMAA is not installed then OnDemand is unable to get a precise job status for single jobs and will only return either queued or complete.

Common Issues

Shell environments

You may run into an error similar to this where the script running is using a BASH process substitution and there’s a syntax error.

/export/uge/some/file: line 156: syntax error near unexpected token <' /export/uge/some/file:
line 156: done < <(tail -f --pid=${SCRIPT_PID} “vnc.log”) &’

What you’ll need to do is add a script_wrapper element to your clusters’ configuration like below. This sets the sh shell to behave like bash and ensures you’ve sourced your users’ bashrc file.

# /etc/ood/config/clusters.d/my_cluster.yml
# (other elements removed for brevity)
---
v2:
  batch_connect:
    basic:
      script_wrapper: |
        set +o posix
        . ~/.bashrc
        %s
    vnc:
      script_wrapper: |
        set +o posix
        . ~/.bashrc
        %s

Invalid Job name

If you encounter an issue in running batch connect applications complaining about invalid job names like the error below.

Unable to read script file because of error: ERROR! argument to -N option must not contain /

You’ll need to configure illegal job name characters as described here.