Grid Engine
Note
This resource manager is not actively maintained by the Open OnDemand developers, either because it was community-contributed or we no longer have access to a test system. As a result, bug fixes from the core team may be limited, but community patches are encouraged.
A YAML cluster configuration file for a Grid Engine (Sun Grid Engine, Son of Grid Engine, Univa Grid Engine) resource manager on an HPC cluster looks like:
# /etc/ood/config/clusters.d/my_cluster.yml
---
v2:
metadata:
title: "My Cluster"
login:
host: "my_cluster.my_center.edu"
job:
adapter: "sge"
cluster: "my_cluster"
bin: "/usr/lib/gridengine"
conf: "/etc/default/gridengine"
sge_root: "/var/lib/gridengine"
libdrmaa_path: "/var/lib/gridengine/drmaa/libdrmaa.so"
# bin_overrides:
# qsub: "/usr/local/bin/qsub_wrapper"
# qstat: ""
# qhold: ""
# qrls: ""
# qdel: ""
with the following configuration options:
adapterThis is set to
sge.clusterThe Grid Engine cluster name. Optional
binThe path to the Grid Engine client installation binaries.
sge_rootThe path to the root directory of the Grid Engine installation. Default:
/var/lib/gridengineconfThe path to the Grid Engine configuration file for this cluster. Optional
libdrmaa_pathThe full path to
libdrmaa.so. Provide this to enable use oflibdrmaafor more precise job status reporting. Optionalsubmit_hostA different, optional host to ssh to and then issue commands. Optional
bin_overridesReplacements/wrappers for Grid Engine's job submission and control clients. Optional
Supports the following clients:
qsubqstatqholdqrlsqdel
Tip
DRMAA improves OnDemand's ability to report on the precise status of jobs. To use this feature ensure that libdrmaa.so for Grid Engine is installed or built and set the configuration value for libdrmaa_path and sge_root. If DRMAA is not installed then OnDemand is unable to get a precise job status for single jobs and will only return either queued or complete.
Common Issues
Shell environments
You may run into an error similar to this where the script running is using a BASH process substitution and there's a syntax error.
/export/uge/some/file: line 156: syntax error near unexpected token <' /export/uge/some/file: line 156: done < <(tail -f --pid=${SCRIPT_PID} “vnc.log”) &’
What you'll need to do is add a script_wrapper element to your clusters' configuration like below.
This sets the sh shell to behave like bash and ensures you've sourced your users' .bashrc file.
# /etc/ood/config/clusters.d/my_cluster.yml # (other elements removed for brevity) --- v2: batch_connect: basic: script_wrapper: | set +o posix . ~/.bashrc %s vnc: script_wrapper: | set +o posix . ~/.bashrc %s
Invalid Job name
If you encounter an issue in running batch connect applications complaining about invalid job names like the error below.
Unable to read script file because of error: ERROR! argument to -N option must not contain /
You'll need to configure illegal job name characters as described here.