Cloudy Cluster¶
A YAML cluster configuration file for a Cloudy Cluster resource manager on an HPC cluster looks like:
# /etc/ood/config/clusters.d/my_cluster.yml
---
v2:
metadata:
title: "My Cluster"
login:
host: "my_cluster.my_center.edu"
job:
adapter: "ccq"
image: "my-default-image"
cloud: "gcp"
scheduler: "my_scheduler"
bin: "/path/to/other/CCQ"
jobid_regex: "different job_id regex: (?<job_id>\\d+) "
# bin_overrides:
# ccqstat: "/usr/local/bin/ccqstat"
# ccqdel: ""
# ccqsub: ""
with the following configuration options:
- adapter
This is set to
ccq
.- image
The default cloud image to use when launching jobs. There is no default.
- cloud
The cloud provider being used. Valid options are
gcp
oraws
. Defaults togcp
.- scheduler
The name of the scheduler being used. There is no default.
- bin
The path to the CCQ client installation binaries. Defaults to
/opt/CloudyCluster/srv/CCQ
.- jobid_regex
The regular expression to extract the job id from the ccqstat output. Defaults to
job id is: (?<job_id>\\d+) you
. You should only need this if the ccqstat output changes format. If you are required to reconfigure, you’ll need to extract the named groupjob_id
as the default does.- bin_overrides
Replacements/wrappers for CCQ’s job submission and control clients. Optional
Supports the following clients:
ccqstat
ccqdel
ccqsub
Common Issues¶
Prompted for input¶
You may see this error when you initially try to start a job.
The /opt/CloudyCluster/srv/CCQ/ccqsub command was prompted. You need
to generate the certificate manually in a shell by running 'ccqstat'
and entering your username/password
This is because CCQ libraries require a certificate to be generated to communicate with the
backend servers. To remediate you’ll simply have to login through a shell terminal and generate
a certificate. Do this by running the ccqstat
command and entering your username and password
when prompted. If you’re successful, the command will generate a ccqCert.cert
in your home
directory that subsequent invocations will use.
Note these certificates expire, so you may have to generate them every so often or specify a very distant expiry date when you do generate them.