Cloudy Cluster
A YAML cluster configuration file for a Cloudy Cluster resource manager on an HPC cluster looks like:
# /etc/ood/config/clusters.d/my_cluster.yml
---
v2:
metadata:
title: "My Cluster"
login:
host: "my_cluster.my_center.edu"
job:
adapter: "ccq"
image: "my-default-image"
cloud: "gcp"
scheduler: "my_scheduler"
bin: "/path/to/other/CCQ"
jobid_regex: "different job_id regex: (?<job_id>\\d+) "
# bin_overrides:
# ccqstat: "/usr/local/bin/ccqstat"
# ccqdel: ""
# ccqsub: ""
with the following configuration options:
adapter
This is set to
ccq
.image
The default cloud image to use when launching jobs. There is no default.
cloud
The cloud provider being used. Valid options are
gcp
oraws
. Defaults togcp
.scheduler
The name of the scheduler being used. There is no default.
bin
The path to the CCQ client installation binaries. Defaults to
/opt/CloudyCluster/srv/CCQ
.jobid_regex
The regular expression to extract the job id from the
ccqstat
output. Defaults tojob id is: (?<job_id>\\d+) you
. You should only need this if theccqstat
output changes format. If you are required to reconfigure, you'll need to extract the named groupjob_id
as the default does.bin_overrides
Replacements/wrappers for CCQ's job submission and control clients. Optional
Supports the following clients:
ccqstat
ccqdel
ccqsub
Common Issues
Prompted for input
You may see this error when you initially try to start a job.
The /opt/CloudyCluster/srv/CCQ/ccqsub command was prompted. You need
to generate the certificate manually in a shell by running ``ccqstat``
and entering your username/password
This is because CCQ libraries require a certificate to be generated to communicate with the
backend servers. To remediate you'll simply have to login through a shell terminal and generate
a certificate. Do this by running the ccqstat
command and entering your username and password
when prompted. If you're successful, the command will generate a ccqCert.cert
in your home
directory that subsequent invocations will use.
Note these certificates expire, so you may have to generate them every so often or specify a very distant expiry date when you do generate them.