4. Modify Submit Parameters
In some cases the Jupyter app batch job fails to submit to the given cluster or you are not happy with the default chosen submission parameters. This section explains how to modify the submission parameters.
The main responsibility of the submit.yml.erb
file
(Job Submission (submit.yml.erb)) located in the root of the app is
for modifying the underlying batch script that is generated from an internal
template and its submission parameters.
Note
The .erb
file extension will cause the YAML configuration file to be
processed using the eRuby (Embedded Ruby) templating system. This allows
you to embed Ruby code into the YAML configuration file for flow control,
variable substitution, and more.
The simplest submit.yml.erb
will look like:
# ~/ondemand/dev/jupyter/submit.yml.erb
---
batch_connect:
template: "basic"
Which only describes which internal template to use when generating the batch script. From here we can add more options to the batch script, such as:
# ~/ondemand/dev/jupyter/submit.yml.erb
---
batch_connect:
template: "basic"
set_host: "host=$(hostname -A | awk '{print $2}')"
where we override the bash script used to determine the host name of the compute node from within a running batch job.
You can learn more about possible batch_connect
options here:
http://www.rubydoc.info/gems/ood_core/OodCore%2FBatchConnect%2FTemplate:initialize
Warning
It is recommended any global batch_connect
attributes be defined in the
corresponding cluster configuration file under:
/etc/ood/config/clusters.d/my_cluster.yml
There is further discussion on this under Modify Cluster Configuration.
But in most cases you will want to change the actual job submission parameters
(e.g., node type). These are defined under the script
option as:
# ~/ondemand/dev/jupyter/submit.yml.erb
---
batch_connect:
template: "basic"
script:
...
You can read more about all the available script
options here:
http://www.rubydoc.info/gems/ood_core/OodCore/Job/Script
Although in most cases you will want to modify the native
attribute, which
is resource manager dependent. Some examples are given below.
Note
It is recommended you commit the changes you made to submit.yml.erb
to
git.
git commit submit.yml.erb -m 'updated batch job options'
Slurm
For Slurm, you can choose the features on a requested node with:
# ~/ondemand/dev/jupyter/submit.yml.erb
---
batch_connect:
template: "basic"
script:
native:
- "-N"
- "<%= bc_num_slots.blank? ? 1 : bc_num_slots.to_i %>"
- "-C"
- "c12"
where we define the sbatch parameters as an array under script
and native
.
Note
The native
attribute is an array of command line arguments. So the above
example is equivalent to appending to sbatch:
sbatch ... -N <bc_num_slots> -C c12
The bc_num_slots
shown above located within the ERB syntax is the value
returned from web form for "Number of nodes". We check if it is blank and
return a valid number (since it wouldn't make sense to return 0
).
Torque
For Torque, you can choose processors-per-node with:
# ~/ondemand/dev/jupyter/submit.yml.erb
---
batch_connect:
template: "basic"
script:
native:
resources:
nodes: "<%= bc_num_slots.blank? ? 1 : bc_num_slots.to_i %>:ppn=28"
Note
See http://www.rubydoc.info/gems/pbs/PBS%2FBatch:submit_script for more
information on possible values for the native
attribute.
The bc_num_slots
shown above located within the ERB syntax is the value
returned from web form for "Number of nodes". We check if it is blank and
return a valid number (since it wouldn't make sense to return 0
).
PBS Professional
For most cases of PBS Professional you will want to modify how the
bc_num_slots
(number of CPUs on a single node) is submitted to the batch
server.
This can be specified as such:
# ~/ondemand/dev/jupyter/submit.yml.erb
---
batch_connect:
template: "basic"
script:
native:
- "-l"
- "select=1:ncpus=<%= bc_num_slots.blank? ? 1 : bc_num_slots.to_i %>"
where we define the qsub parameters as an array under script
and
native
.
If you would like to mimic how Torque handles bc_num_slots
(number of
nodes), then we will first need to change the form label of
bc_num_slots
that the user sees in the form. This can be done by adding to
the form configuration file the highlighted lines:
# ~/ondemand/dev/jupyter/form.yml
---
cluster: "cluster1"
attributes:
modules: "python"
extra_jupyter_args: ""
bc_num_slots:
label: "Number of nodes"
form:
- modules
- extra_jupyter_args
- bc_num_hours
- bc_num_slots
- bc_account
- bc_queue
- bc_email_on_started
Now when we click Launch Jupyter Notebook from the app details view, the form in the browser will have the new label "Number of nodes" instead of "Number of CPUs on a single node".
Next we will need to handle how we submit the bc_num_slots
since it means
something different now. So we modify the job submission configuration file as
such:
# ~/ondemand/dev/jupyter/submit.yml.erb
---
batch_connect:
template: "basic"
script:
native:
- "-l"
- "select=<%= bc_num_slots.blank? ? 1 : bc_num_slots.to_i %>:ncpus=28"
where we replace ncpus=28
with the correct number for your cluster. You can
also append mem=...gb
to the select=...
statement if you'd like.
Note
The native
attribute is an array of command line arguments. So the above
example is equivalent to appending to qsub:
qsub ... -l select=<bc_num_slots>:ncpus=28
The bc_num_slots
shown above located within the ERB syntax is the value
returned from web form for "Number of nodes". We check if it is blank and
return a valid number (since it wouldn't make sense to return 0
).
Other Resource Manager
For most of our other adapters (aside from Torque) the native
attribute is
an array of command line arguments formatted similarly to the
Slurm
example above.