4. Modify Submit Parameters¶
In some cases the Jupyter app batch job fails to submit to the given cluster or you are not happy with the default chosen submission parameters. This section explains how to modify the submission parameters.
4.1. Modify Submit File¶
The main responsibility of the submit.yml.erb
file located in the root of
the app is for modifying the underlying batch script that is generated from an
internal template and its submission parameters.
Note
The .erb
file extension will cause the YAML configuration file to be
processed using the eRuby (Embedded Ruby) templating system. This allows
you to embed Ruby code into the YAML configuration file for flow control,
variable substitution, and more.
The simplest submit.yml.erb
will look like:
# submit.yml.erb
---
batch_connect
template: "basic"
Which only describes which internal template to use when generating the batch script. From here we can add more options to the batch script, such as:
# submit.yml.erb
---
batch_connect
template: "basic"
set_host: "host=$(hostname -A | awk '{print $2}')"
where we override the Bash script used to determine the host name of the compute node from within a running batch job.
You can learn more about possible batch_connect
options here:
http://www.rubydoc.info/gems/ood_core/OodCore%2FBatchConnect%2FTemplate:initialize
Warning
It is recommended any global batch_connect
attributes be defined in the
corresponding cluster configuration file under:
/etc/ood/config/clusters.d/cluster.yml
There is further discussion on this under Modify Cluster Configuration.
But in most cases you will want to change the actual job submission parameters (e.g., node type). These are defined under the script
option as:
# submit.yml.erb
---
batch_connect
template: "basic"
script:
...
You can read more about all the available script
options here:
http://www.rubydoc.info/gems/ood_core/OodCore/Job/Script
Although in most cases you will want to modify the native
attribute, which
is resource manager dependent. Some examples are given below.
Note
It is recommended you commit the changes you made to submit.yml.erb
to
git:
# Stage and commit your changes
git commit submit.yml.erb -m 'updated batch job options'
4.1.1. Slurm¶
For Slurm, you can choose the features on a requested node with:
# submit.yml.erb
---
batch_connect
template: "basic"
script:
native: [ "-N", "<%= bc_num_slots.blank? ? 1 : bc_num_slots.to_i %>", "-C", "c12" ]
where we define the sbatch parameters as an array under script
and
native
.
Note
The native
attribute is an array of command line arguments. So the above
example is equivalent to appending to sbatch:
sbatch ... -N <bc_num_slots> -C c12
The bc_num_slots
shown above located within the ERB syntax is the value
returned from web form for “Number of nodes”. We check if it is blank and
return a valid number (since it wouldn’t make sense to return 0
).
4.1.2. Torque¶
For Torque, you can choose processors-per-node with:
# submit.yml.erb
---
batch_connect
template: "basic"
script:
native:
resources:
nodes: "<%= bc_num_slots.blank? ? 1 : bc_num_slots.to_i %>:ppn=28"
Note
See http://www.rubydoc.info/gems/pbs/PBS%2FBatch:submit_script for more
information on possible values for the native
attribute.
The bc_num_slots
shown above located within the ERB syntax is the value
returned from web form for “Number of nodes”. We check if it is blank and
return a valid number (since it wouldn’t make sense to return 0
).
4.1.3. PBS Professional¶
For most cases of PBS Professional you will want to modify how the
bc_num_slots
(number of CPUs on a single node) is submitted to the batch
server.
This can be specified as such:
# submit.yml.erb
---
batch_connect
template: "basic"
script:
native: [ "-l", "select=1:ncpus=<%= bc_num_slots.blank? ? 1 : bc_num_slots.to_i %>" ]
where we define the qsub parameters as an array under script
and
native
.
If you would like to mimic how Torque handles bc_num_slots
(number of
nodes), then we will first need to change the form label of
bc_num_slots
that the user sees in the form. This can be done by adding to
the form configuration file the highlighted lines:
# form.yml
---
cluster: "cluster1"
attributes:
modules: "python"
conda_extensions: "1"
extra_jupyter_args: ""
bc_num_slots:
label: "Number of nodes"
form:
- modules
- conda_extensions
- extra_jupyter_args
- bc_num_hours
- bc_num_slots
- bc_account
- bc_queue
- bc_email_on_started
Now when we go to the Jupyter app form in our browser it will have the new label “Number of nodes” instead of “Number of CPUs on a single node”.
Next we will need to handle how we submit the bc_num_slots
since it means
something different now. So we modify the job submission configuration file as
such:
# submit.yml.erb
---
batch_connect
template: "basic"
script:
native: [ "-l", "select=<%= bc_num_slots.blank? ? 1 : bc_num_slots.to_i %>:ncpus=28" ]
where you replace ncpus=28
with the correct number for your cluster.
You can also append mem=...gb
to the select=...
statement if you’d
like.
Note
The native
attribute is an array of command line arguments. So the above
example is equivalent to appending to qsub:
qsub ... -l select=<bc_num_slots>:ncpus=28
The bc_num_slots
shown above located within the ERB syntax is the value
returned from web form for “Number of nodes”. We check if it is blank and
return a valid number (since it wouldn’t make sense to return 0
).
4.2. Verify it Works¶
You can now test the app again by visiting your local OnDemand server in your browser:
GET /pun/sys/dashboard/batch_connect/dev/jupyter_app/session_contexts/new HTTP/1.1
Host: ondemand.my_center.edu
Fill in the form and launch the Jupyter batch job. Click the “Session ID” link for the launched session and confirm your changes are made under:
job_script_content.sh
(if modifiedbatch_connect
)job_script_options.json
(if modifiedscript
)