4. Modify Submit Parameters

In some cases the Jupyter app batch job fails to submit to the given cluster or you are not happy with the default chosen submission parameters. This section explains how to modify the submission parameters.

4.1. Modify Submit File

The main responsibility of the submit.yml.erb file located in the root of the app is for modifying the underlying batch script that is generated from an internal template and its submission parameters.

Note

The .erb file extension will cause the YAML configuration file to be processed using the eRuby (Embedded Ruby) templating system. This allows you to embed Ruby code into the YAML configuration file for flow control, variable substitution, and more.

The simplest submit.yml.erb will look like:

# submit.yml.erb
---

batch_connect
  template: "basic"

Which only describes which internal template to use when generating the batch script. From here we can add more options to the batch script, such as:

# submit.yml.erb
---

batch_connect
  template: "basic"
  set_host: "host=$(hostname -A | awk '{print $2}')"

where we override the Bash script used to determine the host name of the compute node from within a running batch job.

You can learn more about possible batch_connect options here:

http://www.rubydoc.info/gems/ood_core/OodCore%2FBatchConnect%2FTemplate:initialize

Warning

It is recommended any global batch_connect attributes be defined in the corresponding cluster configuration file under:

/etc/ood/config/clusters.d/cluster.yml

There is further discussion on this under Modify Cluster Configuration.

But in most cases you will want to change the actual job submission parameters (e.g., node type). These are defined under the script option as:

# submit.yml.erb
---

batch_connect
  template: "basic"

script:
  ...

You can read more about all the available script options here:

http://www.rubydoc.info/gems/ood_core/OodCore/Job/Script

Although in most cases you will want to modify the native attribute, which is resource manager dependent. Some examples are given below.

Note

It is recommended you commit the changes you made to submit.yml.erb to git:

# Stage and commit your changes
git commit submit.yml.erb -m 'updated batch job options'

4.1.1. Slurm

For Slurm, you can choose the features on a requested node with:

# submit.yml.erb
---

batch_connect
  template: "basic"

script:
  native: [ "-N", "<%= bc_num_slots.blank? ? 1 : bc_num_slots.to_i %>", "-C", "c12" ]

where we define the sbatch parameters as an array under script and native.

Note

The native attribute is an array of command line arguments. So the above example is equivalent to appending to sbatch:

sbatch ... -N <bc_num_slots> -C c12

The bc_num_slots shown above located within the ERB syntax is the value returned from web form for “Number of nodes”. We check if it is blank and return a valid number (since it wouldn’t make sense to return 0).

4.1.2. Torque

For Torque, you can choose processors-per-node with:

# submit.yml.erb
---

batch_connect
  template: "basic"

script:
  native:
    resources:
      nodes: "<%= bc_num_slots.blank? ? 1 : bc_num_slots.to_i %>:ppn=28"

Note

See http://www.rubydoc.info/gems/pbs/PBS%2FBatch:submit_script for more information on possible values for the native attribute.

The bc_num_slots shown above located within the ERB syntax is the value returned from web form for “Number of nodes”. We check if it is blank and return a valid number (since it wouldn’t make sense to return 0).

4.1.3. PBS Professional

For most cases of PBS Professional you will want to modify how the bc_num_slots (number of CPUs on a single node) is submitted to the batch server.

This can be specified as such:

# submit.yml.erb
---

batch_connect
  template: "basic"

script:
  native: [ "-l", "select=1:ncpus=<%= bc_num_slots.blank? ? 1 : bc_num_slots.to_i %>" ]

where we define the qsub parameters as an array under script and native.

If you would like to mimic how Torque handles bc_num_slots (number of nodes), then we will first need to change the form label of bc_num_slots that the user sees in the form. This can be done by adding to the form configuration file the highlighted lines:

# form.yml
---
cluster: "cluster1"

attributes:
  modules: "python"
  conda_extensions: "1"
  extra_jupyter_args: ""
  bc_num_slots:
    label: "Number of nodes"

form:
  - modules
  - conda_extensions
  - extra_jupyter_args
  - bc_num_hours
  - bc_num_slots
  - bc_account
  - bc_queue
  - bc_email_on_started

Now when we go to the Jupyter app form in our browser it will have the new label “Number of nodes” instead of “Number of CPUs on a single node”.

Next we will need to handle how we submit the bc_num_slots since it means something different now. So we modify the job submission configuration file as such:

# submit.yml.erb
---

batch_connect
  template: "basic"

script:
  native: [ "-l", "select=<%= bc_num_slots.blank? ? 1 : bc_num_slots.to_i %>:ncpus=28" ]

where you replace ncpus=28 with the correct number for your cluster.

You can also append mem=...gb to the select=... statement if you’d like.

Note

The native attribute is an array of command line arguments. So the above example is equivalent to appending to qsub:

qsub ... -l select=<bc_num_slots>:ncpus=28

The bc_num_slots shown above located within the ERB syntax is the value returned from web form for “Number of nodes”. We check if it is blank and return a valid number (since it wouldn’t make sense to return 0).

4.1.4. Other

For most of our other adapters (aside from Torque) the native attribute is an array of command line arguments formatted similarly to the Slurm example above.

4.2. Verify it Works

You can now test the app again by visiting your local OnDemand server in your browser:

GET /pun/sys/dashboard/batch_connect/dev/jupyter_app/session_contexts/new HTTP/1.1
Host: ondemand.my_center.edu

Fill in the form and launch the Jupyter batch job. Click the “Session ID” link for the launched session and confirm your changes are made under:

  • job_script_content.sh (if modified batch_connect)
  • job_script_options.json (if modified script)