Class: OodCore::Job::Adapters::Slurm::Batch Private
- Inherits:
-
Object
- Object
- OodCore::Job::Adapters::Slurm::Batch
- Defined in:
- lib/ood_core/job/adapters/slurm.rb
Overview
This class is part of a private API. You should avoid using this class if possible, as it may be removed or be changed in the future.
Object used for simplified communication with a Slurm batch server
Defined Under Namespace
Classes: Error, SlurmTimeoutError
Constant Summary collapse
- UNIT_SEPARATOR =
This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.
"\x1F"
- RECORD_SEPARATOR =
This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.
"\x1E"
Instance Attribute Summary collapse
-
#bin ⇒ Pathname
readonly
private
The path to the Slurm client installation binaries.
-
#bin_overrides ⇒ Object
readonly
private
Optional overrides for Slurm client executables.
-
#cluster ⇒ String?
readonly
private
The cluster of the Slurm batch server.
-
#conf ⇒ Pathname?
readonly
private
The path to the Slurm configuration file.
-
#strict_host_checking ⇒ Bool
readonly
private
Wheter to use strict host checking when ssh to submit_host.
-
#submit_host ⇒ String
readonly
private
The login node where the job is submitted via ssh.
Instance Method Summary collapse
- #accounts ⇒ Object private
- #all_sinfo_node_fields ⇒ Object private
-
#all_squeue_fields ⇒ Object
private
Fields requested from a formatted `squeue` call Note that the order of these fields is important.
-
#delete_job(id) ⇒ void
private
Delete a specified job from batch server.
-
#get_cluster_info ⇒ ClusterInfo
private
Get a ClusterInfo object containing information about the given cluster.
-
#get_jobs(id: "", owner: nil, attrs: nil) ⇒ Array<Hash>
private
Get a list of hashes detailing each of the jobs on the batch server.
-
#hold_job(id) ⇒ void
private
Put a specified job on hold.
-
#initialize(cluster: nil, bin: nil, conf: nil, bin_overrides: {}, submit_host: "", strict_host_checking: true) ⇒ Batch
constructor
private
A new instance of Batch.
- #nodes ⇒ Object private
- #queues ⇒ Object private
-
#release_job(id) ⇒ void
private
Release a specified job that is on hold.
-
#squeue_args(id: "", owner: nil, options: []) ⇒ Object
private
TODO: write some barebones test for this? like 2 options and id or no id.
- #squeue_fields(attrs) ⇒ Object private
- #squeue_required_fields ⇒ Object private
-
#submit_string(str, args: [], env: {}) ⇒ String
private
Submit a script expanded as a string to the batch server.
Constructor Details
#initialize(cluster: nil, bin: nil, conf: nil, bin_overrides: {}, submit_host: "", strict_host_checking: true) ⇒ Batch
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Returns a new instance of Batch.
100 101 102 103 104 105 106 107 |
# File 'lib/ood_core/job/adapters/slurm.rb', line 100 def initialize(cluster: nil, bin: nil, conf: nil, bin_overrides: {}, submit_host: "", strict_host_checking: true) @cluster = cluster && cluster.to_s @conf = conf && Pathname.new(conf.to_s) @bin = Pathname.new(bin.to_s) @bin_overrides = bin_overrides @submit_host = submit_host.to_s @strict_host_checking = strict_host_checking end |
Instance Attribute Details
#bin ⇒ Pathname (readonly)
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
The path to the Slurm client installation binaries
69 70 71 |
# File 'lib/ood_core/job/adapters/slurm.rb', line 69 def bin @bin end |
#bin_overrides ⇒ Object (readonly)
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Optional overrides for Slurm client executables
75 76 77 |
# File 'lib/ood_core/job/adapters/slurm.rb', line 75 def bin_overrides @bin_overrides end |
#cluster ⇒ String? (readonly)
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
The cluster of the Slurm batch server
57 58 59 |
# File 'lib/ood_core/job/adapters/slurm.rb', line 57 def cluster @cluster end |
#conf ⇒ Pathname? (readonly)
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
The path to the Slurm configuration file
63 64 65 |
# File 'lib/ood_core/job/adapters/slurm.rb', line 63 def conf @conf end |
#strict_host_checking ⇒ Bool (readonly)
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Wheter to use strict host checking when ssh to submit_host
85 86 87 |
# File 'lib/ood_core/job/adapters/slurm.rb', line 85 def strict_host_checking @strict_host_checking end |
#submit_host ⇒ String (readonly)
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
The login node where the job is submitted via ssh
80 81 82 |
# File 'lib/ood_core/job/adapters/slurm.rb', line 80 def submit_host @submit_host end |
Instance Method Details
#accounts ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 |
# File 'lib/ood_core/job/adapters/slurm.rb', line 183 def accounts user = Etc.getlogin args = ['-nP', 'show', 'users', 'withassoc', 'format=account,cluster,partition,qos', 'where', "user=#{user}"] [].tap do |accts| call('sacctmgr', *args).each_line do |line| acct, cluster, queue, qos = line.split('|') next if acct.nil? || acct.chomp.empty? args = { name: acct, qos: qos.to_s.chomp.split(','), cluster: cluster, queue: queue.to_s.empty? ? nil : queue } info = OodCore::Job::AccountInfo.new(**args) unless acct.nil? accts << info unless acct.nil? end end end |
#all_sinfo_node_fields ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
336 337 338 339 340 341 342 |
# File 'lib/ood_core/job/adapters/slurm.rb', line 336 def all_sinfo_node_fields { procs: '%c', name: '%n', features: '%f' } end |
#all_squeue_fields ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Fields requested from a formatted `squeue` call Note that the order of these fields is important
272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 |
# File 'lib/ood_core/job/adapters/slurm.rb', line 272 def all_squeue_fields { account: "%a", job_id: "%A", exec_host: "%B", min_cpus: "%c", cpus: "%C", min_tmp_disk: "%d", nodes: "%D", end_time: "%e", dependency: "%E", features: "%f", array_job_id: "%F", group_name: "%g", group_id: "%G", over_subscribe: "%h", sockets_per_node: "%H", array_job_task_id: "%i", cores_per_socket: "%I", job_name: "%j", threads_per_core: "%J", comment: "%k", array_task_id: "%K", time_limit: "%l", time_left: "%L", min_memory: "%m", time_used: "%M", req_node: "%n", node_list: "%N", command: "%o", contiguous: "%O", qos: "%q", partition: "%P", priority: "%Q", reason: "%r", start_time: "%S", state_compact: "%t", state: "%T", user: "%u", user_id: "%U", reservation: "%v", submit_time: "%V", wckey: "%w", licenses: "%W", excluded_nodes: "%x", core_specialization: "%X", nice: "%y", scheduled_nodes: "%Y", sockets_cores_threads: "%z", work_dir: "%Z", gres: "%b", # must come at the end to fix a bug with Slurm 18 } end |
#delete_job(id) ⇒ void
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
This method returns an undefined value.
Delete a specified job from batch server
254 255 256 |
# File 'lib/ood_core/job/adapters/slurm.rb', line 254 def delete_job(id) call("scancel", id.to_s) end |
#get_cluster_info ⇒ ClusterInfo
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Get a ClusterInfo object containing information about the given cluster
111 112 113 114 115 116 117 118 119 120 121 122 123 |
# File 'lib/ood_core/job/adapters/slurm.rb', line 111 def get_cluster_info node_cpu_info = call("sinfo", "-aho %A/%D/%C").strip.split('/') gres_length = call("sinfo", "-o %G").lines.map(&:strip).map(&:length).max + 2 gres_lines = call("sinfo", "-ahNO ,nodehost,gres:#{gres_length},gresused:#{gres_length}") .lines.uniq.map(&:split) ClusterInfo.new(active_nodes: node_cpu_info[0].to_i, total_nodes: node_cpu_info[2].to_i, active_processors: node_cpu_info[3].to_i, total_processors: node_cpu_info[6].to_i, active_gpus: gres_lines.sum { |line| Slurm.gpus_from_gres(line[2]) }, total_gpus: gres_lines.sum { |line| Slurm.gpus_from_gres(line[1]) } ) end |
#get_jobs(id: "", owner: nil, attrs: nil) ⇒ Array<Hash>
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Get a list of hashes detailing each of the jobs on the batch server
147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 |
# File 'lib/ood_core/job/adapters/slurm.rb', line 147 def get_jobs(id: "", owner: nil, attrs: nil) fields = squeue_fields(attrs) args = squeue_args(id: id, owner: owner, options: fields.values) #TODO: switch mock of Open3 to be the squeue mock script # then you can use that for performance metrics StringIO.open(call("squeue", *args)) do |output| advance_past_squeue_header!(output) jobs = [] output.each_line(RECORD_SEPARATOR) do |line| # TODO: once you can do performance metrics you can test zip against some other tools # or just small optimizations # for example, fields is ALREADY A HASH and we are setting the VALUES to # "line.strip.split(unit_separator)" array # # i.e. store keys in an array, do Hash[[keys, values].transpose] # # or # # job = {} # keys.each_with_index { |key, index| [key] = values[index] } # jobs << job # # assuming keys and values are same length! if not we have an error! line = line.encode('UTF-8', invalid: :replace, undef: :replace) values = line.chomp(RECORD_SEPARATOR).strip.split(UNIT_SEPARATOR) jobs << Hash[fields.keys.zip(values)] unless values.empty? end jobs end rescue SlurmTimeoutError # TODO: could use a log entry here return [{ id: id, state: 'undetermined' }] end |
#hold_job(id) ⇒ void
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
This method returns an undefined value.
Put a specified job on hold
234 235 236 |
# File 'lib/ood_core/job/adapters/slurm.rb', line 234 def hold_job(id) call("scontrol", "hold", id.to_s) end |
#nodes ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 |
# File 'lib/ood_core/job/adapters/slurm.rb', line 344 def nodes args = all_sinfo_node_fields.values.join(UNIT_SEPARATOR) output = call('sinfo', '-ho', "#{RECORD_SEPARATOR}#{args}") output.each_line(RECORD_SEPARATOR).map do |line| values = line.chomp(RECORD_SEPARATOR).strip.split(UNIT_SEPARATOR) next if values.empty? data = Hash[all_sinfo_node_fields.keys.zip(values)] data[:name] = data[:name].to_s.split(',').first data[:features] = data[:features].to_s.split(',') NodeInfo.new(**data) end.compact end |
#queues ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
326 327 328 329 330 331 332 333 334 |
# File 'lib/ood_core/job/adapters/slurm.rb', line 326 def queues info_raw = call('scontrol', 'show', 'part', '-o') [].tap do |ret_arr| info_raw.each_line do |line| ret_arr << str_to_queue_info(line) end end end |
#release_job(id) ⇒ void
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
This method returns an undefined value.
Release a specified job that is on hold
244 245 246 |
# File 'lib/ood_core/job/adapters/slurm.rb', line 244 def release_job(id) call("scontrol", "release", id.to_s) end |
#squeue_args(id: "", owner: nil, options: []) ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
TODO: write some barebones test for this? like 2 options and id or no id
220 221 222 223 224 225 226 |
# File 'lib/ood_core/job/adapters/slurm.rb', line 220 def squeue_args(id: "", owner: nil, options: []) args = ["--all", "--states=all", "--noconvert"] args.concat ["-o", "#{RECORD_SEPARATOR}#{.join(UNIT_SEPARATOR)}"] args.concat ["-u", owner.to_s] unless owner.to_s.empty? args.concat ["-j", id.to_s] unless id.to_s.empty? args end |
#squeue_fields(attrs) ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
204 205 206 207 208 209 210 |
# File 'lib/ood_core/job/adapters/slurm.rb', line 204 def squeue_fields(attrs) if attrs.nil? all_squeue_fields else all_squeue_fields.slice(*squeue_attrs_for_info_attrs(Array.wrap(attrs) + squeue_required_fields)) end end |
#squeue_required_fields ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
212 213 214 215 216 217 |
# File 'lib/ood_core/job/adapters/slurm.rb', line 212 def squeue_required_fields #TODO: does this need to include ::array_job_task_id? #TODO: does it matter that order of the output can vary depending on the arguments and if "squeue_required_fields" are included? # previously the order was "fields.keys"; i don't think it does [:job_id, :state_compact] end |
#submit_string(str, args: [], env: {}) ⇒ String
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Submit a script expanded as a string to the batch server
264 265 266 267 268 |
# File 'lib/ood_core/job/adapters/slurm.rb', line 264 def submit_string(str, args: [], env: {}) args = args.map(&:to_s) + ["--parsable"] env = env.to_h.each_with_object({}) { |(k, v), h| h[k.to_s] = v.to_s } call("sbatch", *args, env: env, stdin: str.to_s).strip.split(";").first end |