Class: OodCore::Job::Adapters::HTCondor::Batch Private

Inherits:
Object
  • Object
show all
Defined in:
lib/ood_core/job/adapters/htcondor.rb

Overview

This class is part of a private API. You should avoid using this class if possible, as it may be removed or be changed in the future.

Object used for simplified communication with an HTCondor batch server

Defined Under Namespace

Classes: Error

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(bin: nil, bin_overrides: {}, submit_host: "", strict_host_checking: false, default_universe: "vanilla", default_docker_image: "ubuntu:latest", user_group_map: nil, cluster: "", additional_attributes: {}) ⇒ Batch

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Returns a new instance of Batch.

Parameters:

  • bin (#to_s) (defaults to: nil)

    path to HTCondor installation binaries

  • submit_host (#to_s) (defaults to: "")

    Submits the job on a login node via ssh

  • strict_host_checking (Bool) (defaults to: false)

    Whether to use strict host checking when ssh to submit_host



100
101
102
103
104
105
106
107
108
109
110
111
# File 'lib/ood_core/job/adapters/htcondor.rb', line 100

def initialize(bin: nil, bin_overrides: {}, submit_host: "", strict_host_checking: false, default_universe: "vanilla", default_docker_image: "ubuntu:latest", user_group_map: nil, cluster: "", additional_attributes: {})
    @bin                  = Pathname.new(bin.to_s)
    @bin_overrides        = bin_overrides
    @submit_host          = submit_host.to_s
    @strict_host_checking = strict_host_checking
    @default_universe     = default_universe.to_s
    @default_docker_image = default_docker_image.to_s
    @user_group_map       = user_group_map.to_s unless user_group_map.nil?
    @cluster              = cluster.to_s
    @additional_attributes = additional_attributes
    @version = get_htcondor_version
end

Instance Attribute Details

#additional_attributesHash{#to_s => #to_s} (readonly)

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Additional attributes to be added to the job submission

Returns:

  • (Hash{#to_s => #to_s})

    additional attributes to be added to the job submission



87
88
89
# File 'lib/ood_core/job/adapters/htcondor.rb', line 87

def additional_attributes
  @additional_attributes
end

#binPathname (readonly)

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

The path to the HTCondor client installation binaries

Returns:

  • (Pathname)

    path to HTCondor binaries



53
54
55
# File 'lib/ood_core/job/adapters/htcondor.rb', line 53

def bin
  @bin
end

#bin_overridesPathname (readonly)

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

The path to the HTCondor client installation binaries that override the default binaries

Returns:

  • (Pathname)

    path to HTCondor binaries overrides



58
59
60
# File 'lib/ood_core/job/adapters/htcondor.rb', line 58

def bin_overrides
  @bin_overrides
end

#clusterString (readonly)

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

The cluster name for this HTCondor instance

Returns:

  • (String)

    the cluster name



83
84
85
# File 'lib/ood_core/job/adapters/htcondor.rb', line 83

def cluster
  @cluster
end

#default_docker_imageString (readonly)

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Default docker image for jobs submitted to HTCondor

Returns:

  • (String)

    the default docker image for jobs



74
75
76
# File 'lib/ood_core/job/adapters/htcondor.rb', line 74

def default_docker_image
  @default_docker_image
end

#default_universeString (readonly)

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Default universe for jobs submitted to HTCondor

Returns:

  • (String)

    the default universe for jobs



70
71
72
# File 'lib/ood_core/job/adapters/htcondor.rb', line 70

def default_universe
  @default_universe
end

#strict_host_checkingBool (readonly)

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Whether to use strict host checking when ssh to submit_host

Returns:

  • (Bool)

    ; true if empty



66
67
68
# File 'lib/ood_core/job/adapters/htcondor.rb', line 66

def strict_host_checking
  @strict_host_checking
end

#submit_hostString (readonly)

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

The login node where the job is submitted via ssh

Returns:

  • (String)

    The login node



62
63
64
# File 'lib/ood_core/job/adapters/htcondor.rb', line 62

def submit_host
  @submit_host
end

#user_group_mapString? (readonly)

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

A path to the user/group map for HTCondor jobs The format in the file should adhere to the format used by [AssignAccountingGroup](htcondor.readthedocs.io/en/latest/admin-manual/introduction-to-configuration.html#FEATURE:ASSIGNACCOUNTINGGROUP)

Returns:

  • (String, nil)

    the path to the user/group map file



79
80
81
# File 'lib/ood_core/job/adapters/htcondor.rb', line 79

def user_group_map
  @user_group_map
end

#versionGem::Version (readonly)

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

The version of HTCondor on the submit_host

Returns:

  • (Gem::Version)

    the version of HTCondor



91
92
93
# File 'lib/ood_core/job/adapters/htcondor.rb', line 91

def version
  @version
end

Instance Method Details

#condor_q_attrsObject

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.



164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
# File 'lib/ood_core/job/adapters/htcondor.rb', line 164

def condor_q_attrs
    {
        id: "ClusterId",
        sub_id: "ProcId",
        status: "JobStatus",
        owner: "Owner",
        acct_group: "AcctGroup",
        job_name: "JobBatchName",
        procs: "CpusProvisioned",
        gpus: "GpusProvisioned",
        submission_time: "QDate",
        dispatch_time: "JobCurrentStartDate",
        sys_cpu_time: "RemoteSysCpu",
        user_cpu_time: "RemoteUserCpu",
        wallclock_time: "RemoteWallClockTime"
    }
end

#get_accountsHash{String => Array<String>}

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Retrieve accounts using user_group_map on @submit_host

Returns:

  • (Hash{String => Array<String>})

    mapping of usernames to their groups



219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
# File 'lib/ood_core/job/adapters/htcondor.rb', line 219

def get_accounts
    raise Error, "user_group_map is not defined" if user_group_map.nil? || user_group_map.empty?

    # Retrieve accounts, use local file, if exists. Otherwise use from submit_host
    if File.exist?(user_group_map) && File.readable?(user_group_map)
        output = File.read(user_group_map)
    else
        output = call("cat", user_group_map)
    end
    accounts = {}
    output.each_line do |line|
        next if line.strip.empty? || line.start_with?("#") # Skip empty lines and comments
        _, username, groups = line.strip.split(/\s+/, 3)
        accounts[username] = groups.split(",") if username && groups
    end

    accounts
rescue Error => e
    raise Error, "Failed to retrieve accounts: #{e.message}"
end

#get_jobs(id: "", owner: nil) ⇒ Array<Hash>

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Retrieve job information using `condor_q`

Parameters:

  • id (#to_s) (defaults to: "")

    the id of the job

  • owner (String) (defaults to: nil)

    the owner(s) of the job

Returns:

  • (Array<Hash>)

    list of details for jobs

Raises:

  • (Error)

    if `condor_q` command exited unsuccessfully



187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
# File 'lib/ood_core/job/adapters/htcondor.rb', line 187

def get_jobs(id: "", owner: nil)
    args = []
    unless id.to_s.empty?
        if id.to_s.include?(".") # if id is a job array, we need to use the ClusterId and ProcId
            cluster_id, proc_id = id.to_s.split(".")
            args.concat ["-constraint", "\"ClusterId == #{cluster_id} && ProcId == #{proc_id}\""]
        else # if id is a single job, we can just use the ClusterId
            args.concat ["-constraint", "\"ClusterId == #{id}\""]
        end
    end
    args.concat ["-constraint", "\"Owner == #{owner}\""] unless owner.to_s.empty?
    args.concat ["-af", *condor_q_attrs.values]

    output = call("condor_q", *args)
    parse_condor_q_output(output)
end

#get_slotsArray<Hash>

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Retrieve slot information using `condor_status`

Parameters:

  • owner (String)

    the owner(s) of the slots

Returns:

  • (Array<Hash>)

    list of details for slots

Raises:

  • (Error)

    if `condor_status` command exited unsuccessfully



208
209
210
211
212
213
214
# File 'lib/ood_core/job/adapters/htcondor.rb', line 208

def get_slots
    args = ["-af", "Machine", "TotalSlotCPUs", "TotalSlotGPUs", "TotalSlotMemory", "CPUs", "GPUs", "Memory", "NumDynamicSlots"]
    args.concat ["-constraint", "\"DynamicSlot is undefined\""]

    output = call("condor_status", *args)
    parse_condor_status_output(output)
end

#hold_job(id) ⇒ Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Place a job on hold using `condor_hold`

Parameters:

  • id (#to_s)

    the id of the job to hold

Raises:

  • (Error)

    if `condor_hold` command exited unsuccessfully



147
148
149
150
151
152
# File 'lib/ood_core/job/adapters/htcondor.rb', line 147

def hold_job(id)
    id = id.to_s
    call("condor_hold", id)
rescue Error => e
    raise Error, "Failed to hold job #{id}: #{e.message}"
end

#release_job(id) ⇒ Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Release a job from hold using `condor_release`

Parameters:

  • id (#to_s)

    the id of the job to release

Raises:

  • (Error)

    if `condor_release` command exited unsuccessfully



157
158
159
160
161
162
# File 'lib/ood_core/job/adapters/htcondor.rb', line 157

def release_job(id)
    id = id.to_s
    call("condor_release", id)
rescue Error => e
    raise Error, "Failed to release job #{id}: #{e.message}"
end

#remove_job(id) ⇒ Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Run the `condor_rm` command to remove a job

Parameters:

  • id (#to_s)

    the id of the job to remove

Raises:

  • (Error)

    if `condor_rm` command exited unsuccessfully



138
139
140
141
142
# File 'lib/ood_core/job/adapters/htcondor.rb', line 138

def remove_job(id)
    call("condor_rm", id.to_s)
rescue Error => e
    raise Error, "Failed to remove job #{id}: #{e.message}"
end

#submit_string(args: [], script_args: [], env: {}, script: "") ⇒ String

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Submit a script to the batch server

Parameters:

  • args (Array<#to_s>) (defaults to: [])

    arguments passed to `condor_submit` command

  • env (Hash{#to_s => #to_s}) (defaults to: {})

    environment variables set

  • script (String) (defaults to: "")

    the script to submit

Returns:

  • (String)

    the id of the job that was created

Raises:

  • (Error)

    if `condor_submit` command exited unsuccessfully



119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
# File 'lib/ood_core/job/adapters/htcondor.rb', line 119

def submit_string(args: [], script_args: [], env: {}, script: "")
    args = args.map(&:to_s)
    script_args = script_args.map(&:to_s).map { |s| s.to_s.gsub('"', "'") } # cannot do double
    env = env.to_h.each_with_object({}) { |(k, v), h| h[k.to_s] = v.to_s }

    path = "#{Dir.tmpdir}/htcondor_submit_#{SecureRandom.uuid}"

    call("bash", "-c", "cat > #{path}", stdin: script)
    output = call("condor_submit", *args, env: env, stdin: "arguments=#{path.split("/").last} #{script_args.join(" ")}\ntransfer_input_files=#{path}").strip

    match = output.match(/(cluster )?(\d+)/)
    raise Error, "Failed to parse job ID from output: #{output}" unless match
    match[2]

end