HTCondor mini user guide
Description
Deploy HTCondor mini, a technology preview of an all-in-one ("minicondor") HTCondor. This type of install is useful for testing and experimentation.
Prerequisites
The user has to be registered in the IAM system for INFN-Cloud https://iam.cloud.infn.it/. Only registered users can login into the INFN-Cloud dashboard https://my.cloud.infn.it.
For more details regarding registration please see Getting Started.
User responsibilities
Important
The solution described in this guide consists on instantiation of Virtual Machines instantiated on INFN-CLOUD infrastructure. The instantiation of a VM comes with the responsibility of maintaining it and all the services it hosts.
Please read the INFN Cloud AUP in order to understand the responsibilities you have in managing this service.
Deployment of the service
After logging into the INFN-Cloud dashboard, select the "HTCondor mini" button from the list of available solutions:
Select either Automatic or Manual scheduling as shown below:
In the first case, the Orchestrator will take care of choosing the best available provider, in the other case it will be performed a direct submission towards one of the providers available, to be selected from the drop-down menu. In the case of manual scheduling, the flavors displayed on the next page will be those offered by the chosen provider.
Insert into the corresponding fields a Deployment description and choose a flavour in order to specify the number of vCPUs and memory size of the Virtual Machine, as shown in the image below:
Once the deployment is ready, it will be possible to access the VM via SSH.
Submit a simple job to HTCondor mini
- As a first step, it is necessary to switch to the
submituseruser by simply issuing thecondorcommand:
~$ condor
[submituser@c9c00e2e28c8 ~]$
This command allows to execute a user shell in the docker container. At this point it is possible to proceed to the job submission.
- Create a submit-file like this:
[submituser@c9c00e2e28c8 ~]$ cat submit.sub
executable = /bin/hostname
output = output.txt
error = error.txt
log = log.txt
queue 1
- Then submit the job and see its status to check if it is correctly running:
[submituser@c9c00e2e28c8 ~]$ condor_submit submit.sub
Submitting job(s).
1 job(s) submitted to cluster 2.
[submituser@c9c00e2e28c8 ~]$
[submituser@c9c00e2e28c8 ~]$ condor_q
-- Schedd: c9c00e2e28c8 : <127.0.0.1:9618?... @ 04/26/23 14:25:51
OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS
submituser ID: 2 4/26 14:25 _ 1 _ 1 2.0
Total for query: 1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended
Total for submituser: 1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended
Total for all users: 1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended
- It's also possible to check the status of the cluster:
[submituser@c9c00e2e28c8 ~]$ condor_status
Name OpSys Arch State Activity LoadAv Mem ActvtyTime
slot1@c9c00e2e28c8 LINUX X86_64 Unclaimed Idle 0.000 1983 0+02:38:52
Total Owner Claimed Unclaimed Matched Preempting Backfill Drain
X86_64/LINUX 1 0 0 1 0 0 0 0
Total 1 0 0 1 0 0 0 0