HTCondor mini user guide

Description

Deploy HTCondor mini, a technology preview of an all-in-one (“minicondor”) HTCondor. This type of install is useful for testing and experimentation.

Prerequisites

The user has to be registered in the IAM system for INFN-Cloud https://iam.cloud.infn.it/. Only registered users can login into the INFN-Cloud dashboard https://my.cloud.infn.it.

User responsibilities

Important

The solution described in this guide consists on instantiation of Virtual Machines instantiated on INFN-CLOUD infrastructure. The instantiation of a VM comes with the responsibility of maintaining it and all the services it hosts.

Please read the INFN Cloud AUP in order to understand the responsibilities you have in managing this service.

Deployment of the service

After login into the INFN-Cloud dashboard, select the “HTCondor mini” button from the list of available solutions:

../../../_images/htcondor_mini_panel.png

Insert into the corresponding fields a Deployment description and choose a flavour in order to specify the number of vCPUs and memory size of the Virtual Machine, as shown in the image below:

../../../_images/htcondor_mini_configuration.png

Once the deployment is ready, it will be possible to access the VM via SSH.

Submit a simple job to HTCondor mini

  • As a first step, it is necessary to switch to the submituser user by simply issuing the condor command:

    ~$ condor
    [submituser@c9c00e2e28c8 ~]$
    

This command allows to execute a user shell in the docker container. At this point it is possible to proceed to the job submission.

  • Create a submit file like this:

    [submituser@c9c00e2e28c8 ~]$ cat submit.sub
    executable = /bin/hostname
    
    output = output.txt
    error = error.txt
    log = log.txt
    queue 1
    
  • Then submit the job and see its status to check if it is correctly running:

    [submituser@c9c00e2e28c8 ~]$ condor_submit submit.sub
    Submitting job(s).
    1 job(s) submitted to cluster 2.
    [submituser@c9c00e2e28c8 ~]$
    [submituser@c9c00e2e28c8 ~]$ condor_q
    
    
    -- Schedd: c9c00e2e28c8 : <127.0.0.1:9618?... @ 04/26/23 14:25:51
    OWNER      BATCH_NAME    SUBMITTED   DONE   RUN    IDLE  TOTAL JOB_IDS
    submituser ID: 2        4/26 14:25      _      1      _      1 2.0
    
    Total for query: 1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended
    Total for submituser: 1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended
    Total for all users: 1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended
    
  • It’s also possible to check the status of the cluster:

    [submituser@c9c00e2e28c8 ~]$ condor_status
    Name               OpSys      Arch   State     Activity LoadAv Mem   ActvtyTime
    
    slot1@c9c00e2e28c8 LINUX      X86_64 Unclaimed Idle      0.000 1983  0+02:38:52
    
              Total Owner Claimed Unclaimed Matched Preempting Backfill  Drain
    
    X86_64/LINUX     1     0       0         1       0          0        0      0
    
           Total     1     0       0         1       0          0        0      0