INFN Cloud CVMFS solution

Introduction

This guide provides a short description about the INFN-DataCloud CVMFS solution and relevant howtos.

Prerequisites

The user has to be registered in the IAM system for INFN-DataCloud https://iam.cloud.infn.it/login. Only authorized users can access the service, obtaining a personal or group CVMFS repository.

  • For more details regarding the registration process please see Getting Started. To use this service you don't need the "system admin" nomination.

What is CVMFS

CVMFS stands for CERN Virtual Machine File System and is, since 2014, the de-facto standard way for distributing software environments throughout the WLCG grid. The server exposes only an outgoing HTTP connections to clients, allowing to avoid most of the firewall-related issues affecting other network filesystems and to mount a read-only filesystem.

Technical documentation is available through the official website.

CVMFS in INFN-DataCloud

In the INFN-DataCloud implementation, three different approaches are available to publish artifacts to a CVMFS repository:

  • the "standard" publishing approach: using the cvmfs_server command line interface on a publisher;
  • using S3 access to the backend bucket: files uploaded to a "directory"1 called cvmfs on the S3 bucket are automatically published to CVMFS;
  • use the unpacked.infn.it repository: write access to this share repository is done by pushing container images to our container registry.

The first two approaches are mutually exclusive: you may not have both working on the same CVMFS repository.

Additionally, by design, the name of a CVMFS repository using the standard approach must not collide with an existing S3 bucket in the INFN-DataCloud Object Storage service. For this reason, personal CVMFS repositories using the standard approach will be named as /cvmfs/<AAI username>-personalrepo.infn.it.

How to request an INFN-DataCloud CVMFS repository

Info

For the moment, human interaction is required. In the future this may change and some SaaS button may automatise parts of this procedure.

To request an INFN-DataCloud CVMFS repository it is necessary to file a ticket to the user support group, by either writing an email to cloud-support@infn.it or via browser by connecting to https://servicedesk.cloud.infn.it.

In the ticket, please specify whether a group or personal repository is requested and the preferred approaches to enable.

If the "standard" approach is to be enabled, you'll need to receive a password file, defined by the CVMFS managers, via e-mail. To ensure the confidentiality of this secret password, you are asked to create a public and private keys pair which will be used to encrypt and decrypt the secret, respectively.

To generate the private key (secret.pem, to be kept secret) and the public one (public.pem) issue the following commands on your terminal:

openssl genrsa -aes256 -out secret.pem 2048
openssl rsa -pubout -in secret.pem -out public.pem

They will create the two files. The public.pem file has to be attached to the ticket. It will be used by the CVMFS managers to encrypt the password file.

To decrypt the received password file, named <repo-name>.infn.it.gw.encrypted you then use the private key via openssl:

openssl rsautl -decrypt -inkey secret.pem -in <repo-name>.infn.it.gw.encrypted -out <repo-name>.infn.it.gw

obtaining the unencrypted version, <repo-name>.infn.it.gw, of the file.

Alongside with the <repo-name>.infn.it.gw.encrypted you will receive a copy of the DataCloud CVMFS common public key but named <repo-name>.infn.it.pub, and a certificate file named <repo-name>.infn.it.crt. You'll find below how to use them.

How to publish to a CVMFS repository

Warning

Files published on CVMFS have to be considered public. Do not publish files containing sensitive information, namely passwords, private keys, P12 certificates, voms proxies, personal information or photos, etc... on CVMFS.

Note

The file propagation from the CVMFS stratum 0 to the stratum 1s and then to the clients is not an atomic filesystem operation. Large latency, even of the order of hours, is to be expected depending on the complexity of the network and the number of clients.

In the following subsections you'll learn how to publish contents to a CVMFS repository in the three possible approaches.

Using a publisher

In the CVMFS jargon, a publisher is a server configured to write to a CVMFS repository using the cvmfs_server publish command.

To install a publisher we suggest you to create a dedicated virtual machine. Both on private or public network will work.

This section of the guide tells how to install a publisher and how to write to a repository.

Software installation on AlmaLinux

sudo dnf update -y
sudo dnf install -y https://ecsft.cern.ch/dist/cvmfs/cvmfs-release/cvmfs-release-latest.noarch.rpm
sudo dnf install -y cvmfs cvmfs-server

Software installation on Ubuntu or Debian

sudo apt-get update
curl -LO https://ecsft.cern.ch/dist/cvmfs/cvmfs-release/cvmfs-release-latest_all.deb
sudo dpkg -i cvmfs-release-latest_all.deb && rm cvmfs-release-latest_all.deb
sudo apt-get update
sudo apt-get install -y cvmfs cvmfs-server

Repository setup on the publisher

Create a folder, in the example located in a unprivileged-user home directory called cvmfs-datacloud, and put inside the three files:

  • <repo-name>.infn.it.gw (the decrypted one)
  • <repo-name>.infn.it.pub
  • <repo-name>.infn.it.crt

Then "format" the repository using the following commands:

sudo cvmfs_server mkfs -w https://rgw.cloud.infn.it:443/cvmfs-prod/<repo-name>.infn.it \
  -u gw,/srv/cvmfs/<repo-name>.infn.it/data/txn,http://cvmfs.cloud.infn.it:4929/api/v1 \
  -k ~/cvmfs-datacloud/ -o $(id -nu) <repo-name>.infn.it
sudo systemctl daemon-reload

If both are successful, CVMFS server will copy the three files in a dedicated folder under /etc/cvmfs/ and prepare the VM to publish new contents.

How to publish to CVMFS using a publisher

The CVMFS publishing procedure is transactional: you are supposed to tell the server to be prepared to the publishing by starting a transaction, then operate on the files, and finally decide whether to finalise the transaction, i.e. publishing the content, or abort it, i.e. restore to the state before the begin of the transaction.

To start a new transaction, type the following command on the publisher:

cvmfs_publish transaction <repo-name>.infn.it

Once a transaction is started you cannot start a new one until you close the current one.

Then you operate on the files by simply copying, moving or deleting files under the local /cvmfs/<repo-name>.infn.it/ folder. You can use the regular tools like cp, rm, rsync, etc...

When you are satisfied with the content of the repository, you can finalise the transaction by issuing the following command:

cvmfs_server publish

Alternatively, to abort the current transaction, the command is:

cvmfs_server abort

Using the S3 interface

Warning

This approach is currently not available. For the moment refer to the other approaches.

Enabled buckets can be used to publish to an INFN-DataCloud CVMFS repository by simply uploading files into their cvmfs folder.

A back-end agent takes care automatically of the required filesystem operations and publishing.

Users can request a personal or group repository.

  • Personal repository:
    • The CVMFS repository will be /cvmfs/<iam_username>.infn.it and can be requested by any user registered in INFN Cloud
  • Group repository:
    • The CVMFS repository will be /cvmfs/<group_name>.infn.it and can be requested by the person responsible for that group/experiment.
    • Along with the request, the person responsible for that experiment/group also specifies which IAM users and/or groups must be able to publish in this repository.

To upload a file: - login to https://s3webui.cloud.infn.it/ S3 object storage. - Select the section "Browser". - Click on your bucket and create the cvmfs folder using the button "New path".

Using unpacked.infn.it

The unpacked.infn.it is a special CVMFS repository specifically designated for publishing "unpacked" container images usable with apptainer.

Being a collection of container images, the user interface for publishing on it is the INFN-DataCloud container registry, [https://harbor.cloud.infn.it].

Users can request a group or personal repository which corresponds to an harbor project.

By pushing, e.g. via a docker push command, to the harbor project, the image will be automatically published on CVMFS.

The path of the image on CVMFS is /cvmfs/unpacked.infn.it/harbor.cloud.infn.it/unpacked-<user/group/experiment>/<image-name>:<tag>.

Note

In order to proceed with the operations above, it is required that the user, or at least one member of the group (in case of group repository), has logged in to Harbor at least once beforehand. The default quota of the Harbour project is 50 GB. In case the user requires more space or a higher quota, this must be indicated accordingly before the creation of the project.

Client access to a CVMFS repository

Italian grid sites

All Italian WLCG grid sites, i.e. the INFN-T1 at CNAF and the Tier-2s, are already configured to mount INFN DataCloud CVMFS repositories.

In other words, you should already be able to access them. Please file a ticket by writing to cloud-support@infn.it if you experience issues in accessing a DataCloud CVMFS repository on the INFN-T1 or a Tier-2.

Other computing resource (Virtual Machines, Laptop, any other)

To use CVMFS on a self-managed computing resource, you first need to ensure the correct installation of the CVMFS client. Please find detailed instructions for all the supported platforms on the official webpage or find in the following sections the instructions for the platforms supported by INFN-DataCloud.

Software installation on AlmaLinux 9

sudo dnf update -y
sudo dnf install -y https://ecsft.cern.ch/dist/cvmfs/cvmfs-release/cvmfs-release-latest.noarch.rpm
sudo dnf install -y cvmfs

Software installation on Ubuntu or Debian

sudo apt-get update
curl -LO https://ecsft.cern.ch/dist/cvmfs/cvmfs-release/cvmfs-release-latest_all.deb
sudo dpkg -i cvmfs-release-latest_all.deb && rm cvmfs-release-latest_all.deb
sudo apt-get update
sudo apt-get install -y cvmfs

Configuration files

After the software installation, a couple of additional steps are required to enable the INFN-DataCloud

Put the following public key inside the /etc/cvmfs/keys/infn.it/common.infn.it.pub file:

-----BEGIN PUBLIC KEY-----
MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAn7MHm+TkYjyLmuQKOL2x
IU/DPHiRusqzVjvnILGaDfX2J9/DNwyJ8G3JNhP9Ivm5XuoNm+rgGxweHvMTC1/7
S9I2d5Ur4AyGDoXXmFj+nmd8yi+cU+n2AFaF9BAtr8pJZSVDISsNsa7MXqwc4AHi
E3lc2xxDH9uH2t6dOaNvAEB9T/LhqYJg7UlSJaXm4kKT0ys/C6EL5KlpQPkHKYGO
+ucZpilj/v9cuTu7N2GPLXHtU8m02CfY6N4BC1PoEdhZ6ZirAcTDJU6hENnzL+2h
K4p5DRuZxuROjYozkhLp6N1zm1ih+lRUnsU2zXyOpTFOEP2kZzS++yKi+l/jd3+b
fwIDAQAB
-----END PUBLIC KEY-----

using your favourite text editor launched with root privileges, e.g. sudo vim /etc/cvmfs/keys/infn.it/common.infn.it.pub.

Then a second file, /etc/cvmfs/domain.d/infn.it.conf, has to be created with the following content:

CVMFS_HTTP_PROXY=DIRECT
CVMFS_SERVER_URL="http://cvmfs-stratum1-cnaf.cloud.infn.it:8000/cvmfs/@fqrn@;http://cvmfs-stratum1-bari.cloud.infn.it:8000/cvmfs/@fqrn@"
CVMFS_KEYS_DIR="/etc/cvmfs/keys/infn.it"

again with root privileges.

Finally, issue the following commands to finalise the client configuration and check that everything is working properly:

sudo systemctl enable --now autofs
sudo cvmfs_config setup
sudo cvmfs_config chksetup # shall respond 'OK'

Note

CVMFS repositories are mounted by autofs upon the first access. Don't be surprised to not seeing any output of a ls /cvmfs/ command. Just typing ls /cvmfs/<repo-name>.infn.it/ is sufficient to trigger the automatic mount.


  1. there is no such a thing in S3 as a directory. The abstraction is done a the object level.