INFN Cloud CVMFS solution
Introduction
This guide provides a short description about the INFN-DataCloud CVMFS solution and relevant howtos.
Prerequisites
The user has to be registered in the IAM system for INFN-DataCloud https://iam.cloud.infn.it/login. Only authorized users can access the service, obtaining a personal or group CVMFS repository.
- For more details regarding the registration process please see Getting Started. To use this service you don't need the "system admin" nomination.
What is CVMFS
CVMFS stands for CERN Virtual Machine File System and is, since 2014, the de-facto standard way for distributing software environments throughout the WLCG grid. The server exposes only an outgoing HTTP connections to clients, allowing to avoid most of the firewall-related issues affecting other network filesystems and to mount a read-only filesystem.
Technical documentation is available through the official website.
CVMFS in INFN-DataCloud
In the INFN-DataCloud implementation, three different approaches are available to publish artifacts to a CVMFS repository:
- the "standard" publishing approach: using the
cvmfs_servercommand line interface on a publisher; - using S3 access to the backend bucket: files uploaded to a
"directory"1 called
cvmfson the S3 bucket are automatically published to CVMFS; - use the
unpacked.infn.itrepository: write access to this share repository is done by pushing container images to our container registry.
The first two approaches are mutually exclusive: you may not have both working on the same CVMFS repository.
Additionally, by design, the name of a CVMFS repository using the
standard approach must not collide with an existing S3 bucket in the
INFN-DataCloud Object Storage service. For this reason, personal CVMFS
repositories using the standard approach will be named as
/cvmfs/<AAI username>-personalrepo.infn.it.
How to request an INFN-DataCloud CVMFS repository
Info
For the moment, human interaction is required. In the future this may change and some SaaS button may automatise parts of this procedure.
To request an INFN-DataCloud CVMFS repository it is necessary to file a ticket to the user support group, by either writing an email to cloud-support@infn.it or via browser by connecting to https://servicedesk.cloud.infn.it.
In the ticket, please specify whether a group or personal repository is requested and the preferred approaches to enable.
If the "standard" approach is to be enabled, you'll need to receive a password file, defined by the CVMFS managers, via e-mail. To ensure the confidentiality of this secret password, you are asked to create a public and private keys pair which will be used to encrypt and decrypt the secret, respectively.
To generate the private key (secret.pem, to be kept secret) and the
public one (public.pem) issue the following commands on your terminal:
openssl genrsa -aes256 -out secret.pem 2048
openssl rsa -pubout -in secret.pem -out public.pem
They will create the two files. The public.pem file has to be attached
to the ticket. It will be used by the CVMFS managers to encrypt the
password file.
To decrypt the received password file, named
<repo-name>.infn.it.gw.encrypted you then use the private key via
openssl:
openssl rsautl -decrypt -inkey secret.pem -in <repo-name>.infn.it.gw.encrypted -out <repo-name>.infn.it.gw
obtaining the unencrypted version, <repo-name>.infn.it.gw, of the file.
Alongside with the <repo-name>.infn.it.gw.encrypted you will receive a
copy of the DataCloud CVMFS common public key but named
<repo-name>.infn.it.pub, and a certificate file named
<repo-name>.infn.it.crt. You'll find below how to use them.
How to publish to a CVMFS repository
Warning
Files published on CVMFS have to be considered public. Do not publish files containing sensitive information, namely passwords, private keys, P12 certificates, voms proxies, personal information or photos, etc... on CVMFS.
Note
The file propagation from the CVMFS stratum 0 to the stratum 1s and then to the clients is not an atomic filesystem operation. Large latency, even of the order of hours, is to be expected depending on the complexity of the network and the number of clients.
In the following subsections you'll learn how to publish contents to a CVMFS repository in the three possible approaches.
Using a publisher
In the CVMFS jargon, a publisher is a server configured to write to a
CVMFS repository using the cvmfs_server publish command.
To install a publisher we suggest you to create a dedicated virtual machine. Both on private or public network will work.
This section of the guide tells how to install a publisher and how to write to a repository.
Software installation on AlmaLinux
sudo dnf update -y
sudo dnf install -y https://ecsft.cern.ch/dist/cvmfs/cvmfs-release/cvmfs-release-latest.noarch.rpm
sudo dnf install -y cvmfs cvmfs-server
Software installation on Ubuntu or Debian
sudo apt-get update
curl -LO https://ecsft.cern.ch/dist/cvmfs/cvmfs-release/cvmfs-release-latest_all.deb
sudo dpkg -i cvmfs-release-latest_all.deb && rm cvmfs-release-latest_all.deb
sudo apt-get update
sudo apt-get install -y cvmfs cvmfs-server
Repository setup on the publisher
Create a folder, in the example located in a unprivileged-user home
directory called cvmfs-datacloud, and put inside the three files:
<repo-name>.infn.it.gw(the decrypted one)<repo-name>.infn.it.pub<repo-name>.infn.it.crt
Then "format" the repository using the following commands:
sudo cvmfs_server mkfs -w https://rgw.cloud.infn.it:443/cvmfs-prod/<repo-name>.infn.it \
-u gw,/srv/cvmfs/<repo-name>.infn.it/data/txn,http://cvmfs.cloud.infn.it:4929/api/v1 \
-k ~/cvmfs-datacloud/ -o $(id -nu) <repo-name>.infn.it
sudo systemctl daemon-reload
If both are successful, CVMFS server will copy the three files in a
dedicated folder under /etc/cvmfs/ and prepare the VM to publish new
contents.
How to publish to CVMFS using a publisher
The CVMFS publishing procedure is transactional: you are supposed to tell the server to be prepared to the publishing by starting a transaction, then operate on the files, and finally decide whether to finalise the transaction, i.e. publishing the content, or abort it, i.e. restore to the state before the begin of the transaction.
To start a new transaction, type the following command on the publisher:
cvmfs_publish transaction <repo-name>.infn.it
Once a transaction is started you cannot start a new one until you close the current one.
Then you operate on the files by simply copying, moving or deleting
files under the local /cvmfs/<repo-name>.infn.it/ folder. You can use
the regular tools like cp, rm, rsync, etc...
When you are satisfied with the content of the repository, you can finalise the transaction by issuing the following command:
cvmfs_server publish
Alternatively, to abort the current transaction, the command is:
cvmfs_server abort
Using the S3 interface
Warning
This approach is currently not available. For the moment refer to the other approaches.
Enabled buckets can be used to publish to an INFN-DataCloud CVMFS
repository by simply uploading files into their cvmfs folder.
A back-end agent takes care automatically of the required filesystem operations and publishing.
Users can request a personal or group repository.
- Personal repository:
- The CVMFS repository will be
/cvmfs/<iam_username>.infn.itand can be requested by any user registered in INFN Cloud
- The CVMFS repository will be
- Group repository:
- The CVMFS repository will be
/cvmfs/<group_name>.infn.itand can be requested by the person responsible for that group/experiment. - Along with the request, the person responsible for that experiment/group also specifies which IAM users and/or groups must be able to publish in this repository.
- The CVMFS repository will be
To upload a file:
- login to
https://s3webui.cloud.infn.it/ S3
object storage.
- Select the section "Browser".
- Click on your bucket and create the cvmfs folder using the button
"New path".
Using unpacked.infn.it
The unpacked.infn.it is a special CVMFS repository specifically
designated for publishing "unpacked" container images usable with
apptainer.
Being a collection of container images, the user interface for publishing on it is the INFN-DataCloud container registry, [https://harbor.cloud.infn.it].
Users can request a group or personal repository which corresponds to an harbor project.
By pushing, e.g. via a docker push command, to the harbor project, the
image will be automatically published on CVMFS.
The path of the image on CVMFS is
/cvmfs/unpacked.infn.it/harbor.cloud.infn.it/unpacked-<user/group/experiment>/<image-name>:<tag>.
Note
In order to proceed with the operations above, it is required that the user, or at least one member of the group (in case of group repository), has logged in to Harbor at least once beforehand. The default quota of the Harbour project is 50 GB. In case the user requires more space or a higher quota, this must be indicated accordingly before the creation of the project.
Client access to a CVMFS repository
Italian grid sites
All Italian WLCG grid sites, i.e. the INFN-T1 at CNAF and the Tier-2s, are already configured to mount INFN DataCloud CVMFS repositories.
In other words, you should already be able to access them. Please file a ticket by writing to cloud-support@infn.it if you experience issues in accessing a DataCloud CVMFS repository on the INFN-T1 or a Tier-2.
Other computing resource (Virtual Machines, Laptop, any other)
To use CVMFS on a self-managed computing resource, you first need to ensure the correct installation of the CVMFS client. Please find detailed instructions for all the supported platforms on the official webpage or find in the following sections the instructions for the platforms supported by INFN-DataCloud.
Software installation on AlmaLinux 9
sudo dnf update -y
sudo dnf install -y https://ecsft.cern.ch/dist/cvmfs/cvmfs-release/cvmfs-release-latest.noarch.rpm
sudo dnf install -y cvmfs
Software installation on Ubuntu or Debian
sudo apt-get update
curl -LO https://ecsft.cern.ch/dist/cvmfs/cvmfs-release/cvmfs-release-latest_all.deb
sudo dpkg -i cvmfs-release-latest_all.deb && rm cvmfs-release-latest_all.deb
sudo apt-get update
sudo apt-get install -y cvmfs
Configuration files
After the software installation, a couple of additional steps are required to enable the INFN-DataCloud
Put the following public key inside the
/etc/cvmfs/keys/infn.it/common.infn.it.pub file:
-----BEGIN PUBLIC KEY-----
MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAn7MHm+TkYjyLmuQKOL2x
IU/DPHiRusqzVjvnILGaDfX2J9/DNwyJ8G3JNhP9Ivm5XuoNm+rgGxweHvMTC1/7
S9I2d5Ur4AyGDoXXmFj+nmd8yi+cU+n2AFaF9BAtr8pJZSVDISsNsa7MXqwc4AHi
E3lc2xxDH9uH2t6dOaNvAEB9T/LhqYJg7UlSJaXm4kKT0ys/C6EL5KlpQPkHKYGO
+ucZpilj/v9cuTu7N2GPLXHtU8m02CfY6N4BC1PoEdhZ6ZirAcTDJU6hENnzL+2h
K4p5DRuZxuROjYozkhLp6N1zm1ih+lRUnsU2zXyOpTFOEP2kZzS++yKi+l/jd3+b
fwIDAQAB
-----END PUBLIC KEY-----
using your favourite text editor launched with root privileges, e.g.
sudo vim /etc/cvmfs/keys/infn.it/common.infn.it.pub.
Then a second file, /etc/cvmfs/domain.d/infn.it.conf, has to be
created with the following content:
CVMFS_HTTP_PROXY=DIRECT
CVMFS_SERVER_URL="http://cvmfs-stratum1-cnaf.cloud.infn.it:8000/cvmfs/@fqrn@;http://cvmfs-stratum1-bari.cloud.infn.it:8000/cvmfs/@fqrn@"
CVMFS_KEYS_DIR="/etc/cvmfs/keys/infn.it"
again with root privileges.
Finally, issue the following commands to finalise the client configuration and check that everything is working properly:
sudo systemctl enable --now autofs
sudo cvmfs_config setup
sudo cvmfs_config chksetup # shall respond 'OK'
Note
CVMFS repositories are mounted by autofs upon the first access.
Don't be surprised to not seeing any output of a ls /cvmfs/
command. Just typing ls /cvmfs/<repo-name>.infn.it/ is sufficient
to trigger the automatic mount.
-
there is no such a thing in S3 as a directory. The abstraction is done a the object level. ↩