Deploy a Kubernetes cluster (sys-admin nomination required)
Prerequisites
The user has to be registered in the IAM system for INFN-Cloud https://iam.cloud.infn.it/login. Only registered users can login into the INFN-Cloud dashboard https://my.cloud.infn.it/login.
User responsibilities
The solution described in this guide consists of the deployment of a Kubernetes cluster on top of Virtual Machines instantiated on INFN-Cloud infrastructure. The instantiation of a VM comes with the responsibility of maintaining it and all of the services it hosts. In particular, be careful when updating the operating system packages, as they could incorrectly modify the current version of the cluster and cause it to malfunction.
Please read the INFN Cloud AUP in order to understand the responsibilities you have in managing this service.
Kubernetes cluster configuration
Note
If you belong to multiple projects, aka multiple IAM-groups, after login into the dashboard, from the lower left corner, select the one to be used for the deployment you intend to perform. Not all solutions are available for all projects. The resources used for the deployment will be accounted to the respective project, and impact on their available quota. See figure below.
Select the "Kubernetes cluster" button and then "configure". The configuration menu shows only your projects allowed to instantiate it.

Once done the configuration form appears. Parameters are split in two pages: "Basic" and "Advanced" configuration. Select either Automatic or Manual scheduling as shown below:


In the first case, the Orchestrator will take care of choosing the best available provider, in the other case it will be performed a direct submission towards one of the providers available, to be selected from the drop-down menu. In the case of manual scheduling, the flavors displayed on the next page will be those offered by the chosen provider.

Basic configuration
Default parameters are ready for the submission of a cluster composed by 1 master and 1 slave. If you selected the Automatic Scheduling mode, the provider where the cluster will be instantiated is automatically selected by the INFN Cloud orchestrator service.
The user has to specify:
- the flavor of master and slave selecting the number of VCPUs and memory size (RAM)
- the number of slaves if more than one in needed
admin_token
: the password that will be used to access the Grafana dashboards.
If needed, a single port or a range can be specified to be open on the master. By policy the open ports on providers are higher than 8000.

Advanced configuration

The user can select:
- the timeout for the deployment
- "no cluster deletion" in case of failure
- don't send the confirmation email when complete
- the manual scheduling, selecting the provider where the cluster will be created. The list of providers is related to the project.

Deployment result
To check the status of the deployment and its details select the "deployments" button. Here all the user's deployments are reported with "deployment identifier", "status", "creation time", the "resources provider" and the button "Details".

For each deployment the button "Details" allows:
- to delete the cluster
- to show the TOSCA template of the cluster (with the default values)
- to retrieve the deployment log file that contains error messages in case of failure
- to lock the deployment
If the creation of a deployment fails, an additional option (retry) is introduced in the dropdown menu, allowing the user to resubmit the deployment with the same parameters:

If the deletion of a deployment fails, resulting in the status being set
to DELETE_FAILED
, the "delete (force)" button is displayed in the
list of available actions, allowing the user to force the deletion of
the deployment:

Clicking on the "deployment identifier" or on "Details" button the details of the deployed cluster are shown:
- the "Overview" of the cluster
- the "Input Values" used for the cluster configuration
- the "Output Values" to access the cluster, as the Kubernetes and Grafana dashboard endpoints, the kubeconfig file to download and the FloatingIP to access the created VMs. To access the Kubernetes dashboard, use the token in the kubeconfig or the kubeconfig file itself.



InterLink
InterLink is a technology that handles the transparent offloading of Kubernetes workloads to remote computation systems.
From the INFN Cloud dashboard, select the second option \"Configure a Kubernetes cluster with a Virtual Node Deploy an InterLink Virtual Node\"

After selecting either the automatic or manual scheduling mode, in the configuration page you will be presented with the following additional tabs:
- Virtual Node
Plus, one of the following, according to the selected architecture you want to deploy (see later):
- InterLink-In-Cluster
- InterLink-Edge
- InterLink-Tunneled
Virtual Node

In this tab you configure the capacity of the Virtual Node, i.e. the total amount of resources available on that node:
- number of CPU cores;
- memory size in GiB;
- max number of schedulable PODs;
- number of nVidia GPUs.
Such node will appear in the list of available cluster nodes, and PODs scheduled to run on it will be offloaded for execution to the remote computation system.
In this tab you can also choose the InterLink architecture to deploy:
- In Cluster with K8s plugin
- Edge
- Tunneled
Details about the supported architectures can be found in the InterLink Cookbook.
According to the selected architecture, you must provide configuration options in the corresponding tab.
InterLink-In-Cluster

In the InterLink-In-Cluster case, all InterLink components are run in the local cluster (i.e., the Kubernetes cluster that you are going to deploy in the INFN Cloud PaaS infrastructure).
Currently, the only available plugin is the InterLink Kubernetes Plugin that allows PODs submitted to the local cluster to be offloaded to a remote Kubernetes cluster.
Accordingly, the plugin requires the kubeconfig YAML to access the remote cluster, and some optional parameters:
- KUBECONFIG YAML (required): the Kubeconfig to access the remote
cluster.
- Kubeconfig must be provided in YAML format (notice that certificates and keys, if provided, must be given inline, see InterLink Kubernetes Plugin - 401-unauthorized for some notes).
- CLIENT CONFIGURATION (optional): option to set properties of the
underlying python Kubernetes client configuration object.
- This is a JSON field (take care of using single quotes).
- E.g. set the following value if your remote cluster is using
self-signed certificates:
{'verify_ssl': false}
- NAMESPACE PREFIX (optional, defaults to \'offloading\'): remote cluster namespace prefix where resources are offloaded.
- NODE SELECTOR (optional): remote workloads node selector, if you
want to offload resources to selected nodes.
- This is a JSON field (take care of using single quotes).
- E.g. let\'s assume the remote cluster has GPU accelerated nodes
labeled according to the GPU model, e.g. some nodes are labeld
with nvidia/gpu-model: T4, if you want to offload PODs on such
nodes set the following value:
{'nvidia/gpu-model': 'T4'}
- NODE TOLERATIONS (optional): remote workloads node tolerations, if
you want to offload resources to tainted nodes.
- This is a JSON field (take care of using single quotes).
- E.g., let\'s assume the GPU nodes on the remote cluster are
tainted with nvidia.com/gpu=present:NoSchedule, if you want to
offload PODs on such nodes set the following value:
[{'key': 'nvidia.com/gpu', 'operator': 'Exists', 'effect': 'NoSchedule'}]
- GATEWAY PORT (optional, defaults to 30222): port to reach the Gateway\'s SSH daemon when offloading POD microservices (see Microservices Offloading). If you plan to deploy POD microservices, you must open such port to public traffic:

After the Kubernetes cluster with InterLink components has been deployed, you can perform some checks.
Check node capacity
Check if the node has the expected capacity:
$ kubectl describe node ivk-in-cluster-k8s
you should get an output like the following:
[...]
Capacity:
cpu: 10
memory: 256Gi
nvidia.com/gpu: 1
pods: 10
[...]
Offload a POD:
In the remote cluster, export KUBECONFIG
with the content of the
local kubeconfig file:
$ export KUBECONFIG=/etc/kubeconfig
Submit a POD to the local cluster:
$ cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: test-pod
namespace: default
spec:
containers:
- name: test-container
image: busybox
command: ["sh", "-c", "while true; do date; sleep 3; done"]
resources:
limits:
memory: "32Mi"
cpu: "10m"
nodeSelector:
type: virtual-kubelet
tolerations:
- key: "virtual-node.interlink/no-schedule"
operator: "Equal"
value: "true"
effect: "NoSchedule"
EOF
The POD is actually being executed in the remote cluster, but you can get the logs querying the local cluster:
$ kubectl logs test-pod
Thu Jan 23 08:32:41 UTC 2025
Thu Jan 23 08:32:44 UTC 2025
Thu Jan 23 08:32:47 UTC 2025
InterLink-Edge
In the Edge-node case, the Virtual Kubelet is run in the local cluster, while the InterLink API Server is run outside of the cluster.
In the corresponding tab, you provide to the Virtual Kubelet the parameters to reach the InterLink API Server and authenticate:

INTERLINK URL
: URL of the InterLink API Server;INTERLINK PORT
: port to reach the InterLink API Server;INTERLINK SECURE
: Whether to enable/disable API Server certificates verification (for development, if you are using self-signed certificates);OAUTH
: OAuth2 Client configuration (see official guide https://intertwin-eu.github.io/interLink/docs/guides/oidc-IAM).
Details can be found in the official InterLink Cookbook .
InterLink-TUNNELED
In the Tunneled case, the Virtual Kubelet and the API Server are run in the local cluster and an SSH socket tunnel is opened to allow communication with the plugin.

Useful links
References
- InterLink documentation: https://intertwin-eu.github.io/interLink.
- InterLink Kubernetes Plugin: https://baltig.infn.it/mgattari/interlink-kubernetes-plugin.
- Helm chart to deploy InterLink on a Kubernetes Cluster: https://github.com/interTwin-eu/interlink-helm-chart/tree/main.
- Ansible role to deploy the Helm chart: https://baltig.infn.it/infn-cloud/ansible-role-interlink.
Troubleshooting
In both the cases (auto and manual scheduling) the success of creation depends on the provider resources availability. Otherwise a "no quota" is reported as failure reason.
Client certificates generated by kubeadm
expire after 1 year (consult
official
guide).
You can renew your certificates manually at any time with following
commands
# If you didn't save the executable in the $PATH
$ which kubeadm
# You can use the check-expiration sub-command to check when certificates expire
$ kubeadm certs check-expiration
# The command renew, with the sub-command all, can renew all certificates
$ kubeadm certs renew all
# Export KUBECONFIG again (admin.conf has been modified) and try any command
$ export KUBECONFIG=/etc/kubernetes/admin.conf
Note
If you have older versions of the cluster (i.e. < 1.30) you need to use
the kubeadm certs check-expiration
command and kubeadm certs
renew all
.