Deploy a Kubernetes cluster (sys-admin nomination required)

Table of contents

Deploy a Kubernetes cluster (sys-admin nomination required)

Prerequisites

The user has to be registered in the IAM system for INFN-Cloud. Only registered users can login into the INFN-Cloud dashboard.

User responsibilities

The solution described in this guide consists of the deployment of a Kubernetes cluster on top of Virtual Machines instantiated on INFN-Cloud infrastructure. The instantiation of a VM comes with the responsibility of maintaining it and all of the services it hosts. In particular, be careful when updating the operating system packages, as they could incorrectly modify the current version of the cluster and cause it to malfunction.

Please read the INFN Cloud AUP in order to understand the responsibilities you have in managing this service.

Kubernetes cluster configuration

!!! note If you belong to multiple projects, aka multiple IAM-groups, after login into the dashboard, from the lower left corner, select the one to be used for the deployment you intend to perform. Not all solutions are available for all projects. The resources used for the deployment will be accounted to the respective project, and impact on their available quota. See figure below.

<figure> <img src="/users_guides/img/project_selection.png"
class="with-border with-borderalign-center"
alt="/users_guides/img/project_selection.png" />
</figure>

Select the "Kubernetes cluster" button and then "configure". The configuration menu shows only your projects allowed to instantiate it.

Once done the configuration form appears. Parameters are split in two pages: "Basic" and "Advanced" configuration. Select either Automatic or Manual scheduling as shown below:

In the first case, the Orchestrator will take care of choosing the best available provider, in the other case it will be performed a direct submission towards one of the providers available, to be selected from the drop-down menu. In the case of manual scheduling, the flavors displayed on the next page will be those offered by the chosen provider.

/users_guides/img/k8s_img/k8s_config.png

Basic configuration

Default parameters are ready for the submission of a cluster composed by 1 master and 1 slave. If you selected the Automatic Scheduling mode, the provider where the cluster will be instantiated is automatically selected by the INFN Cloud orchestrator service.

The user has to specify:

the flavor of master and slave selecting the number of VCPUs and memory size (RAM)
the number of slaves if more than one in needed
admin_token: the password that will be used to access the Grafana dashboards.

If needed, a single port or a range can be specified to be open on the master. By policy the open ports on providers are higher than 8000.

Advanced configuration

/users_guides/img/k8s_img/k8s_advanced.png

The user can select:

the timeout for the deployment
"no cluster deletion" in case of failure
don't send the confirmation email when complete
the manual scheduling, selecting the provider where the cluster will be created. The list of providers is related to the project.

/users_guides/img/k8s_img/k8s_manual.png

Deployment result

To check the status of the deployment and its details select the "deployments" button. Here all the user's deployments are reported with "deployment identifier", "status", "creation time", the "resources provider" and the button "Details".

/users_guides/img/k8s_img/k8s_deployment_list.png

For each deployment the button "Details" allows:

to delete the cluster
to show the TOSCA template of the cluster (with the default values)
to retrieve the deployment log file that contains error messages in case of failure
to lock the deployment

If the creation of a deployment fails, an additional option (retry) is introduced in the dropdown menu, allowing the user to resubmit the deployment with the same parameters:

/users_guides/img/create_failed.png — Figure 8: Deployment creation failed

If the deletion of a deployment fails, resulting in the status being set to DELETE_FAILED, the "delete (force)" button is displayed in the list of available actions, allowing the user to force the deletion of the deployment:

/users_guides/img/delete_failed.png — Figure 9: Deployment deletion failed

Clicking on the "deployment identifier" or on "Details" button the details of the deployed cluster are shown:

the "Overview" of the cluster
the "Input Values" used for the cluster configuration
the "Output Values" to access the cluster, as the Kubernetes and Grafana dashboard endpoints, the kubeconfig file to download and the FloatingIP to access the created VMs. To access the Kubernetes dashboard, use the token in the kubeconfig or the kubeconfig file itself.

/users_guides/img/k8s_img/k8s_output01.png

/users_guides/img/k8s_img/k8s_output02.png

/users_guides/img/k8s_img/k8s_output03.png

InterLink

InterLink is a technology that handles the transparent offloading of Kubernetes workloads to remote computation systems.

From the INFN Cloud dashboard, select the second option \"Configure a Kubernetes cluster with a Virtual Node Deploy an InterLink Virtual Node\"

/users_guides/img/k8s_img/select_interlink.png

After selecting either the automatic or manual scheduling mode, in the configuration page you will be presented with the following additional tabs:

Virtual Node

Plus, one of the following, according to the selected architecture you want to deploy (see later):

InterLink-In-Cluster
InterLink-Edge
InterLink-Tunneled

Virtual Node

In this tab you configure the capacity of the Virtual Node, i.e. the total amount of resources available on that node:

number of CPU cores;
memory size in GiB;
max number of schedulable PODs;
number of nVidia GPUs.

Such node will appear in the list of available cluster nodes, and PODs scheduled to run on it will be offloaded for execution to the remote computation system.

In this tab you can also choose the InterLink architecture to deploy:

In Cluster with K8s plugin
Edge
Tunneled

Details about the supported architectures can be found in the InterLink Cookbook.

According to the selected architecture, you must provide configuration options in the corresponding tab.

InterLink-In-Cluster

/users_guides/img/k8s_img/in_cluster.png

In the InterLink-In-Cluster case, all InterLink components are run in the local cluster (i.e., the Kubernetes cluster that you are going to deploy in the INFN Cloud PaaS infrastructure).

Currently, the only available plugin is the InterLink Kubernetes Plugin that allows PODs submitted to the local cluster to be offloaded to a remote Kubernetes cluster.

Accordingly, the plugin requires the kubeconfig YAML to access the remote cluster, and some optional parameters:

KUBECONFIG YAML (required): the Kubeconfig to access the remote cluster.
- Kubeconfig must be provided in YAML format (notice that certificates and keys, if provided, must be given inline, see InterLink Kubernetes Plugin - 401-unauthorized for some notes).
CLIENT CONFIGURATION (optional): option to set properties of the underlying python Kubernetes client configuration object.
- This is a JSON field (take care of using single quotes).
- E.g. set the following value if your remote cluster is using self-signed certificates: {'verify_ssl': false}
NAMESPACE PREFIX (optional, defaults to \'offloading\'): remote cluster namespace prefix where resources are offloaded.
NODE SELECTOR (optional): remote workloads node selector, if you want to offload resources to selected nodes.
- This is a JSON field (take care of using single quotes).
- E.g. let\'s assume the remote cluster has GPU accelerated nodes labeled according to the GPU model, e.g. some nodes are labeld with nvidia/gpu-model: T4, if you want to offload PODs on such nodes set the following value: {'nvidia/gpu-model': 'T4'}
NODE TOLERATIONS (optional): remote workloads node tolerations, if you want to offload resources to tainted nodes.
- This is a JSON field (take care of using single quotes).
- E.g., let\'s assume the GPU nodes on the remote cluster are tainted with nvidia.com/gpu=present:NoSchedule, if you want to offload PODs on such nodes set the following value: [{'key': 'nvidia.com/gpu', 'operator': 'Exists', 'effect': 'NoSchedule'}]
GATEWAY PORT (optional, defaults to 30222): port to reach the Gateway\'s SSH daemon when offloading POD microservices (see Microservices Offloading). If you plan to deploy POD microservices, you must open such port to public traffic:

/users_guides/img/k8s_img/port_interlink.png

After the Kubernetes cluster with InterLink components has been deployed, you can perform some checks.

Check node capacity

Check if the node has the expected capacity:

$ kubectl describe node ivk-in-cluster-k8s

you should get an output like the following:

[...]
Capacity:
cpu:             10
memory:          256Gi
nvidia.com/gpu:  1
pods:            10
[...]

Offload a POD:

In the remote cluster, export KUBECONFIG with the content of the local kubeconfig file:

$ export KUBECONFIG=/etc/kubeconfig

Submit a POD to the local cluster:

$ cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: test-pod
  namespace: default
spec:
  containers:
  - name: test-container
    image: busybox
    command: ["sh", "-c", "while true; do date; sleep 3; done"]
    resources:
      limits:
        memory: "32Mi"
        cpu: "10m"
  nodeSelector:
    type: virtual-kubelet
  tolerations:
  - key: "virtual-node.interlink/no-schedule"
    operator: "Equal"
    value: "true"
    effect: "NoSchedule"
EOF

The POD is actually being executed in the remote cluster, but you can get the logs querying the local cluster:

$ kubectl logs test-pod
Thu Jan 23 08:32:41 UTC 2025
Thu Jan 23 08:32:44 UTC 2025
Thu Jan 23 08:32:47 UTC 2025

InterLink-Edge

In the Edge-node case, the Virtual Kubelet is run in the local cluster, while the InterLink API Server is run outside of the cluster.

In the corresponding tab, you provide to the Virtual Kubelet the parameters to reach the InterLink API Server and authenticate:

INTERLINK URL: URL of the InterLink API Server;
INTERLINK PORT: port to reach the InterLink API Server;
INTERLINK SECURE: Whether to enable/disable API Server certificates verification (for development, if you are using self-signed certificates);
OAUTH: OAuth2 Client configuration (see official guide).

Details can be found in the official InterLink Cookbook.

InterLink-TUNNELED

In the Tunneled case, the Virtual Kubelet and the API Server are run in the local cluster and an SSH socket tunnel is opened to allow communication with the plugin.

/users_guides/img/k8s_img/k8s_tunneled.png

Useful links

InterLink documentation (link).
InterLink Kubernetes Plugin (link).
Helm chart to deploy InterLink on a Kubernetes Cluster (link).
Ansible role to deploy the Helm chart (link).

Upgrade

To upgrade the RKE2 version, run the following commands as the root user, one node at a time (see the Manual Upgrade section of the official RKE2 guide for more details). Upgrading the RKE2 version will involve, in addition to the obvious Kubernetes cluster upgrade, updating the software components included in the package as well (e.g., ETCD, Containerd, Ingress Controller, CNI). It is therefore recommended to review the release notes of the target version carefully. It is also advisable to reach the desired RKE2 version in small steps, that is, one minor release at a time (for example, to move from v1.33.6+rke2r1 to v1.35.1+rke2r1, first upgrade to v1.34.Z+rke2rN). After upgrade, make sure the service is healthy. In case of issues, try restarting the service. In general, it is also recommended to perform an operating system update, followed by a reboot. The commands are different for the two node types.

For server (or master) node

$ curl -sfL https://get.rke2.io | INSTALL_RKE2_VERSION=vX.Y.Z+rke2rN sh -
# Check the service status and restart it if needed.
$ systemctl status
$ rke2-server systemctl restart rke2-server

For agent (or worker) node

$ curl -sfL https://get.rke2.io | INSTALL_RKE2_TYPE=agent INSTALL_RKE2_VERSION=vX.Y.Z+rke2rN sh -
# Check the service status and restart it if needed.
$ systemctl status
$ rke2-agent systemctl restart rke2-agent

Troubleshooting

In both the cases (auto and manual scheduling) the success of creation depends on the provider resources availability. Otherwise a "no quota" is reported as failure reason.

Client certificates generated by kubeadm expire after 1 year (consult official guide). You can renew your certificates manually at any time with following commands

# If you didn't save the executable in the $PATH
$ which kubeadm
# You can use the check-expiration sub-command to check when certificates expire
$ kubeadm certs check-expiration
# The command renew, with the sub-command all, can renew all certificates
$ kubeadm certs renew all
# Export KUBECONFIG again (admin.conf has been modified) and try any command
$ export KUBECONFIG=/etc/kubernetes/admin.conf

!!! note If you have older versions of the cluster (i.e. < 1.30) you need to use the kubeadm certs check-expiration command and kubeadm certs renew all.