Deploy a Spark cluster + Jupyter notebook (sys-admin nomination required)
Prerequisites
The user has to be registered in the IAM system for INFN-Cloud https://iam.cloud.infn.it/login. Only registered users can login into the INFN-Cloud dashboard https://my.cloud.infn.it/login.
User responsibilities
The solution described in this guide consists on the deployment of a Spark cluster on top of a Virtual Machine instantiated on INFN-CLOUD infrastructure. The instantiation of a VM comes with the responsibility of maintaining it and all the services it hosts.
Please read the INFN Cloud AUP in order to understand the responsibilities you have in managing this service.
Spark cluster configuration
Note
If you belong to multiple projects, aka multiple IAM-groups, after login into the dashboard, from the upper right corner, select the one to be used for the deployment you intend to perform. Not all solutions are available for all projects. The resources used for the deployment will be accounted to the respective project, and impact on their available quota. See figure below.
After the selection of the project choose the "Spark + Jupyter cluster" button from the list of available solutions.

The configuration menu is shown. Parameters are split in two pages: "Basic" and "Advanced" configuration.
Basic configuration
Default parameters are ready for the submission of a cluster composed by 1 master and 1 slave both with 4CPU and 8GB RAM. By default the provider where the cluster will be instantiated is automatically selected by the INFN-Cloud Orchestrator Service.
The user must specify (see Figure 1)
- a human readable name for your deployment (max 50 characters)
- certificate_type:
: - letsencrypt-prod
meant for production purposes and must be
used only for that reason, as there is a limited number of them.
Once this limit is reached, the deployment will fail and it will
take several days until these certificates become available
again.
- letsencrypt-staging
suggested for testing purposes, as they
are not limited in number
- selfsigned
a self-signed type certificate
- a password that will be required to access the Kubernetes dashboard and the Grafana monitoring as admin user
- the number of slaves and the RAM and CPU value
- number of CPUs for K8s node of slaves
- memory size for K8s node VM
- disk size for K8s node VM
- optionally, a S3 storage endpoint (http://endpoint:9000) and a list of its buckets to be mounted as persistent storage on the Jupyter notebook
- disk size for K8s node VM
- Number of vCPUs and memory size of the k8s master VM

Advanced configuration
The user can select (see Figure 2)
- the timeout for the deployment
- "no cluster deletion" in case of failure
- send a confirmation email when complete

Deployment result
To check the status of the deployment and its details select the "deployments" button. Here all the user's deployments are reported with "deployment uuid", "status", "creation time" and the "provider" (see Figure 3).

For each deployment the button "Details" allows:
- to get the details of deployment: overview info, input values and output values as the Kubernetes dashboard and Jupyter notebook endpoints (see Figure 4a)
- to edit the description of the deployment
- to retrieve the deployment log file that contains error messages in case of failure
- to show the TOSCA template of the cluster
- to request new ports to be opened
- to retrieve VM details (see Figure 4b for an example)
- to delete the cluster
- to lock the deployment (it makes disappear the Delete action)
If the creation of a deployment fails, an additional option (retry) is introduced in the dropdown menu, allowing the user to resubmit the deployment with the same parameters:

If the deletion of a deployment fails, resulting in the status being set
to DELETE_FAILED
, the "delete (force)" button is displayed in the
list of available actions, allowing the user to force the deletion of
the deployment:



Use Spark from Jupyter
Clicking on the jupyter_endpoint
link you'll be asked to authenticate
with IAM and choose the size of your personal Jupyter server (see Figure
6).

This will start a Jupyter notebook with your S3 bucket(s) mounted on the file-system, as shown in Figure 7.

You can then upload your preferred notebook (or take one previously uploaded in your S3 bucket) and open it in Jupyter. Click on the star button (shown in Figure 8) to connect with the underlying cluster by creating the Spark Context and Session.

In the Spark clusters connection box you can specify the Spark configuration, as shown in Figure 9.

After clicking the Connect button and waiting a few seconds, you'll see the connection details as shown in Figure 10.

Troubleshooting
In both the cases of auto and manual scheduling, the success of creation depends on the provider resources availability. Otherwise a "no quota" is reported as failure reason.
Known issues: the Jupyter notebook takes time to start, sometimes it could fail due to a timeout. In this case, go back to control panel and restart the notebook.
Contact for support: cloud-support@infn.it
Resource Availability Less Than Requested For a Spark Server
User may request resources for a Spark server that are not available in Kubernetes Cluster. In this case, a message will be prompted warning that there are insufficient CPU and/or memory. During this period, it is not possible to cancel the deployment using JupyterHub UI.

Jupyter returns Spawn failed error after 600 seconds. After that, the user can redeploy the Server.
