Deploy a Spark cluster + Jupyter notebook (sys-admin nomination required)¶
Table of Contents
Prerequisites¶
The user has to be registered in the IAM system for INFN-Cloud https://iam.cloud.infn.it/login. Only registered users can login into the INFN-Cloud dashboard https://my.cloud.infn.it/login.
User responsibilities¶
Important
The solution described in this guide consists on the deployment of a Spark cluster on top of a Virtual Machine instantiated on INFN-CLOUD infrastructure. The instantiation of a VM comes with the responsibility of maintaining it and all the services it hosts.
Please read the INFN Cloud AUP in order to understand the responsibilities you have in managing this service.
Spark cluster configuration¶
Note
If you belong to multiple projects, aka multiple IAM-groups, after login into the dashboard, from the upper right corner, select the one to be used for the deployment you intend to perform. Not all solutions are available for all projects. The resources used for the deployment will be accounted to the respective project, and impact on their available quota. See figure below.
After the selection of the project choose the “Spark + Jupyter cluster” button from the list of available solutions.
The configuration menu is shown. Parameters are split in two pages: “Basic” and “Advanced” configuration.
Basic configuration¶
Default parameters are ready for the submission of a cluster composed by 1 master and 1 slave both with 4CPU and 8GB RAM. By default the provider where the cluster will be instantiated is automatically selected by the INFN-Cloud Orchestrator Service.
The user must specify (see fig.1)
- a human readable name for your deployment (max 50 characters)
- a password that will be required to access the Kubernetes dashboard and the Grafana monitoring as admin user
- the number of slaves and the RAM and CPU value of both master and slaves
- optionally, a S3 storage endpoint and a list of its buckets to be mounted as persistent storage on the Jupyter notebook
Advanced configuration¶
The user can select (see fig.2)
- the timeout for the deployment
- “no cluster deletion” in case of failure
- the automatic or manual scheduling, that selects the provider where the cluster will be created
- send a confirmation email when complete
Deployment result¶
To check the status of the deployment and its details select the “deployments” button. Here all the user’s deployments are reported with “deployment uuid”, “status”, “creation time” and the “provider” (see fig.3).
For each deployment the button “Details” allows:
- to get the details of deployment: overview info, input values and output values as the Kubernetes dashboard and Jupyter notebook endpoints (see fig. 4a)
- to edit the description of the deployment
- to retrieve the deployment log file that contains error messages in case of failure
- to show the TOSCA template of the cluster
- to request new ports to be opened
- to retrieve VM details (see fig. 4b for an example)
- to delete the cluster
- to lock the deployment (it makes disappear the Delete action)
Use Spark from Jupyter¶
Clicking on the jupyter_endpoint link you’ll be asked to authenticate with IAM and choose the size of your personal Jupyter server (see fig. 5).
This will start a Jupyter notebook with your S3 bucket(s) mounted on the file-system, as shown in fig. 6.
You can then upload your preferred notebook (or take one previously uploaded in your S3 bucket) and open it in Jupyter. Click on the star button (shown in fig. 7) to connect with the underlying cluster by creating the Spark Context and Session.
In the Spark clusters connection box you can specify the Spark configuration, as shown in fig. 8.
After clicking the Connect button and waiting a few seconds, you’ll see the connection details as shown in fig. 9
Troubleshooting¶
In both the cases of auto and manual scheduling, the success of creation depends on the provider resources availability. Otherwise a “no quota” is reported as failure reason.
Known issues: the Jupyter notebook takes time to start, sometimes it could fail due to a timeout. In this case, go back to control panel and restart the notebook.
Contact for support: cloud-support@infn.it
Resource Availability Less Than Requested For a Spark Server¶
User may request resources for a Spark server that are not available in Kubernetes Cluster. In this case, a message will be prompted warning that there are insufficient CPU and/or memory. During this period, it is not possible to cancel the deployment using JupyterHub UI.
Jupyter returns Spawn failed error after 600 seconds. After that, the user can redeploy the Server.