Run JupyterHub on a single VM enabling Notebooks persistence (sys-admin nomination required)

Prerequisites

Make sure you are registered to the IAM system for INFN-CLOUD https://iam.cloud.infn.it, as described in the Getting Started guide. Only registered users can login into the INFN-CLOUD dashboard https://my.cloud.infn.it.

Access to the INFN-CLOUD dashboard enables users to instantiate a JupyterHub service on a single VM, providing Notebooks with data persistence.

Important

This solution requires the instantiation of a JupytherHub service on top of a newly created virtual machine (VM). You will have complete control, administration rights, on the respective service and VM becoming a service administrator.

Please read the INFN Cloud AUP in order to understand the responsibilities you have in managing this service.

How to deploy and access the JupyterHub service

Step 1 - Connecting and authenticating to the INFN-CLOUD dashboard

Connect to the INFN-CLOUD dashboard (https://my.cloud.infn.it/):

Fig 1: INFN-CLOUD welcome dashboard

Fig 1: INFN-CLOUD welcome dashboard

You need to authenticate with the credentials used for the IAM account (https://iam.cloud.infn.it/login).

INFN-CLOUD IAM login

Fig2: INFN-CLOUD IAM login

Step 2 - Select and Configure the JupyterHub service

First of all make sure to select which project, among those you belong to, your application should be deployed.

Project Selection

After logging into the dashboard, select the “Jupyter with persistence for Notebooks” card in the service catalog and click on the Configure button.

dashboard

After that you will have to configure your deployment. The deployment definition window consists of three tabs: “General”, “Authorizations” and “Advanced”. Before continuing please fill the first mandatory field - the “Deployment description”. You will not be able to submit your deployment without it!

Deployment description

“General” TAB

In this tab, all fields have default values but can be changed if desired. You can fill the following fields:

  • num_cpus

    • Number of virtual CPUs for the VM that will host the Jupyter service. The default value is 2.
  • mem_size

    • Amount of memory for the VM in GB. The default value is 4.
  • enable_monitoring

    It is disabled by default.

  • jupyter_images

    • Default value: “harbor.cloud.infn.it/datacloud-templates/snj-base-lab-persistence”. If you want to build and use your own JupyterHub images, you can follow the dedicated guide.
  • jupyterlab_collaborative

    • enable the new collaborative editing feature that allows collaboration in real-time between multiple users. It is disabled by default. See JupyterLab documentation for more information
  • jupyterlab_collaborative_image

    • “harbor.cloud.infn.it/datacloud-templates/snj-base-labc” is the default image for JupyterLab collaborative feature.
  • ports

    • List of additional ports to be opened. By default, and you don’t need to specify them, the deployment will have the following TCP ports accessible: 22 (ssh to the host VM), 3000 (for grafana dashboard), 8888 (for JupyterHub), 8889 (for jupyter collaborative)
General tab

Note

Please be aware that this solution is only available for the Ubuntu 20.04 operating system.

“Authorization” TAB

You can decide to authorize INFN Cloud user groups by filling:

  • iam_groups
    • user groups that are allowed to access JupyterHub services.
  • iam_admin_groups
    • user groups that are allowed to administrate JupyterHub.

Note

INFN Cloud (https://iam.cloud.infn.it) is the IAM identity provider.

Authorization tab

“Advanced” TAB

Advanced Tab

Advanced parameters can be configured here:

  • Configure Scheduling

    • Automatic (Default)
      • The system will choose the most suitable cloud provider for the deployment
    • Manual
    • A resource provider can be selected from the list of available cloud sites
List of available cloud resources providers Tab

The following extra parameters can be set as well:

  • Deployment creation timeout (minutes)
    • If specified the deployment will fail when the timeout is reached
  • Do not delete the deployment in case of failure
  • Send a confirmation email when complete

Step 3 - Submitting the deployment

Once all the parameters have been set, you can click on the “Continue” button. After that an overview of the deployment will be shown.

Deployment Check

Now you can submit your application and you will be redirected to the list of your deployments from where you can follow the evolution of the new deployment.

Deployment List

Step 4 - Access your application

On successful completion (“CREATE_COMPLETE”),

  • an e-mail is send to notify you on the status of the deployment, completed or failed
Deployment notification e-mail
  • you can check your deployment outputs by clicking on the “Details” button and then on the “Output values” Tab.
Deployment Overview
Deployment Output Values

Use the reported IP address to connect to the services you deployed.

How to change the authorized IAM group

If you deployed an instance of JupyterHub with persistence of Notebooks and want to change the name of the IAM group that users must be members of to have access granted, you need to update the file located in /usr/local/share/dodasts/jupyterhub/compose.yaml. Here is an example of its content:

version: "3.9"

services:
  jupyterhub:
    depends_on:
      - http_proxy
    [...]
    environment:
      - [...]
      - OAUTH_GROUPS=users/example admins/example
      - ADMIN_OAUTH_GROUPS=admins/example
      - [...]

In the example, the OAUTH_GROUPS environment variable is used to define the IAM groups of users that granted user-role access within the JupyterHub instance, while the ADMIN_OAUTH_GROUPS environment variable defines the IAM group of users with admin-role access. Multiple groups can be defined, separated by a space `` `` character.

Furthermore, to make the change effective, a restart of the service has to be performed:

cd /usr/local/share/dodasts/jupyterhub/
docker-compose down || docker compose down
docker-compose up -d || docker compose up -d