Deploy JupyterHub
Why Deploy Your Own JupyterHub?
JupyterHub provides a multi-user Jupyter notebook environment that allows you to:
- Share computational resources with your team, class, or research group
- Standardize environments across multiple users with consistent packages and configurations
- Control access to specific users or institutions through authentication
- Customize the environment with your own software stack, packages, and tools
- Scale resources based on your specific needs (CPU, memory, GPU)
- Integrate with NRP infrastructure for seamless access to cluster resources
This guide is based on the Zero to Jupyter guide with configurations specific to the Nautilus cluster. You must be the admin of the namespace you’re deploying to.
Initial Setup
Start from choosing the name for your project. It will look like your_name.nrp-nautilus.io
Register CiLogon application
Register your application at https://cilogon.org/oauth2/register.
Set the callback url to https://your_name.nrp-nautilus.io/hub/oauth_callback
Client Type: Confidential
Scopes: org.cilogon.userinfo,openid,profile,email
Refresh Tokens: No
Save the client ID and Secret.
Create the namespace
Create a namespace for your project on Nautilus portal and annotate it with all information.
Configuration
Install helm and download the helm chart
Follow the install guide, and use this template for the config:
- Run
openssl rand -hex 32
and replace thesecret_token
in the yaml file with the generated key - Minimally set the
client_id
,client_secret
,admin_users
,secret_token
,oauth_callback_url
,ingress.hosts
fields. - Add security with either
allowed_idps
orallowed_users
. Do NOT leave your JupyterHub instance open for anyone to sign in, this may result in locking of your namespace. You may find your IDP at https://cilogon.org/idplist helm repo add jupyterhub https://jupyterhub.github.io/helm-chart/ && helm repo update
helm upgrade --cleanup-on-fail --install jhub jupyterhub/jupyterhub --namespace <namespace> --version=3.3.7 --values config.yaml
Once the pods start, you should be able to see the installation under your selected name.
Automatic deployment
You can put your JupyterHub configuration in GitLab and automatically redeploy the application on repository changes. Please refer to this guide for details.
Customization
Understanding the Values Template
The template values file provides a comprehensive starting point with:
- Pre-configured profiles for different scientific domains (Python, R, Julia, TensorFlow, PyTorch, etc.)
- Resource limits and guarantees for CPU and memory
- Storage configuration with Ceph block storage
- Authentication settings for CILogon integration
- Ingress configuration for external access
Adding Your Own Container Image
To add your own custom container image to JupyterHub, you need to modify the profileList
section in your values.yaml
:
profileList: - display_name: "My Custom Environment" kubespawner_override: image_spec: "your-registry.com/your-org/your-image:tag" default: false # Set to true if you want this as the default
Key fields to modify:
display_name
: What users see in the profile selectorimage_spec
: Full path to your container imagedefault
: Whether this profile is selected by default
Creating Custom Container Images
Option 1: Extend Existing Images
Start with a base Jupyter image and add your packages:
FROM jupyter/minimal-notebook:latest
# Install system dependenciesUSER rootRUN apt-get update && apt-get install -y \ build-essential \ && rm -rf /var/lib/apt/lists/*
# Switch back to jovyan userUSER jovyan
# Install Python packagesRUN pip install --no-cache-dir \ pandas \ matplotlib \ scipy \ scikit-learn \ your-custom-package
Option 2: Build from Scratch
Create a completely custom image:
FROM python:3.11-slim
# Install system dependenciesRUN apt-get update && apt-get install -y \ build-essential \ git \ && rm -rf /var/lib/apt/lists/*
# Install JupyterHub requirementsRUN pip install --no-cache-dir \ jupyterhub \ notebook \ jupyterlab
# Install your scientific packagesRUN pip install --no-cache-dir \ numpy \ pandas \ matplotlib \ your-research-packages
# Create jovyan user (JupyterHub standard)RUN useradd -m -s /bin/bash jovyanUSER jovyanWORKDIR /home/jovyan
# Start JupyterCMD ["jupyter", "notebook", "--ip=0.0.0.0", "--port=8888"]
Building and Pushing Your Image
# Build the imagedocker build -t your-registry.com/your-org/your-image:tag .
# Push to registrydocker push your-registry.com/your-org/your-image:tag
Available registries on NRP:
- GitLab Container Registry:
gitlab-registry.nrp-nautilus.io/your-project/your-image:tag
- Docker Hub:
your-username/your-image:tag
- Quay.io:
quay.io/your-org/your-image:tag
Extending Existing Images
The first method to create your own custom software environment for the JupyterHub instance is to create your own Docker image to be used with the instance. The easiest way to get started is to use a pre-existing image, such as jupyter/minimal-notebook:latest
for a minimal Jupyter image or using a Scientific Image, then extending either image with the packages you need. If you have an existing image that you would like to make usable within JupyterHub, you will need to install the jupyterhub
and notebook
Python packages in your image.
The general format would be:
FROM <registry_url>/<organization>/<your_project>:<optional_tag>
# Install Packagespip install --no-cache-dir <packages>
An example of this would be:
FROM jupyter/minimal-notebook:latest
# Install packagespip install --no-cache-dir pandas matplotlib scipy
Once you create your Dockerfile, you can build your image locally and push it to a container registry such as gitlab-registry.nrp-nautilus.io
or have the GitLab build and push the image for you automatically.
Allowing custom Anaconda environments
Sometimes you may want to be able to create custom environments on the fly in your JupyterHub instance and allow them to persist across sessions. This can be useful for development, lab workflows, or exploration assignments in classes.
For this, you will need to complete two steps.
- Make sure your Jupyter image has
nb_conda_kernels
installed in the environment. - Create
.condarc
file in your$HOME
path and add the config below.envs_dirs:- /home/jovyan/my-conda-envs/
After those steps are complete, Anaconda environments can be created within the Jupyter session and persist across sessions after they close.
Adding images to your configuration
In other cases, you may want to set specific environments to use for different tasks, assignments, or labs to avoid many redundent environments.
The example values from earlier in this guide already has a few environments defined that provide a broad range of applications to use. To add your image to the list of available images, you will need to add the values below to your profileList:
- display_name: Name To Show kubespawner_override: image_spec: <registry_url>/<organization>/<your_project>:<optional_tag>
If it is to be the default image, add default: True
.
Shared Storage
If you are working with others on the same project or distributing data out for a class, you can add a PersisitentVolumeClaim
as a shared location across all of the pods in the JupyterHub instance.
For example, using the example from the Zero to JupyterHub guide, we can example the storage:
section of our example values to:
storage: type: dynamic extraLabels: {} # Change starts here extraVolumes: - name: jupyterhub-shared persistentVolumeClaim: claimName: jupyterhub-shared-volume extraVolumeMounts: - name: jupyterhub-shared mountPath: /home/shared # Change Ends capacity: 5Gi homeMountPath: /home/jovyan dynamic: storageClass: rook-ceph-block pvcNameTemplate: claim-{username}{servername} volumeNameTemplate: volume-{username}{servername} storageAccessModes: [ReadWriteOnce]
This would mount the shared storage to /home/shared
using the jupyterhub-shared-volume
PVC. Please note, that for the PVC to be used across multiple pods, the volume would need to have an appropriate access mode such as ReadOnlyMany
or ReadWriteMany
.
Authentication
Limit access to your University
As a basic step to help limit access to your JupyterHub instance and not leaving it wide open, you should set the allowed_idps
to be that of only your university and any other universities that your collaborators are from.
The available Identity providers (idps) are listed in this list from CILogon: https://cilogon.org/idplist/
Once on that page, search for your university by name or url. Once found, the idps value for the configuration will be under “EntityID”. The example values uses the University of Nebraska-Lincoln as the example. The “EntityID” for that university is https://shib.unl.edu/idp/shibboleth
.
Under each identity provider, you will also need to add the allowed_domains
, which will often be your university’s URL. For example, the University of Nebraska-Lincoln’s allowed_domains
would be unl.edu
.
Admin Users
Admin users can be set in the admin_users
list under JupyterHub
. Admins are identified by their email address used to log into Nautilus.
JupyterHub: admin_access: true
With the example config, admin users can access another user’s notebooks. If you want to disable this, set admin_access
to false
Allowed Users
Allowed users can be set in the allowed_users
list under JupyterHub
. Admins are identified by their email address used to log into Nautilus.
JupyterHub:
With the example config, admin users can access another user’s notebooks. If you want to disable this, set admin_access
to false
Culling Configuration
Required: All JupyterHub deployments must include culling configuration to automatically shut down idle user servers. This is mandatory for compliance with NRP cluster policies.
Basic Culling Setup
Add the following configuration to your values.yaml
file at the root level:
cull: enabled: true users: false removeNamedServers: false timeout: 3600 # 1 hour in seconds - Must be ≤ 21600 (6 hours) every: 600 # Check every 10 minutes concurrency: 10 # Number of parallel culling operations maxAge: 0 # No maximum age limit
Culling Parameters
enabled
: Set totrue
to enable cullingusers
: Set tofalse
to only cull servers, not user accountsremoveNamedServers
: Set tofalse
to preserve named serverstimeout
: Maximum idle time before culling (in seconds). Must be ≤ 21600 (6 hours)every
: How often to check for idle servers (in seconds)concurrency
: Number of parallel culling operationsmaxAge
: Maximum age of servers regardless of activity (0 = no limit)
Verification
After deployment, verify culling is working by checking the culling service logs:
kubectl logs -n <namespace> deployment/jupyterhub -c hub | grep cull
You should see periodic messages about culling operations.
Good Practices
When setting up a custom JupyterHub there are a couple of good practices you can implement to help keep the environment sustainable and secure.
Limit who has access
When deploying a JupyterHub instance, you should lock down the service to who should have access rather than leaving the instance open.
At minimum, it should be limited to your University.Further steps can be taken to limit it to specific individuals, which is recommended for labs and small classes.
Use Git
While developing your configuration, Git will be a useful tool to keep track of any changes you make. If something goes wrong with your configuration, using the history in Git will help revert any changes made that caused the issues. When working with the Nautilus cluster, you can use the hosted Gitlab instance to keep track of the changes and then automatically deploy the changes.
Documentation
As you start and continue to use your JupyterHub instance, it is strongly encouraged to keep a running docment of how the instance is setup and any workflows or assignments that run on it. This will help others use and maintain the instance, along with helping with future development and debugging.
