Setting up HTTPS and SSL for JupyterHub
JupyterHub provides a shared computational environment for Data Science teams and other groups of users, allowing for customized collaboration that scales for big data. Importantly, it also allows for a single place to implement security protocols. In this post, we will go over some basic measures you can take to secure your JupyterHub deployments.
In our previous blog post on JupyterHub, we walked through the basic deployment steps for The Littlest JupyterHub (TLJH) and Zero-to-JupyterHub (ZTJH). Our recommendation for anyone looking to deploy JupyterHub as a data science platform in production was to use ZTJH. We’ll assume you’re using that for this blog post.
Once you have Zero-JupyterHub up and running, security is the top priority. You should feel confident that your data science platform is safe and that your users can access it easily. In this post, we strive to not only show how to secure your JupyterHub with HTTPS and SSL, but why each of these steps is important. When we’re done, you will have the most common security measures in place to keep bad actors out.
Saturn Cloud provides customizable, ready-to-use cloud environments for collaborative data teams.
Try Saturn Cloud and join thousands of users moving to the cloud without
having to switch tools.
Reminder: the helm upgrade command
As described in the previous post, Helm is the Kubernetes package manager used to install and update JupyterHub running on our Kubernetes cluster and in our case deployed on AWS EKS.
When we update config.yaml
, we will run the helm upgrade
command given below. We will refer back to it throughout the blog post:
helm upgrade --cleanup-on-fail \
<your-release-name> jupyterhub/jupyterhub \
--namespace <your-namespace> \
--version=<JH-helm-chart-version> \
--values config.yaml
NOTE: In our previous post, we recommended that you save your values, those in brackets <...>
, as comments in your config.yaml
.
Values included in the helm upgrade
command:
<your-release-name>
- given that the same “chart” (package) can be installed multiple times on the same Kubernetes cluster, this release name is simply a way of distinguishing between those different installations.- In our case, we used
ztjh-release
.
- In our case, we used
<your-namespace>
- this is the Kubernetes namespace that JupyterHub will be created in. If that namespace doesn’t exist, it will create it for you.- In our case, we went with
ztjh
.
- In our case, we went with
<JH-helm-chart-version>
- each version of JupyterHub is associated with a Helm chart version. Reference this document for more details.- In our case, because we are deploying JupyterHub version 1.5, we use Helm chart version
1.2.0
.
- In our case, because we are deploying JupyterHub version 1.5, we use Helm chart version
Security and HTTPS
From our first blog post, our ZTJH deployment is up and running, but in its most basic form. To login as a user, we have to navigate to the EXTERNAL-IP
. That is a long and confusing URL string that AWS provided. Let’s use an easier domain name instead.
We will first get a new domain name that is short and easy to remember. Then we will set up automatic HTTPS by creating a Let’s Encrypt certificate, which auto-renews every few months. This will keep our friendly domain name secure behind HTTPS.
HTTP stands for hyper-text transfer protocol. It is the standard protocol used to transfer data over the internet. HTTPS is simply the encrypted or secured (hence the “S”) extension of HTTP. By using HTTPS you can guard the connection from third parties being able to read it. We establish the secure connection using transport layer security, or TLS.
Register your domain name
The JupyterHub documentation for this step is quite sparse. This is because of how many different domain providers there are. To give you a sense of how to do this with your provider, we will walk through each step of the process with hover.com as an example. First buy the domain name you would like to use. In our case, we chose “demohub.tech”, which at the time of this writing was on sale for five bucks.
1. Create a CNAME record for your domain
With a newly purchased domain, create a “CNAME” record that points to the EXTERNAL-IP
. A “CNAME”, or Canonical Name, is a DNS record that points to another domain name, in our case the one provided by AWS, whereas an A-record points to an IP address. How you do this depends on which domain provider you’re using.
For our hover.com example, we will first navigate to the “DNS” tab, then select “ADD A RECORD”.
For the DNS record, use these options:
- “TYPE”, select “CNAME”
- “HOSTNAME”, choose a hostname. In our case, we selected “my.demohub.tech”
- If you’d like to use the domain name without any prefix, enter “@”.
- “TARGET”, paste the
EXTERNAL-IP
URL from AWS.
2. Wait for the DNS to propagate
DNS records take time to be updated on the servers, so be patient while that happens over the next few minutes (or hours in some cases). For those interested to learn more on how DNS works, have a read through this amusing comic.
You will know when the DNS changes have propagated successfully when you can access your JupyterHub from your new domain.
NOTE: It’s CRITICAL that you wait for these changes to propagate before proceeding.
Add Let’s Encrypt certificate
Now that we can access our JupyterHub from an easy domain name, we’ll add a TLS certificate to increase security even more. Just as the JupyterHub docs outline, we will use Let’s Encrypt for us.
1. Update the config
Update the config.yaml
that you used for your initial deployment by adding the following:
proxy:
https:
enabled: true
hosts:
- <your-domain-name>
letsencrypt:
contactEmail: <your-email-address>
In our example, our domain name is my.demohub.tech
.
2. Run helm upgrade
Run the helm upgrade
command.
Wait a few minutes and then navigate to your domain. You should see that your JupyterHub is further secured by TLS, represented by the little lock symbol next to your domain name in the browser. You may also notice that the address no longer starts with http
, but instead with https
.
Conclusion
We covered a few of the most important and common security topics that should be considered for any JupyterHub deployment. For more on additional security topics not covered here, feel free to review the security section of the Zero-to-JupyterHub docs.
Ultimately, we hope this blog helped you understand the steps needed to provide a base level of security, and some of the reasons each piece helps to keep your JupyterHub safe. It is certainly important to consider security early in your deployment so that you can establish the necessary protocols before your users log in. By properly securing your data science platform, you can prevent vulnerabilities that bad actors can exploit.
Saturn Cloud provides customizable, ready-to-use cloud environments for collaborative data teams.
Try Saturn Cloud and join thousands of users moving to the cloud without
having to switch tools.
Check out other resources on setting up JupyterHub:
- Setting up JupyterHub
- Setting up JupyterHub Securely on AWS
- Using JupyterHub with a Private Container Registry
- Setting up JupyterHub with Single Sign-on (SSO) on AWS
- List: How to Setup Jupyter Notebooks on EC2
- List: How to Set Up JupyterHub on AWS
About Saturn Cloud
Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Request a demo today to learn more.
Saturn Cloud provides customizable, ready-to-use cloud environments for collaborative data teams.
Try Saturn Cloud and join thousands of users moving to the cloud without
having to switch tools.