Easily Connect to Dask from Outside of Saturn Cloud
Sometimes you’re running code and you come across such a slow function that you wish that you could run it on some other, faster machine (or set of machines in parallel!). Maybe you’re trying to train 1000 neural networks to generate memes at the same time. Or maybe you have to download 5,000 individual csvs and aggregate each one before combining them. Dask is a great tool for Python users to do such a thing–it lets you pass Python code to a cluster of workers to execute in parallel. But setting up a Dask cluster takes effort, and depending on how you set it up, you may have to switch from whatever environment your code previously was executing on to another environment (like one on the cloud). But with Saturn Cloud, you can easily rely on the power of Dask, all from whatever computing environment your code calls home.
Dask works by having the client running the primary code call a scheduler, which passes tasks to workers to execute. Those tasks can be any Python commands–from PyTorch model training to the Dask libraries that mimic Pandas in a distributed way. But what’s even better is that client can be anywhere. In Saturn Cloud we set up a Jupyter Server so you can have your notebooks and code entirely in the cloud, but you could just as easily have that client be:
Or any other location where you could run Python
To use Saturn Cloud from another location, there are only a few steps:
Install client libraries
You need to set up the client Python to be able to communicate with Dask on Saturn Cloud. This means you need an environment that exactly matches the Dask cluster’s. It’s especially important that you install the same versions of dask, dask.distributed and dask-saturn.
pip install dask==2.30.0 distributed==2.30.1 dask-saturn==0.2.2
Then, in Python create an external Saturn Cloud connection on the client. You need the user_token
and project_id
from
Saturn Cloud so that the platform knows who you are and where the code should execute (see the docs on how to get these).
from dask_saturn import ExternalConnection, SaturnCluster
from dask.distributed import Client
conn = ExternalConnection(
project_id="[project_id]",
base_url='https://app.community.saturnenterprise.io',
saturn_token="[user_token]"
)
Lastly, create the Dask cluster from the client and wait for it to be online.
cluster = SaturnCluster(
external_connection=conn,
n_workers=4,
worker_size='8xlarge',
scheduler_size='2xlarge',
nthreads=32,
worker_is_spot=False,
)
client = Client(cluster)
client.wait_for_workers(4)
And that’s it! You now have a working Dask cluster within Saturn Cloud that you can call from anywhere! You can monitor the cluster performance and schedule jobs and deployments from the Saturn Cloud app. Check out our getting started documentation for more guides, and consider whether our Saturn Hosted Free, Saturn Hosted Pro, or Enterprise plan is best for you!
About Saturn Cloud
Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Request a demo today to learn more.
Saturn Cloud provides customizable, ready-to-use cloud environments for collaborative data teams.
Try Saturn Cloud and join thousands of users moving to the cloud without
having to switch tools.