Logging
Logging in Dask
When writing code a natural method of keeping track of how code runs is through logging. Typically in Python, logging is done using the built in logging
module, like this:
import logging
logging.warning("This is a warning")
logging.info("This is non-essential info")
Unfortunately, if you try and use this style of logging from within a Dask Delayed function, you won’t see any output at all. You won’t see it in the console if you’re running a Python script nor will you see it after a cell within a Jupyter Notebook. This is also the case for print
calls–they won’t be captured if they are run within a Dask Delayed function. So an alternate approach is needed for logging within Dask.
Instead, to do logging we’ll need to use the distributed.worker
Python module, and import logger
. This will give us a logging mechanism that does work in Dask. Here is an example of it in action.
First, start the Dask cluster associated with your Saturn Cloud resource.
from dask_saturn import SaturnCluster
from dask.distributed import Client
client = Client(SaturnCluster())
After running the above command, it’s recommended that you check on the Saturn Cloud resource page that the Dask cluster as fully online before continuing. Alternatively, you can use the command client.wait_for_workers(3)
to halt the notebook execution until all three of the workers are ready.
Next is an example of a Dask command that logs the result in a way that can be saved. Notice the logger.info
call using the special logger
from distributed.worker
:
import dask
from distributed.worker import logger
@dask.delayed
def lazy_exponent(args):
x, y = args
result = x**y
# the logging call to keep tabs on the computation
logger.info(f"Computed exponent {x}^{y} = {result}")
return result
inputs = [[1, 2], [3, 4], [5, 6], [9, 10], [11, 12]]
outputs = [lazy_exponent(i) for i in inputs]
futures = client.compute(outputs, sync=False)
results = [x.result() for x in futures]
results
The logs generated using distributed.worker
won’t show up in the console output or in a Jupyter Notebook still. Instead they’ll be within the Saturn Cloud resource logs. First, click the “logs” link of the resource you’re working in:
From there, expand each of the Dask workers. The logs from each worker are stored individually, but select Aggregated Logs to view them all at once:
Those will show the logs created by the Dask worker. Notice that there is lots of information there, including how the worker was started by Dask. Near the bottom you should see the logs we wanted, in this case the ones generated by lazy_exponent
:
There we correctly see that the logs included the info logging we did within the function. That concludes the example of how to generate logs from within Dask. This can be a great tool for understanding how code is running, debugging code, and better propagating warnings and errors.
import logging
logging.warning("This is a warning")
logging.info("This is non-essential info")
from dask_saturn import SaturnCluster
from dask.distributed import Client
client = Client(SaturnCluster())
import dask
from distributed.worker import logger
@dask.delayed
def lazy_exponent(args):
x, y = args
result = x**y
# the logging call to keep tabs on the computation
logger.info(f"Computed exponent {x}^{y} = {result}")
return result
inputs = [[1, 2], [3, 4], [5, 6], [9, 10], [11, 12]]
outputs = [lazy_exponent(i) for i in inputs]
futures = client.compute(outputs, sync=False)
results = [x.result() for x in futures]
results