Hi there,

You are probably aware that Azure provides SDKs for most services. And you probably know as well that these SDKs cover not only Microsoft frameworks such as .NET, but others such as Java or Python as well. I am in particular fond of the Python SDKs, due to my background and to a very interesting complement: Jupyter Notebooks.

You will probably know what Azure Batch is. If not, you can go to this technical overview to find out more. It is essentially a managed scheduler that allows you to define “tasks” (for example a script or a Docker container) that will then run on an autoscalable pool of compute resources. You can create these pools and tasks via multiple UIs: the Azure Portal, the Azure CLI, a standalone client app called BatchLabs aka BatchExplorer, Powershell, and through SDKs that cover multiple programming languages, such as Python.

Now let’s go back to Python and Jupyter Notebooks. If you don’t know Jupyter Notebooks, well, you should (you can find more info here). You can think of them as a playground where you can write code in snippets called “cells”, that you can execute individually, and therefore work on your code step by step. This makes Jupyter Notebooks a very comfortable platform to explore Python SDKs. In the following picture you can see how this Jupyter Notebook for Azure Batch looks like:


If you want to start playing with the Python SDK for Azure Batch you have sample code at your disposal in this Azure Github repo. I took that code as starting point, and started modifying it in order to support additional features, for example container-enabled pools and container-based workloads.

So I ported (copy&paste) the code from one of the samples in the samples repo to a Jupyter notebook, and after doing my modifications I posted the result here. Here some learnings from the process, that are not so well documented:

  • When creating container-enabled pools in Azure Batch (with Docker preinstalled in the nodes, a configured private/public Docker image repository, and optionally images prefetched), only container tasks are then supported. Standard bash-based tasks that are not run inside of a Docker container will fail.
  • Container based tasks will override some settings in the container. For example:
    • WORKDIR: Azure Batch will set the working directory of containers to /mnt/batch/tasks/…, overriding the WORKDIR setting of the container. If you are using relative links in the container that could be a problem. You can issue by issuing the container runtime option ​–workdir=””.
    • CMD: You need to supply a command_line argument to each container task, and it is not optional. This will override the CMD option of the container. What if you want to run the CMD defined in the container, that was configured in the original Dockerfile? Easy: you pass an empty string as command_line to the Azure Batch task
  • You can use the Azure Batch pool information and compute node status to create a function that waits until a pool is ready to accept jobs
  • In container-enabled pools you can define a start task that will initialize the compute nodes (you find an example of such an init task in the Jupyter Notebook, that runs a compute_node_init.sh script on each compute node in the pool). You need to run this startup task as elevated user, in case you are doing stuff such as installing software packages.

The compute node init script I use installs blobfuse and mounts an Azure Storage Blob Container to the compute nodes. Unfortunately I had some permission problems to inject the blobfuse mount to the task containers (that per default do not run as root). I need to find some time in the future to investigate this in depth. Other than that, you can inject blobs to the container tasks, and export the files generated by each task as blobs (that would act as input to further tasks).

Something I want to investigate as well is the Azure Batch functionality for task dependencies, where Task A will run only after Task B is done. Stay tuned.

I hope I could give you some ideas about how to start playing around with Azure Batch and develop a workload scheduler based on Python for your Docker containers. Thanks for reading!