In an earlier blog post, we talked about Excel as custom calculation engine. In a nutshell, a developer or power user can author the calculation logic inside an Excel workbook and then execute the workbook programmatically via either Excel Services or HPC Services for Excel. You can read about this approach in detail in our MSDN article. This approach has been successfully used by our customer on a large scale for many years now.
Lately though, we’ve been thinking about Jupyter Notebooks as another potential option for building custom calculation engines.
But before we make the case, let’s review some background information on Jupyter Notebooks.
Jupyter Notebooks are a popular computational environment for data scientists and developers. Think of Jupyter Notebooks as a document that contains code (Python, R, Julia etc.), graphs, equations and of course, comments to describe the code. In that sense, an instance of Jupyter Notebook represents a data science experiment that can be executed in real-time.
Jupyter Notebooks can be edited and run from the browser. The execution of the notebook itself happens on the “server” via the notion of a Kernel. In addition, Jupyter Notebooks can be easily integrated with big data tools like Apache Spark. Jupyter Notebooks can also be hosted inside Docker containers.
Microsoft Azure provides a free implementation of Jupyter Notebooks via https://notebooks.azure.com. You can use this service to author Python and R workbooks. You can also build an ML model than deploy it to Azure ML.
Now that you have a basic understanding of Jupyter Notebooks, let’s return to our idea of using them as a custom calculation engine (similar Excel Services and HPC) by reviewing some of the key capabilities. To be clear, this is not an argument against using Excel. With its near ubiquitous use within businesses, Excel-based custom calculations can be authored without learning any new skills. while (as we will see shortly) working with Jupyter Notebooks requires the use of languages like Python and R.
That said, think of a Jupyter Notebook as an open source, cross platform alternative. Furthermore, the growing support for popular languages like Python, R, Julia etc. available within the Jupyter Notebook means that it continues to benefit from rapid innovation taking place within these languages, especially with new libraries. Finally, the fact that Jupyter Notebook is designed to run on the server from the ground up (unlike Excel, which is primarily a desktop application) and its integration with Spark and containers means that it can offer greater flexibility and cost effective ways of hosting them. For example, Jupyter Notebook execution can be offloaded to a PaaS Service like Azure Batch using Jupyter Notebook image available on Docker Hub. This removes the need to setup and manage a cluster like HPC.
Let’s now review how Jupyter Notebooks offer an Excel-like custom calculation capability:
- Self-contained, shareable unit of work – Conceptually similar to an Excel workbook, a Jupyter Notebook is document that combines code, comments, graphs and can be easily shared using email, GitHub, Dropbox, etc. You can view samples of interesting Jupyter workbooks here.
- Interactive Execution Environment – Like Excel, you can author formulas and plot charts interactively. You can simply select a cell and execute it using the “play” button show in the toolbar below.
- Ability to work with cells and formulas – At the most basic level in Excel, you work with a grid of cells where each cell is referenced by a letter-number address. Jupyter Notebooks via Python libraries like pandas, offer a similar capability. pandas.DataFrame offer a two-dimensional structure with labeled rows and columns. The following example populates a DataFrame ‘df’ (you could just as easily have loaded this data from an Excel workbook or another data source):If we execute the current cellof the Jupyter Notebook, you will see the output of DataFrame as shown below:Once we have the DataFrame populated, we can apply operations on the cells like Excel as shown below:
- Ability to invoke libraries in other languages like C/ C++ (like Excel Add-ins) – Since Python supports linking with external libraries written in languages other than Python, Jupyter Notebooks can leverage this extensibility to reuse existing libraries without much effort.This is an example of how a C++ library is being used in Jupyter Notebooks. We have used a Cython to re-compile the existing C++ library to a Python module.
- Programmatic Invocation of the Jupyter notebook – Jupyter Kernel gateway is a web server layer that provides UI less access to Jupyter Kernel. This kernel allows incoming requests to be served by executing the annotated notebook cells. This enables a non-interactive communication channel between REST clients and Jupyter Notebooks.Kernel gateway can be containerized and scaled out using various container orchestration technologies.Here is an example of a notebook cell annotated as a HTTP POST:
Note: The Jupyter Notebook code snippets referenced in the text above are taken from an example workbook we created for this blog (Thanks to Lav Kumar from AIS to put this example workbook together). You can access this example Jupyter Notebook by browsing to https://notebooks.azure.com/n/9DgDpdVblyI/notebooks/Sample.ipynb Note that you will need to clone this workbook after logging on via any Microsoft ID: you@gmail.com, you@outlook.com, you@Xbox.com, etc.
In summary, Jupyter Notebooks offer a viable alternative approach for building custom calculation engines.