Running Alchemist on Cray XC and CS Series Supercomputers: Dask and PySpark Interfaces, Deployment Options, and Data Transfer Times
Rothauge et al.
"Abstract—Alchemist allows Apache Spark to achieve better performance by interfacing with HPC libraries for largescale distributed computations. In this paper we highlight some recent developments in Alchemist that are of interest to Cray users and the scientific community in general. We discuss our experience porting Alchemist to container images and deploying it on Cray XC (using Shifter) and CS (using Singularity) series supercomputers, on a local Kubernetes cluster, and on the cloud. Newly developed interfaces for Python, Dask and PySpark enable the use of Alchemist with additional data analysis frameworks. We also briefly discuss the combination of Alchemist with RLlib, an increasingly popular library for reinforcement learning, and consider the benefits of leveraging HPC simulations in reinforcement learning. Finally, since data transfer between the client applications and Alchemist are the main overhead Alchemist encounters, we give a qualitative assessment of these transfer times with respect to different factors.
Several recent developments have enabled more practitioners to use Alchemist to easily access HPC libraries from data analysis frameworks such as Spark, Dask and PySpark, or from single-process Python applications. The availability of Docker and other containers enables users to get started with Alchemist quickly, and we briefly discussed the potentially exciting combination of Alchemist with reinforcement learning frameworks such as RLlib. Alchemist’s main overhead comes from the data transfer between client applications and Alchemist, and we ran some experiments to better understand the behaviour of these transfer times with respect to message buffer sizes, matrix layouts, and network variability"