MLOps on HPC/Slurm

Certain classes of machine learning research -- such as life sciences, drug discovery, autonomous driving, and oil exploration -- require computational capability beyond what a standard server can normally provide. This class of training benefits significantly from a dedicated High Performance Computing (HPC) platform.

Until now, the obstacle has been that the MLOps workflow, based on Kubernetes, has different applications, frameworks, tools, workflows, and administration than HPC systems, often based on Slurm. [MLOps on HPC/Slurm with Kubeflow]

DKube™ removes this obstacle by allowing you to submit your ML jobs to a Slurm-based HPC system directly, and without any compromise on its Kubeflow foundation or its broad MLOps capabilities. This unleashes the advantages of both types of platforms, and enables use cases that would not otherwise be feasible.

The program code and datasets do not need to be modified. All required translation is handled automatically, and the remote execution supports all of the powerful features of the DKube MLOps platform. This includes:


Learn More About the DKube Hub & Spoke Execution Architecture