DKube Hub and Spoke Execution Architecture

DKube uses an innovative hub and spoke architecture to integrate the remote Slurm cluster into the MLOps workflow, and communication happens through simple plug-ins. This has the following advantages:

  • Loose integration allows the 2 domains (MLOps & Slurm) to use their own tools, disciplines, administration, and workflows
  • It is non-intrusive to the HPC system
  • ML workloads can be run on the compute-intensive HPC system on demand

The primary activity happens on the hub, a Kubeflow-based framework that runs Kubernetes containers. This handles:

  • The management of the system
  • The data sources
  • Metadata storage
  • Job management
  • Automation
  • Model management

The HPC/Slurm cluster is the spoke in the architecture, and there can be more multiple Slurm clusters in the system. The Slurm cluster:

  • Executes the job using Singularity
  • Communicates with the DKube hub

Adding a remote HPC/Slurm cluster to the DKube Kubernetes hub is quick and straightforward. The information required to access the cluster, including the credentials, is entered from the DKube UI. This creates a link between the clusters so that they can be viewed as a single MLOps entity.

Learn More About DKube