Blogs

Create an Enterprise-Ready MLOps Platform Using Kubeflow

Machine Learning offers the promise of revolutionizing AI. But to achieve that promise, it needs to move from something that only a few experts can use to a mainstream discipline. Until recently, only experts could navigate the complex patchwork of tools and processes required to create such a system.

MLOps is an innovative approach that integrates the full ML workflow into an easy-to-use package such that research domain experts can do their work without needing to become intimate with all the underlying components. MLOps allows developers to release models to production quickly using DevOps-like processes and ML automation for reproducibility.

MLOps is not just for large organizations, either. Any enterprise that works with more than a few trivial models will benefit from optimizing their ML investment in both people and resources. As your organization scales, using less integrated approaches will become expensive very quickly.

This article argues that the open source Kubeflow framework is the best foundation for an MLOps platform. It integrates a set of powerful components backed by strong, diverse community support. And with just a few enhancements it can offer the best of both worlds: compatibility with a popular community-based standard, but with the capabilities that enterprise customers demand. Finally, we explain how you can benefit from this combination of advantages today.

Kubeflow to the Rescue

No structure can be stronger than its foundation. The most powerful and extensible platform available today is Kubeflow.

Kubeflow is a Kubernetes-based, open-source framework that integrates the key components necessary to develop and deploy complex machine learning models. It has a number of characteristics that make it ideal as the primary building block for an enterprise MLOps system.

Kubeflow is not built as a unified platform. Instead, it consists of a collection of components, packaged together through manifests. This makes it flexible and easy to customize. A sophisticated team can choose the specific components that apply to a particular workflow, thus providing a balance of flexibility and simplicity.

Kubeflow is standards-based. Where a standard exists that is powerful and popular, the Kubeflow framework includes it as part of its manifest. This includes:

  • JupyterLab for development and experimentation
  • TensorFlow and PyTorch for training

Kubeflow goes beyond just pulling together existing tools. The diverse set of cloud and enterprise vendors and users in the Kubeflow community offer innovative solutions to enhance the workflow. Some important Kubeflow applications are:

  • Pipelines for automation
  • Katib for hyperparameter tuning
  • KFServing for production serving

One aspect of Kubeflow that is not always understood is that it is meant to be a reference architecture rather than a product. It is not expected to be push-button; you need quite a bit of expertise to get it running and to maintain it. Nor is it meant to be a complete MLOps platform on its own. As powerful and flexible as it is, Kubeflow expects that a supported enterprise-ready solution is provided by partners and vendors.

Making Kubeflow Enterprise-Ready

Enterprise customers have certain expectations in both capabilities and quality

Studio-Based MLOps Environment For an organization to scale, and to allow the domain experts to do their research, it requires an integrated, studio-based MLOps environment rather than a functional interface that ties together a set of components.
Easy Installation & Resilient Operation An enterprise-ready solution needs to be quick and easy to install, and it needs to have the resilience to offer MLOps as a service. The components that make up Kubeflow are open-source, with a broad community of companies contributing. This drives the components forward aggressively, but - as with any open-source software package - somebody needs to be responsible for ensuring that the components provide enterprise-level quality, and to guarantee compatibility between the components.
Supported On-Prem Operation Although Kubeflow is platform-independent, it is primarily focused on cloud implementations. However, many enterprise customers require an on-prem implementation, either instead of or in addition to the cloud. They may have security requirements on the code or data based on legal or regulatory constraints, or the data may simply be too large to transfer back and forth from the source to the cloud. It is difficult and time-consuming to implement a working system on-prem, since the environment is less well defined than a cloud deployment.
For a discussion of what this entails, please see DKube™ : Kubeflow Implementation On-Prem or AWS,GCP,Azure
Heterogeneous Platform Support Many enterprise environments are heterogeneous and require a wider set of options than are included in the standard Kubeflow package. This includes:
1. Multi-cluster capability to support non-Kubernetes environments such as Spark™ and Slurm.
2. Plug-ins to support a wide variety of storage platforms, data sources, and authorization & authentication frameworks.
3. Support for popular enterprise platforms such as Rancher®, Nutanix™, and VMWare®
And all these environments and extensions need to integrate seamlessly into the workflow.
Powerful Metric & Log Management A critical feature of any enterprise ready MLOps platform is the ability to easily and automatically collect & display model metrics, compare metrics among trained models, and collect and display logs.

Bringing Together Kubeflow & MLFlow

  • Integrated, UI-based interface and studio-oriented workflow that allows multiple users to collaborate on complex model development and deployment
  • An SDK-based programmatic interface to allow integration with existing organizational tool suites
  • Simple Helm-based installation that sets up everything necessary for the full MLOps solution
  • MLFlow-based metric management & integrated log management
  • Automatic model versioning
  • Full, automatic tracking and lineage, allowing the user to visually trace the inputs that were used to create the model
  • Flexible, Tekton-based CI/CD capability to enhance automation

To learn more about DKube, please visit www.dkube.io

Written by
Team DKube

The time to put your AI model to work is now

There's a faster way to go from research to application. Find out how an MLOps workflow can benefit your teams.

Schedule a Demo