Metric management maintains a central role in achieving program goals when training models. This goes beyond simply choosing the right metrics for your training. There are key characteristics to optimizing the usefulness of this capability in your workflow:
- Metric collection should be simple and predictable
- Metrics must be easily viewed in real-time during the training run, and saved for later display and analysis through the model life cycle, including after the model has been deployed for live serving
- Model metrics need to be easily compared in order to determine which ones best achieve the program goals
- All of these activities should be integrated into a unified workflow
- Metrics must be available through automation so that decisions can be made based on the outcome of a comparison
DKube™ Provides Best-of-Breed Metric Management
DKube provides a powerful, flexible, comprehensive metric management system by building on community standards, and integrating them into an end-to-end MLOps platform. This allows your team to use the right tool for the job.
DKube is built on the foundation of Kubeflow, an open source framework supported by a diverse community of vendors and data scientists. We add MLFlow, the best-in-class metric management tool, to this foundation to handle metric collection, display, and comparison.
DKube integrates the metric workflow into the overall MLOps life cycle, allowing a smooth transition between training and analysis, and eventual deployment. All of this is available through an intuitive UI-based interface and workflow.
The metric management implementation integrates into the powerful DKube versioning and tracking system . This enables reproducibility and model comparison later in the process - including after deployment - so that the models can be improved based on changes in the live inference data characteristics.
DKube provides metric support for multiple languages, including Python & R. And it can be used to collect metrics from the most popular frameworks, including TensorFlow, PyTorch, & Scikit Learn. Collecting metrics is as simple as adding a few MLFlow-compatible API calls to your program code.
DKube operates with a hub and spoke architecture .The training can run anywhere, and the metadata - including the metric information - is kept locally in the Kubernetes hub cluster. This ensures that the metadata is always accessible for the MLOps workflow, from the earliest training runs to full deployment, and beyond.