DKube makes it easier for data science teams to analyze and interpret complex dataand extract valuable insights that can inform business decisions and drive growth. Tasks such as collecting and cleaning data, developing and testing models, and visualizing and presenting results can all be coordinated on DKube, using a toolset of your choice.
Apart from industry-standard MLOps features, DKube also comes packed with collaboration features that finally help Data Science and other mission-critical teams communicate on common ground.
Metric Collection And Management
Metric management maintains a central role in achieving program goals when training models. This goes beyond simply choosing the right metrics for your training. There are key characteristics to optimizing the usefulness of this capability in your workflow:
Metric collection should be simple and predictable
Metrics must be easily viewed in real-time during the training run, and saved for later display and analysis through the model life cycle, including after the model has been deployed for live serving
Model metrics need to be easily compared in order to determine which ones best achieve the program goals
All of these activities should be integrated into a unified workflow
Metrics must be available through automation so that decisions can be made based on the outcome of a comparison
The first step in any metric management system is to gather the appropriate measurements during a training run. DKube makes it fast and simple to collect and store metrics.
Adding a few MLFlow-compatible API calls will store the metric information as part of the training run and model metadata. This information is saved throughout the model life cycle, so that it can be reviewed after the training run has completed, or later in the process to allow model improvement.
Metrics can be autologged for extreme simplicity, or specific metrics can be logged by instrumenting the code with the name of the metric required.
The metric information is stored as part of the model version, making it easy to see what combination of inputs - code, data, and hyperparameters - lead to the associated outcomes. This provides a convenient way to understand the impact of different inputs, and can be used as a launching point to create a new run with differing inputs.
And the metrics are not restricted to a single user. All users in the group have access to them, encouraging cooperation and facilitating incremental progress across the organization.
Collecting metrics is important only to the extent that they can be used to determine how well the trained model does its job. DKube provides a rich set of MLFlow-based display options, automatically invoked from a list of runs or models.
Metrics are available in real-time as the run progresses, allowing the data scientist to follow its progress. This becomes important for runs that are complex, or are operating on large datasets, since they can run for days or weeks, taking up both time and resources.
Once the run is complete, the stored metrics are available for display from both the run and model screens. The metrics are available in both tabular and graphical format, giving the flexibility to view them in the most appropriate form.
The graphical display is flexible and intuitive. You can choose the metrics that you want to view, and the timeline that is of interest. And this is all available from within the full MLOps point and click UI-based interface and workflow. Since the metrics are saved as part of the powerful DKube versioning capability, getting access to the metric for a training run is as simple as choosing it from a list of completed runs.
Best-in-class Metric Management
DKube provides a powerful, flexible, comprehensive metric management system by building on community standards and integrating them into an end-to-end MLOps platform. This allows your team to use the right tool for the job.
DKube is built on the foundation of Kubeflow, an open-source framework supported by a diverse community of vendors and data scientists. We add MLFlow, the best-in-class metric management tool, to this foundation to handle metric collection, display, and comparison.
DKube integrates the metric workflow into the overall MLOps life cycle, allowing a smooth transition between training and analysis, and eventual deployment. All of this is available through an intuitive UI-based interface and workflow.
The metric management implementation integrates into the powerful DKube versioning and tracking system. This enables reproducibility and model comparison later in the process - including after deployment - so that the models can be improved based on changes in the live inference data characteristics.
DKube provides metric support for multiple languages, including Python & R. And it can be used to collect metrics from the most popular frameworks, including TensorFlow, PyTorch, & Scikit Learn. Collecting metrics is as simple as adding a few MLFlow-compatible API calls to your program code.
DKube operates with a hub and spoke architecture. The training can run anywhere, and the metadata - including the metric information - is kept locally in the Kubernetes hub cluster. This ensures that the metadata is always accessible for the MLOps workflow, from the earliest training runs to full deployment, and beyond.
The central focus of an MLOps platform, and the whole reason for almost everything else in the development process is to compare different models to understand how various inputs impact the quality of the model, and to determine which ones best achieve the program goals - based on the chosen metrics.
DKube addresses this fundamental decision-making capability in a powerful, flexible, and intuitive manner through its combination of Kubeflow and MLFlow.
The process starts with how the models are saved and organized. When submitting a training run, a model can be saved and viewed as a version of an existing set of models, or it can be designated to be a new model. This allows the data scientist to provide some structure to a process that can involve hundreds of different runs by grouping them into a manageable organization.
Once several training runs have been completed, the resulting models can be compared by selecting them from a list. Models can be compared based on the versions of the same model, or from versions of different models. It is completely flexible. You can even compare models that were created by other users in the same group.
The comparison is provided in a tabular display, and in a number of flexible graphical displays. You can choose the metrics that you want to compare, and the timeline that they should be compared against. This includes an X/Y format, or a more sophisticated graph such as a scatter plot, a contour plot, or parallel coordinates plot.
Once the comparison is complete, decisions can be made about what to do next, all within the integrated MLOps interface and workflow. One or more of the models can be chosen for possible deployment, or a new run can be cloned from one of the existing runs based on the metric analysis.
Tracking, Lineage, And History
Formal training involves a significant number of iterations. This creates a logistical challenge just to keep track of what inputs lead to what outputs.
DKube enhances the standard Kubeflow tracking ability with a powerful, automatic, built-in version control system for datasets and models. The full lineage of the model is shown graphically. This helps the user track and understand what changes impact preprocessing or training.
DKube includes a metric collection, display, and comparison capability based on MLFlow, and the lineage information provides insight into the reasons behind the model metrics.
Later, the lineage can be used to reproduce or audit the model, and to identify what might be causing issues with the production serving outcomes.
In addition, DKube keeps track of what code and dataset repositories are used in the training. This provides a quick indication of how broadly the input components are being used.
Finding the right mix of hyperparameters to achieve your goals can be difficult and time-consuming. Kubeflow includes a hyperparameter optimization tool called Katib, and it is fully integrated into the DKube platform.
Katib sweeps through a range of hyperparameter combinations on a specific code and dataset and chooses the best metrics based on your goals. An input configuration file selects the hyperparameters, the target metrics, and the algorithm to use for the input parameters.
Once the combination has been identified to maximize the metric goals, a model is created for that combination.The impact of each hyperparameter on the target metrics can be viewed graphically to better understand overall trends.
The output of Katib is fully integrated into the DKube workflow, including the ability to operate on the models that are created and to use MLFlow to compare the metrics generated from training.
DKube is based on Kubeflow and Kubernetes. Jobs run within container images, providing the ability to accommodate different platforms. Standard images are provided by default that includes a set of common packages. Users can easily create and use their own images.
A custom image can be provided when submitting any job through the UI-based interface. The images can come from a catalog that contains a set of images that have been added by the user. The images can be added from a registry or can be built from a code repository.