The quality of your overall solution depends on the quality of your training data. DKube supports a flexible end-to-end workflow that follows the data from its inception to production serving.
The Data Engineer takes the initial raw data and processes it to optimize training. The processed Feature Set will be saved in a Feature Store. The processing can be done manually in an IDE such as JupyterLab or RStudio, through individual preprocessing Runs, or through an automated Kubeflow Pipeline.
The Feature Sets are automatically organized through versions. The versions can be conveniently viewed within the DKube UI. And each version saves the complete lineage of how the raw data becomes a Feature Set.
The Data Scientist uses the processed data from the Feature Sets for code development, experimentation, and - once the components have been completed - in a training pipeline. The output of the training is a model.
The Feature Sets are a global resource. Once the optimized processing has been identified for a particular dataset, the same Feature Set can be used by others in the organization for their training. This ensures that a clean, optimized input is available for efficient training.
The trained model that best achieves the organizational goals is deployed for live serving by the Production Engineer. The data workflow continues through this phase by enabling the same integrated access to the original data management steps.