Data Science Tutorial

This section takes the first time user through the DKube workflow using a sample program and dataset. The MNIST model is used to provide a simple, successful initial experience.

General Workflow

The workflow demonstrated in this example is as follows:

  • Load the program code folder

  • Load the dataset folder

  • Create a model placeholder for versioned output

  • Create and open a DKube JupyterLab Notebook

  • Create a Training Run

  • Create a model from the Training Run

  • Deploy the model

Create Code Repo

Load the MNIST code folder from a GitHub repository into DKube from the Repo menu by selecting “+ Code”.

_images/Data_Scientist_Code_Tutorial_R22.png

The fields should be filled in as follows, then select “Add Code”.

Field

Value

Name

mnist

Code Source

Git

url

https://github.com/oneconvergence/dkube-training.git

Branch

main

_images/Data_Scientist_Code_mnist_R22.png

This will create the mnist Code repo within DKube.

_images/Data_Scientist_Code_Success_R22.png

Create Dataset Repo

Load the MNIST dataset folder from a GitHub repository into DKube from the Datasets menu by selecting “+ Dataset”.

_images/Data_Scientist_Datasets_Tutorial_R22.png

The fields should be filled in as follows, then select “Add Dataset”.

Field

Value

Name

mnist

Versioning

DVS

Dataset Source

Other

url

https://s3.amazonaws.com/img-datasets/mnist.pkl.gz

_images/Data_Scientist_Dataset_mnist.png

This will create the mnist Dataset.

_images/Data_Scientist_Dataset_Success_R22.png

Create Model Repo

A Model needs to be created that will become the basis of the output of the Training Run later in the process.

_images/Data_Scientist_Models_Tutorial_R22.png

The fields should be filled in as follows, then select “Add Model”.

Field

Value

Name

mnist

Versioning

DVS

Model Store

default

Model Source

None

_images/Data_Scientist_Models_mnist.png

Create Notebook

Create a JupyterLab Notebook from the IDE menu to experiment with the program by selecting “+ JupyterLab”.

_images/Data_Scientist_Notebooks_Tutorial_R22.png

Fill in the fields as shown.

Basic Submission Screen

Field

Value

Name

mnist

Code

mnist

Framework

Tensorflow

Framework Version

2.0.0

Image

Will be filled by default - do not change

_images/Data_Scientist_Notebook_mnist_Basic_R22.png

All the other fields should be left in their default state. No not submit at this point. Select the “Repos” tab.

Repo Submission Screen

Field

Value

Dataset

mnist

Version

Select ver 1

Mount Path

/mnist

_images/Data_Scientist_Notebook_mnist_Repo_Dataset.png

The mount path is the path that is used within the program code to access the input dataset. This is described in more detail at Mount Path.

All the other fields should be left in their default state. Select “Submit” to start the Notebook.

Note

The initial Notebook will take a few minutes to start. Follow-on Notebooks with the same framework version will start more quickly.

_images/Data_Scientist_Notebook_Success.png

Open JupyterLab Notebook

Open a JupyterLab notebook by selecting the Jupyter icon under “Actions” on the far right.

_images/Data_Scientist_Jupyter_mnist.png

There is no need to change any code in this tutorial. The instructions are meant to provide the details on how to use DKube to experiment with your program code. Your programs will have a different folder structure.

The next step creates a training run.

Note

The Training Run can be created directly from the Notebook, as described in Create Training Run. This will fill in most of the fields for the Run with the information that was provided during the IDE creation. This tutorial provides the more general Run creation.

Create Training Run

Create a Training Run from the Runs menu to train the mnist model on the dataset and create a trained model.

_images/Data_Scientist_Run_mnist_R22.png

Fill in the fields as shown.

Basic Submission Screen

Field

Value

Name

mnist

Code

mnist

Framework

tensorflow

Framework Version

2.0.0

Start-up script

python mnist/train.py

_images/Data_Scientist_Run_mnist_Basic_R22.png

All the other fields should be left in their default state. Select the “Repos” tab.

Repos Submission Screen

In order to submit a Training Run:

  • A Dataset needs to be selected for input

  • A Model needs to be selected for output

Input Selections

Field

Value

Dataset

mnist

Version

Select ver 1

Mount Path

/mnist

The mount path is the path that is used within the program code to access the input dataset. This is described in more detail at Mount Path.

_images/Data_Scientist_Run_mnist_Repo_Dataset_R22.png

Output Selection

A Model needs to be selected for the Training Run output.

Field

Value

Model

mnist

Mount Path

/opt/dkube/output

After the fields have been completed, select “Submit”.

_images/Data_Scientist_Run_mnist_Repo_Model_R22.png

Note

The initial Run will take a few minutes to start. Follow-on Runs with the same framework version will start more quickly.

The Training Run will appear in the “All Runs” tab.

_images/Data_Scientist_Run_Success_R22.png

View Trained Model

Once the Run status shows “Complete”, it indicates that a trained Model has been created. The trained Model will appear in the Models Repo.

_images/Data_Scientist_Models_Trained_mnist_R22.png

Selecting the trained Model will provide the details on the model, including the versions.

_images/Data_Scientist_Models_mnist_Detail.png
  • Ver 1 of the model is the initial blank version that was created earlier in the tutorial in order to set up the versioning capability

  • Ver 2 is the new model that was created by the training run

Selecting a version will show more details on the model version, including the lineage. The lineage provides all of the inputs required to create this model.

_images/Data_Scientist_Models_mnist_Lineage_R22.png

Create Test Inference

A test inference creates a local serving instance that exposes the APIs for the model. This can be used to test the model against specific input. A test inference is created from the Model screen.

This will show a pop-up window, where you can enter the fields as explained here.

_images/Data_Scientist_Models_mnist_Test_Inference_Select.png

Field

Value

Name

mnist

Serving Image

Leave at default

Transformer

Select

Transformer Image

Leave at default

Transformer Code

Leave at default

Transformer Script

mnist/transformer.py

CPU/GPU

CPU

Minimum Replicas

Leave at default

Maximum Concurrent Requests

Leave at default

_images/Data_Scientist_Models_mnist_Test_Inference_R22.png

The test inference is viewed from the “Test Inferences” menu. Once the status of the test inference shows “Running”, the “Endpoint” column provides the API that is serving the model.

_images/Data_Scientist_Inferences_mnist_R22.png