Production Engineer Dashboard & Workflow¶

The Production Engineer (PE) takes the models that are optimized and published by the ML Engineer and deploys them for live inference after validating them and comparing them to the existing inference models.

Menu Item |
Function |
---|---|
Model Catalog |
Optimized models, ready for deployment |
Model Serving |
Staged or deployed inferences |
The Model Catalog is the catalog described in the section Publish Model.
Deployment Workflow¶
After a Model has been published by the ML Eng, the PE can either stage or deploy the model through the icons on the far right side of the screen under “Actions”.
Function |
Description |
---|---|
Stage |
Deploy locally for testing |
Deploy |
Deploy for live inference |

Staging or deploying the model will create a serving endpoint. This endpoint can be used to:
Test the model using inference data to ensure that it meets the goals
Compare the results to the existing live inference model
Provide the endpoint for live serving if it meets the goals
Selecting “Deploy Model” from the Action buttons on the right will cause a popup to appear where the deployment options can be provided.

Field |
Value |
---|---|
Name |
User-chosen name for the Deployment |
Description |
Optional user-chosen name to provide more details for the Deployment |
CPU/GPU |
Type of inference |
Minimum Replicas |
Minimum number of inference pods that will run in the idle state with no inference requests (https://knative.dev/docs/serving/autoscaling/scale-bounds/) |
Maximum Concurrent Requests |
Soft target for the number of concurrent requests that a single inference pod can serve for the Model (https://knative.dev/docs/serving/autoscaling/concurrency/) |

Change Model Deployment¶
After a Model has been deployed, it is associated with an endpoint URL. The Model associated with that endpoint can be changed from the Deployment screen. Select the Edit Action button to the right of the Model name. This will cause a Popup to appear that allows the Model version and other associated information to be changed for that endpoint.
