Production Engineer Dashboard & Workflow


The Production Engineer (PE) takes the models that are optimized and published by the ML Engineer and deploys them for live inference after validating them and comparing them to the existing inference models.


Menu Item


Model Catalog

Optimized models, ready for deployment

Model Serving

Staged or deployed inferences

The Model Catalog is the catalog described in the section Publish Model.

Deployment Workflow

After a Model has been published by the ML Eng, the PE can either stage or deploy the model through the icons on the far right side of the screen under “Actions”.




Deploy locally for testing


Deploy for live inference


Staging or deploying the model will create a serving endpoint. This endpoint can be used to:

  • Test the model using inference data to ensure that it meets the goals

  • Compare the results to the existing live inference model

  • Provide the endpoint for live serving if it meets the goals

Selecting “Deploy Model” from the Action buttons on the right will cause a popup to appear where the deployment options can be provided.





User-chosen name for the Deployment


Optional user-chosen name to provide more details for the Deployment


Type of inference

Minimum Replicas

Minimum number of inference pods that will run in the idle state with no inference requests (

Maximum Concurrent Requests

Soft target for the number of concurrent requests that a single inference pod can serve for the Model (


Change Model Deployment

After a Model has been deployed, it is associated with an endpoint URL. The Model associated with that endpoint can be changed from the Deployment screen. Select the Edit Action button to the right of the Model name. This will cause a Popup to appear that allows the Model version and other associated information to be changed for that endpoint.