In enterprise AI, success isn’t just about deploying large language models- it’s about controlling access, tracking usage, and understanding performance across every layer of your stack.
That's why VMware and DKube are working together to enable secure, local LLM deployments with full-stack observability, using the VMware Private AI Foundation with NVIDIA.
In this demo, we showcase:
- Centralized API-level access control and model routing with LiteLLM
- RAG-based querying through OpenWebUI, backed by Postgres and PGVector
- Request-level tracing, usage metrics, and cost visibility via Langfuse
- Backend insight into vector embeddings and data chunks through PGAdmin
- A fully private deployment- no external internet connectivity required
Watch the demo to see how VMware and DKube bring transparency, traceability, and control to enterprise GenAI workflows- from secure model access to every individual query.





