Model serving on Kubernetes: KServe, vLLM, and the substrate question
KServe, vLLM, Triton, BentoML, four different answers to the same question. Each layer trades operational complexity for serving features. Worth being concrete about which layer is right for which workload, tested on a single GTX 1080 Ti.