You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This scheduler makes optimized routing decisions for inference requests to
the llm-d inference framework.
About
This provides an "Endpoint Picker (EPP)" component to the llm-d inference
framework which schedules incoming inference requests to the platform via a
Kubernetes Gateway according to scheduler plugins. For more details on the llm-d inference scheduler architecture, routing logic, and different plugins (filters and scorers), including plugin configuration, see the Architecture Documentation).
A compatible Gateway API implementation is used as the Gateway. The Gateway
API implementation must utilize Envoy and support ext-proc, as this is the
callback mechanism the EPP relies on to make routing decisions to model serving
workloads currently.
We currently utilize the #sig-inference-scheduler channel in llm-d Slack workspace for communications.
For large changes please create an issue first describing the change so the
maintainers can do an assessment, and work on the details with you. See
DEVELOPMENT.md for details on how to work with the codebase.
Note that in general features should go to the upstream Gateway API Inference
Extension (GIE) project first if applicable. The GIE is a major dependency of
ours, and where most general purpose inference features live. If you have
something that you feel is general purpose or use, it probably should go to the
GIE. If you have something that's llm-d specific then it should go here. If
you're not sure whether your feature belongs here or in the GIE, feel free to
create a discussion or ask on Slack.