Inference Scheduler

This scheduler makes optimized routing decisions for inference requests to the llm-d inference framework.

About

This provides an "Endpoint Picker (EPP)" component to the llm-d inference framework which schedules incoming inference requests to the platform via a Kubernetes Gateway according to scheduler plugins. For more details on the llm-d inference scheduler architecture, routing logic, and different plugins (filters and scorers), including plugin configuration, see the Architecture Documentation).

The EPP extends the Gateway API Inference Extension (GIE) project, which provides the API resources and machinery for scheduling. We add some custom features that are specific to llm-d here, such as P/D Disaggregation.

A compatible Gateway API implementation is used as the Gateway. The Gateway API implementation must utilize Envoy and support ext-proc, as this is the callback mechanism the EPP relies on to make routing decisions to model serving workloads currently.

Contributing

Our community meeting is weekly at Wednesday 10AM PDT (Google Meet, Meeting Notes).

We currently utilize the #sig-inference-scheduler channel in llm-d Slack workspace for communications.

For large changes please create an issue first describing the change so the maintainers can do an assessment, and work on the details with you. See DEVELOPMENT.md for details on how to work with the codebase.

Note that in general features should go to the upstream Gateway API Inference Extension (GIE) project first if applicable. The GIE is a major dependency of ours, and where most general purpose inference features live. If you have something that you feel is general purpose or use, it probably should go to the GIE. If you have something that's llm-d specific then it should go here. If you're not sure whether your feature belongs here or in the GIE, feel free to create a discussion or ask on Slack.

Contributions are welcome!

Name		Name	Last commit message	Last commit date
Latest commit History 246 Commits
.github		.github
cmd/epp		cmd/epp
deploy		deploy
docs		docs
hooks		hooks
pkg		pkg
scripts		scripts
test		test
.gitignore		.gitignore
.golangci.yml		.golangci.yml
.lychee.toml		.lychee.toml
DEVELOPMENT.md		DEVELOPMENT.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Inference Scheduler

About

Contributing

About

Uh oh!

Releases 6

Packages

Uh oh!

Uh oh!

Contributors 26

Uh oh!

Languages

License

llm-d/llm-d-inference-scheduler

Folders and files

Latest commit

History

Repository files navigation

Inference Scheduler

About

Contributing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Uh oh!

Uh oh!

Contributors 26

Uh oh!

Languages

Packages