| CARVIEW |
Welcome to Planet Gentoo,
an aggregation of Gentoo-related weblog articles written by Gentoo developers.
For a broader range of topics, you might be interested in Gentoo Universe.
December 26 2025
FOSDEM 2026
♦
Once again it’s FOSDEM time! Join us at Université Libre de Bruxelles, Campus du Solbosch, in Brussels, Belgium. The upcoming FOSDEM 2026 will be held on January 31st and February 1st 2026. If you visit FOSDEM, make sure to come by at our Gentoo stand (exact location still to be announced), for the newest Gentoo news and Gentoo swag. Also, this year there will be a talk about the official Gentoo binary packages in the Distributions devroom. Visit our Gentoo wiki page on FOSDEM 2026 to see who’s coming and for more practical information.
Once again it’s FOSDEM time! Join us at Université Libre de Bruxelles, Campus du Solbosch, in Brussels, Belgium. The upcoming FOSDEM 2026 will be held on January 31st and February 1st 2026. If you visit FOSDEM, make sure to come by at our Gentoo stand (exact location still to be announced), for the newest Gentoo news and Gentoo swag. Also, this year there will be a talk about the official Gentoo binary packages in the Distributions devroom. Visit our Gentoo wiki page on FOSDEM 2026 to see who’s coming and for more practical information.
November 30 2025
One jobserver to rule them all
A common problem with running Gentoo builds is concurrency. Many packages include extensive build steps that are either fully serial, or cannot fully utilize the available CPU threads throughout. This problem becomes less pronounced when running building multiple packages in parallel, but then we are risking overscheduling for packages that do take advantage of parallel builds.
Fortunately, there are a few tools at our disposal that can improve the situation. Most recently, they were joined by two experimental system-wide jobservers: guildmaster and steve. In this post, I’d like to provide the background on them, and discuss the problems they are facing.
You can use the MAKEOPTS variable to specify a number of parallel jobs to run:
MAKEOPTS="-j12"
This is used not only by GNU make, but it is also recognized by a plethora of eclasses and ebuilds, and converted into appropriate options for various builders, test runners and other tools that can benefit from concurrency. So far, that’s good news; whenever we can, we’re going to run 12 jobs and utilize all the CPU threads.
The problems start when we’re running multiple builds in parallel. This could be either due to running emerge --jobs, or simply needing to start another emerge process. The latter happens to me quite often, as I am testing multiple packages simultaneously.
For example, if we end up building four packages simultaneously, and all of them support -j, we may end up spawning 48 jobs. The issue isn’t just saturating the CPU; imagine you’re running 48 memory-hungry C++ compilers simultaneously!
Load-average scheduling to the rescueOne possible workaround is to use the --load-average option, e.g.:
MAKEOPTS="-j12 -l13"
This causes tools supporting the option not to start new jobs if the current load exceeds 13, which roughly approximates 13 processes running simultaneously. However, the option isn’t universally supported, and the exact behavior differs from tool to tool. For example, CTest doesn’t start any jobs when the load is exceeded, effectively stopping test execution, whereas GNU make and Ninja throttle themselves down to one job.
Of course, this is a rough approximation. While GNU make attempts to establish the current load from /proc/loadavg, most tools just use the one-minute average from getloadavg(), suffering from some lag. It is entirely possible to end up with interspersed periods of overscheduling while the load is still ramping up, followed by periods of underscheduling before it decreases again. Still, it is better than nothing, and can become especially useful for providing background load for other tasks: a build process that can utilize the idle CPU threads, and back down when other builds need them.
The nested Makefile problem and GNU Make jobserverNested Makefiles are processed by calling make recursively, and therefore face a similar problem: if you run multiple make processes in parallel, and they run multiple jobs simultaneously, you end up overscheduling. To avoid this, GNU make introduces a jobserver. It ensures that the specified job number is respected across multiple make invocations.
At the time of writing, GNU make supports three kinds of the jobserver protocol:
- The legacy Unix pipe-based protocol that relied on passing file descriptors to child processes.
- The modern Unix protocol using a named pipe.
- The Windows protocol using a shared semaphore.
All these variants follow roughly the same design principles, and are peer-to-peer protocols for using shared state rather than true servers in the network sense. The jobserver’s role is mostly limited to initializing the state and seeding it with an appropriate number of job tokens. Afterwards, clients are responsible for acquiring a token whenever they are about to start a job, and returning it once the job finishes. The availability of job tokens therefore limits the total number of processes started.
The flexibility of modern protocols permitted more tools to support them. Notably, the Ninja build system recently started supporting the protocol, therefore permitting proper parallelism in complex build systems combining Makefiles and Ninja. The jobserver protocol is also supported by Cargo and various Rust tools, GCC and LLVM, where it can be used to limit the number of parallel LTO jobs.
A system-wide jobserverWith a growing number of tools becoming capable of parallel processing, and at the same time gaining support for the GNU make jobserver protocol, it starts being an interesting solution to the overscheduling problem. If we could run one jobserver shared across all build processes, we could control the total number of jobs running simultaneously, and therefore have all the simultaneously running builds dynamically adjust one to another!
In fact, this is not a new idea. A bug requesting jobserver integration has been filed for Portage back in 2019. NixOS jobserver effort dates back at least to 2021, though it has not been merged yet. Guildmaster and steve joined the effort very recently.
There are two primary problems with using a system-wide jobserver: token release reliability, and the “implicit slot” problem.
The token release problemThe first problem is more important. As noted before, the jobserver protocol relies entirely on clients releasing the job tokens they acquired, and the documentation explicitly emphasizes that they must be returned even in error conditions. Unfortunately, this is not always possible: if the client gets killed, it cannot run any cleanup code and therefore return the tokens! For scoped jobservers like GNU make’s this usually isn’t that much of a problem, since make normally terminates upon a child being killed. However, a system jobserver could easily be left with no job tokens in the queue this way!
This problem cannot really be solved within the strict bounds of the jobserver protocol. After all, it is just a named pipe, and there are limits to how much you can monitor what’s happening to the pipe buffer. Fortunately, there is a way around that: you can implement a proper server for the jobserver protocol using FUSE, and provide it in place of the named pipe. Good news is, most of the tools don’t actually check the file type, and these that do can easily be patched.
The current draft of NixOS jobserver provides a regular file with special behavior via FUSE, whereas guildmaster and steve both provide a character device via its CUSE API. NixOS jobserver and guildmaster both return unreleased tokens once the process closes the jobserver file, whereas steve returns them once the process acquiring them exits. This way, they can guarantee that a process that either can’t release its tokens (e.g. because it’s been killed), or one that doesn’t because of implementation issue (e.g. Cargo), doesn’t end up effectively locking other builds. It also means we can provide live information on which processes are holding the tokens, or even implement additional features such as limiting token provision based on the system load, or setting per-process limits.
The implicit slot problemThe second problem is related to the implicit assumption that a jobserver is inherited from a parent GNU make process that already acquired a token to spawn the subprocess. Since the make subprocess doesn’t really do any work itself, it can “use” the token to spawn another job instead. Therefore, every GNU make process running under a jobserver has one implicit slot that runs jobs without consuming any tokens. If the jobserver is running externally and no job tokens were acquired while running the top make process, it ends up running an extra process without a job token: so steve -j12 permits 12 jobs, plus one extra job for every package being built.
Fortunately, the solution is rather simple: one needs to implement token acquisition at Portage level. Portage acquires a new token prior to starting a build job, and releases it once the job finishes. In fact, this solves two problems: it accounts for the implicit slot in builders implementing the jobserver protocol, and it limits the total number of jobs run for parallel builds.
However, this is a double-edged sword. On one hand, it limits the risk of overscheduling when running parallel build jobs. On the other, it means that a new emerge job may not be able to start immediately, but instead wait for other jobs to free up job tokens first, negatively affecting interactivity.
A semi-related issue is that acquiring a single token doesn’t properly account for processes that are parallel themselves but do not implement the jobserver protocol, such as pytest-xdist runs. It may be possible to handle these better by acquiring multiple tokens prior to running them (or possibly while running them), but in the former case one needs to be careful to acquire them atomically, and not end up with the equivalent of lock contention: two processes acquiring part of the tokens they require, and waiting forever for more.
The implicit slot problem also causes issues in other clients. For example, nasm-rs writes an extra token to the jobserver pipe to avoid special-casing the implicit slot. However, this violates the protocol and breaks clients with per-process tokens. Steve carries a special workaround for that package.
SummaryA growing number of tools is capable of some degree of concurrency: from builders traditionally being able to start multiple parallel jobs, to multithreaded compilers. While they provide some degree of control over how many jobs to start, avoiding overscheduling while running multiple builds in parallel is non-trivial. Some builders can use load average to partially mitigate the issue, but that’s far from a perfect solution.
Jobservers are our best bet right now. Originally designed to handle job scheduling for recursive GNU make invocations, they are being extended to control other parallel processes throughout the build, and can be further extended to control the job numbers across different builds, and even across different build containers.
While NixOS seems to have dropped the ball, Gentoo is now finally actively pursuing global jobserver support. Guildmaster and steve both prove that the server-side implementation is possible, and integration is just around the corner. At this point, it’s not clear whether a jobserver-enabled systems are going to become the default in the future, but certainly it’s an interesting experiment to carry.
A common problem with running Gentoo builds is concurrency. Many packages include extensive build steps that are either fully serial, or cannot fully utilize the available CPU threads throughout. This problem becomes less pronounced when running building multiple packages in parallel, but then we are risking overscheduling for packages that do take advantage of parallel builds.
Fortunately, there are a few tools at our disposal that can improve the situation. Most recently, they were joined by two experimental system-wide jobservers: guildmaster and steve. In this post, I’d like to provide the background on them, and discuss the problems they are facing.
The job multiplication problem
You can use the MAKEOPTS variable to specify a number of parallel jobs to run:
MAKEOPTS="-j12"
This is used not only by GNU make, but it is also recognized by a plethora of eclasses and ebuilds, and converted into appropriate options for various builders, test runners and other tools that can benefit from concurrency. So far, that’s good news; whenever we can, we’re going to run 12 jobs and utilize all the CPU threads.
The problems start when we’re running multiple builds in parallel. This could be either due to running emerge --jobs, or simply needing to start another emerge process. The latter happens to me quite often, as I am testing multiple packages simultaneously.
For example, if we end up building four packages simultaneously, and all of them support -j, we may end up spawning 48 jobs. The issue isn’t just saturating the CPU; imagine you’re running 48 memory-hungry C++ compilers simultaneously!
Load-average scheduling to the rescue
One possible workaround is to use the --load-average option, e.g.:
MAKEOPTS="-j12 -l13"
This causes tools supporting the option not to start new jobs if the current load exceeds 13, which roughly approximates 13 processes running simultaneously. However, the option isn’t universally supported, and the exact behavior differs from tool to tool. For example, CTest doesn’t start any jobs when the load is exceeded, effectively stopping test execution, whereas GNU make and Ninja throttle themselves down to one job.
Of course, this is a rough approximation. While GNU make attempts to establish the current load from /proc/loadavg, most tools just use the one-minute average from getloadavg(), suffering from some lag. It is entirely possible to end up with interspersed periods of overscheduling while the load is still ramping up, followed by periods of underscheduling before it decreases again. Still, it is better than nothing, and can become especially useful for providing background load for other tasks: a build process that can utilize the idle CPU threads, and back down when other builds need them.
The nested Makefile problem and GNU Make jobserver
Nested Makefiles are processed by calling make recursively, and therefore face a similar problem: if you run multiple make processes in parallel, and they run multiple jobs simultaneously, you end up overscheduling. To avoid this, GNU make introduces a jobserver. It ensures that the specified job number is respected across multiple make invocations.
At the time of writing, GNU make supports three kinds of the jobserver protocol:
- The legacy Unix pipe-based protocol that relied on passing file descriptors to child processes.
- The modern Unix protocol using a named pipe.
- The Windows protocol using a shared semaphore.
All these variants follow roughly the same design principles, and are peer-to-peer protocols for using shared state rather than true servers in the network sense. The jobserver’s role is mostly limited to initializing the state and seeding it with an appropriate number of job tokens. Afterwards, clients are responsible for acquiring a token whenever they are about to start a job, and returning it once the job finishes. The availability of job tokens therefore limits the total number of processes started.
The flexibility of modern protocols permitted more tools to support them. Notably, the Ninja build system recently started supporting the protocol, therefore permitting proper parallelism in complex build systems combining Makefiles and Ninja. The jobserver protocol is also supported by Cargo and various Rust tools, GCC and LLVM, where it can be used to limit the number of parallel LTO jobs.
A system-wide jobserver
With a growing number of tools becoming capable of parallel processing, and at the same time gaining support for the GNU make jobserver protocol, it starts being an interesting solution to the overscheduling problem. If we could run one jobserver shared across all build processes, we could control the total number of jobs running simultaneously, and therefore have all the simultaneously running builds dynamically adjust one to another!
In fact, this is not a new idea. A bug requesting jobserver integration has been filed for Portage back in 2019. NixOS jobserver effort dates back at least to 2021, though it has not been merged yet. Guildmaster and steve joined the effort very recently.
There are two primary problems with using a system-wide jobserver: token release reliability, and the “implicit slot” problem.
The token release problem
The first problem is more important. As noted before, the jobserver protocol relies entirely on clients releasing the job tokens they acquired, and the documentation explicitly emphasizes that they must be returned even in error conditions. Unfortunately, this is not always possible: if the client gets killed, it cannot run any cleanup code and therefore return the tokens! For scoped jobservers like GNU make’s this usually isn’t that much of a problem, since make normally terminates upon a child being killed. However, a system jobserver could easily be left with no job tokens in the queue this way!
This problem cannot really be solved within the strict bounds of the jobserver protocol. After all, it is just a named pipe, and there are limits to how much you can monitor what’s happening to the pipe buffer. Fortunately, there is a way around that: you can implement a proper server for the jobserver protocol using FUSE, and provide it in place of the named pipe. Good news is, most of the tools don’t actually check the file type, and these that do can easily be patched.
The current draft of NixOS jobserver provides a regular file with special behavior via FUSE, whereas guildmaster and steve both provide a character device via its CUSE API. NixOS jobserver and guildmaster both return unreleased tokens once the process closes the jobserver file, whereas steve returns them once the process acquiring them exits. This way, they can guarantee that a process that either can’t release its tokens (e.g. because it’s been killed), or one that doesn’t because of implementation issue (e.g. Cargo), doesn’t end up effectively locking other builds. It also means we can provide live information on which processes are holding the tokens, or even implement additional features such as limiting token provision based on the system load, or setting per-process limits.
The implicit slot problem
The second problem is related to the implicit assumption that a jobserver is inherited from a parent GNU make process that already acquired a token to spawn the subprocess. Since the make subprocess doesn’t really do any work itself, it can “use” the token to spawn another job instead. Therefore, every GNU make process running under a jobserver has one implicit slot that runs jobs without consuming any tokens. If the jobserver is running externally and no job tokens were acquired while running the top make process, it ends up running an extra process without a job token: so steve -j12 permits 12 jobs, plus one extra job for every package being built.
Fortunately, the solution is rather simple: one needs to implement token acquisition at Portage level. Portage acquires a new token prior to starting a build job, and releases it once the job finishes. In fact, this solves two problems: it accounts for the implicit slot in builders implementing the jobserver protocol, and it limits the total number of jobs run for parallel builds.
However, this is a double-edged sword. On one hand, it limits the risk of overscheduling when running parallel build jobs. On the other, it means that a new emerge job may not be able to start immediately, but instead wait for other jobs to free up job tokens first, negatively affecting interactivity.
A semi-related issue is that acquiring a single token doesn’t properly account for processes that are parallel themselves but do not implement the jobserver protocol, such as pytest-xdist runs. It may be possible to handle these better by acquiring multiple tokens prior to running them (or possibly while running them), but in the former case one needs to be careful to acquire them atomically, and not end up with the equivalent of lock contention: two processes acquiring part of the tokens they require, and waiting forever for more.
The implicit slot problem also causes issues in other clients. For example, nasm-rs writes an extra token to the jobserver pipe to avoid special-casing the implicit slot. However, this violates the protocol and breaks clients with per-process tokens. Steve carries a special workaround for that package.
Summary
A growing number of tools is capable of some degree of concurrency: from builders traditionally being able to start multiple parallel jobs, to multithreaded compilers. While they provide some degree of control over how many jobs to start, avoiding overscheduling while running multiple builds in parallel is non-trivial. Some builders can use load average to partially mitigate the issue, but that’s far from a perfect solution.
Jobservers are our best bet right now. Originally designed to handle job scheduling for recursive GNU make invocations, they are being extended to control other parallel processes throughout the build, and can be further extended to control the job numbers across different builds, and even across different build containers.
While NixOS seems to have dropped the ball, Gentoo is now finally actively pursuing global jobserver support. Guildmaster and steve both prove that the server-side implementation is possible, and integration is just around the corner. At this point, it’s not clear whether a jobserver-enabled systems are going to become the default in the future, but certainly it’s an interesting experiment to carry.
October 12 2025
How we incidentally uncovered a 7-year old bug in gentoo-ci
“Gentoo CI” is the service providing periodic linting for the Gentoo repository. It is a part of the Repository mirror and CI project that I’ve started in 2015. Of course, it all started as a temporary third-party solution, but it persisted, was integrated into Gentoo Infrastructure and grew organically into quite a monstrosity.
It’s imperfect in many ways. In particular, it has only some degree of error recovery and when things go wrong beyond that, it requires a manual fix. Often the “fix” is to stop mirroring a problematic repository. Over time, I’ve started having serious doubts about the project, and proposed sunsetting most of it.
Lately, things have been getting worse. What started as a minor change in behavior of Git triggered a whole cascade of failures, leading to me finally announcing the deadline for sunsetting the mirroring of third-party repositories, and starting ripping non-critical bits out of it. Interesting enough, this whole process led me to finally discover the root cause of most of these failures — a bug that has existed since the very early version of the code, but happened to be hidden by the hacky error recovery code. Here’s the story of it.
Repository mirror and CI is basically a bunch of shell scripts with Python helpers run via a cronjob (repo-mirror-ci code). The scripts are responsible for syncing the lot of public Gentoo repositories, generating caches for them, publishing them onto our mirror repositories, and finally running pkgcheck on the Gentoo repository. Most of the “unexpected” error handling is set -e -x, with a dumb logging to a file, and mailing on a cronjob failure. Some common errors are handled gracefully though — sync errors, pkgcheck failures and so on.
The whole cascade started when Git was upgraded on the server. The upgrade involved a change in behavior where git checkout -- ${branch} stopped working; you could only specify files after the --. The fix was trivial enough.
However, once the issue was fixed I’ve started periodically seeing sync failures from the Gentoo repository. The scripts had a very dumb way of handling sync failures: if syncing failed, they removed the local copy entirely and tried again. This generally made sense — say, if upstream renamed the main branch, git pull would fail but a fresh clone would be a cheap fix. However, the Gentoo repository is quite big and when it gets removed due to sync failure, cloning it afresh from the Gentoo infrastructure failed.
So when it failed, I did a quick hack — I’ve cloned the repository manually from GitHub, replaced the remote and put it in place. Problem solved. Except a while later, the same issue surfaced. This time I kept an additional local clone, so I wouldn’t have to fetch it from server, and added it again. But then, it got removed once more, and this was really getting tedious.
What I have assumed then is that the repository is failing to sync due to some temporary problems, either network or Infrastructure related. If that were the case, it really made no sense to remove it and clone afresh. On top of that, since we are sunsetting support for third-party repositories anyway, there is no need for automatic recovery from issues such as branch name changes. So I removed that logic, to have sync fail immediately, without removing the local copy.
Now, this had important consequences. Previously, any failed sync would result in the repository being removed and cloned again, leaving no trace of the original error. On top of that, a logic stopping the script early when the Gentoo repository failed meant that the actual error wasn’t even saved, leaving me only with the subsequent clone failures.
When the sync failed again (and of course it did), I was able to actually investigate what was wrong. What actually happened is that the repository wasn’t on a branch — the checkout was detached at some commit. Initially, I assumed this was some fluke, perhaps also related to the Git upgrade. I’ve switched manually to master, and that fixed it. Then it broke again. And again.
So far I’ve been mostly dealing with the failures asynchronously — I wasn’t around at the time of the initial failure, and only started working on it after a few failed runs. However, finally the issue resurfaced so fast that I was able to connect the dots. The problem likely happened immediately after gentoo-ci hit an issue, and bisected it! So I’ve started suspecting that there is another issue in the scripts, perhaps another case of missed --, but I couldn’t find anything relevant.
Finally, I’ve started looking at the post-bisect code. What we were doing is calling git rev-parse HEAD prior to bisect, and then using that result in git checkout. This obviously meant that after every bisect, we ended up with detached tree, i.e. precisely the issue I was seeing. So why didn’t I notice this before?
Of course, because of the sync error handling. Once bisect broke the repository, next sync failed and the repository got cloned again, and we never noticed anything was wrong. We only started noticing once cloning started failing. So after a few days of confusion and false leads, I finally fixed a bug that was present for over 7 years in production code, and caused the Gentoo repository to be cloned over and over again whenever any bad commit happened.
“Gentoo CI” is the service providing periodic linting for the Gentoo repository. It is a part of the Repository mirror and CI project that I’ve started in 2015. Of course, it all started as a temporary third-party solution, but it persisted, was integrated into Gentoo Infrastructure and grew organically into quite a monstrosity.
It’s imperfect in many ways. In particular, it has only some degree of error recovery and when things go wrong beyond that, it requires a manual fix. Often the “fix” is to stop mirroring a problematic repository. Over time, I’ve started having serious doubts about the project, and proposed sunsetting most of it.
Lately, things have been getting worse. What started as a minor change in behavior of Git triggered a whole cascade of failures, leading to me finally announcing the deadline for sunsetting the mirroring of third-party repositories, and starting ripping non-critical bits out of it. Interesting enough, this whole process led me to finally discover the root cause of most of these failures — a bug that has existed since the very early version of the code, but happened to be hidden by the hacky error recovery code. Here’s the story of it.
Repository mirror and CI is basically a bunch of shell scripts with Python helpers run via a cronjob (repo-mirror-ci code). The scripts are responsible for syncing the lot of public Gentoo repositories, generating caches for them, publishing them onto our mirror repositories, and finally running pkgcheck on the Gentoo repository. Most of the “unexpected” error handling is set -e -x, with a dumb logging to a file, and mailing on a cronjob failure. Some common errors are handled gracefully though — sync errors, pkgcheck failures and so on.
The whole cascade started when Git was upgraded on the server. The upgrade involved a change in behavior where git checkout -- ${branch} stopped working; you could only specify files after the --. The fix was trivial enough.
However, once the issue was fixed I’ve started periodically seeing sync failures from the Gentoo repository. The scripts had a very dumb way of handling sync failures: if syncing failed, they removed the local copy entirely and tried again. This generally made sense — say, if upstream renamed the main branch, git pull would fail but a fresh clone would be a cheap fix. However, the Gentoo repository is quite big and when it gets removed due to sync failure, cloning it afresh from the Gentoo infrastructure failed.
So when it failed, I did a quick hack — I’ve cloned the repository manually from GitHub, replaced the remote and put it in place. Problem solved. Except a while later, the same issue surfaced. This time I kept an additional local clone, so I wouldn’t have to fetch it from server, and added it again. But then, it got removed once more, and this was really getting tedious.
What I have assumed then is that the repository is failing to sync due to some temporary problems, either network or Infrastructure related. If that were the case, it really made no sense to remove it and clone afresh. On top of that, since we are sunsetting support for third-party repositories anyway, there is no need for automatic recovery from issues such as branch name changes. So I removed that logic, to have sync fail immediately, without removing the local copy.
Now, this had important consequences. Previously, any failed sync would result in the repository being removed and cloned again, leaving no trace of the original error. On top of that, a logic stopping the script early when the Gentoo repository failed meant that the actual error wasn’t even saved, leaving me only with the subsequent clone failures.
When the sync failed again (and of course it did), I was able to actually investigate what was wrong. What actually happened is that the repository wasn’t on a branch — the checkout was detached at some commit. Initially, I assumed this was some fluke, perhaps also related to the Git upgrade. I’ve switched manually to master, and that fixed it. Then it broke again. And again.
So far I’ve been mostly dealing with the failures asynchronously — I wasn’t around at the time of the initial failure, and only started working on it after a few failed runs. However, finally the issue resurfaced so fast that I was able to connect the dots. The problem likely happened immediately after gentoo-ci hit an issue, and bisected it! So I’ve started suspecting that there is another issue in the scripts, perhaps another case of missed --, but I couldn’t find anything relevant.
Finally, I’ve started looking at the post-bisect code. What we were doing is calling git rev-parse HEAD prior to bisect, and then using that result in git checkout. This obviously meant that after every bisect, we ended up with detached tree, i.e. precisely the issue I was seeing. So why didn’t I notice this before?
Of course, because of the sync error handling. Once bisect broke the repository, next sync failed and the repository got cloned again, and we never noticed anything was wrong. We only started noticing once cloning started failing. So after a few days of confusion and false leads, I finally fixed a bug that was present for over 7 years in production code, and caused the Gentoo repository to be cloned over and over again whenever any bad commit happened.
July 26 2025
EPYTEST_PLUGINS and other goodies now in Gentoo
If you are following the gentoo-dev mailing list, you may have noticed that there’s been a fair number of patches sent for the Python eclasses recently. Most of them have been centered on pytest support. Long story short, I’ve came up with what I believed to be a reasonably good design, and decided it’s time to stop manually repeating all the good practices in every ebuild separately.
In this post, I am going to shortly summarize all the recently added options. As always, they are all also documented in the Gentoo Python Guide.
The pytest test loader defaults to automatically loading all the plugins installed to the system. While this is usually quite convenient, especially when you’re testing in a virtual environment, it can get quite messy when you’re testing against system packages and end up with lots of different plugins installed. The results can range from slowing tests down to completely breaking the test suite.
Our initial attempts to contain the situation were based on maintaining a list of known-bad plugins and explicitly disabling their autoloading. The list of disabled plugins has gotten quite long by now. It includes both plugins that were known to frequently break tests, and these that frequently resulted in automagic dependencies.
While the opt-out approach allowed us to resolve the worst issues, it only worked when we knew about a particular issue. So naturally we’d miss some rarer issue, and learn only when arch testing workflows were failing, or users reported issues. And of course, we would still be loading loads of unnecessary plugins at the cost of performance.
So, we started disabling autoloading entirely, using PYTEST_DISABLE_PLUGIN_AUTOLOAD environment variable. At first we only used it when we needed to, however over time we’ve started using it almost everywhere — after all, we don’t want the test suites to suddenly start failing because of a new pytest plugin installed.
For a long time, I have been hesitant to disable autoloading by default. My main concern was that it’s easy to miss a missing plugin. Say, if you ended up failing to load pytest-asyncio or a similar plugin, all the asynchronous tests would simply be skipped (verbosely, but it’s still easy to miss among the flood of warnings). However, eventually we started treating this warning as an error (and then pytest started doing the same upstream), and I have decided that going opt-in is worth the risk. After all, we were already disabling it all over the place anyway.
EPYTEST_PLUGINSDisabling plugin autoloading is only the first part of the solution. Once you disabled autoloading, you need to load the plugins explicitly — it’s not sufficient anymore to add them as test dependencies, you also need to add a bunch of -p switches. And then, you need to keep maintaining both dependencies and pytest switches in sync. So you’d end up with bits like:
BDEPEND="
test? (
dev-python/flaky[${PYTHON_USEDEP}]
dev-python/pytest-asyncio[${PYTHON_USEDEP}]
dev-python/pytest-timeout[${PYTHON_USEDEP}]
)
"
distutils_enable_tests pytest
python_test() {
local -x PYTEST_DISABLE_PLUGIN_AUTOLOAD=1
epytest -p asyncio -p flaky -p timeout
}
Not very efficient, right? The idea then is to replace all that with a single EPYTEST_PLUGINS variable:
EPYTEST_PLUGINS=( flaky pytest-{asyncio,timeout} )
distutils_enable_tests pytest
And that’s it! EPYTEST_PLUGINS takes a bunch of Gentoo package names (without category — almost all of them reside in dev-python/, and we can special-case the few that do not), distutils_enable_tests adds the dependencies and epytest (in the default python_test() implementation) disables autoloading and passes the necessary flags.
Now, what’s really cool is that the function will automatically determine the correct argument values! This can be especially important if entry point names change between package versions — and upstreams generally don’t consider this an issue, since autoloading isn’t affected.
Going towards no autoloading by defaultOkay, that gives us a nice way of specifying which plugins to load. However, weren’t we talking of disabling autoloading by default?
Well, yes — and the intent is that it’s going to be disabled by default in EAPI 9. However, until then there’s a simple solution we encourage everyone to use: set an empty EPYTEST_PLUGINS. So:
EPYTEST_PLUGINS=() distutils_enable_tests pytest
…and that’s it. When it’s set to an empty list, autoloading is disabled. When it’s unset, it is enabled for backwards compatibility. And the next pkgcheck release is going to suggest it:
dev-python/a2wsgi EPyTestPluginsSuggestion: version 1.10.10: EPYTEST_PLUGINS can be used to control pytest plugins loadedEPYTEST_PLUGIN* to deal with special cases
While the basic feature is neat, it is not a golden bullet. The approach used is insufficient for some packages, most notably pytest plugins that run a pytest subprocesses without appropriate -p options, and expect plugins to be autoloaded there. However, after some more fiddling we arrived at three helpful features:
- EPYTEST_PLUGIN_LOAD_VIA_ENV that switches explicit plugin loading from -p arguments to PYTEST_PLUGINS environment variable. This greatly increases the chance that subprocesses will load the specified plugins as well, though it is more likely to cause issues such as plugins being loaded twice (and therefore is not the default). And as a nicety, the eclass takes care of finding out the correct values, again.
- EPYTEST_PLUGIN_AUTOLOAD to reenable autoloading, effectively making EPYTEST_PLUGINS responsible only for adding dependencies. It’s really intended to be used as a last resort, and mostly for future EAPIs when autoloading will be disabled by default.
- Additionally, EPYTEST_PLUGINS can accept the name of the package itself (i.e. ${PN}) — in which case it will not add a dependency, but load the just-built plugin.
How useful is that? Compare:
BDEPEND="
test? (
dev-python/pytest-datadir[${PYTHON_USEDEP}]
)
"
distutils_enable_tests pytest
python_test() {
local -x PYTEST_DISABLE_PLUGIN_AUTOLOAD=1
local -x PYTEST_PLUGINS=pytest_datadir.plugin,pytest_regressions.plugin
epytest
}
…and:
EPYTEST_PLUGINS=( "${PN}" pytest-datadir )
EPYTEST_PLUGIN_LOAD_VIA_ENV=1
distutils_enable_tests pytest
Old and new bits: common plugins
The eclass already had some bits related to enabling common plugins. Given that EPYTEST_PLUGINS only takes care of loading plugins, but not passing specific arguments to them, they are still meaningful. Furthermore, we’ve added EPYTEST_RERUNS.
The current list is:
- EPYTEST_RERUNS=... that takes a number of reruns and uses pytest-rerunfailures to retry failing tests the specified number of times.
- EPYTEST_TIMEOUT=... that takes a number of seconds and uses pytest-timeout to force a timeout if a single test does not complete within the specified time.
- EPYTEST_XDIST=1 that enables parallel testing using pytest-xdist, if the user allows multiple test jobs. The number of test jobs can be controlled (by the user) by setting EPYTEST_JOBS with a fallback to inferring from MAKEOPTS (setting to 1 disables the plugin entirely).
The variables automatically add the needed plugin, so they do not need to be repeated in EPYTEST_PLUGINS.
JUnit XML output and gpy-junit2deselectAs an extra treat, we ask pytest to generate a JUnit-style XML output for each test run that can be used for machine processing of test results. gpyutils now supply a gpy-junit2deselect tool that can parse this XML and output a handy EPYTEST_DESELECT for the failing tests:
$ gpy-junit2deselect /tmp/portage/dev-python/aiohttp-3.12.14/temp/pytest-xml/python3.13-QFr.xml EPYTEST_DESELECT=( tests/test_connector.py::test_tcp_connector_ssl_shutdown_timeout_nonzero_passed tests/test_connector.py::test_tcp_connector_ssl_shutdown_timeout_passed_to_create_connection tests/test_connector.py::test_tcp_connector_ssl_shutdown_timeout_zero_not_passed )
While it doesn’t replace due diligence, it can help you update long lists of deselects. As a bonus, it automatically collapses deselects to test functions, classes and files when all matching tests fail.
hypothesis-gentoo to deal with health check nightmareHypothesis is a popular Python fuzz testing library. Unfortunately, it has one feature that, while useful upstream, is pretty annoying to downstream testers: health checks.
The idea behind health checks is to make sure that fuzz testing remains efficient. For example, Hypothesis is going to fail if the routine used to generate examples is too slow. And as you can guess, “too slow” is more likely to happen on a busy Gentoo system than on dedicated upstream CI. Not to mention some upstreams plain ignore health check failures if they happen rarely.
Given how often this broke for us, we have requested an option to disable Hypothesis health checks long ago. Unfortunately, upstream’s answer can be summarized as: “it’s up to packages using Hypothesis to provide such an option, and you should not be running fuzz testing downstream anyway”. Easy to say.
Well, obviously we are not going to pursue every single package using Hypothesis to add a profile with health checks disabled. We did report health check failures sometimes, and sometimes got no response at all. And skipping these tests is not really an option, given that often there are no other tests for a given function, and even if there are — it’s just going to be a maintenance nightmare.
I’ve finally figured out that we can create a Hypothesis plugin — now hypothesis-gentoo — that provides a dedicated “gentoo” profile with all health checks disabled, and then we can simply use this profile in epytest. And how do we know that Hypothesis is used? Of course we look at EPYTEST_PLUGINS! All pieces fall into place. It’s not 100% foolproof, but health check problems aren’t that common either.
SummaryI have to say that I really like what we achieved here. Over the years, we learned a lot about pytest, and used that knowledge to improve testing in Gentoo. And after repeating the same patterns for years, we have finally replaced them with eclass functions that can largely work out of the box. This is a major step forward.
If you are following the gentoo-dev mailing list, you may have noticed that there’s been a fair number of patches sent for the Python eclasses recently. Most of them have been centered on pytest support. Long story short, I’ve came up with what I believed to be a reasonably good design, and decided it’s time to stop manually repeating all the good practices in every ebuild separately.
In this post, I am going to shortly summarize all the recently added options. As always, they are all also documented in the Gentoo Python Guide.
The unceasing fight against plugin autoloading
The pytest test loader defaults to automatically loading all the plugins installed to the system. While this is usually quite convenient, especially when you’re testing in a virtual environment, it can get quite messy when you’re testing against system packages and end up with lots of different plugins installed. The results can range from slowing tests down to completely breaking the test suite.
Our initial attempts to contain the situation were based on maintaining a list of known-bad plugins and explicitly disabling their autoloading. The list of disabled plugins has gotten quite long by now. It includes both plugins that were known to frequently break tests, and these that frequently resulted in automagic dependencies.
While the opt-out approach allowed us to resolve the worst issues, it only worked when we knew about a particular issue. So naturally we’d miss some rarer issue, and learn only when arch testing workflows were failing, or users reported issues. And of course, we would still be loading loads of unnecessary plugins at the cost of performance.
So, we started disabling autoloading entirely, using PYTEST_DISABLE_PLUGIN_AUTOLOAD environment variable. At first we only used it when we needed to, however over time we’ve started using it almost everywhere — after all, we don’t want the test suites to suddenly start failing because of a new pytest plugin installed.
For a long time, I have been hesitant to disable autoloading by default. My main concern was that it’s easy to miss a missing plugin. Say, if you ended up failing to load pytest-asyncio or a similar plugin, all the asynchronous tests would simply be skipped (verbosely, but it’s still easy to miss among the flood of warnings). However, eventually we started treating this warning as an error (and then pytest started doing the same upstream), and I have decided that going opt-in is worth the risk. After all, we were already disabling it all over the place anyway.
EPYTEST_PLUGINS
Disabling plugin autoloading is only the first part of the solution. Once you disabled autoloading, you need to load the plugins explicitly — it’s not sufficient anymore to add them as test dependencies, you also need to add a bunch of -p switches. And then, you need to keep maintaining both dependencies and pytest switches in sync. So you’d end up with bits like:
BDEPEND="
test? (
dev-python/flaky[${PYTHON_USEDEP}]
dev-python/pytest-asyncio[${PYTHON_USEDEP}]
dev-python/pytest-timeout[${PYTHON_USEDEP}]
)
"
distutils_enable_tests pytest
python_test() {
local -x PYTEST_DISABLE_PLUGIN_AUTOLOAD=1
epytest -p asyncio -p flaky -p timeout
}
Not very efficient, right? The idea then is to replace all that with a single EPYTEST_PLUGINS variable:
EPYTEST_PLUGINS=( flaky pytest-{asyncio,timeout} )
distutils_enable_tests pytest
And that’s it! EPYTEST_PLUGINS takes a bunch of Gentoo package names (without category — almost all of them reside in dev-python/, and we can special-case the few that do not), distutils_enable_tests adds the dependencies and epytest (in the default python_test() implementation) disables autoloading and passes the necessary flags.
Now, what’s really cool is that the function will automatically determine the correct argument values! This can be especially important if entry point names change between package versions — and upstreams generally don’t consider this an issue, since autoloading isn’t affected.
Going towards no autoloading by default
Okay, that gives us a nice way of specifying which plugins to load. However, weren’t we talking of disabling autoloading by default?
Well, yes — and the intent is that it’s going to be disabled by default in EAPI 9. However, until then there’s a simple solution we encourage everyone to use: set an empty EPYTEST_PLUGINS. So:
EPYTEST_PLUGINS=() distutils_enable_tests pytest
…and that’s it. When it’s set to an empty list, autoloading is disabled. When it’s unset, it is enabled for backwards compatibility. And the next pkgcheck release is going to suggest it:
dev-python/a2wsgi EPyTestPluginsSuggestion: version 1.10.10: EPYTEST_PLUGINS can be used to control pytest plugins loaded
EPYTEST_PLUGIN* to deal with special cases
While the basic feature is neat, it is not a golden bullet. The approach used is insufficient for some packages, most notably pytest plugins that run a pytest subprocesses without appropriate -p options, and expect plugins to be autoloaded there. However, after some more fiddling we arrived at three helpful features:
- EPYTEST_PLUGIN_LOAD_VIA_ENV that switches explicit plugin loading from -p arguments to PYTEST_PLUGINS environment variable. This greatly increases the chance that subprocesses will load the specified plugins as well, though it is more likely to cause issues such as plugins being loaded twice (and therefore is not the default). And as a nicety, the eclass takes care of finding out the correct values, again.
- EPYTEST_PLUGIN_AUTOLOAD to reenable autoloading, effectively making EPYTEST_PLUGINS responsible only for adding dependencies. It’s really intended to be used as a last resort, and mostly for future EAPIs when autoloading will be disabled by default.
- Additionally, EPYTEST_PLUGINS can accept the name of the package itself (i.e. ${PN}) — in which case it will not add a dependency, but load the just-built plugin.
How useful is that? Compare:
BDEPEND="
test? (
dev-python/pytest-datadir[${PYTHON_USEDEP}]
)
"
distutils_enable_tests pytest
python_test() {
local -x PYTEST_DISABLE_PLUGIN_AUTOLOAD=1
local -x PYTEST_PLUGINS=pytest_datadir.plugin,pytest_regressions.plugin
epytest
}
…and:
EPYTEST_PLUGINS=( "${PN}" pytest-datadir )
EPYTEST_PLUGIN_LOAD_VIA_ENV=1
distutils_enable_tests pytest
Old and new bits: common plugins
The eclass already had some bits related to enabling common plugins. Given that EPYTEST_PLUGINS only takes care of loading plugins, but not passing specific arguments to them, they are still meaningful. Furthermore, we’ve added EPYTEST_RERUNS.
The current list is:
- EPYTEST_RERUNS=... that takes a number of reruns and uses pytest-rerunfailures to retry failing tests the specified number of times.
- EPYTEST_TIMEOUT=... that takes a number of seconds and uses pytest-timeout to force a timeout if a single test does not complete within the specified time.
- EPYTEST_XDIST=1 that enables parallel testing using pytest-xdist, if the user allows multiple test jobs. The number of test jobs can be controlled (by the user) by setting EPYTEST_JOBS with a fallback to inferring from MAKEOPTS (setting to 1 disables the plugin entirely).
The variables automatically add the needed plugin, so they do not need to be repeated in EPYTEST_PLUGINS.
JUnit XML output and gpy-junit2deselect
As an extra treat, we ask pytest to generate a JUnit-style XML output for each test run that can be used for machine processing of test results. gpyutils now supply a gpy-junit2deselect tool that can parse this XML and output a handy EPYTEST_DESELECT for the failing tests:
$ gpy-junit2deselect /tmp/portage/dev-python/aiohttp-3.12.14/temp/pytest-xml/python3.13-QFr.xml EPYTEST_DESELECT=( tests/test_connector.py::test_tcp_connector_ssl_shutdown_timeout_nonzero_passed tests/test_connector.py::test_tcp_connector_ssl_shutdown_timeout_passed_to_create_connection tests/test_connector.py::test_tcp_connector_ssl_shutdown_timeout_zero_not_passed )
While it doesn’t replace due diligence, it can help you update long lists of deselects. As a bonus, it automatically collapses deselects to test functions, classes and files when all matching tests fail.
hypothesis-gentoo to deal with health check nightmare
Hypothesis is a popular Python fuzz testing library. Unfortunately, it has one feature that, while useful upstream, is pretty annoying to downstream testers: health checks.
The idea behind health checks is to make sure that fuzz testing remains efficient. For example, Hypothesis is going to fail if the routine used to generate examples is too slow. And as you can guess, “too slow” is more likely to happen on a busy Gentoo system than on dedicated upstream CI. Not to mention some upstreams plain ignore health check failures if they happen rarely.
Given how often this broke for us, we have requested an option to disable Hypothesis health checks long ago. Unfortunately, upstream’s answer can be summarized as: “it’s up to packages using Hypothesis to provide such an option, and you should not be running fuzz testing downstream anyway”. Easy to say.
Well, obviously we are not going to pursue every single package using Hypothesis to add a profile with health checks disabled. We did report health check failures sometimes, and sometimes got no response at all. And skipping these tests is not really an option, given that often there are no other tests for a given function, and even if there are — it’s just going to be a maintenance nightmare.
I’ve finally figured out that we can create a Hypothesis plugin — now hypothesis-gentoo — that provides a dedicated “gentoo” profile with all health checks disabled, and then we can simply use this profile in epytest. And how do we know that Hypothesis is used? Of course we look at EPYTEST_PLUGINS! All pieces fall into place. It’s not 100% foolproof, but health check problems aren’t that common either.
Summary
I have to say that I really like what we achieved here. Over the years, we learned a lot about pytest, and used that knowledge to improve testing in Gentoo. And after repeating the same patterns for years, we have finally replaced them with eclass functions that can largely work out of the box. This is a major step forward.
April 30 2025
Urgent - OSU Open Source Lab needs your help
♦ Oregon State University’s Open Source Lab (OSL) has been a major supporter of Gentoo Linux and many other software projects for years. It is currently hosting several of our infrastructure servers as well as development machines for exotic architectures, and is critical for Gentoo operation.
Due to drops in sponsor contributions, OSL has been operating at loss for a while, with the OSU College of Engineering picking up the rest of the bill. Now, university funding has been cut, this is not possible anymore, and unless US$ 250.000 can be provided within the next two weeks OSL will have to shut down. The details can be found in a blog post of Lance Albertson, the director of OSL.
Please, if you value and use Gentoo Linux or any of the other projects that OSL has been supporting, and if you are in a position to make funds available, if this is true for the company you work for, etc … contact the address in the blog post. Obviously, long-term corporate sponsorships would here serve best - for what it’s worth, OSL developers have ended up at almost every big US tech corporation by now. Right now probably everything helps though.
Oregon State University’s Open Source Lab (OSL) has been a major supporter
of Gentoo Linux and many other software projects for years.
It is currently hosting several of our infrastructure servers as well as development machines for exotic
architectures, and is critical for Gentoo operation.
Due to drops in sponsor contributions, OSL has been operating at loss for a while, with the OSU College of Engineering picking up the rest of the bill. Now, university funding has been cut, this is not possible anymore, and unless US$ 250.000 can be provided within the next two weeks OSL will have to shut down. The details can be found in a blog post of Lance Albertson, the director of OSL.
Please, if you value and use Gentoo Linux or any of the other projects that OSL has been supporting, and if you are in a position to make funds available, if this is true for the company you work for, etc … contact the address in the blog post. Obviously, long-term corporate sponsorships would here serve best - for what it’s worth, OSL developers have ended up at almost every big US tech corporation by now. Right now probably everything helps though.
February 20 2025
Bootable Gentoo QCOW2 disk images - ready for the cloud!
♦ We are very happy to announce new official downloads on our website and our mirrors: Gentoo for amd64 (x86-64) and arm64 (aarch64), as immediately bootable disk images in qemu’s QCOW2 format! The images, updated weekly, include an EFI boot partition and a fully functional Gentoo installation; either with no network activated but a password-less root login on the console (“no root pw”), or with network activated, all accounts initially locked, but cloud-init running on boot (“cloud-init”). Enjoy, and read on for more!
Questions and answers How can I quickly test the images?We recommend using the “no root password” images and qemu system emulation. Both amd64 and arm64 images have all the necessary drivers ready for that. Boot them up, use as login name “root”, and you will immediately get a fully functional Gentoo shell. The set of installed packages is similar to that of an administration or rescue system, with a focus more on network environment and less on exotic hardware. Of course you can emerge whatever you need though, and binary package sources are already configured too.
What settings do I need for qemu?You need qemu with the target architecture (aarch64 or x86_64) enabled in QEMU_SOFTMMU_TARGETS, and the UEFI firmware.
app-emulation/qemu sys-firmware/edk2-bin
You should disable the useflag “pin-upstream-blobs” on qemu and update edk2-bin at least to the 2024 version. Also, since you probably want to use KVM hardware acceleration for the virtualization, make sure that your kernel supports that and that your current user is in the kvm group.
For testing the amd64 (x86-64) images, a command line could look like this, configuring 8G RAM and 4 CPU threads with KVM acceleration:
qemu-system-x86_64 \
-m 8G -smp 4 -cpu host -accel kvm -vga virtio -smbios type=0,uefi=on \
-drive if=pflash,unit=0,readonly=on,file=/usr/share/edk2/OvmfX64/OVMF_CODE_4M.qcow2,format=qcow2 \
-drive file=di-amd64-console.qcow2 &
For testing the arm64 (aarch64) images, a command line could look like this:
qemu-system-aarch64 \
-machine virt -cpu neoverse-v1 -m 8G -smp 4 -device virtio-gpu-pci -device usb-ehci -device usb-kbd \
-drive if=pflash,unit=0,readonly=on,file=/usr/share/edk2/ArmVirtQemu-AARCH64/QEMU_EFI.qcow2 \
-drive file=di-arm64-console.qcow2 &
Please consult the qemu documentation for more details.
Can I install the images onto a real harddisk / SSD?Sure. Gentoo can do anything. The limitations are:
- you need a disk with sector size 512 bytes (otherwise the partition table of the image file will not work), see the “SSZ” value in the following example:
pinacolada ~ # blockdev --report /dev/sdb RO RA SSZ BSZ StartSec Size Device rw 256 512 4096 0 4000787030016 /dev/sdb
- your machine must be able to boot via UEFI (no legacy boot)
- you may have to adapt the configuration yourself to disks, hardware, …
So, this is an expert workflow.
Assuming your disk is /dev/sdb and has a size of at least 20GByte, you can then use the utility qemu-img to decompress the image onto the raw device. Warning, this obviously overwrites the first 20Gbyte of /dev/sdb (and with that the existing boot sector and partition table):
qemu-img convert -O raw di-amd64-console.qcow2 /dev/sdb
Afterwards, you can and should extend the new root partition with xfs_growfs, create an additional swap partition behind it, possibly adapt /etc/fstab and the grub configuration, …
If you are familiar with partitioning and handling disk images you can for sure imagine more workflow variants; you might find also the qemu-nbd tool interesting.
So what are the cloud-init images good for?Well, for the cloud. Or more precisely, for any environment where a configuration data source for cloud-init is available. If this is already provided for you, the image should work out of the box. If not, well, you can provide the configuration data manually, but be warned that this is a non-trivial task.
Are you planning to support further architectures?Eventually yes, in particular (EFI) riscv64 and loongarch64.
Are you planning to support legacy boot?No, since the placement of the bootloader outside the file system complicates things.
How about disks with 4096 byte sectors?Well… let’s see how much demand this feature finds. If enough people are interested, we should be able to generate an alternative image with a corresponding partition table.
Why XFS as file system?It has some features that ext4 is sorely missing (reflinks and copy-on-write), but at the same time is rock-solid and reliable.
We are very happy to announce new official
downloads on our website and our mirrors: Gentoo for amd64 (x86-64) and arm64 (aarch64),
as immediately bootable disk images in qemu’s QCOW2 format! The images, updated weekly,
include an EFI boot partition and a fully functional Gentoo installation; either with no
network activated but a password-less root login on the console (“no root pw”), or with
network activated, all accounts initially locked, but
cloud-init running on boot
(“cloud-init”). Enjoy, and
read on for more!
Questions and answers
How can I quickly test the images?
We recommend using the “no root password” images and qemu system emulation. Both amd64 and arm64 images have all the necessary drivers ready for that. Boot them up, use as login name “root”, and you will immediately get a fully functional Gentoo shell. The set of installed packages is similar to that of an administration or rescue system, with a focus more on network environment and less on exotic hardware. Of course you can emerge whatever you need though, and binary package sources are already configured too.
What settings do I need for qemu?
You need qemu with the target architecture (aarch64 or x86_64) enabled in QEMU_SOFTMMU_TARGETS, and the UEFI firmware.
app-emulation/qemu sys-firmware/edk2-bin
You should disable the useflag “pin-upstream-blobs” on qemu and update edk2-bin at least to the 2024 version. Also, since you probably want to use KVM hardware acceleration for the virtualization, make sure that your kernel supports that and that your current user is in the kvm group.
For testing the amd64 (x86-64) images, a command line could look like this, configuring 8G RAM and 4 CPU threads with KVM acceleration:
qemu-system-x86_64 \
-m 8G -smp 4 -cpu host -accel kvm -vga virtio -smbios type=0,uefi=on \
-drive if=pflash,unit=0,readonly=on,file=/usr/share/edk2/OvmfX64/OVMF_CODE_4M.qcow2,format=qcow2 \
-drive file=di-amd64-console.qcow2 &
For testing the arm64 (aarch64) images, a command line could look like this:
qemu-system-aarch64 \
-machine virt -cpu neoverse-v1 -m 8G -smp 4 -device virtio-gpu-pci -device usb-ehci -device usb-kbd \
-drive if=pflash,unit=0,readonly=on,file=/usr/share/edk2/ArmVirtQemu-AARCH64/QEMU_EFI.qcow2 \
-drive file=di-arm64-console.qcow2 &
Please consult the qemu documentation for more details.
Can I install the images onto a real harddisk / SSD?
Sure. Gentoo can do anything. The limitations are:
- you need a disk with sector size 512 bytes (otherwise the partition table of the image file will not work), see the “SSZ” value in the following example:
pinacolada ~ # blockdev --report /dev/sdb RO RA SSZ BSZ StartSec Size Device rw 256 512 4096 0 4000787030016 /dev/sdb
- your machine must be able to boot via UEFI (no legacy boot)
- you may have to adapt the configuration yourself to disks, hardware, …
So, this is an expert workflow.
Assuming your disk is /dev/sdb and has a size of at least 20GByte, you can then use the utility qemu-img to decompress the image onto the raw device. Warning, this obviously overwrites the first 20Gbyte of /dev/sdb (and with that the existing boot sector and partition table):
qemu-img convert -O raw di-amd64-console.qcow2 /dev/sdb
Afterwards, you can and should extend the new root partition with xfs_growfs, create an additional swap partition behind it, possibly adapt /etc/fstab and the grub configuration, …
If you are familiar with partitioning and handling disk images you can for sure imagine more workflow variants; you might find also the qemu-nbd tool interesting.
So what are the cloud-init images good for?
Well, for the cloud. Or more precisely, for any environment where a configuration data source for cloud-init is available. If this is already provided for you, the image should work out of the box. If not, well, you can provide the configuration data manually, but be warned that this is a non-trivial task.
Are you planning to support further architectures?
Eventually yes, in particular (EFI) riscv64 and loongarch64.
Are you planning to support legacy boot?
No, since the placement of the bootloader outside the file system complicates things.
How about disks with 4096 byte sectors?
Well… let’s see how much demand this feature finds. If enough people are interested, we should be able to generate an alternative image with a corresponding partition table.
Why XFS as file system?
It has some features that ext4 is sorely missing (reflinks and copy-on-write), but at the same time is rock-solid and reliable.
February 01 2025
Tinderbox shutdown
Due to the lack of hardware, the Tinderbox (and CI) service is no longer operational.
I would like to take this opportunity to thank all the people who have always seen the Tinderbox as a valuable resource and who have promptly addressed bugs, significantly improving the quality of the packages we have in Portage as well as the user experience.
Due to the lack of hardware, the Tinderbox (and CI) service is no longer operational.
I would like to take this opportunity to thank all the people who have always seen the Tinderbox as a valuable resource and who have promptly addressed bugs, significantly improving the quality of the packages we have in Portage as well as the user experience.
January 05 2025
2024 in retrospect & happy new year 2025!
♦ Happy New Year 2025! Once again, a lot has happened over the past months, in Gentoo and otherwise. Our fireworks were a bit early this year with the stabilization of GCC 14 in November, after a huge amount of preparations and bug fixing via the Modern C initiative. A lot of other programming language ecosystems also saw significant improvements. As always here we’re going to revisit all the exciting news from our favourite Linux distribution.
Gentoo in numbersThe number of commits to the main ::gentoo repository has remained at an overall high level in 2024, with a 2.4% increase from 121000 to 123942. The number of commits by external contributors has grown strongly from 10708 to 12812, now across 421 unique external authors.
The importance of GURU, our user-curated repository with a trusted user model, as entry point for potential developers, is clearly increasing as well. We have had 7517 commits in 2024, a strong growth from 5045 in 2023. The number of contributors to GURU has increased a lot as well, from 158 in 2023 to 241 in 2024. Please join us there and help packaging the latest and greatest software. That’s the ideal preparation for becoming a Gentoo developer!
Activity has picked up speed on the Gentoo bugtracker bugs.gentoo.org, where we’ve had 26123 bug reports created in 2024, compared to 24795 in 2023. The number of resolved bugs shows the same trend, with 25946 in 2024 compared to 22779 in 2023!
New developersIn 2024 we have gained two new Gentoo developers. They are in chronological order:
-
Matt Jolly (kangie): ♦ Matt joined us already in February from Brisbane, Australia - now finally pushing his commits himself, after already taking care of, e.g., Chromium for over half a year. In work life a High Performance Computing systems administrator, in his free time he enjoys playing with his animals, restoring retro computing equipment and gaming consoles (or using them), brewing beer, the beach, or the local climbing gym.
-
Eli Schwartz (eschwartz): ♦ In July, we were able to welcome Eli Schwartz from the USA as new Gentoo developer. A bookworm and big fan of Python, and also an upstream maintainer for the Meson Build System, Eli caught the Linux bug already in highschool. Quoting him, “asking around for recommendations on distro I was recommended either Arch or Gentoo. Originally I made a mistake ;)” … We’re glad this got fixed now!
Let’s now look at the major improvements and news of 2024 in Gentoo.
Distribution-wide Initiatives-
♦ SPI associated project: As of March 2024, Gentoo Linux has become an Associated Project of Software in the Public Interest (SPI). SPI is a non-profit corporation founded to act as a fiscal sponsor for organizations that develop open source software and hardware. It provides services such as accepting donations, holding funds and assets, … and qualifies for 501(c)(3) (U.S. non-profit organization) status. This means that all donations made to SPI and its supported projects are tax deductible for donors in the United States. The intent behind becoming an SPI associated project is to gradually wind down operations of the Gentoo Foundation and transfer its assets to SPI.
-
♦ GCC 14 stabilization: After a huge amount of work to identify and fix bugs and working with upstreams to modernize the overall source code base, see also the Modern C porting initiative, GCC 14 was finally stabilized in November 2024. Same as Clang 16, GCC 14 by default drops support for several long-deprecated and obsolete language constructs, turning decades-long warnings on bad code into fatal errors.
-
Link time optimization (LTO): Lots of progress has been made supporting LTO all across the Gentoo repository.
-
♦ 64bit time_t for 32bit architectures: Various preparations have begun to keep our 32-bit arches going beyond the year 2038. While the GNU C library is ready for that, the switch to a wider time_t data type is an ABI break between userland programs and libraries and needs to be approached carefully, in particular for us as a source-based distribution. Experimental profiles as well as a migration tool are available by now, and will be announced more widely at some point in 2025.
-
New 23.0 profiles: A new profile version 23.0, i.e. a collection of presets and configurations, has become the default setting; the old profiles are deprecated and will be removed in June 2025. The 23.0 profiles fix a lot of internal inconsistencies; for the user, they bring more toolchain hardening (specifically, CET on amd64 and non-lazy runtime binding) and optimization (e.g., packed relative reolcations where supported) by default.
-
♦ Expanded binary package coverage: The binary package coverage for amd64 has been expanded a lot, with, e.g., different use-flag combinations, Python support up to version 3.13, and additional large leaf packages beyond stable as for example current GCC snapshots, all for baseline x86-64 and for x86-64-v3. At the moment, the mirrors hold over 60GByte of package data for amd64 alone.
-
Two additional merchandise stores: We have licensed two additional official merchandise stores, both based in Europe: FreeWear (clothing, mugs, stickers; located in Spain) and BadgeShop (Etsy, Ebay; badges, stickers; located in Romania).
-
♦ Handbook improvements and editor role: The Gentoo handbook has once again been significantly improved (though there is always still more work to be done). We now have special Gentoo handbook editor roles assigned, which makes the handbook editing effectively much more community friendly. This way, a lot of longstanding issues have been fixed, making installing Gentoo easier for everyone.
-
♦ Event presence: At the Free and Open Source Software Conference (FrOSCon) 2024, visitors enjoyed a full weekend of hands-on Gentoo workshops. The workshops covered a wide range of topics, from first installation to ebuild maintenance. We also offered mugs, stickers, t-shirts, and of course the famous self-compiled buttons.
-
Online workshops: Our German support, Gentoo e.V., is grateful to the inspiring speakers of the 6 online workshops in 2024 on various Gentoo topics in German and English. We are looking forward to more exciting events in 2025.
-
♦ Ban on NLP AI tools: Due to serious concerns with current AI and LLM systems, the Gentoo Council has decided to embrace the value of human contributions and adopt the following motion: “It is expressly forbidden to contribute to Gentoo any content that has been created with the assistance of Natural Language Processing artificial intelligence tools. This motion can be revisited, should a case been made over such a tool that does not pose copyright, ethical and quality concerns.”
-
♦ MIPS and Alpha fully supported again: After the big drive to improve Alpha support last year, now we’ve taken care of MIPS keywording all across the Gentoo repository. Thanks to renewed volunteer interest, both arches have returned to the forefront of Gentoo Linux development, with a consistent dependency tree checked and enforced by our continuous integration system. Up-to-date stage builds and the accompanying binary packages are available for both, in the case of MIPS for all three ABI variants o32, n32, and n64 and for both big and little endian, and in the case of Alpha also with a bootable installation CD.
-
♦ 32bit RISC-V now available: Installation stages for 32bit RISC-V systems (rv32) are now available for download, both using hard-float and soft-float ABI, and both using glibc and musl.
-
End of IA-64 (Itanium) support: Following the removal of IA-64 (Itanium) support in the Linux kernel and in glibc, we have dropped all ia64 profiles and keywords.
-
♦ Slotted Rust: The Rust compiler is now slotted, allowing multiple versions to be installed in parallel. This allows us to finally support packages that have a maximum bounded Rust dependency and don’t compile successfully with a newer Rust (yes, that exists!), or ensure that packages use Rust and LLVM versions that fit together (e.g., firefox or chromium).
-
Reworked LLVM handling: In conjunction with this, the LLVM ebuilds and eclasses have been reworked so packages can specify which LLVM versions they support and dependencies are generated accordingly. The eclasses now provide much cleaner LLVM installation information to the build systems of packages, and therefore, e.g., also fix support for cross-compilation
-
Python: In the meantime the default Python version in Gentoo has reached Python 3.12. Additionally we have also Python 3.13 available stable - again we’re fully up to date with upstream.
-
♦ Zig rework and slotting: An updated eclass and ebuild framework for the Zig programming language has been committed that hooks into the ZBS or Zig Build System, allows slotting of Zig versions, allows Zig libraries to be depended on, and even provides some experimental cross-compilation support.
-
Ada support: We finally have Ada support for just about every architecture. Yay!
-
♦ Slotted Guile: The last but not least language that received the slotting treatment has been Guile, with three new eclasses, such that now Guile 1, 2, and 3 and their reverse dependencies can coexist in a Gentoo installation.
-
TeX Live 2023 and 2024: Catching up with our backlog, the packaging of TeX Live has been refreshed; TeX Live 2023 is now marked stable and TeX Live 2024 is marked testing.
-
♦ DTrace 2.0: The famous tracing tool DTrace has come to Gentoo! All required kernel options are already enabled in the newest stable Gentoo distribution kernel; if you are compiling manually, the DTrace ebuild will inform you about required configuration changes. Internally, DTrace 2.0 for Linux builds on the BPF engine of the Linux kernel, so the build installs a gcc that outputs BPF code (which, btw, also is very useful for systemd).
-
KDE Plasma 6 upgrade: Stable Gentoo Linux has upgraded to the new major version of the KDE community desktop environment, KDE Plasma 6. As of end of 2024, in Gentoo stable we have KDE Gear 24.08.3, KDE Frameworks 6.7.0, and KDE Plasma 6.2.4. As always, Gentoo testing follows the newest upstream releases (and using the KDE overlay you can even install from git sources). In the course of KDE package maintenance we have over the past months and years contributed over 240 upstream backports to KDE’s Qt5PatchCollection.
-
Microgram Ramdisk: We have added µgRD (or ugrd) as a lightweight initramfs generator alternative to dracut. As a side effect of this our installkernel mechanism has gained support for arbitrary initramfs generators.
-
Mailing list archives: archives.gentoo.org, our mailing list archive, is back, now with a backend based on public-inbox. Many thanks to upstream there for being very helpful; we were even able to keep all historical links to archived list e-mails working.
-
♦ Ampere Altra Max development server: Arm Ltd. and specifically its Works on Arm team has sent us a fast Ampere Altra Max server to support Gentoo development. With 96 Armv8.2+ 64bit cores, 256 GByte of RAM, and 4 TByte NVMe storage, it is now hosted together with some of our other hardware at OSU Open Source Lab.
-
♦ Income: The Gentoo Foundation took in approximately $20,800 in fiscal year 2024; the dominant part (over 80%) consists of individual cash donations from the community.
-
Expenses: Our expenses in 2024 were, as split into the usual three categories, operating expenses (for services, fees, …) $7,900, only minor capital expenses (for bought assets), and depreciation expenses (value loss of existing assets) $13,300.
-
Balance: We have about $105,000 in the bank as of July 1, 2024 (which is when our fiscal year 2024 ends for accounting purposes). The draft finanical report for 2024 is available on the Gentoo Wiki.
-
Transition to SPI: With the move of our accounts to SPI, see above, the web pages for individual cash donations now direct the funds to SPI earmarked for Gentoo, both for one time and recurrent donations. Donors of ongoing recurrent donations will be contacted and asked to re-arrange over the upcoming months.
As every year, we would like to thank all Gentoo developers and all who have submitted contributions for their relentless everyday Gentoo work. If you are interested and would like to help, please join us to make Gentoo even better! As a volunteer project, Gentoo could not exist without its community.
Happy New Year 2025! Once again, a lot has happened over the past months, in Gentoo and otherwise.
Our fireworks were a bit early this year with the stabilization of GCC 14 in November, after a huge
amount of preparations and bug fixing via the Modern C initiative. A lot of other programming language
ecosystems also saw significant improvements. As always here
we’re going to revisit all the exciting news from our favourite Linux distribution.
Gentoo in numbers
The number of commits to the main ::gentoo repository has remained at an overall high level in 2024, with a 2.4% increase from 121000 to 123942. The number of commits by external contributors has grown strongly from 10708 to 12812, now across 421 unique external authors.
The importance of GURU, our user-curated repository with a trusted user model, as entry point for potential developers, is clearly increasing as well. We have had 7517 commits in 2024, a strong growth from 5045 in 2023. The number of contributors to GURU has increased a lot as well, from 158 in 2023 to 241 in 2024. Please join us there and help packaging the latest and greatest software. That’s the ideal preparation for becoming a Gentoo developer!
Activity has picked up speed on the Gentoo bugtracker bugs.gentoo.org, where we’ve had 26123 bug reports created in 2024, compared to 24795 in 2023. The number of resolved bugs shows the same trend, with 25946 in 2024 compared to 22779 in 2023!
New developers
In 2024 we have gained two new Gentoo developers. They are in chronological order:
-
Matt Jolly (kangie):
Matt joined us already in February from Brisbane, Australia - now finally pushing his commits himself, after already taking care of, e.g., Chromium for over half a year. In work life a High Performance Computing systems administrator, in his free time he enjoys playing with his animals, restoring retro computing equipment and gaming consoles (or using them), brewing beer, the beach, or the local climbing gym.
-
Eli Schwartz (eschwartz):
In July, we were able to welcome Eli Schwartz from the USA as new Gentoo developer. A bookworm and big fan of Python, and also an upstream maintainer for the Meson Build System, Eli caught the Linux bug already in highschool. Quoting him, “asking around for recommendations on distro I was recommended either Arch or Gentoo. Originally I made a mistake ;)” … We’re glad this got fixed now!
Featured changes and news
Let’s now look at the major improvements and news of 2024 in Gentoo.
Distribution-wide Initiatives
-
SPI associated project: As of March 2024, Gentoo Linux has become an Associated Project of
Software in the Public Interest (SPI). SPI is a non-profit corporation founded
to act as a fiscal sponsor for organizations that develop open source software and hardware. It provides services such
as accepting donations, holding funds and assets, … and qualifies for 501(c)(3) (U.S. non-profit organization) status.
This means that all donations made to SPI and its supported projects
are tax deductible for donors in the United States. The intent behind becoming an SPI associated project is to gradually
wind down operations of the Gentoo Foundation and transfer its assets to SPI. -
GCC 14 stabilization: After a huge amount of work to identify and fix bugs and working with upstreams to modernize
the overall source code base, see also the Modern C porting
initiative, GCC 14 was finally stabilized in November 2024. Same as Clang 16, GCC 14 by default drops support for
several long-deprecated and obsolete language constructs, turning decades-long warnings on bad code into fatal errors. -
Link time optimization (LTO): Lots of progress has been made supporting LTO all across the Gentoo repository.
-
64bit time_t for 32bit architectures: Various preparations have begun to keep our 32-bit arches going
beyond the year 2038. While the GNU C library is ready for that, the switch to a wider time_t data type is an
ABI break between userland programs and libraries and needs to be approached carefully, in particular for us as a source-based
distribution. Experimental profiles as well as a migration tool are available by now, and will be announced more widely at
some point in 2025. -
New 23.0 profiles: A new profile version 23.0, i.e. a collection of presets and configurations, has become the default setting; the old profiles are deprecated and will be removed in June 2025. The 23.0 profiles fix a lot of internal inconsistencies; for the user, they bring more toolchain hardening (specifically, CET on amd64 and non-lazy runtime binding) and optimization (e.g., packed relative reolcations where supported) by default.
-
Expanded binary package coverage: The binary package coverage for amd64 has been expanded a lot, with, e.g., different
use-flag combinations, Python support up to version 3.13, and additional large leaf packages beyond stable as for
example current GCC snapshots, all for baseline x86-64 and for x86-64-v3. At the moment, the mirrors hold
over 60GByte of package data for amd64 alone. -
Two additional merchandise stores: We have licensed two additional official merchandise stores, both based in Europe: FreeWear (clothing, mugs, stickers; located in Spain) and BadgeShop (Etsy, Ebay; badges, stickers; located in Romania).
-
Handbook improvements and editor role: The Gentoo handbook has once again been significantly improved (though
there is always still more work to be done). We now have special Gentoo handbook editor roles assigned, which makes
the handbook editing effectively much more community friendly. This way, a lot of longstanding issues have been fixed,
making installing Gentoo easier for everyone. -
Event presence: At the Free and Open Source Software
Conference (FrOSCon) 2024, visitors enjoyed a full weekend of hands-on Gentoo workshops. The workshops covered a wide range
of topics, from first installation to ebuild maintenance.
We also offered mugs, stickers, t-shirts, and of course the famous self-compiled buttons. -
Online workshops: Our German support, Gentoo e.V., is grateful to the inspiring speakers of the 6 online workshops in 2024 on various Gentoo topics in German and English. We are looking forward to more exciting events in 2025.
-
Ban on NLP AI tools: Due to serious concerns with current AI and LLM systems, the Gentoo Council has decided to embrace the value of human contributions and adopt the following motion: “It is expressly forbidden to contribute to Gentoo any content that has been created with the assistance of Natural Language Processing artificial intelligence tools. This motion can be revisited, should a case been made over such a tool that does not pose copyright, ethical and quality concerns.”
Architectures
-
MIPS and Alpha fully supported again: After the big drive to improve
Alpha support last year, now we’ve taken care of
MIPS keywording all across the Gentoo
repository. Thanks to renewed volunteer interest, both arches have returned to the forefront of
Gentoo Linux development, with a consistent dependency tree checked and enforced by our continuous integration system.
Up-to-date stage builds and the accompanying binary packages are available for both, in the case of
MIPS for all three ABI variants
o32, n32, and n64 and for both big and little endian, and in the case of
Alpha also with a
bootable installation CD. -
32bit RISC-V now available:
Installation stages for 32bit RISC-V systems (rv32) are now available
for download, both using hard-float and soft-float ABI, and both using glibc and musl. -
End of IA-64 (Itanium) support: Following the removal of IA-64 (Itanium) support in the Linux kernel and in glibc, we have dropped all ia64 profiles and keywords.
Packages
-
Slotted Rust: The Rust compiler is now slotted, allowing multiple versions to be installed in parallel.
This allows us to finally support packages that have a maximum bounded Rust dependency and don’t compile successfully
with a newer Rust (yes, that exists!), or ensure that packages use Rust and LLVM versions that fit together (e.g., firefox or chromium). -
Reworked LLVM handling: In conjunction with this, the LLVM ebuilds and eclasses have been reworked so packages can specify which LLVM versions they support and dependencies are generated accordingly. The eclasses now provide much cleaner LLVM installation information to the build systems of packages, and therefore, e.g., also fix support for cross-compilation
-
Python: In the meantime the default Python version in Gentoo has reached Python 3.12. Additionally we have also Python 3.13 available stable - again we’re fully up to date with upstream.
-
Zig rework and slotting: An updated eclass and ebuild framework for the Zig
programming language has been committed that hooks into the ZBS or Zig Build System, allows slotting of Zig versions,
allows Zig libraries to be depended on, and even provides some experimental cross-compilation support. -
Ada support: We finally have Ada support for just about every architecture. Yay!
-
Slotted Guile: The last but not least language that received the slotting treatment has been Guile, with three new eclasses,
such that now Guile 1, 2, and 3 and their reverse dependencies can coexist in a Gentoo installation. -
TeX Live 2023 and 2024: Catching up with our backlog, the packaging of TeX Live has been refreshed; TeX Live 2023 is now marked stable and TeX Live 2024 is marked testing.
-
DTrace 2.0: The famous tracing tool DTrace has come to Gentoo!
All required kernel options are already enabled in the newest stable Gentoo distribution kernel; if you
are compiling manually, the DTrace ebuild will inform you about required configuration changes.
Internally, DTrace 2.0 for Linux builds on the BPF
engine of the Linux kernel, so the build installs a gcc that outputs BPF code (which, btw, also is very useful for systemd). -
KDE Plasma 6 upgrade: Stable Gentoo Linux has upgraded to the new major version of the KDE community desktop environment, KDE Plasma 6. As of end of 2024, in Gentoo stable we have KDE Gear 24.08.3, KDE Frameworks 6.7.0, and KDE Plasma 6.2.4. As always, Gentoo testing follows the newest upstream releases (and using the KDE overlay you can even install from git sources). In the course of KDE package maintenance we have over the past months and years contributed over 240 upstream backports to KDE’s Qt5PatchCollection.
-
Microgram Ramdisk: We have added µgRD (or ugrd) as a lightweight initramfs generator alternative to dracut. As a side effect of this our installkernel mechanism has gained support for arbitrary initramfs generators.
Physical and Software Infrastructure
-
Mailing list archives: archives.gentoo.org, our mailing list archive, is back, now with a backend based on public-inbox. Many thanks to upstream there for being very helpful; we were even able to keep all historical links to archived list e-mails working.
-
Ampere Altra Max development server: Arm Ltd. and
specifically its Works on Arm team
has sent us a fast Ampere Altra Max
server to support Gentoo development. With 96 Armv8.2+ 64bit cores, 256 GByte of
RAM, and 4 TByte NVMe storage, it is now hosted together with some of our other hardware at
OSU Open Source Lab.
Finances of the Gentoo Foundation
-
Income: The Gentoo Foundation took in approximately $20,800 in fiscal year 2024;
the dominant part (over 80%) consists of individual cash donations from the community. -
Expenses: Our expenses in 2024 were, as split into the usual three categories, operating expenses (for services, fees, …) $7,900, only minor capital expenses (for bought assets), and depreciation expenses (value loss of existing assets) $13,300.
-
Balance: We have about $105,000 in the bank as of July 1, 2024 (which is when our fiscal year 2024 ends for accounting purposes). The draft finanical report for 2024 is available on the Gentoo Wiki.
-
Transition to SPI: With the move of our accounts to SPI, see above, the web pages for individual cash donations now direct the funds to SPI earmarked for Gentoo, both for one time and recurrent donations. Donors of ongoing recurrent donations will be contacted and asked to re-arrange over the upcoming months.
Thank you!
As every year, we would like to thank all Gentoo developers and all who have submitted contributions for their relentless everyday Gentoo work. If you are interested and would like to help, please join us to make Gentoo even better! As a volunteer project, Gentoo could not exist without its community.
December 29 2024
FOSDEM 2025
♦
It’s FOSDEM time again! Join us at Université Libre de Bruxelles, Campus du Solbosch, in Brussels, Belgium. The upcoming FOSDEM 2025 will be held on February 1st and 2nd 2025. Our developers will be happy to greet all open source enthusiasts at our Gentoo stand (exact location still to be announced), which we will share this year with then Gentoo-based Flatcar Container Linux. Of course there’s also the chance to celebrate 25 years of compiling! Visit this year’s wiki page to see who’s coming and for more practical information.
It’s FOSDEM time again! Join us at Université Libre de Bruxelles, Campus du Solbosch, in Brussels, Belgium. The upcoming FOSDEM 2025 will be held on February 1st and 2nd 2025. Our developers will be happy to greet all open source enthusiasts at our Gentoo stand (exact location still to be announced), which we will share this year with then Gentoo-based Flatcar Container Linux. Of course there’s also the chance to celebrate 25 years of compiling! Visit this year’s wiki page to see who’s coming and for more practical information.
December 20 2024
Poetry(-core), or the ultimate footgun
I’ve been complaining about the Poetry project a lot, in particular about its use (or more precisely, the use of poetry-core) as a build system. In fact, it pretty much became a synonym of a footgun for me — and whenever I’m about to package some project using poetry-core, or switching to it, I’ve learned to expect some predictable mistake. I suppose the time has come to note all these pitfalls in a single blog post.
The nightmarish caret operatorOne of the first things Poetry teaches us is to pin dependencies, SemVer-style. Well, I’m not complaining. I suppose it’s a reasonable compromise between pinning exact versions (which just asks for dependency conflicts between different packages), and leaving user at the mercy of breaking changes in dependencies. The problem is, Poetry teaches us to treat these pins in a wholesale, one-size-fits-all manner.
What I’m talking about is the (in)famous caret operator. I mean, I suppose it’s quite convenient for the general case of semantic versioning, where e.g. ^1.2.3 is handy short for >=1.2.3,<2.0.0, and works quite well for the non-exactly-SemVer case of ^0.2.3 for >=0.2.3,<0.3.0. However, the way it is presented as a panacea means that most of the time people use it for all their dependencies, whether it is meaningful there or not.
So some pins are correct, some are too strict and others are too lax. In the end, you get the worst of both worlds: you annoy distro packagers like us who have to keep relaxing your dependencies, and you don’t help users who still get incidental breakage. Some people even use the caret operator for packages that clearly don’t fit it at all. My favorite example is the equivalent of the following dependency:
tzdata = "^2023.3"
This actually suffers from two problems. Firstly, this package clearly uses CalVer rather than SemVer, so pinning to 2023 seems fishy. Secondly, since we are talking about timezone data, there is really no point in pinning at all — on the contrary, you always want to use up-to-date timezone data.
The misleading include keyWhen people want to control which files are included in the source distributions, they resort to the include and exclude keys. And they add “obvious” blocks like the following:
include = [
"CHANGELOG",
"README.md",
"LICENSE",
]
Except that this is entirely wrong! A plain entry in the include key is included both in source and in binary distribution. Or, to put it more clearly, this code causes the following files to be installed:
/usr/lib/python3.12/site-packages/CHANGELOG /usr/lib/python3.12/site-packages/LICENSE /usr/lib/python3.12/site-packages/README.md
What you need to do instead is to annotate every file with the desired format, i.e.:
include = [
{ path = "CHANGELOG", format = "sdist" },
{ path = "README.md", format = "sdist" },
{ path = "LICENSE", format = "sdist" },
]
Yes, this is absolutely confusing and counterintuitive. On top of that, even today the first example in the linked documentation is clearly wrong. And people keep repeating this mistake over and over again — I know because I keep sending pull requests fixing them, and there is no end to them! In fact, I’ve even seen people adding additional entries without the format just below entries that did have it!
Schrödinger’s optional dependencyPoetry has a custom way of declaring optional dependencies. You declare them just like a regular dependency, and add an optional key to it, e.g.:
[tool.poetry.dependencies]
python = "^3.7"
filetype = "^1.0.7"
deprecation = "^2.1.0"
# yaml-plugin extra
"ruamel.yaml" = {version = "^0.16.12", optional = true}
Well, so that last dependency is optional, right? Well, not necessarily! It is not, unless you actually add it to some dependency group, such as:
[tool.poetry.extras] yaml-plugin = ["ruamel.yaml"]
And again, this weird behavior leads to real problems. If you declare a dependency as optional, but forget to add it to some group, Poetry will just silently treat it as a required dependency. And this is really easy to miss, unless you actually look at the generated wheel metadata. A bug about confusing handling of optional dependencies has been filed back in 2020.
SummaryThese are the handful of common issues I’ve repeatedly seen happening when people tried to use poetry-core as a build system. Sure, other PEP 517 backends aren’t perfect and have their own issues. For one, setuptools pretty much consists of tons of legacy, buggy code, deprecated bits everyone uses anyway, and is barely kept alive these days. People also fall into pitfalls there.
However, I have never seen any other Python or non-Python build system that would be as counterintuitive and mistake-prone as Poetry is. On top of that, implementing PEP 621 (the standard for pyproject.toml pretty much every other PEP 517 backend follows) took 3 years — and even today, Poetry still defaults to their own, nonstandard configuration format.
Whenever I criticize Poetry, people ask me about the alternatives. For completeness, let me repeat my PEP517 backend recommendations here:
For pure Python packages: use either flit-core (lightweight, simple, no dependencies), or hatchling (popular and quite powerful, and we have to deal with its disadvantages anyway). For Python packages with C extensions, meson-python combines the power and correctness of Meson with good Python integration. For Python packages with Rust extensions, Maturin is the way to go.
I’ve been complaining about the Poetry project a lot, in particular about its use (or more precisely, the use of poetry-core) as a build system. In fact, it pretty much became a synonym of a footgun for me — and whenever I’m about to package some project using poetry-core, or switching to it, I’ve learned to expect some predictable mistake. I suppose the time has come to note all these pitfalls in a single blog post.
The nightmarish caret operator
One of the first things Poetry teaches us is to pin dependencies, SemVer-style. Well, I’m not complaining. I suppose it’s a reasonable compromise between pinning exact versions (which just asks for dependency conflicts between different packages), and leaving user at the mercy of breaking changes in dependencies. The problem is, Poetry teaches us to treat these pins in a wholesale, one-size-fits-all manner.
What I’m talking about is the (in)famous caret operator. I mean, I suppose it’s quite convenient for the general case of semantic versioning, where e.g. ^1.2.3 is handy short for >=1.2.3,<2.0.0, and works quite well for the non-exactly-SemVer case of ^0.2.3 for >=0.2.3,<0.3.0. However, the way it is presented as a panacea means that most of the time people use it for all their dependencies, whether it is meaningful there or not.
So some pins are correct, some are too strict and others are too lax. In the end, you get the worst of both worlds: you annoy distro packagers like us who have to keep relaxing your dependencies, and you don’t help users who still get incidental breakage. Some people even use the caret operator for packages that clearly don’t fit it at all. My favorite example is the equivalent of the following dependency:
tzdata = "^2023.3"
This actually suffers from two problems. Firstly, this package clearly uses CalVer rather than SemVer, so pinning to 2023 seems fishy. Secondly, since we are talking about timezone data, there is really no point in pinning at all — on the contrary, you always want to use up-to-date timezone data.
The misleading include key
When people want to control which files are included in the source distributions, they resort to the include and exclude keys. And they add “obvious” blocks like the following:
include = [
"CHANGELOG",
"README.md",
"LICENSE",
]
Except that this is entirely wrong! A plain entry in the include key is included both in source and in binary distribution. Or, to put it more clearly, this code causes the following files to be installed:
/usr/lib/python3.12/site-packages/CHANGELOG /usr/lib/python3.12/site-packages/LICENSE /usr/lib/python3.12/site-packages/README.md
What you need to do instead is to annotate every file with the desired format, i.e.:
include = [
{ path = "CHANGELOG", format = "sdist" },
{ path = "README.md", format = "sdist" },
{ path = "LICENSE", format = "sdist" },
]
Yes, this is absolutely confusing and counterintuitive. On top of that, even today the first example in the linked documentation is clearly wrong. And people keep repeating this mistake over and over again — I know because I keep sending pull requests fixing them, and there is no end to them! In fact, I’ve even seen people adding additional entries without the format just below entries that did have it!
Schrödinger’s optional dependency
Poetry has a custom way of declaring optional dependencies. You declare them just like a regular dependency, and add an optional key to it, e.g.:
[tool.poetry.dependencies]
python = "^3.7"
filetype = "^1.0.7"
deprecation = "^2.1.0"
# yaml-plugin extra
"ruamel.yaml" = {version = "^0.16.12", optional = true}
Well, so that last dependency is optional, right? Well, not necessarily! It is not, unless you actually add it to some dependency group, such as:
[tool.poetry.extras] yaml-plugin = ["ruamel.yaml"]
And again, this weird behavior leads to real problems. If you declare a dependency as optional, but forget to add it to some group, Poetry will just silently treat it as a required dependency. And this is really easy to miss, unless you actually look at the generated wheel metadata. A bug about confusing handling of optional dependencies has been filed back in 2020.
Summary
These are the handful of common issues I’ve repeatedly seen happening when people tried to use poetry-core as a build system. Sure, other PEP 517 backends aren’t perfect and have their own issues. For one, setuptools pretty much consists of tons of legacy, buggy code, deprecated bits everyone uses anyway, and is barely kept alive these days. People also fall into pitfalls there.
However, I have never seen any other Python or non-Python build system that would be as counterintuitive and mistake-prone as Poetry is. On top of that, implementing PEP 621 (the standard for pyproject.toml pretty much every other PEP 517 backend follows) took 3 years — and even today, Poetry still defaults to their own, nonstandard configuration format.
Whenever I criticize Poetry, people ask me about the alternatives. For completeness, let me repeat my PEP517 backend recommendations here:
For pure Python packages: use either flit-core (lightweight, simple, no dependencies), or hatchling (popular and quite powerful, and we have to deal with its disadvantages anyway). For Python packages with C extensions, meson-python combines the power and correctness of Meson with good Python integration. For Python packages with Rust extensions, Maturin is the way to go.
November 10 2024
The peculiar world of Gentoo package testing
While discussing uv tests with Fedora developers, it occurred to me how different your average Gentoo testing environment is — not only from these used upstream, but also from these used by other Linux distributions. This article will be dedicated exactly to that: to pointing out how it’s different, what does that imply and why I think it’s not a bad thing.
The first important thing about Gentoo is that it is a source-first distribution. The best way to explain this is to compare it with your average “binary” distribution.
In a “binary” distribution, source and binary packages are somewhat isolated from one another. Developers work with source packages (recipes, specs) and use them to build binary packages — either directly, or via an automation. Then the binary packages hit repositories. The end users usually do not interface with sources at all — may well not even be aware that such a thing exists.
In Gentoo, on the other hand, source packages are the first degree citizens. All users use source repositories, and can optionally use local or remote binary package repositories. I think the best way of thinking about binary packages is: as a form of “cache”.
If the package manager is configured to use binary packages, it attempts to find a package that matches the build parameters — the package version, USE flags, dependencies. If it finds a match, it can use it. If it doesn’t, it just proceeds with building from source. If configured to do so, it may write a binary package as a side effect of that — almost literally cache it. It can also be set to create a binary package without installing it (pre-fill the “cache”). It should hardly surprise anyone at this point that the default local binary packages repository is under the /var/cache tree.
A side implication of this is that the binary packages provided by Gentoo are a subset of all packages available — and on top of that, only a small number of viable package configurations are covered by the official packages.
The build phasesThe source build in Gentoo is split into a few phases. The central phases that are of interest here are largely inspired by how autotools-based packages were built. These are:
- src_configure — meant to pass input parameters to the build system, and get it to perform necessary platform checks. Usually involves invoking a configure script, or an equivalent action of a build system such as CMake, Meson or another.
- src_compile — meant to execute the bulk of compilation, and leave the artifacts in the build tree. Usually involves invoking a builder such as make or ninja.
- src_test — meant to run the test suite, if the user wishes testing to be done. Usually involves invoking the check or test target.
- src_install — meant to install the artifacts and other files from the work directory into a staging directory (not the live system). The files can be afterwards transferred to the live system and/or packed into a binary package. Usually involves invoking the install target.
Clearly, it’s very similar to how you’d compile and install software yourself: configure, build, optionally test before installing, and then install.
Of course, this process is not really one-size-fits-all. For example, the modern Python packages no longer even try fitting into it. Instead, we build the wheel in the PEP 517 blackbox manner, and install it to a temporary directory straight in the compile phase. As a result, the test phase is run with a locally-installed package (relying on the logic from virtual environments), and the install phase merely moves files around for the package manager to pick them up.
The implications for testingThe key takeaways of the process are these:
- The test phase is run inside the working tree, against package that was just built but not installed into the live system.
- All the package’s build-time dependencies should be installed into the live system.
- However, the system may contain any other packages, including packages that could affect the just-built package or its test suite in unpredictable ways.
- As a corollary, the live system may or may not contain a copy of the package in question already installed. And if it does, it may be a different version, and/or a different build configuration.
All of these mean trouble. Sometimes random packages will cause the tests to fail as false positives — and sometimes they make also them wrongly pass or get ignored. Sometimes packages already installed will prevent developers from seeing that they’ve missed some dependency. Often mismatches between installed packages will make reproducing issues hard. On top of that, sometimes an earlier installed copy of the package will leak into the test environment, causing confusing problems.
If there are so many negatives, why do we do it then? Because there is also a very important positive: the packages are being tested as close to the production environment as possible (short of actually installing them — but we want to test before that happens). Presence of a certain package may cause tests to fail as false positive — but it may also uncover an actual runtime issue, one that would not otherwise be caught until it actually broke production. And I’m not talking theoretical here. While I don’t have any links handy right now, over and over again we were hitting real issues — either these that haven’t been caught by upstream CI setups yet, or that simply couldn’t have been caught in an idealized test environment.
So yeah, testing stuff this way may be quite a pain, and a source of huge frustration with the constant stream of false positives. But it’s also an important strength that no idealized — not to say “lazy” — test environment can bring. Add to that the fact that a fair number of Gentoo users are actually installing their packages with tests enabled, and you get testing on a huge variety of systems, with different architectures, dependency versions and USE flags, configuration files… and on top of that, a knack for hacking. Yeah, people hate us for finding all these bugs they’d rather not hear about.
While discussing uv tests with Fedora developers, it occurred to me how different your average Gentoo testing environment is — not only from these used upstream, but also from these used by other Linux distributions. This article will be dedicated exactly to that: to pointing out how it’s different, what does that imply and why I think it’s not a bad thing.
Gentoo as a source-first distro
The first important thing about Gentoo is that it is a source-first distribution. The best way to explain this is to compare it with your average “binary” distribution.
In a “binary” distribution, source and binary packages are somewhat isolated from one another. Developers work with source packages (recipes, specs) and use them to build binary packages — either directly, or via an automation. Then the binary packages hit repositories. The end users usually do not interface with sources at all — may well not even be aware that such a thing exists.
In Gentoo, on the other hand, source packages are the first degree citizens. All users use source repositories, and can optionally use local or remote binary package repositories. I think the best way of thinking about binary packages is: as a form of “cache”.
If the package manager is configured to use binary packages, it attempts to find a package that matches the build parameters — the package version, USE flags, dependencies. If it finds a match, it can use it. If it doesn’t, it just proceeds with building from source. If configured to do so, it may write a binary package as a side effect of that — almost literally cache it. It can also be set to create a binary package without installing it (pre-fill the “cache”). It should hardly surprise anyone at this point that the default local binary packages repository is under the /var/cache tree.
A side implication of this is that the binary packages provided by Gentoo are a subset of all packages available — and on top of that, only a small number of viable package configurations are covered by the official packages.
The build phases
The source build in Gentoo is split into a few phases. The central phases that are of interest here are largely inspired by how autotools-based packages were built. These are:
- src_configure — meant to pass input parameters to the build system, and get it to perform necessary platform checks. Usually involves invoking a configure script, or an equivalent action of a build system such as CMake, Meson or another.
- src_compile — meant to execute the bulk of compilation, and leave the artifacts in the build tree. Usually involves invoking a builder such as make or ninja.
- src_test — meant to run the test suite, if the user wishes testing to be done. Usually involves invoking the check or test target.
- src_install — meant to install the artifacts and other files from the work directory into a staging directory (not the live system). The files can be afterwards transferred to the live system and/or packed into a binary package. Usually involves invoking the install target.
Clearly, it’s very similar to how you’d compile and install software yourself: configure, build, optionally test before installing, and then install.
Of course, this process is not really one-size-fits-all. For example, the modern Python packages no longer even try fitting into it. Instead, we build the wheel in the PEP 517 blackbox manner, and install it to a temporary directory straight in the compile phase. As a result, the test phase is run with a locally-installed package (relying on the logic from virtual environments), and the install phase merely moves files around for the package manager to pick them up.
The implications for testing
The key takeaways of the process are these:
- The test phase is run inside the working tree, against package that was just built but not installed into the live system.
- All the package’s build-time dependencies should be installed into the live system.
- However, the system may contain any other packages, including packages that could affect the just-built package or its test suite in unpredictable ways.
- As a corollary, the live system may or may not contain a copy of the package in question already installed. And if it does, it may be a different version, and/or a different build configuration.
All of these mean trouble. Sometimes random packages will cause the tests to fail as false positives — and sometimes they make also them wrongly pass or get ignored. Sometimes packages already installed will prevent developers from seeing that they’ve missed some dependency. Often mismatches between installed packages will make reproducing issues hard. On top of that, sometimes an earlier installed copy of the package will leak into the test environment, causing confusing problems.
If there are so many negatives, why do we do it then? Because there is also a very important positive: the packages are being tested as close to the production environment as possible (short of actually installing them — but we want to test before that happens). Presence of a certain package may cause tests to fail as false positive — but it may also uncover an actual runtime issue, one that would not otherwise be caught until it actually broke production. And I’m not talking theoretical here. While I don’t have any links handy right now, over and over again we were hitting real issues — either these that haven’t been caught by upstream CI setups yet, or that simply couldn’t have been caught in an idealized test environment.
So yeah, testing stuff this way may be quite a pain, and a source of huge frustration with the constant stream of false positives. But it’s also an important strength that no idealized — not to say “lazy” — test environment can bring. Add to that the fact that a fair number of Gentoo users are actually installing their packages with tests enabled, and you get testing on a huge variety of systems, with different architectures, dependency versions and USE flags, configuration files… and on top of that, a knack for hacking. Yeah, people hate us for finding all these bugs they’d rather not hear about.
November 09 2024
Ready-to-boot, fresh & experimental Gentoo QCOW2 disk images
Recently I've been experimenting with Catalyst, the tool that generates stages and iso files for Gentoo's Release Engineering team. The first, still very experimental result is now available for download - a bootable hard disk image in QEmu's qcow2 format that immediately drops you into a fully working Gentoo environment.
Feel free to download it and try it out, either this first upload or any future weekly build from the amd64 release file directories. The files are not linked on the www.gentoo.org webserver since I consider them not really finished yet, but instead experimental and under development. You can use a QEmu commandline as for example
qemu-system-x86_64 \
-m 8G -smbios type=0,uefi=on -bios /usr/share/edk2-ovmf/OVMF_CODE.fd \
-smp 4 -cpu host -accel kvm -vga virtio -drive file=di.qcow2 &
where the last "file" argument specifies the file that you downloaded, for testing.
The current download initially does not start any network login services such as sshd, but has an empty root password for logging in on the console - this is why I call it a "console" type disk image. Future variants I'm planning include for example a "cloud-init" type, which sets up log-in credentials and further configuration as supplied by a cloud provider.
Cheers and enjoy!
Recently I've been experimenting with Catalyst, the tool that generates stages and iso files for Gentoo's Release Engineering team. The first, still very experimental result is now available for download - a bootable hard disk image in QEmu's qcow2 format that immediately drops you into a fully working Gentoo environment.
Feel free to download it and try it out, either this first upload or any future weekly build from the amd64 release file directories. The files are not linked on the www.gentoo.org webserver since I consider them not really finished yet, but instead experimental and under development. You can use a QEmu commandline as for example
qemu-system-x86_64 \
-m 8G -smbios type=0,uefi=on -bios /usr/share/edk2-ovmf/OVMF_CODE.fd \
-smp 4 -cpu host -accel kvm -vga virtio -drive file=di.qcow2 &
where the last "file" argument specifies the file that you downloaded, for testing.
The current download initially does not start any network login services such as sshd, but has an empty root password for logging in on the console - this is why I call it a "console" type disk image. Future variants I'm planning include for example a "cloud-init" type, which sets up log-in credentials and further configuration as supplied by a cloud provider.
Cheers and enjoy!
October 23 2024
DTrace 2.0 for Gentoo
♦
The real, mythical DTrace comes to Gentoo! Need to dynamically trace your kernel or userspace programs, with rainbows, ponies, and unicorns - and all entirely safely and in production?! Gentoo is now ready for that! Just emerge dev-debug/dtrace and you’re all set. All required kernel options are already enabled in the newest stable Gentoo distribution kernel; if you are compiling manually, the DTrace ebuild will inform you about required configuration changes. Internally, DTrace 2.0 for Linux builds on the BPF engine of the Linux kernel, so don’t be surprised if the awesome cross-compilation features of Gentoo are used to install a gcc that outputs BPF code (which, btw, also comes in very handy for sys-apps/systemd).
Documentation? Sure, there’s lots of it. You can start with our DTrace wiki page, the DTrace for Linux page on GitHub, or the original documentation for Illumos. Enjoy!
The real, mythical DTrace comes to Gentoo! Need to dynamically trace your kernel or userspace programs, with rainbows, ponies, and unicorns - and all entirely safely and in production?! Gentoo is now ready for that! Just emerge dev-debug/dtrace and you’re all set. All required kernel options are already enabled in the newest stable Gentoo distribution kernel; if you are compiling manually, the DTrace ebuild will inform you about required configuration changes. Internally, DTrace 2.0 for Linux builds on the BPF engine of the Linux kernel, so don’t be surprised if the awesome cross-compilation features of Gentoo are used to install a gcc that outputs BPF code (which, btw, also comes in very handy for sys-apps/systemd).
Documentation? Sure, there’s lots of it. You can start with our DTrace wiki page, the DTrace for Linux page on GitHub, or the original documentation for Illumos. Enjoy!
October 07 2024
Arm Ltd. provides fast Ampere Altra Max server for Gentoo
♦
We’re very happy to announce that Arm Ltd. and specifically its Works on Arm team has sent us a fast Ampere Altra Max server to support Gentoo development. With 96 Armv8.2+ 64bit cores, 256 GByte of RAM, and 4 TByte NVMe storage, it is now hosted together with some of our other hardware at OSU Open Source Lab. The machine will be a clear boost to our future arm64 (aarch64) and arm (32bit) support, via installation stage builds and binary packages, architecture testing of Gentoo packages, as well as our close work with upstream projects such as GCC and glibc. Thank you!
We’re very happy to announce that Arm Ltd. and specifically its Works on Arm team has sent us a fast Ampere Altra Max server to support Gentoo development. With 96 Armv8.2+ 64bit cores, 256 GByte of RAM, and 4 TByte NVMe storage, it is now hosted together with some of our other hardware at OSU Open Source Lab. The machine will be a clear boost to our future arm64 (aarch64) and arm (32bit) support, via installation stage builds and binary packages, architecture testing of Gentoo packages, as well as our close work with upstream projects such as GCC and glibc. Thank you!