Discussion: what's the best strategy for "nowait" and "select" operations?

So from @wingo's [blog I have learned about Concurrent ML](https://wingolog.org/archives/2017/06/29/a-new-concurrent-ml). (His [concurrency tag](https://wingolog.org/tags/concurrency) is also worth perusing generally.)

The core insight, IIUC, is to take the atomic blocking operations – roughly, the same ones our current [cancellation guarantee](https://trio.readthedocs.io/en/latest/reference-core.html#cancellation-and-primitive-operations) applies to – and systematically split them into a few consistent sub-operations:

* attempt to perform the operation synchronously, possibly wrapping the result in some thunk
* publish that we're waiting for the operation to be doable and should be woken up if it might succeed
* give up on waiting ("unpublish")

We already implicitly have this basic structure open-coded in a bunch of places, e.g. the `_try_sync` helper in `trio/socket.py`, the classes in `trio/_sync.py`, etc. Pretty much anywhere you see the `yield_if_cancelled`/`yield_briefly_no_cancel` pair in trio fits this general pattern, and "unpublish" is basically just our abort callback. So the advantages of reifying this would partly be just to simplify the code by having a single implementation of the overall pattern that we could slot things into – but even more, so because given the above pieces, you can create generic implementations of three variants:

* Try to perform the operation, but fail if it would block (like the `*_nowait` variants that we currently implement in an ad hoc way)
* Perform the operation, blocking as necessary (what we do now)
* A thing like golang's `select`, where you can say "perform exactly one of the following operations: ..."

(The first is done by just calling the "try this" sub-operation; the second you do by trying and then blocking if you fail, in a loop, with the unpublish operation as the abort callback; the third is done by calling a bunch of "try this" sub-operations and if one succeeds you're done, otherwise publish *all* the operations and go to sleep. There's some subtleties around knowing which operation woke you up, and when unpublish happens, etc., but that's the basic idea.)

Right now we have a bunch of manual implementations of `await x()` / `x_nowait()` primitives. It's not clear that we have enough, either; #14 has an example of a case where you semantically need `accept_nowait`, and for HTTP client connection pooling when you pull a connection out of the pool you need something like `receive_some_nowait` to check whether the server has closed it before you try to use it.

Also, a golang-style `select` is potentially quite useful but isn't possible right now (at least for trio's built-in operations, of course you certainly could build a golang-style channels library on top, but then `select` would only work on that library's operations). You can spawn some child tasks to try doing all the things concurrently, but there's no way to make sure that no more than one complete – for that you would need to be able to (1) guarantee that all of the operations really are cleanly cancellable, and (2) perform the cancellation synchronously with the first operation completing, which isn't possible if the last thing it does after committing its work is to call `yield_briefly_no_cancel`.

An ancillary benefit is that if we expose these things as a standard interface to users, then this would also serve as very clear documentation of which the actual atomic cancellable operations are.

But, there are also some issues, I think:

IOCP: the above pattern works for BSD-style non-blocking socket operations, but *not* for Windows IOCP operations (#52). You can implement cancellable blocking operations as IOCP calls (that's basically what they are), and nowait operations using Windows' non-blocking calls, but IOCP has no generic equivalent to epoll to implement wake-me-when-this-non-blocking-call-might-succed, which means that golang-`select` is not possible. All you can do is ask the kernel to start initiating all the operations, and then by the time you find out that, say, your `recv` has finished, then your `send` might also have finished. I guess this might be possible to work around: for `send` I'm pretty sure we *can* treat IOCP as a kind of epoll, and then use non-blocking send. For `recv` I'm not sure if this works and for `accept` I'm pretty sure it doesn't, but for these operations you can more-or-less fake them as being always cancellable by using a little user-space buffer to push back any result that you want to pretend didn't happen. Are there any other cases? Do we need to change the pattern to accomodate this? The pushback trick doesn't seem compatible with a strict separation between "try it" and "wake me when you want to try again" primitives.

The main problem with HTTP connection pooling isn't checking if the socket is readable – we can already do that in the specific case of a raw socket. It's that it isn't something you can reasonably abstract in terms of generic "streams". In particular, if you have an TLS-wrapped socket, then you actually don't want to check if the *TLS* layer has data available, you really do want to check the raw socket underneath. And in any case I'm not sure that this would help make it easier to implement `receive_some_nowait` as a generic feature on streams, because receiving on a TLS stream is a very complex operation that may require things like lock acquisition, and all of the operations above have to be synchronous. So maybe the HTTP connection pool case isn't really a good motivator anyway.

`Lock.acquire` and `Lock.acquire_nowait` are tricky because of fairness (#54); it's not the case that `acquire` is just a loop like `while acquire_nowait fails: sleep until it might succeed`, because that creates a race on wakeup. *I don't think it's possible to implement a fair mutex using just the framework described above*. The problem is that we really need the two operations to be "try it" and "block until the operation is *done*"; a retry loop just doesn't work. So maybe this is actually equivalent to the IOCP case? Maybe we need primitives:

* try it without executing any cancel/schedule point
* publish whatever we need to publish to be woken up (but *don't* actually sleep), with some callbacks for abort, reschedule, and we-got-successfully-rescheduled

I *think* this is flexible enough to implement fair synchronization primitives and handle all the public operations. E.g. for golang-`select` we would want to arrange so that when one operation gets rescheduled, then we immediately abort all the other operations, *before* waiting to actually be woken up – this would need to happen from the context of the task that's handing off the mutex (for example).

But... this *isn't* quite right for stuff like non-blocking socket operations, where you actually need a try-sleep-retry-sleep-retry-... loop. Need to think some more about this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Discussion: what's the best strategy for "nowait" and "select" operations? #242

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Discussion: what's the best strategy for "nowait" and "select" operations? #242

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions