CARVIEW |
Navigation Menu
-
Notifications
You must be signed in to change notification settings - Fork 84
server: Fix connection issues when receiving ECONNRESET #123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
14878b8
to
8a6a44e
Compare
Shutdown()
9079746
to
edfef48
Compare
server.go
Outdated
@@ -524,15 +532,14 @@ func (c *serverConn) run(sctx context.Context) { | |||
// TODO(stevvooe): Not wildly clear what we should do in this | |||
// branch. Basically, it means that we are no longer receiving | |||
// requests due to a terminal error. | |||
recvErr = nil // connection is now "closing" | |||
if err == io.EOF || err == io.ErrUnexpectedEOF { | |||
opErr, isOpErr := err.(*net.OpError) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the purpose of this type check? Shouldn't we just be able to use errors.Is
without this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right, updated, thx 😁
60d8f33
to
031309f
Compare
if err != nil { | ||
logrus.WithError(err).Error("error receiving message") | ||
} | ||
logrus.WithError(err).Error("error receiving message") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why remove the nil check though? If this message is needed, then perhaps we need to change sendStatus
to return error to send on recvError
.
It seems to me though that this is trying to catch the shutdown case which this block is purposefully avoiding when there is active connections. If the return after this line is necessary, then the shutdown logic in this goroutine still doesn't seem correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As code at https://github.com/containerd/ttrpc/blob/v1.1.0/server.go#L363 will do the nil check and only send non-nill error to the recvErr
channel, I think it's safe to remove the nil check here.
It seems to me though that this is trying to catch the shutdown case which this block is purposefully avoiding when there is active connections. If the return after this line is necessary, then the shutdown logic in this goroutine still doesn't seem correct.
Yes, the return here is not necessary and should be done by the shutdown logic. I‘ll remove it, thx
It seems like the correct behavior would be to handle the connection since it has already been accepted. In this change, the check is being done before, but the same race is still possible since there is no guarantee in execution order between checking For the |
031309f
to
9ccde4c
Compare
@dmcgowan You are right, the race condition still exists. I found the possible actual issue is |
fd47460
to
8a747cf
Compare
The ttrpc server somtimes receives `ECONNRESET` rather than `EOF` when the client is exited. Such as reading from a closed connection. Handle it properly to avoid goroutine and connection leak. Change-Id: If32711cfc1347dd2da27ca846dd13c3f5af351bb Signed-off-by: liyuxuan.darfux <liyuxuan.darfux@bytedance.com>
8a747cf
to
a03aa04
Compare
Shutdown()
Thanks for the fix to this leak that we've been seeing for a while It still seems wrong to me that the default case is to set recvErr = nil (which keeps this select case from ever unblocking again) and continuing the loop after logging the error. Yes this fixes ECONNRESET but what happens the next time you see an esoteric error? You will have the same problem. Why dont you just 'return' all the time, and only use the if to decide whether or not to log the error? If your concern is not draining the other channels, then the goroutine that is writing to those channels should close them in a deferral like it does here: This would guarantee that if the writer (This goroutine) dies, the channels are closed and the select isn't stuck hanging. |
Hi @benbuzbee, as discussed before, we considered other cases should be handled by the shutdown logic. But I'm agree with you, all errors should return in the block as the related connection will no longer receive requests. The shutdown case is only used for normal connection without any error. cc @dmcgowan |
The ttrpc server somtimes receives
ECONNRESET
rather thanEOF
when the client is exited. Such as reading from a closed connection. Handle it properly to avoid goroutine and connection leak.Below is a debug example:
Currently there's a possible race condition when shutting down the
server, which causes new connection to be added after
Shutdown()