public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: Stefano Brivio <sbrivio@redhat.com>
Cc: Matej Hrica <mhrica@redhat.com>, passt-dev@passt.top
Subject: Re: [PATCH RFT 2/5] tcp: Reset STALLED flag on ACK only, check for pending socket data
Date: Mon, 25 Sep 2023 13:07:24 +1000	[thread overview]
Message-ID: <ZRD5bMw8Ad1/h+JA@zatzit> (raw)
In-Reply-To: <20230922220610.58767-3-sbrivio@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 3832 bytes --]

I think the change itself here is sound, but I have some nits to pick
with the description and reasoning.

On Sat, Sep 23, 2023 at 12:06:07AM +0200, Stefano Brivio wrote:
> In tcp_tap_handler(), we shouldn't reset the STALLED flag (indicating
> that we ran out of tap-side window space, or that all available
> socket data is already in flight -- better names welcome!

Hmm.. when you put it like that it makes me wonder if those two quite
different conditions really need the same handling.  Hrm.. I guess
both conditions mean that we can't accept data from the socket, even
if it's availble.

> ) on any
> event: do that only if the first packet in a batch has the ACK flag
> set.

"First packet in a batch" may not be accurate here - we're looking at
whichever packet we were up to before calling data_from_tap().  There
could have been earlier packets in the receive batch that were already
processed.

This also raises the question of why the first data packet should be
particularly privileged here.  I'm wondering if what we really want to
check is whether data_from_tap() advanced the ack pointer at all.

I'm not clear on when the th->ack check would ever fail in practice:
aren't the only normal packets in a TCP connection without ACK the
initial SYN or an RST?  We've handled the SYN case earlier, so should
we just have a blanket case above this that if we get a packet with
!ACK, we reset the connection?

> Make sure we check for pending socket data when we reset it:
> reverting back to level-triggered epoll events, as tcp_epoll_ctl()
> does, isn't guaranteed to actually trigger a socket event.

Which sure seems like a kernel bug.  Some weird edge conditions for
edge-triggered seems expected, but this doesn't seem like valid
level-triggered semantics.

Hmmm... is toggling EPOLLET even what we want.  IIUC, the heart of
what's going on here is that we can't take more data from the socket
until something happens on the tap side (either the window expands, or
it acks some data).  In which case should we be toggling EPOLLIN on
the socket instead?  That seems more explicitly to be saying to the
socket side "we don't currently care if you have data available".

> Further, note that the flag only makes sense once a connection is
> established, so move all this to the right place, which is convenient
> for the next patch, as we want to check if the STALLED flag was set
> before processing any new information about the window size
> advertised by the tap.
> 
> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
> ---
>  tcp.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/tcp.c b/tcp.c
> index aa1c8c9..5528e05 100644
> --- a/tcp.c
> +++ b/tcp.c
> @@ -2572,8 +2572,6 @@ int tcp_tap_handler(struct ctx *c, int af, const void *saddr, const void *daddr,
>  	if (th->ack && !(conn->events & ESTABLISHED))
>  		tcp_update_seqack_from_tap(c, conn, ntohl(th->ack_seq));
>  
> -	conn_flag(c, conn, ~STALLED);
> -
>  	/* Establishing connection from socket */
>  	if (conn->events & SOCK_ACCEPTED) {
>  		if (th->syn && th->ack && !th->fin) {
> @@ -2631,6 +2629,11 @@ int tcp_tap_handler(struct ctx *c, int af, const void *saddr, const void *daddr,
>  	if (conn->seq_ack_to_tap != conn->seq_from_tap)
>  		ack_due = 1;
>  
> +	if ((conn->flags & STALLED) && th->ack) {
> +		conn_flag(c, conn, ~STALLED);
> +		tcp_data_from_sock(c, conn);
> +	}
> +
>  	if ((conn->events & TAP_FIN_RCVD) && !(conn->events & SOCK_FIN_SENT)) {
>  		shutdown(conn->sock, SHUT_WR);
>  		conn_event(c, conn, SOCK_FIN_SENT);

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2023-09-25  3:31 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-22 22:06 [PATCH RFT 0/5] Fixes and a workaround for TCP stalls with small buffers Stefano Brivio
2023-09-22 22:06 ` [PATCH RFT 1/5] tcp: Fix comment to tcp_sock_consume() Stefano Brivio
2023-09-23  2:48   ` David Gibson
2023-09-22 22:06 ` [PATCH RFT 2/5] tcp: Reset STALLED flag on ACK only, check for pending socket data Stefano Brivio
2023-09-25  3:07   ` David Gibson [this message]
2023-09-27 17:05     ` Stefano Brivio
2023-09-28  1:48       ` David Gibson
2023-09-29 15:20         ` Stefano Brivio
2023-10-03  3:20           ` David Gibson
2023-10-05  6:18             ` Stefano Brivio
2023-10-05  7:36               ` David Gibson
2023-09-22 22:06 ` [PATCH RFT 3/5] tcp: Force TCP_WINDOW_CLAMP before resetting STALLED flag Stefano Brivio
2023-09-22 22:31   ` Stefano Brivio
2023-09-23  7:55   ` David Gibson
2023-09-25  4:09   ` David Gibson
2023-09-25  4:10     ` David Gibson
2023-09-25  4:21     ` David Gibson
2023-09-27 17:05       ` Stefano Brivio
2023-09-28  1:51         ` David Gibson
2023-09-22 22:06 ` [PATCH RFT 4/5] tcp, tap: Don't increase tap-side sequence counter for dropped frames Stefano Brivio
2023-09-25  4:47   ` David Gibson
2023-09-27 17:06     ` Stefano Brivio
2023-09-28  1:58       ` David Gibson
2023-09-29 15:19         ` Stefano Brivio
2023-10-03  3:22           ` David Gibson
2023-10-05  6:19             ` Stefano Brivio
2023-10-05  7:38               ` David Gibson
2023-09-22 22:06 ` [PATCH RFT 5/5] passt.1: Add note about tuning rmem_max and wmem_max for throughput Stefano Brivio
2023-09-25  4:57   ` David Gibson
2023-09-27 17:06     ` Stefano Brivio
2023-09-28  2:02       ` David Gibson
2023-09-25  5:52 ` [PATCH RFT 0/5] Fixes and a workaround for TCP stalls with small buffers David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZRD5bMw8Ad1/h+JA@zatzit \
    --to=david@gibson.dropbear.id.au \
    --cc=mhrica@redhat.com \
    --cc=passt-dev@passt.top \
    --cc=sbrivio@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).