public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: Jon Maloy <jmaloy@redhat.com>
Cc: passt-dev@passt.top, sbrivio@redhat.com, lvivier@redhat.com,
	dgibson@redhat.com
Subject: Re: [PATCH v6 1/3] tcp: move seq_to_tap update to when frame is queued
Date: Mon, 20 May 2024 17:46:40 +1000	[thread overview]
Message-ID: <Zkr_4LkjDImgFqSi@zatzit> (raw)
In-Reply-To: <20240517152414.1188282-2-jmaloy@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 8155 bytes --]

On Fri, May 17, 2024 at 11:24:12AM -0400, Jon Maloy wrote:
> commit a469fc393fa1 ("tcp, tap: Don't increase tap-side sequence counter for dropped frames")
> delayed update of conn->seq_to_tap until the moment the corresponding
> frame has been successfully pushed out. This has the advantage that we
> immediately can make a new attempt to transmit a frame after a failed
> trasnmit, rather than waiting for the peer to later discover a gap and
> trigger the fast retransmit mechanism to solve the problem.
> 
> This approach has turned out to cause a problem with spurious sequence
> number updates during peer-initiated retransmits, and we have realized
> it may not be the best way to solve the above issue.
> 
> We now restore the previous method, by updating the said field at the
> moment a frame is added to the outqueue. To retain the advantage of
> having a quick re-attempt based on local failure detection, we now scan
> through the part of the outqueue that had do be dropped, and restore the
> sequence counter for each affected connection to the most appropriate
> value.
> 
> Signed-off-by: Jon Maloy <jmaloy@redhat.com>
> 
> ---
> v2: - Re-spun loop in tcp_revert_seq() and some other changes based on
>       feedback from Stefano Brivio.
>     - Added paranoid test to avoid that seq_to_tap becomes lower than
>       seq_ack_from_tap.
> 
> v3: - Identical to v2. Called v3 because it was embedded in a series
>       with that version.
> 
> v4: - In tcp_revert_seq(), we read the sequence number from the TCP
>       header instead of keeping a copy in struct tcp_buf_seq_update.
>     - Since the only remaining field in struct tcp_buf_seq_update is
>       a pointer to struct tcp_tap_conn, we eliminate the struct
>       altogether, and make the tcp6/tcp3_buf_seq_update arrays into
>       arrays of said pointer.
>     - Removed 'paranoid' test in tcp_revert_seq. If it happens, it
>       is not fatal, and will be caught by other code anyway.
>     - Separated from the series again.
> 
> v5: - A couple of style issues.
> ---
>  tcp.c | 61 ++++++++++++++++++++++++++++++++++++++---------------------
>  1 file changed, 39 insertions(+), 22 deletions(-)
> 
> diff --git a/tcp.c b/tcp.c
> index 21d0af0..3a2350a 100644
> --- a/tcp.c
> +++ b/tcp.c
> @@ -410,16 +410,6 @@ static int tcp_sock_ns		[NUM_PORTS][IP_VERSIONS];
>   */
>  static union inany_addr low_rtt_dst[LOW_RTT_TABLE_SIZE];
>  
> -/**
> - * tcp_buf_seq_update - Sequences to update with length of frames once sent
> - * @seq:	Pointer to sequence number sent to tap-side, to be updated
> - * @len:	TCP payload length
> - */
> -struct tcp_buf_seq_update {
> -	uint32_t *seq;
> -	uint16_t len;
> -};
> -
>  /* Static buffers */
>  /**
>   * struct tcp_payload_t - TCP header and data to send segments with payload
> @@ -461,7 +451,8 @@ static struct tcp_payload_t	tcp4_payload[TCP_FRAMES_MEM];
>  
>  static_assert(MSS4 <= sizeof(tcp4_payload[0].data), "MSS4 is greater than 65516");
>  
> -static struct tcp_buf_seq_update tcp4_seq_update[TCP_FRAMES_MEM];
> +/* References tracking the owner connection of frames in the tap outqueue */
> +static struct tcp_tap_conn *tcp4_frame_conns[TCP_FRAMES_MEM];
>  static unsigned int tcp4_payload_used;
>  
>  static struct tap_hdr		tcp4_flags_tap_hdr[TCP_FRAMES_MEM];
> @@ -483,7 +474,8 @@ static struct tcp_payload_t	tcp6_payload[TCP_FRAMES_MEM];
>  
>  static_assert(MSS6 <= sizeof(tcp6_payload[0].data), "MSS6 is greater than 65516");
>  
> -static struct tcp_buf_seq_update tcp6_seq_update[TCP_FRAMES_MEM];
> +/* References tracking the owner connection of frames in the tap outqueue */
> +static struct tcp_tap_conn *tcp6_frame_conns[TCP_FRAMES_MEM];
>  static unsigned int tcp6_payload_used;
>  
>  static struct tap_hdr		tcp6_flags_tap_hdr[TCP_FRAMES_MEM];
> @@ -1261,25 +1253,51 @@ static void tcp_flags_flush(const struct ctx *c)
>  	tcp4_flags_used = 0;
>  }
>  
> +/**
> + * tcp_revert_seq() - Revert affected conn->seq_to_tap after failed transmission
> + * @conns:       Array of connection pointers corresponding to queued frames
> + * @frames:      Two-dimensional array containing queued frames with sub-iovs
> + * @num_frames:  Number of entries in the two arrays to be compared
> + */
> +static void tcp_revert_seq(struct tcp_tap_conn **conns, struct iovec (*frames)[TCP_NUM_IOVS],
> +			   int num_frames)
> +{
> +	int i;
> +
> +	for (i = 0; i < num_frames; i++) {
> +		struct tcp_tap_conn *conn = conns[i];
> +		struct tcphdr *th = frames[i][TCP_IOV_PAYLOAD].iov_base;
> +		uint32_t seq = ntohl(th->seq);
> +
> +		if (SEQ_LE(conn->seq_to_tap, seq))
> +			continue;
> +
> +		conn->seq_to_tap = seq;

Not worth a respin, but given the other simplifications to this, it
would be clearer to have:
	if (!SEQ_LE(conn->seq_to_tap, seq))
		conn->seq_to_tao = seq;

Rather than using continue;


> +	}
> +}
> +
>  /**
>   * tcp_payload_flush() - Send out buffers for segments with data
>   * @c:		Execution context
>   */
>  static void tcp_payload_flush(const struct ctx *c)
>  {
> -	unsigned i;
>  	size_t m;
>  
>  	m = tap_send_frames(c, &tcp6_l2_iov[0][0], TCP_NUM_IOVS,
>  			    tcp6_payload_used);
> -	for (i = 0; i < m; i++)
> -		*tcp6_seq_update[i].seq += tcp6_seq_update[i].len;
> +	if (m != tcp6_payload_used) {
> +		tcp_revert_seq(tcp6_frame_conns, &tcp6_l2_iov[m],
> +			       tcp6_payload_used - m);

Hrm.. AFAICT tcp_revert_seq() is using the same indices into conns[]
and frames[].  But here, aren't you passing the frames array from
entry m onwards, but the conns array from 0 onwards?  Meaning that
revert_seq() might use the wrong connections for each frame.  I think
you either need
	tcp_revert_seq(&tcp6_frame_conns[m], &tcp6_l2_iov[m], ...)
Or else pass the unindexed arrays here, and take the start index as a
new parameter to tcp_revert_seq().

> +	}
>  	tcp6_payload_used = 0;
>  
>  	m = tap_send_frames(c, &tcp4_l2_iov[0][0], TCP_NUM_IOVS,
>  			    tcp4_payload_used);
> -	for (i = 0; i < m; i++)
> -		*tcp4_seq_update[i].seq += tcp4_seq_update[i].len;
> +	if (m != tcp4_payload_used) {
> +		tcp_revert_seq(tcp4_frame_conns, &tcp4_l2_iov[m],
> +			       tcp4_payload_used - m);

Same thing here, of course.

> +	}
>  	tcp4_payload_used = 0;
>  }
>  
> @@ -2129,10 +2147,11 @@ static int tcp_sock_consume(const struct tcp_tap_conn *conn, uint32_t ack_seq)
>  static void tcp_data_to_tap(const struct ctx *c, struct tcp_tap_conn *conn,
>  			    ssize_t dlen, int no_csum, uint32_t seq)
>  {
> -	uint32_t *seq_update = &conn->seq_to_tap;
>  	struct iovec *iov;
>  	size_t l4len;
>  
> +	conn->seq_to_tap = seq + dlen;

Now that we update seq_to_tap here, we don't really need seq as a
parameter any more, which would also simplify logic in the caller slightly.

>  	if (CONN_V4(conn)) {
>  		struct iovec *iov_prev = tcp4_l2_iov[tcp4_payload_used - 1];
>  		const uint16_t *check = NULL;
> @@ -2142,8 +2161,7 @@ static void tcp_data_to_tap(const struct ctx *c, struct tcp_tap_conn *conn,
>  			check = &iph->check;
>  		}
>  
> -		tcp4_seq_update[tcp4_payload_used].seq = seq_update;
> -		tcp4_seq_update[tcp4_payload_used].len = dlen;
> +		tcp4_frame_conns[tcp4_payload_used] = conn;
>  
>  		iov = tcp4_l2_iov[tcp4_payload_used++];
>  		l4len = tcp_l2_buf_fill_headers(c, conn, iov, dlen, check, seq);
> @@ -2151,8 +2169,7 @@ static void tcp_data_to_tap(const struct ctx *c, struct tcp_tap_conn *conn,
>  		if (tcp4_payload_used > TCP_FRAMES_MEM - 1)
>  			tcp_payload_flush(c);
>  	} else if (CONN_V6(conn)) {
> -		tcp6_seq_update[tcp6_payload_used].seq = seq_update;
> -		tcp6_seq_update[tcp6_payload_used].len = dlen;
> +		tcp6_frame_conns[tcp6_payload_used] = conn;
>  
>  		iov = tcp6_l2_iov[tcp6_payload_used++];
>  		l4len = tcp_l2_buf_fill_headers(c, conn, iov, dlen, NULL, seq);

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2024-05-20  9:50 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-17 15:24 [PATCH v6 0/3] Support for SO_PEEK_OFF socket option Jon Maloy
2024-05-17 15:24 ` [PATCH v6 1/3] tcp: move seq_to_tap update to when frame is queued Jon Maloy
2024-05-20  7:46   ` David Gibson [this message]
2024-05-17 15:24 ` [PATCH v6 2/3] tcp: leverage support of SO_PEEK_OFF socket option when available Jon Maloy
2024-05-20  8:07   ` David Gibson
2024-05-17 15:24 ` [PATCH v6 3/3] tcp: allow retransmit when peer receive window is zero Jon Maloy
2024-05-21  5:51   ` David Gibson
2024-05-21 22:25     ` Jon Maloy
  -- strict thread matches above, loose matches on Subject: below --
2024-05-17 15:05 [PATCH v6 0/3] Support for SO_PEEK_OFF socket option Jon Maloy
2024-05-17 15:06 ` [PATCH v6 1/3] tcp: move seq_to_tap update to when frame is queued Jon Maloy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zkr_4LkjDImgFqSi@zatzit \
    --to=david@gibson.dropbear.id.au \
    --cc=dgibson@redhat.com \
    --cc=jmaloy@redhat.com \
    --cc=lvivier@redhat.com \
    --cc=passt-dev@passt.top \
    --cc=sbrivio@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).