From: Stefano Brivio <sbrivio@redhat.com>
To: Jon Maloy <jmaloy@redhat.com>
Cc: passt-dev@passt.top, lvivier@redhat.com, dgibson@redhat.com
Subject: Re: [PATCH v3 1/3] tcp: move seq_to_tap update to when frame is queued
Date: Tue, 14 May 2024 18:48:02 +0200 [thread overview]
Message-ID: <20240514184802.71fb4a91@elisabeth> (raw)
In-Reply-To: <20240511152008.421750-2-jmaloy@redhat.com>
On Sat, 11 May 2024 11:20:06 -0400
Jon Maloy <jmaloy@redhat.com> wrote:
> commit a469fc393fa1 ("tcp, tap: Don't increase tap-side sequence counter for dropped frames")
> delayed update of conn->seq_to_tap until the moment the corresponding
> frame has been successfully pushed out. This has the advantage that we
> immediately can make a new attempt to transmit a frame after a failed
> trasnmit, rather than waiting for the peer to later discover a gap and
> trigger the fast retransmit mechanism to solve the problem.
>
> This approach has turned out to cause a problem with spurious sequence
> number updates during peer-initiated retransmits, and we have realized
> it may not be the best way to solve the above issue.
>
> We now restore the previous method, by updating the said field at the
> moment a frame is added to the outqueue. To retain the advantage of
> having a quick re-attempt based on local failure detection, we now scan
> through the part of the outqueue that had do be dropped, and restore the
> sequence counter for each affected connection to the most appropriate
> value.
>
> Signed-off-by: Jon Maloy <jmaloy@redhat.com>
>
> ---
> v2: - Re-spun loop in tcp_revert_seq() and some other changes based on
> feedback from Stefano Brivio.
> - Added paranoid test to avoid that seq_to_tap becomes lower than
> seq_ack_from_tap.
Should we really fix it up there? More below.
> ---
> tcp.c | 63 ++++++++++++++++++++++++++++++++++++++++++-----------------
> 1 file changed, 45 insertions(+), 18 deletions(-)
>
> diff --git a/tcp.c b/tcp.c
> index 21d0af0..21cbfba 100644
> --- a/tcp.c
> +++ b/tcp.c
> @@ -411,13 +411,14 @@ static int tcp_sock_ns [NUM_PORTS][IP_VERSIONS];
> static union inany_addr low_rtt_dst[LOW_RTT_TABLE_SIZE];
>
> /**
> - * tcp_buf_seq_update - Sequences to update with length of frames once sent
> - * @seq: Pointer to sequence number sent to tap-side, to be updated
> - * @len: TCP payload length
> + * tcp_frame_ref - References needed by queued frames in case we need
I think the name isn't really indicative. If you don't like the
tcp_conn_old_seq name I proposed, maybe something that refers to
sequence numbers being reverted anyway? tcp_conn_revert_seq?
> + * to revert corresponding connection sequence numbers
> + * @conn: Pointer to connection for this frame
> + * @seq: Sequence number of the corresponding frame
> */
> -struct tcp_buf_seq_update {
> - uint32_t *seq;
> - uint16_t len;
> +struct tcp_frame_ref {
> + struct tcp_tap_conn *conn;
> + uint32_t seq;
> };
>
> /* Static buffers */
> @@ -461,7 +462,7 @@ static struct tcp_payload_t tcp4_payload[TCP_FRAMES_MEM];
>
> static_assert(MSS4 <= sizeof(tcp4_payload[0].data), "MSS4 is greater than 65516");
>
> -static struct tcp_buf_seq_update tcp4_seq_update[TCP_FRAMES_MEM];
> +static struct tcp_frame_ref tcp4_frame_ref[TCP_FRAMES_MEM];
> static unsigned int tcp4_payload_used;
>
> static struct tap_hdr tcp4_flags_tap_hdr[TCP_FRAMES_MEM];
> @@ -483,7 +484,7 @@ static struct tcp_payload_t tcp6_payload[TCP_FRAMES_MEM];
>
> static_assert(MSS6 <= sizeof(tcp6_payload[0].data), "MSS6 is greater than 65516");
>
> -static struct tcp_buf_seq_update tcp6_seq_update[TCP_FRAMES_MEM];
> +static struct tcp_frame_ref tcp6_frame_ref[TCP_FRAMES_MEM];
> static unsigned int tcp6_payload_used;
>
> static struct tap_hdr tcp6_flags_tap_hdr[TCP_FRAMES_MEM];
> @@ -1261,25 +1262,50 @@ static void tcp_flags_flush(const struct ctx *c)
> tcp4_flags_used = 0;
> }
>
> +/**
> + * tcp_revert_seq() - Revert affected conn->seq_to_tap after failed transmission
> + * @frames_ref: Array with connection and sequence number data
Nit: it's frame_ref now.
> + * @first: Index of entry corresponding to first dropped frame
> + * @last: Index of entry corresponding to last dropped frame
> + */
> +static void tcp_revert_seq(struct tcp_frame_ref *frame_ref, int first, int last)
> +{
> + struct tcp_tap_conn *conn;
> + int i;
> +
> + for (i = first; i <= last; i++) {
> + conn = frame_ref[i].conn;
> +
> + if (SEQ_LE(conn->seq_to_tap, frame_ref[i].seq))
> + continue;
> +
> + conn->seq_to_tap = frame_ref[i].seq;
So far, it all makes sense to me. Now, to the "paranoid" check you
added here:
> + if (SEQ_GE(conn->seq_to_tap, conn->seq_ack_from_tap))
> + continue;
let's say this is false. How did it happen? Did you actually see that
happening? And in that case,
> + conn->seq_to_tap = conn->seq_ack_from_tap;
should we really fix it up here? If yes, I would add a debug() message
and also a comment indicating that this isn't expected.
> + }
> +}
> +
> /**
> * tcp_payload_flush() - Send out buffers for segments with data
> * @c: Execution context
> */
> static void tcp_payload_flush(const struct ctx *c)
> {
> - unsigned i;
> size_t m;
>
> m = tap_send_frames(c, &tcp6_l2_iov[0][0], TCP_NUM_IOVS,
> tcp6_payload_used);
> - for (i = 0; i < m; i++)
> - *tcp6_seq_update[i].seq += tcp6_seq_update[i].len;
> + if (m != tcp6_payload_used)
> + tcp_revert_seq(tcp6_frame_ref, m, tcp6_payload_used - 1);
> tcp6_payload_used = 0;
>
> m = tap_send_frames(c, &tcp4_l2_iov[0][0], TCP_NUM_IOVS,
> tcp4_payload_used);
> - for (i = 0; i < m; i++)
> - *tcp4_seq_update[i].seq += tcp4_seq_update[i].len;
> + if (m != tcp4_payload_used)
> + tcp_revert_seq(tcp4_frame_ref, m, tcp4_payload_used - 1);
> tcp4_payload_used = 0;
> }
>
> @@ -2129,10 +2155,11 @@ static int tcp_sock_consume(const struct tcp_tap_conn *conn, uint32_t ack_seq)
> static void tcp_data_to_tap(const struct ctx *c, struct tcp_tap_conn *conn,
> ssize_t dlen, int no_csum, uint32_t seq)
> {
> - uint32_t *seq_update = &conn->seq_to_tap;
> struct iovec *iov;
> size_t l4len;
>
> + conn->seq_to_tap = seq + dlen;
> +
> if (CONN_V4(conn)) {
> struct iovec *iov_prev = tcp4_l2_iov[tcp4_payload_used - 1];
> const uint16_t *check = NULL;
> @@ -2142,8 +2169,8 @@ static void tcp_data_to_tap(const struct ctx *c, struct tcp_tap_conn *conn,
> check = &iph->check;
> }
>
> - tcp4_seq_update[tcp4_payload_used].seq = seq_update;
> - tcp4_seq_update[tcp4_payload_used].len = dlen;
> + tcp4_frame_ref[tcp4_payload_used].conn = conn;
> + tcp4_frame_ref[tcp4_payload_used].seq = seq;
>
> iov = tcp4_l2_iov[tcp4_payload_used++];
> l4len = tcp_l2_buf_fill_headers(c, conn, iov, dlen, check, seq);
> @@ -2151,8 +2178,8 @@ static void tcp_data_to_tap(const struct ctx *c, struct tcp_tap_conn *conn,
> if (tcp4_payload_used > TCP_FRAMES_MEM - 1)
> tcp_payload_flush(c);
> } else if (CONN_V6(conn)) {
> - tcp6_seq_update[tcp6_payload_used].seq = seq_update;
> - tcp6_seq_update[tcp6_payload_used].len = dlen;
> + tcp6_frame_ref[tcp6_payload_used].conn = conn;
> + tcp6_frame_ref[tcp6_payload_used].seq = seq;
>
> iov = tcp6_l2_iov[tcp6_payload_used++];
> l4len = tcp_l2_buf_fill_headers(c, conn, iov, dlen, NULL, seq);
The rest looks good to me.
--
Stefano
next prev parent reply other threads:[~2024-05-14 16:48 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-11 15:20 [PATCH v3 0/3] Support for SO_PEEK_OFF socket option Jon Maloy
2024-05-11 15:20 ` [PATCH v3 1/3] tcp: move seq_to_tap update to when frame is queued Jon Maloy
2024-05-13 2:09 ` David Gibson
2024-05-14 16:48 ` Stefano Brivio [this message]
2024-05-11 15:20 ` [PATCH v3 2/3] tcp: leverage support of SO_PEEK_OFF socket option when available Jon Maloy
2024-05-13 2:23 ` David Gibson
2024-05-14 17:22 ` Stefano Brivio
2024-05-14 20:06 ` Jon Maloy
2024-05-14 21:00 ` Stefano Brivio
2024-05-11 15:20 ` [PATCH v3 3/3] tcp: allow retransmit when peer receive window is zero Jon Maloy
2024-05-14 17:46 ` Stefano Brivio
2024-05-14 20:19 ` Jon Maloy
2024-05-14 21:09 ` Stefano Brivio
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240514184802.71fb4a91@elisabeth \
--to=sbrivio@redhat.com \
--cc=dgibson@redhat.com \
--cc=jmaloy@redhat.com \
--cc=lvivier@redhat.com \
--cc=passt-dev@passt.top \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).