From: Jon Maloy <jmaloy@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: passt-dev@passt.top, sbrivio@redhat.com, lvivier@redhat.com,
dgibson@redhat.com
Subject: Re: [PATCH v4 1/3] tcp: move seq_to_tap update to when frame is queued
Date: Tue, 4 Jun 2024 13:36:21 -0400 [thread overview]
Message-ID: <26391ddb-bea0-4c7a-abab-823c19897133@redhat.com> (raw)
In-Reply-To: <ZkWIhu28CdM3Kadq@zatzit>
Hi David,
This is the last comment I received from you regarding this patch.
See below for further comment.
On 2024-05-16 00:16, David Gibson wrote:
> On Wed, May 15, 2024 at 10:57:06PM -0400, Jon Maloy wrote:
>>
>> On 2024-05-15 22:24, David Gibson wrote:
>>> On Wed, May 15, 2024 at 11:34:27AM -0400, Jon Maloy wrote:
>>>> commit a469fc393fa1 ("tcp, tap: Don't increase tap-side sequence counter for dropped frames")
>>>> delayed update of conn->seq_to_tap until the moment the corresponding
>>>> frame has been successfully pushed out. This has the advantage that we
>>>> immediately can make a new attempt to transmit a frame after a failed
>>>> trasnmit, rather than waiting for the peer to later discover a gap and
>>>> trigger the fast retransmit mechanism to solve the problem.
>>>>
>>>> This approach has turned out to cause a problem with spurious sequence
>>>> number updates during peer-initiated retransmits, and we have realized
>>>> it may not be the best way to solve the above issue.
>>>>
>>>> We now restore the previous method, by updating the said field at the
>>>> moment a frame is added to the outqueue. To retain the advantage of
>>>> having a quick re-attempt based on local failure detection, we now scan
>>>> through the part of the outqueue that had do be dropped, and restore the
>>>> sequence counter for each affected connection to the most appropriate
>>>> value.
>>>>
>>>> Signed-off-by: Jon Maloy <jmaloy@redhat.com>
>>>>
>>>> ---
>>>> v2: - Re-spun loop in tcp_revert_seq() and some other changes based on
>>>> feedback from Stefano Brivio.
>>>> - Added paranoid test to avoid that seq_to_tap becomes lower than
>>>> seq_ack_from_tap.
>>>>
>>>> v3: - Identical to v2. Called v3 because it was embedded in a series
>>>> with that version.
>>>>
>>>> v4: - In tcp_revert_seq(), we read the sequence number from the TCP
>>>> header instead of keeping a copy in struct tcp_buf_seq_update.
>>>> - Since the only remaining field in struct tcp_buf_seq_update is
>>>> a pointer to struct tcp_tap_conn, we eliminate the struct
>>>> altogether, and make the tcp6/tcp3_buf_seq_update arrays into
>>>> arrays of said pointer.
>>>> - Removed 'paranoid' test in tcp_revert_seq. If it happens, it
>>>> is not fatal, and will be caught by other code anyway.
>>>> - Separated from the series again.
>>>> ---
>>>> tcp.c | 59 +++++++++++++++++++++++++++++++++++++----------------------
>>>> 1 file changed, 37 insertions(+), 22 deletions(-)
>>>>
>>>> diff --git a/tcp.c b/tcp.c
>>>> index 21d0af0..976dba8 100644
>>>> --- a/tcp.c
>>>> +++ b/tcp.c
>>>> @@ -410,16 +410,6 @@ static int tcp_sock_ns [NUM_PORTS][IP_VERSIONS];
>>>> */
>>>> static union inany_addr low_rtt_dst[LOW_RTT_TABLE_SIZE];
>>>> -/**
>>>> - * tcp_buf_seq_update - Sequences to update with length of frames once sent
>>>> - * @seq: Pointer to sequence number sent to tap-side, to be updated
>>>> - * @len: TCP payload length
>>>> - */
>>>> -struct tcp_buf_seq_update {
>>>> - uint32_t *seq;
>>>> - uint16_t len;
>>>> -};
>>>> -
>>>> /* Static buffers */
>>>> /**
>>>> * struct tcp_payload_t - TCP header and data to send segments with payload
>>>> @@ -461,7 +451,8 @@ static struct tcp_payload_t tcp4_payload[TCP_FRAMES_MEM];
>>>> static_assert(MSS4 <= sizeof(tcp4_payload[0].data), "MSS4 is greater than 65516");
>>>> -static struct tcp_buf_seq_update tcp4_seq_update[TCP_FRAMES_MEM];
>>>> +/* References tracking the owner connection of frames in the tap outqueue */
>>>> +static struct tcp_tap_conn *tcp4_frame_conns[TCP_FRAMES_MEM];
>>>> static unsigned int tcp4_payload_used;
>>>> static struct tap_hdr tcp4_flags_tap_hdr[TCP_FRAMES_MEM];
>>>> @@ -483,7 +474,8 @@ static struct tcp_payload_t tcp6_payload[TCP_FRAMES_MEM];
>>>> static_assert(MSS6 <= sizeof(tcp6_payload[0].data), "MSS6 is greater than 65516");
>>>> -static struct tcp_buf_seq_update tcp6_seq_update[TCP_FRAMES_MEM];
>>>> +/* References tracking the owner connection of frames in the tap outqueue */
>>>> +static struct tcp_tap_conn *tcp6_frame_conns[TCP_FRAMES_MEM];
>>>> static unsigned int tcp6_payload_used;
>>>> static struct tap_hdr tcp6_flags_tap_hdr[TCP_FRAMES_MEM];
>>>> @@ -1261,25 +1253,49 @@ static void tcp_flags_flush(const struct ctx *c)
>>>> tcp4_flags_used = 0;
>>>> }
>>>> +/**
>>>> + * tcp_revert_seq() - Revert affected conn->seq_to_tap after failed transmission
>>>> + * @conns: Array of connection pointers corresponding to queued frames
>>>> + * @frames: Two-dimensional array containing queued frames with sub-iovs
>>> You can make the 2d array explicit in the type as:
>>> struct iovec (*frames)[TCP_NUM_IOVS];
>>> See, for example the 'tap_iov' local in udp_tap_send(). (I recommend
>>> the command line tool 'cdecl', also available online at cdecl.org for
>>> working out confusing pointer-to-array types).
>> Nice. I wasn't quite happy with this.
>>>> + * @num_frames: Number of entries in the two arrays to be compared
>>>> + */
>>>> +static void tcp_revert_seq(struct tcp_tap_conn **conns, struct iovec *frames,
>>>> + int num_frames)
>>>> +{
>>>> + int c, f;
>>>> +
>>>> + for (c = 0, f = 0; c < num_frames; c++, f += TCP_NUM_IOVS) {
>>> Nit: I find having the two parallel counters kind of confusing. It
>>> naturally goes away with the type change suggested above, but even
>>> without that I'd prefer an explicit multiply in the body. I strongly
>>> suspect the compiler will be better at working out if the strength
>>> reduction is worth it.
>>>
>>>> + struct tcp_tap_conn *conn = conns[c];
>>>> + struct tcphdr *th = frames[f + TCP_IOV_PAYLOAD].iov_base;
>>>> + uint32_t seq = ntohl(th->seq);
>>>> +
>>>> + if (SEQ_LE(conn->seq_to_tap, seq))
>>> Isn't this test inverted? We want to rewind seq_to_tap if seq is less
>>> than it, rather than the other way aruond.
>> No. We do 'continue', i.e., nothing, if this condition is fulfilled.
>> This may look a little non-intuitive here, but makes sense when I add the
>> next patch.
> Oh, of course, my mistake.
>
The code now (v7) looks as follows:
/**
* tcp_revert_seq() - Revert affected conn->seq_to_tap after failed
transmission
* @conns: Array of connection pointers corresponding to queued
frames
* @frames: Two-dimensional array containing queued frames with
sub-iovs
* @num_frames: Number of entries in the two arrays to be compared
*/
static void tcp_revert_seq(struct tcp_tap_conn **conns, struct iovec
(*frames)[TCP_NUM_IOVS],
int num_frames)
{
int i;
for (i = 0; i < num_frames; i++) {
struct tcp_tap_conn *conn = conns[i];
struct tcphdr *th = frames[i][TCP_IOV_PAYLOAD].iov_base;
uint32_t seq = ntohl(th->seq);
if (SEQ_LE(conn->seq_to_tap, seq))
continue;
conn->seq_to_tap = seq;
tcp_set_peek_offset(conn->sock, seq -
conn->seq_ack_from_tap);
}
}
/**
* tcp_payload_flush() - Send out buffers for segments with data
* @c: Execution context
*/
static void tcp_payload_flush(const struct ctx *c)
{
size_t m;
m = tap_send_frames(c, &tcp6_l2_iov[0][0], TCP_NUM_IOVS,
tcp6_payload_used);
if (m != tcp6_payload_used) {
tcp_revert_seq(tcp6_frame_conns, &tcp6_l2_iov[m],
tcp6_payload_used - m);
}
tcp6_payload_used = 0;
m = tap_send_frames(c, &tcp4_l2_iov[0][0], TCP_NUM_IOVS,
tcp4_payload_used);
if (m != tcp4_payload_used) {
tcp_revert_seq(tcp4_frame_conns, &tcp4_l2_iov[m],
tcp4_payload_used - m);
}
tcp4_payload_used = 0;
}
Was this the version you were talking about on Monday morning?
Did you spot some bug here which I am missing?
Thanks
///jon
next prev parent reply other threads:[~2024-06-04 17:36 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-15 15:34 [PATCH v4 0/3] Support for SO_PEEK_OFF socket option Jon Maloy
2024-05-15 15:34 ` [PATCH v4 1/3] tcp: move seq_to_tap update to when frame is queued Jon Maloy
2024-05-15 20:20 ` Stefano Brivio
2024-05-16 2:24 ` David Gibson
2024-05-16 2:57 ` Jon Maloy
2024-05-16 4:16 ` David Gibson
2024-06-04 17:36 ` Jon Maloy [this message]
2024-06-04 18:04 ` Jon Maloy
2024-06-04 18:10 ` Stefano Brivio
2024-05-15 15:34 ` [PATCH v4 2/3] tcp: leverage support of SO_PEEK_OFF socket option when available Jon Maloy
2024-05-15 20:22 ` Stefano Brivio
2024-05-16 2:29 ` David Gibson
2024-05-16 3:03 ` Jon Maloy
2024-05-15 15:34 ` [PATCH v4 3/3] tcp: allow retransmit when peer receive window is zero Jon Maloy
2024-05-15 20:24 ` Stefano Brivio
2024-05-15 23:10 ` Jon Maloy
2024-05-16 7:19 ` David Gibson
2024-05-16 11:22 ` Stefano Brivio
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=26391ddb-bea0-4c7a-abab-823c19897133@redhat.com \
--to=jmaloy@redhat.com \
--cc=david@gibson.dropbear.id.au \
--cc=dgibson@redhat.com \
--cc=lvivier@redhat.com \
--cc=passt-dev@passt.top \
--cc=sbrivio@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).