public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: Jon Maloy <jmaloy@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: passt-dev@passt.top, sbrivio@redhat.com, lvivier@redhat.com,
	dgibson@redhat.com
Subject: Re: [PATCH v4 1/3] tcp: move seq_to_tap update to when frame is queued
Date: Tue, 4 Jun 2024 14:04:20 -0400	[thread overview]
Message-ID: <c0a70f56-9e0f-4e4c-bd64-128ec6c208ef@redhat.com> (raw)
In-Reply-To: <26391ddb-bea0-4c7a-abab-823c19897133@redhat.com>

Hi David,
Found it (not your missing comment, but the bug), and it fixed the problem.
I'll post this patch separately shortly.

///jon


On 2024-06-04 13:36, Jon Maloy wrote:
> Hi David,
> This is the last comment I received from you regarding this patch.
> See below for further comment.
>
> On 2024-05-16 00:16, David Gibson wrote:
>> On Wed, May 15, 2024 at 10:57:06PM -0400, Jon Maloy wrote:
>>>
>>> On 2024-05-15 22:24, David Gibson wrote:
>>>> On Wed, May 15, 2024 at 11:34:27AM -0400, Jon Maloy wrote:
>>>>> commit a469fc393fa1 ("tcp, tap: Don't increase tap-side sequence 
>>>>> counter for dropped frames")
>>>>> delayed update of conn->seq_to_tap until the moment the corresponding
>>>>> frame has been successfully pushed out. This has the advantage 
>>>>> that we
>>>>> immediately can make a new attempt to transmit a frame after a failed
>>>>> trasnmit, rather than waiting for the peer to later discover a gap 
>>>>> and
>>>>> trigger the fast retransmit mechanism to solve the problem.
>>>>>
>>>>> This approach has turned out to cause a problem with spurious 
>>>>> sequence
>>>>> number updates during peer-initiated retransmits, and we have 
>>>>> realized
>>>>> it may not be the best way to solve the above issue.
>>>>>
>>>>> We now restore the previous method, by updating the said field at the
>>>>> moment a frame is added to the outqueue. To retain the advantage of
>>>>> having a quick re-attempt based on local failure detection, we now 
>>>>> scan
>>>>> through the part of the outqueue that had do be dropped, and 
>>>>> restore the
>>>>> sequence counter for each affected connection to the most appropriate
>>>>> value.
>>>>>
>>>>> Signed-off-by: Jon Maloy <jmaloy@redhat.com>
>>>>>
>>>>> ---
>>>>> v2: - Re-spun loop in tcp_revert_seq() and some other changes 
>>>>> based on
>>>>>         feedback from Stefano Brivio.
>>>>>       - Added paranoid test to avoid that seq_to_tap becomes lower 
>>>>> than
>>>>>         seq_ack_from_tap.
>>>>>
>>>>> v3: - Identical to v2. Called v3 because it was embedded in a series
>>>>>         with that version.
>>>>>
>>>>> v4: - In tcp_revert_seq(), we read the sequence number from the TCP
>>>>>         header instead of keeping a copy in struct 
>>>>> tcp_buf_seq_update.
>>>>>       - Since the only remaining field in struct 
>>>>> tcp_buf_seq_update is
>>>>>         a pointer to struct tcp_tap_conn, we eliminate the struct
>>>>>         altogether, and make the tcp6/tcp3_buf_seq_update arrays into
>>>>>         arrays of said pointer.
>>>>>       - Removed 'paranoid' test in tcp_revert_seq. If it happens, it
>>>>>         is not fatal, and will be caught by other code anyway.
>>>>>       - Separated from the series again.
>>>>> ---
>>>>>    tcp.c | 59 
>>>>> +++++++++++++++++++++++++++++++++++++----------------------
>>>>>    1 file changed, 37 insertions(+), 22 deletions(-)
>>>>>
>>>>> diff --git a/tcp.c b/tcp.c
>>>>> index 21d0af0..976dba8 100644
>>>>> --- a/tcp.c
>>>>> +++ b/tcp.c
>>>>> @@ -410,16 +410,6 @@ static int tcp_sock_ns [NUM_PORTS][IP_VERSIONS];
>>>>>     */
>>>>>    static union inany_addr low_rtt_dst[LOW_RTT_TABLE_SIZE];
>>>>> -/**
>>>>> - * tcp_buf_seq_update - Sequences to update with length of frames 
>>>>> once sent
>>>>> - * @seq:    Pointer to sequence number sent to tap-side, to be 
>>>>> updated
>>>>> - * @len:    TCP payload length
>>>>> - */
>>>>> -struct tcp_buf_seq_update {
>>>>> -    uint32_t *seq;
>>>>> -    uint16_t len;
>>>>> -};
>>>>> -
>>>>>    /* Static buffers */
>>>>>    /**
>>>>>     * struct tcp_payload_t - TCP header and data to send segments 
>>>>> with payload
>>>>> @@ -461,7 +451,8 @@ static struct tcp_payload_t 
>>>>> tcp4_payload[TCP_FRAMES_MEM];
>>>>>    static_assert(MSS4 <= sizeof(tcp4_payload[0].data), "MSS4 is 
>>>>> greater than 65516");
>>>>> -static struct tcp_buf_seq_update tcp4_seq_update[TCP_FRAMES_MEM];
>>>>> +/* References tracking the owner connection of frames in the tap 
>>>>> outqueue */
>>>>> +static struct tcp_tap_conn *tcp4_frame_conns[TCP_FRAMES_MEM];
>>>>>    static unsigned int tcp4_payload_used;
>>>>>    static struct tap_hdr tcp4_flags_tap_hdr[TCP_FRAMES_MEM];
>>>>> @@ -483,7 +474,8 @@ static struct tcp_payload_t 
>>>>> tcp6_payload[TCP_FRAMES_MEM];
>>>>>    static_assert(MSS6 <= sizeof(tcp6_payload[0].data), "MSS6 is 
>>>>> greater than 65516");
>>>>> -static struct tcp_buf_seq_update tcp6_seq_update[TCP_FRAMES_MEM];
>>>>> +/* References tracking the owner connection of frames in the tap 
>>>>> outqueue */
>>>>> +static struct tcp_tap_conn *tcp6_frame_conns[TCP_FRAMES_MEM];
>>>>>    static unsigned int tcp6_payload_used;
>>>>>    static struct tap_hdr tcp6_flags_tap_hdr[TCP_FRAMES_MEM];
>>>>> @@ -1261,25 +1253,49 @@ static void tcp_flags_flush(const struct 
>>>>> ctx *c)
>>>>>        tcp4_flags_used = 0;
>>>>>    }
>>>>> +/**
>>>>> + * tcp_revert_seq() - Revert affected conn->seq_to_tap after 
>>>>> failed transmission
>>>>> + * @conns:       Array of connection pointers corresponding to 
>>>>> queued frames
>>>>> + * @frames:      Two-dimensional array containing queued frames 
>>>>> with sub-iovs
>>>> You can make the 2d array explicit in the type as:
>>>>     struct iovec (*frames)[TCP_NUM_IOVS];
>>>> See, for example the 'tap_iov' local in udp_tap_send().   (I recommend
>>>> the command line tool 'cdecl', also available online at cdecl.org for
>>>> working out confusing pointer-to-array types).
>>> Nice. I  wasn't quite happy with this.
>>>>> + * @num_frames:  Number of entries in the two arrays to be compared
>>>>> + */
>>>>> +static void tcp_revert_seq(struct tcp_tap_conn **conns, struct 
>>>>> iovec *frames,
>>>>> +               int num_frames)
>>>>> +{
>>>>> +    int c, f;
>>>>> +
>>>>> +    for (c = 0, f = 0; c < num_frames; c++, f += TCP_NUM_IOVS) {
>>>> Nit: I find having the two parallel counters kind of confusing.  It
>>>> naturally goes away with the type change suggested above, but even
>>>> without that I'd prefer an explicit multiply in the body.  I strongly
>>>> suspect the compiler will be better at working out if the strength
>>>> reduction is worth it.
>>>>
>>>>> +        struct tcp_tap_conn *conn = conns[c];
>>>>> +        struct tcphdr *th = frames[f + TCP_IOV_PAYLOAD].iov_base;
>>>>> +        uint32_t seq = ntohl(th->seq);
>>>>> +
>>>>> +        if (SEQ_LE(conn->seq_to_tap, seq))
>>>> Isn't this test inverted?  We want to rewind seq_to_tap if seq is less
>>>> than it, rather than the other way aruond.
>>> No. We do 'continue', i.e., nothing, if this condition is fulfilled.
>>> This may look a little non-intuitive here, but makes sense when I 
>>> add the
>>> next patch.
>> Oh, of course, my mistake.
>>
>
> The code now (v7) looks as follows:
>
> /**
>  * tcp_revert_seq() - Revert affected conn->seq_to_tap after failed 
> transmission
>  * @conns:       Array of connection pointers corresponding to queued 
> frames
>  * @frames:      Two-dimensional array containing queued frames with 
> sub-iovs
>  * @num_frames:  Number of entries in the two arrays to be compared
>  */
> static void tcp_revert_seq(struct tcp_tap_conn **conns, struct iovec 
> (*frames)[TCP_NUM_IOVS],
>                            int num_frames)
> {
>         int i;
>
>         for (i = 0; i < num_frames; i++) {
>                 struct tcp_tap_conn *conn = conns[i];
>                 struct tcphdr *th = frames[i][TCP_IOV_PAYLOAD].iov_base;
>                 uint32_t seq = ntohl(th->seq);
>
>                 if (SEQ_LE(conn->seq_to_tap, seq))
>                         continue;
>
>                 conn->seq_to_tap = seq;
>                 tcp_set_peek_offset(conn->sock, seq - 
> conn->seq_ack_from_tap);
>         }
> }
>
> /**
>  * tcp_payload_flush() - Send out buffers for segments with data
>  * @c:          Execution context
>  */
> static void tcp_payload_flush(const struct ctx *c)
> {
>         size_t m;
>
>         m = tap_send_frames(c, &tcp6_l2_iov[0][0], TCP_NUM_IOVS,
>                             tcp6_payload_used);
>         if (m != tcp6_payload_used) {
>                 tcp_revert_seq(tcp6_frame_conns, &tcp6_l2_iov[m],
>                                tcp6_payload_used - m);
>         }
>         tcp6_payload_used = 0;
>
>         m = tap_send_frames(c, &tcp4_l2_iov[0][0], TCP_NUM_IOVS,
>                             tcp4_payload_used);
>         if (m != tcp4_payload_used) {
>                 tcp_revert_seq(tcp4_frame_conns, &tcp4_l2_iov[m],
>                                tcp4_payload_used - m);
>         }
>         tcp4_payload_used = 0;
> }
>
> Was this the version you were talking about on Monday morning?
> Did you spot some bug here which I am missing?
>
> Thanks
> ///jon
>


  reply	other threads:[~2024-06-04 18:04 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-15 15:34 [PATCH v4 0/3] Support for SO_PEEK_OFF socket option Jon Maloy
2024-05-15 15:34 ` [PATCH v4 1/3] tcp: move seq_to_tap update to when frame is queued Jon Maloy
2024-05-15 20:20   ` Stefano Brivio
2024-05-16  2:24   ` David Gibson
2024-05-16  2:57     ` Jon Maloy
2024-05-16  4:16       ` David Gibson
2024-06-04 17:36         ` Jon Maloy
2024-06-04 18:04           ` Jon Maloy [this message]
2024-06-04 18:10             ` Stefano Brivio
2024-05-15 15:34 ` [PATCH v4 2/3] tcp: leverage support of SO_PEEK_OFF socket option when available Jon Maloy
2024-05-15 20:22   ` Stefano Brivio
2024-05-16  2:29   ` David Gibson
2024-05-16  3:03     ` Jon Maloy
2024-05-15 15:34 ` [PATCH v4 3/3] tcp: allow retransmit when peer receive window is zero Jon Maloy
2024-05-15 20:24   ` Stefano Brivio
2024-05-15 23:10     ` Jon Maloy
2024-05-16  7:19       ` David Gibson
2024-05-16 11:22       ` Stefano Brivio

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c0a70f56-9e0f-4e4c-bd64-128ec6c208ef@redhat.com \
    --to=jmaloy@redhat.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=dgibson@redhat.com \
    --cc=lvivier@redhat.com \
    --cc=passt-dev@passt.top \
    --cc=sbrivio@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).