From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTP id 5105D5A02C0 for ; Fri, 10 May 2024 21:40:46 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1715370045; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=E3itKL2/IHPreDTF3EdXbtpMJ4uFYQwFY0Hmt899BKY=; b=R25gTbR/ysRUM9XR5XHJ9Zngy3XlVmeHyhwlHhNmfrhwpfxozNOSjM8UM8YPTCgoVSOPua PkhEucR099/9ceXdQwwQeqLBPSSAGJg5bC5/wZJtYH2gQnI/Zadi0FPTg0uyIS5nQ0p4wr 7NqS2rxnit9qyaGC822O5aXJFg+1p4g= Received: from mail-qv1-f71.google.com (mail-qv1-f71.google.com [209.85.219.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-688-b5BVnMM4Puitgkp9IMSWrA-1; Fri, 10 May 2024 15:40:43 -0400 X-MC-Unique: b5BVnMM4Puitgkp9IMSWrA-1 Received: by mail-qv1-f71.google.com with SMTP id 6a1803df08f44-69649f1894dso41738046d6.0 for ; Fri, 10 May 2024 12:40:43 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715370043; x=1715974843; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=E3itKL2/IHPreDTF3EdXbtpMJ4uFYQwFY0Hmt899BKY=; b=QZkFd2gwuPhVH+s5wuo+uM61yqoc6o+OxweMbPIjKgYfLmDuRJU25L0Mgo2NLpxUAa bzYv5iN+cCK1huSnQdqGRwNU8X57EwtiRlcFCtra/oJ5pjderIgAoxySh1R3SWiBgBIF 6pGCPpx6ApvZPc4GiH9n+6K3WeB62QTM4AAfU2edzyAgAmHGcqTJPZXaJ5VzJSrqDPkc 2MoaQBU1zvhLPdzxy9VA3hjGPjBNY06RvNuFPUZl6uUTot92nsHmX7eIwuza7um8DdkY keNDQB7NyT2vWsx/GSLifc1KRPRK3cewcTvPvBW7KVWU3RLrTjv4pQFZPKy49ythh31X sUjA== X-Gm-Message-State: AOJu0Yx6G0aCbHlm/OQmvTYkQfAjBLdUqDB9rhmQkTeo6qGHW/lPnXl9 EapAljDetKabmXkvo6D8L8IiJAZbgCoCiKLFPgGo6MUdXt9hSCz7jDWa+Cai3SUvR29fr92jpR1 k4K3ApLRrVT1gjrNYjUskdCr5kp1bEfjttDUcHaL6bfHc3/OEkw== X-Received: by 2002:a05:6214:4908:b0:6a0:b905:97b1 with SMTP id 6a1803df08f44-6a16823f9b2mr46861376d6.51.1715370043292; Fri, 10 May 2024 12:40:43 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFrexf/W5N6fQnaULBO4i/ms33AbN5iUHosq8OwbuV93GdDB7usi0feCqc9dWExrT6itYIhQg== X-Received: by 2002:a05:6214:4908:b0:6a0:b905:97b1 with SMTP id 6a1803df08f44-6a16823f9b2mr46861086d6.51.1715370042811; Fri, 10 May 2024 12:40:42 -0700 (PDT) Received: from [10.0.0.97] ([24.225.234.80]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6a15f1797e0sm19852556d6.27.2024.05.10.12.40.42 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 10 May 2024 12:40:42 -0700 (PDT) Message-ID: <51413033-7a19-bf44-39da-980e826de896@redhat.com> Date: Fri, 10 May 2024 15:40:41 -0400 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.15.1 Subject: Re: [PATCH] tcp: move seq_to_tap update to when frame is queued To: Stefano Brivio References: <20240509030023.4153802-1-jmaloy@redhat.com> <20240510184030.44b57a2f@elisabeth> From: Jon Maloy In-Reply-To: <20240510184030.44b57a2f@elisabeth> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Message-ID-Hash: RWGIWZ5GC7DG7WJYJRWLJRWPIEHBL3DY X-Message-ID-Hash: RWGIWZ5GC7DG7WJYJRWLJRWPIEHBL3DY X-MailFrom: jmaloy@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, lvivier@redhat.com, dgibson@redhat.com X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On 2024-05-10 12:40, Stefano Brivio wrote: > On Wed, 8 May 2024 23:00:23 -0400 > Jon Maloy wrote: > >> commit a469fc393fa1 ("tcp, tap: Don't increase tap-side sequence counter for dropped frames") >> delayed update of conn->seq_to_tap until the moment the corresponding >> frame has been successfully pushed out. This has the advantage that we >> immediately can retransmit a buffer that we fail to trasnmit, rather >> than waiting for the peer side to discover the loss and initiate fast >> retransmit. > It's not really fast retransmit, it's a simple retry of the operation > that didn't succeed. We didn't even transmit. Ok >> This approach has turned out to cause a problem with spurious sequence >> number updates during peer-initiated retransmits, and we have realized >> it may not be the best way to solve te above issue. >> >> We now restore the previous method, by updating the said field at the >> moment a frame is added to the outqueue. To retain the advantage of fast >> retansmit > Same here. > >> based on local failure detection, we now scan through the part >> of the outqueue that had do be dropped, and restore the sequence counter >> for each affected connection to the most appropriate value. >> >> Signed-off-by: Jon Maloy >> --- >> tcp.c | 52 ++++++++++++++++++++++++++++++++++++++++++---------- >> 1 file changed, 42 insertions(+), 10 deletions(-) >> >> diff --git a/tcp.c b/tcp.c >> index 21d0af0..58fdbc9 100644 >> --- a/tcp.c >> +++ b/tcp.c >> @@ -412,11 +412,13 @@ static union inany_addr low_rtt_dst[LOW_RTT_TABLE_SIZE]; >> >> /** >> * tcp_buf_seq_update - Sequences to update with length of frames once sent > This is not the case anymore, maybe: > > * tcp_conn_old_seq() - Old sequence numbers for connections with pending frames ok >> - * @seq: Pointer to sequence number sent to tap-side, to be updated >> + * @conn: Pointer to connection corresponding to frame. May need update > Mixed whitespace and tabs. It looks like the connection pointer might > need to be updated... what about: > > * @conn: Pointer to connection for this frame > > ? > >> + * @seq: Sequence number of the corresponding frame >> * @len: TCP payload length > The length is not needed anymore. Yes. Of course ;-( >> */ >> struct tcp_buf_seq_update { >> - uint32_t *seq; >> + struct tcp_tap_conn *conn; >> + uint32_t seq; >> uint16_t len; >> }; >> >> @@ -1261,25 +1263,52 @@ static void tcp_flags_flush(const struct ctx *c) >> tcp4_flags_used = 0; >> } >> >> +/** >> + * tcp_revert_seq() - Revert affected conn->seq_to_tap after failed transmission >> + * @seq_update: Array with connection and sequence number data >> + * @s: Entry corresponding to first dropped frame >> + * @e: Entry corresponding to last dropped frame > These are not pointer to the entries, though. They are indices of the > queued frames. I had already fixed that. > >> + */ >> +static void tcp_revert_seq(struct tcp_buf_seq_update *seq_update, int s, int e) >> +{ >> + struct tcp_tap_conn *conn; >> + uint32_t lowest_seq; >> + int i, ii; >> + >> + for (i = s; i < e; i++) { >> + conn = seq_update[i].conn; >> + lowest_seq = seq_update[i].seq; >> + >> + for (ii = i + 1; ii < e; ii++) { >> + if (seq_update[ii].conn != conn) >> + continue; >> + if (SEQ_GT(lowest_seq, seq_update[ii].seq)) >> + lowest_seq = seq_update[ii].seq; >> + } > If I recall correctly, David suggested a simpler approach that avoids > this O(n^2) scan, based on the observation that 1. the first entry you > find in the table also has the lowest sequence number (we don't send > frames out-of-order), Not so sure about that. We can be in the middle of retransmit. Of course, if I continue to flush the queue just before retransmit, which I didn't intend to do, this will be true. > and that 2. you'll never revert to a higher > sequence number (the two lines below take care of that). > > That is, you could just scan the table once, and if you find a sequence > number that's lower than the current sequence stored for the connection, > store it. Yes, I can do that, and it will work even without flushing the queue. I missed that aspect of David's description. > >> + >> + if (SEQ_GT(conn->seq_to_tap, lowest_seq)) >> + conn->seq_to_tap = lowest_seq; >> + } >> +} >> + >> /** >> * tcp_payload_flush() - Send out buffers for segments with data >> * @c: Execution context >> */ >> static void tcp_payload_flush(const struct ctx *c) >> { >> - unsigned i; >> size_t m; >> >> m = tap_send_frames(c, &tcp6_l2_iov[0][0], TCP_NUM_IOVS, >> tcp6_payload_used); >> - for (i = 0; i < m; i++) >> - *tcp6_seq_update[i].seq += tcp6_seq_update[i].len; >> + if (m != tcp6_payload_used) >> + tcp_revert_seq(tcp6_seq_update, m, tcp6_payload_used); >> tcp6_payload_used = 0; >> >> m = tap_send_frames(c, &tcp4_l2_iov[0][0], TCP_NUM_IOVS, >> tcp4_payload_used); >> - for (i = 0; i < m; i++) >> - *tcp4_seq_update[i].seq += tcp4_seq_update[i].len; >> + if (m != tcp4_payload_used) >> + tcp_revert_seq(tcp4_seq_update, m, tcp4_payload_used); >> tcp4_payload_used = 0; >> } >> >> @@ -2129,10 +2158,11 @@ static int tcp_sock_consume(const struct tcp_tap_conn *conn, uint32_t ack_seq) >> static void tcp_data_to_tap(const struct ctx *c, struct tcp_tap_conn *conn, >> ssize_t dlen, int no_csum, uint32_t seq) >> { >> - uint32_t *seq_update = &conn->seq_to_tap; >> struct iovec *iov; >> size_t l4len; >> >> + conn->seq_to_tap = seq; > This is the sequence number for the frame we're sending (start of this > frame), but not the current byte sequence sent to the "tap" (end of > this frame), which would be seq + dlen, I think. Already noticed during my testing and fixed. Strangely enough, it still worked well for a while :-( > >> + >> if (CONN_V4(conn)) { >> struct iovec *iov_prev = tcp4_l2_iov[tcp4_payload_used - 1]; >> const uint16_t *check = NULL; >> @@ -2142,7 +2172,8 @@ static void tcp_data_to_tap(const struct ctx *c, struct tcp_tap_conn *conn, >> check = &iph->check; >> } >> >> - tcp4_seq_update[tcp4_payload_used].seq = seq_update; >> + tcp4_seq_update[tcp4_payload_used].conn = conn; >> + tcp4_seq_update[tcp4_payload_used].seq = seq; >> tcp4_seq_update[tcp4_payload_used].len = dlen; >> >> iov = tcp4_l2_iov[tcp4_payload_used++]; >> @@ -2151,7 +2182,8 @@ static void tcp_data_to_tap(const struct ctx *c, struct tcp_tap_conn *conn, >> if (tcp4_payload_used > TCP_FRAMES_MEM - 1) >> tcp_payload_flush(c); >> } else if (CONN_V6(conn)) { >> - tcp6_seq_update[tcp6_payload_used].seq = seq_update; >> + tcp6_seq_update[tcp6_payload_used].conn = conn; >> + tcp6_seq_update[tcp6_payload_used].seq = seq; >> tcp6_seq_update[tcp6_payload_used].len = dlen; >> >> iov = tcp6_l2_iov[tcp6_payload_used++]; I will fix the loop and repost shortly. ///jon