From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by passt.top (Postfix) with ESMTP id 7F13B5A02AE for ; Sat, 11 May 2024 17:20:12 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1715440811; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dLHn/wPRtJGVYJ67If952ZGkB5NAteV9IRG0Yw24bLE=; b=ZsE3Slyo0xSxdNeq78R5wI96Pm+OmvZEWy4DCU8bGEE8RvbBIBNWxH+ItjnMtCvgysjFJy aZ7kJCzlMTKWFZyyrKd9GDsaEaiBU7GG5rztmu5Mun7CaJb254AAAk7xECFx79tMLWB21G DAuvt8R5xj6x/137izV+xkMEKBaAm2I= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-373-WcolqXXNMJiBTnbkm1WFyQ-1; Sat, 11 May 2024 11:20:10 -0400 X-MC-Unique: WcolqXXNMJiBTnbkm1WFyQ-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id CD140812296 for ; Sat, 11 May 2024 15:20:09 +0000 (UTC) Received: from fenrir.redhat.com (unknown [10.22.33.114]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6E1045751CF; Sat, 11 May 2024 15:20:09 +0000 (UTC) From: Jon Maloy To: passt-dev@passt.top, sbrivio@redhat.com, lvivier@redhat.com, dgibson@redhat.com, jmaloy@redhat.com Subject: [PATCH v3 1/3] tcp: move seq_to_tap update to when frame is queued Date: Sat, 11 May 2024 11:20:06 -0400 Message-ID: <20240511152008.421750-2-jmaloy@redhat.com> In-Reply-To: <20240511152008.421750-1-jmaloy@redhat.com> References: <20240511152008.421750-1-jmaloy@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.9 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset="US-ASCII"; x-default=true Message-ID-Hash: TUXIY7TO7TJCPXSCHOVMQNKVCKHMUMLK X-Message-ID-Hash: TUXIY7TO7TJCPXSCHOVMQNKVCKHMUMLK X-MailFrom: jmaloy@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: commit a469fc393fa1 ("tcp, tap: Don't increase tap-side sequence counter for dropped frames") delayed update of conn->seq_to_tap until the moment the corresponding frame has been successfully pushed out. This has the advantage that we immediately can make a new attempt to transmit a frame after a failed trasnmit, rather than waiting for the peer to later discover a gap and trigger the fast retransmit mechanism to solve the problem. This approach has turned out to cause a problem with spurious sequence number updates during peer-initiated retransmits, and we have realized it may not be the best way to solve the above issue. We now restore the previous method, by updating the said field at the moment a frame is added to the outqueue. To retain the advantage of having a quick re-attempt based on local failure detection, we now scan through the part of the outqueue that had do be dropped, and restore the sequence counter for each affected connection to the most appropriate value. Signed-off-by: Jon Maloy --- v2: - Re-spun loop in tcp_revert_seq() and some other changes based on feedback from Stefano Brivio. - Added paranoid test to avoid that seq_to_tap becomes lower than seq_ack_from_tap. --- tcp.c | 63 ++++++++++++++++++++++++++++++++++++++++++----------------- 1 file changed, 45 insertions(+), 18 deletions(-) diff --git a/tcp.c b/tcp.c index 21d0af0..21cbfba 100644 --- a/tcp.c +++ b/tcp.c @@ -411,13 +411,14 @@ static int tcp_sock_ns [NUM_PORTS][IP_VERSIONS]; static union inany_addr low_rtt_dst[LOW_RTT_TABLE_SIZE]; /** - * tcp_buf_seq_update - Sequences to update with length of frames once sent - * @seq: Pointer to sequence number sent to tap-side, to be updated - * @len: TCP payload length + * tcp_frame_ref - References needed by queued frames in case we need + * to revert corresponding connection sequence numbers + * @conn: Pointer to connection for this frame + * @seq: Sequence number of the corresponding frame */ -struct tcp_buf_seq_update { - uint32_t *seq; - uint16_t len; +struct tcp_frame_ref { + struct tcp_tap_conn *conn; + uint32_t seq; }; /* Static buffers */ @@ -461,7 +462,7 @@ static struct tcp_payload_t tcp4_payload[TCP_FRAMES_MEM]; static_assert(MSS4 <= sizeof(tcp4_payload[0].data), "MSS4 is greater than 65516"); -static struct tcp_buf_seq_update tcp4_seq_update[TCP_FRAMES_MEM]; +static struct tcp_frame_ref tcp4_frame_ref[TCP_FRAMES_MEM]; static unsigned int tcp4_payload_used; static struct tap_hdr tcp4_flags_tap_hdr[TCP_FRAMES_MEM]; @@ -483,7 +484,7 @@ static struct tcp_payload_t tcp6_payload[TCP_FRAMES_MEM]; static_assert(MSS6 <= sizeof(tcp6_payload[0].data), "MSS6 is greater than 65516"); -static struct tcp_buf_seq_update tcp6_seq_update[TCP_FRAMES_MEM]; +static struct tcp_frame_ref tcp6_frame_ref[TCP_FRAMES_MEM]; static unsigned int tcp6_payload_used; static struct tap_hdr tcp6_flags_tap_hdr[TCP_FRAMES_MEM]; @@ -1261,25 +1262,50 @@ static void tcp_flags_flush(const struct ctx *c) tcp4_flags_used = 0; } +/** + * tcp_revert_seq() - Revert affected conn->seq_to_tap after failed transmission + * @frames_ref: Array with connection and sequence number data + * @first: Index of entry corresponding to first dropped frame + * @last: Index of entry corresponding to last dropped frame + */ +static void tcp_revert_seq(struct tcp_frame_ref *frame_ref, int first, int last) +{ + struct tcp_tap_conn *conn; + int i; + + for (i = first; i <= last; i++) { + conn = frame_ref[i].conn; + + if (SEQ_LE(conn->seq_to_tap, frame_ref[i].seq)) + continue; + + conn->seq_to_tap = frame_ref[i].seq; + + if (SEQ_GE(conn->seq_to_tap, conn->seq_ack_from_tap)) + continue; + + conn->seq_to_tap = conn->seq_ack_from_tap; + } +} + /** * tcp_payload_flush() - Send out buffers for segments with data * @c: Execution context */ static void tcp_payload_flush(const struct ctx *c) { - unsigned i; size_t m; m = tap_send_frames(c, &tcp6_l2_iov[0][0], TCP_NUM_IOVS, tcp6_payload_used); - for (i = 0; i < m; i++) - *tcp6_seq_update[i].seq += tcp6_seq_update[i].len; + if (m != tcp6_payload_used) + tcp_revert_seq(tcp6_frame_ref, m, tcp6_payload_used - 1); tcp6_payload_used = 0; m = tap_send_frames(c, &tcp4_l2_iov[0][0], TCP_NUM_IOVS, tcp4_payload_used); - for (i = 0; i < m; i++) - *tcp4_seq_update[i].seq += tcp4_seq_update[i].len; + if (m != tcp4_payload_used) + tcp_revert_seq(tcp4_frame_ref, m, tcp4_payload_used - 1); tcp4_payload_used = 0; } @@ -2129,10 +2155,11 @@ static int tcp_sock_consume(const struct tcp_tap_conn *conn, uint32_t ack_seq) static void tcp_data_to_tap(const struct ctx *c, struct tcp_tap_conn *conn, ssize_t dlen, int no_csum, uint32_t seq) { - uint32_t *seq_update = &conn->seq_to_tap; struct iovec *iov; size_t l4len; + conn->seq_to_tap = seq + dlen; + if (CONN_V4(conn)) { struct iovec *iov_prev = tcp4_l2_iov[tcp4_payload_used - 1]; const uint16_t *check = NULL; @@ -2142,8 +2169,8 @@ static void tcp_data_to_tap(const struct ctx *c, struct tcp_tap_conn *conn, check = &iph->check; } - tcp4_seq_update[tcp4_payload_used].seq = seq_update; - tcp4_seq_update[tcp4_payload_used].len = dlen; + tcp4_frame_ref[tcp4_payload_used].conn = conn; + tcp4_frame_ref[tcp4_payload_used].seq = seq; iov = tcp4_l2_iov[tcp4_payload_used++]; l4len = tcp_l2_buf_fill_headers(c, conn, iov, dlen, check, seq); @@ -2151,8 +2178,8 @@ static void tcp_data_to_tap(const struct ctx *c, struct tcp_tap_conn *conn, if (tcp4_payload_used > TCP_FRAMES_MEM - 1) tcp_payload_flush(c); } else if (CONN_V6(conn)) { - tcp6_seq_update[tcp6_payload_used].seq = seq_update; - tcp6_seq_update[tcp6_payload_used].len = dlen; + tcp6_frame_ref[tcp6_payload_used].conn = conn; + tcp6_frame_ref[tcp6_payload_used].seq = seq; iov = tcp6_l2_iov[tcp6_payload_used++]; l4len = tcp_l2_buf_fill_headers(c, conn, iov, dlen, NULL, seq); -- 2.42.0