From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by passt.top (Postfix) with ESMTP id 264005A02F3 for ; Fri, 10 May 2024 18:41:30 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1715359288; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Gsr3ECsdOxnTvk2ZcUSW0/y9uglAKafPGzUZzOHoUJs=; b=TRS8Ud1jgPVqSuFc6ZW3JKM7tlklSXGFOEgEa7xsvrmjK7Ma3oLl0xA2vZms6rXhTSw3YC M/ulK0+YHg1NvC2/vr70Fbf0Bejs71U7IBhlVjUMrUa8ZGea70mlbAQ2WjMvR7MH0KQ3P/ cV5obUzMm99//y/f6vRe7b/KxcnHv5M= Received: from mail-ej1-f70.google.com (mail-ej1-f70.google.com [209.85.218.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-517-hKY6YOU7PlqjYila0o4CAg-1; Fri, 10 May 2024 12:41:27 -0400 X-MC-Unique: hKY6YOU7PlqjYila0o4CAg-1 Received: by mail-ej1-f70.google.com with SMTP id a640c23a62f3a-a59caea8836so140265766b.1 for ; Fri, 10 May 2024 09:41:27 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715359285; x=1715964085; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=Gsr3ECsdOxnTvk2ZcUSW0/y9uglAKafPGzUZzOHoUJs=; b=TrqdqPTxhwfuq5HNTSrp4SAcLQtIybK+Lv3Ev6z0fLOueQbw5TEsPTrykRqUC5ITMH I6nLjS25qBy+Nan9/VoG65zFsuAL2O23PfRQuH6PY3YhudTdE4nKbIP3MbssgQHGKKFz 9RzOhqjnK33SkNmgMw8XzL5Q5znB45PLhGMHPtCrHghdURpHpOCUHmu2u5YePCvuK2Cv cTqCCmM8JyYFhFoWdx+vjoVrOX0lHEK/Vb9MOn5F2QIWzHY5JP4YSCMHBYZ1Gab/mlQl J5uMm8qevTsIz9L2moVJZHc2DLAPAJ9SKp4liGxtvRnAMhTHb2UoSjc+JQB1bG8qyECY Vxag== X-Gm-Message-State: AOJu0YxPfA+GKh2QyUF/EcBVUoiZc3yib6qBrO1UANtTX0JC4PrqzSpM MI+7FNFyvfA7FLS8sKScZU4tB5wirwIB18L6QULPZ912+KW5OpMpjej0hHX2nRl/Heu9JU59XeW V6xQBV/1xh7z/OpFohyboCeCyOqMcywAcH2huNtRa5O1KKNfRIIlZcp3sK8fbsTDVMGs4aZJN3J 4qxgr1LSHi/FZzA7JjtTXlFke6PSx5ysg2sZo= X-Received: by 2002:a17:907:9710:b0:a5a:2d30:b8c1 with SMTP id a640c23a62f3a-a5a2d54c038mr276656766b.14.1715359285177; Fri, 10 May 2024 09:41:25 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFERHm2ZzIWc6S5/nQJcUYpdrTYwpJmvrxD24mjMZXcjOPLw+wOMnAUltjgGUpin1Om3C9P3A== X-Received: by 2002:a17:907:9710:b0:a5a:2d30:b8c1 with SMTP id a640c23a62f3a-a5a2d54c038mr276653566b.14.1715359284475; Fri, 10 May 2024 09:41:24 -0700 (PDT) Received: from maya.cloud.tilaa.com (maya.cloud.tilaa.com. [164.138.29.33]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a5a179c7fcfsm197932366b.119.2024.05.10.09.41.23 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Fri, 10 May 2024 09:41:23 -0700 (PDT) Date: Fri, 10 May 2024 18:40:30 +0200 From: Stefano Brivio To: Jon Maloy Subject: Re: [PATCH] tcp: move seq_to_tap update to when frame is queued Message-ID: <20240510184030.44b57a2f@elisabeth> In-Reply-To: <20240509030023.4153802-1-jmaloy@redhat.com> References: <20240509030023.4153802-1-jmaloy@redhat.com> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.36; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: LWRTHQT5PCUM5LOWIYKG6LBLNREYNNEU X-Message-ID-Hash: LWRTHQT5PCUM5LOWIYKG6LBLNREYNNEU X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, lvivier@redhat.com, dgibson@redhat.com X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Wed, 8 May 2024 23:00:23 -0400 Jon Maloy wrote: > commit a469fc393fa1 ("tcp, tap: Don't increase tap-side sequence counter for dropped frames") > delayed update of conn->seq_to_tap until the moment the corresponding > frame has been successfully pushed out. This has the advantage that we > immediately can retransmit a buffer that we fail to trasnmit, rather > than waiting for the peer side to discover the loss and initiate fast > retransmit. It's not really fast retransmit, it's a simple retry of the operation that didn't succeed. We didn't even transmit. > > This approach has turned out to cause a problem with spurious sequence > number updates during peer-initiated retransmits, and we have realized > it may not be the best way to solve te above issue. > > We now restore the previous method, by updating the said field at the > moment a frame is added to the outqueue. To retain the advantage of fast > retansmit Same here. > based on local failure detection, we now scan through the part > of the outqueue that had do be dropped, and restore the sequence counter > for each affected connection to the most appropriate value. > > Signed-off-by: Jon Maloy > --- > tcp.c | 52 ++++++++++++++++++++++++++++++++++++++++++---------- > 1 file changed, 42 insertions(+), 10 deletions(-) > > diff --git a/tcp.c b/tcp.c > index 21d0af0..58fdbc9 100644 > --- a/tcp.c > +++ b/tcp.c > @@ -412,11 +412,13 @@ static union inany_addr low_rtt_dst[LOW_RTT_TABLE_SIZE]; > > /** > * tcp_buf_seq_update - Sequences to update with length of frames once sent This is not the case anymore, maybe: * tcp_conn_old_seq() - Old sequence numbers for connections with pending frames > - * @seq: Pointer to sequence number sent to tap-side, to be updated > + * @conn: Pointer to connection corresponding to frame. May need update Mixed whitespace and tabs. It looks like the connection pointer might need to be updated... what about: * @conn: Pointer to connection for this frame ? > + * @seq: Sequence number of the corresponding frame > * @len: TCP payload length The length is not needed anymore. > */ > struct tcp_buf_seq_update { > - uint32_t *seq; > + struct tcp_tap_conn *conn; > + uint32_t seq; > uint16_t len; > }; > > @@ -1261,25 +1263,52 @@ static void tcp_flags_flush(const struct ctx *c) > tcp4_flags_used = 0; > } > > +/** > + * tcp_revert_seq() - Revert affected conn->seq_to_tap after failed transmission > + * @seq_update: Array with connection and sequence number data > + * @s: Entry corresponding to first dropped frame > + * @e: Entry corresponding to last dropped frame These are not pointer to the entries, though. They are indices of the queued frames. > + */ > +static void tcp_revert_seq(struct tcp_buf_seq_update *seq_update, int s, int e) > +{ > + struct tcp_tap_conn *conn; > + uint32_t lowest_seq; > + int i, ii; > + > + for (i = s; i < e; i++) { > + conn = seq_update[i].conn; > + lowest_seq = seq_update[i].seq; > + > + for (ii = i + 1; ii < e; ii++) { > + if (seq_update[ii].conn != conn) > + continue; > + if (SEQ_GT(lowest_seq, seq_update[ii].seq)) > + lowest_seq = seq_update[ii].seq; > + } If I recall correctly, David suggested a simpler approach that avoids this O(n^2) scan, based on the observation that 1. the first entry you find in the table also has the lowest sequence number (we don't send frames out-of-order), and that 2. you'll never revert to a higher sequence number (the two lines below take care of that). That is, you could just scan the table once, and if you find a sequence number that's lower than the current sequence stored for the connection, store it. > + > + if (SEQ_GT(conn->seq_to_tap, lowest_seq)) > + conn->seq_to_tap = lowest_seq; > + } > +} > + > /** > * tcp_payload_flush() - Send out buffers for segments with data > * @c: Execution context > */ > static void tcp_payload_flush(const struct ctx *c) > { > - unsigned i; > size_t m; > > m = tap_send_frames(c, &tcp6_l2_iov[0][0], TCP_NUM_IOVS, > tcp6_payload_used); > - for (i = 0; i < m; i++) > - *tcp6_seq_update[i].seq += tcp6_seq_update[i].len; > + if (m != tcp6_payload_used) > + tcp_revert_seq(tcp6_seq_update, m, tcp6_payload_used); > tcp6_payload_used = 0; > > m = tap_send_frames(c, &tcp4_l2_iov[0][0], TCP_NUM_IOVS, > tcp4_payload_used); > - for (i = 0; i < m; i++) > - *tcp4_seq_update[i].seq += tcp4_seq_update[i].len; > + if (m != tcp4_payload_used) > + tcp_revert_seq(tcp4_seq_update, m, tcp4_payload_used); > tcp4_payload_used = 0; > } > > @@ -2129,10 +2158,11 @@ static int tcp_sock_consume(const struct tcp_tap_conn *conn, uint32_t ack_seq) > static void tcp_data_to_tap(const struct ctx *c, struct tcp_tap_conn *conn, > ssize_t dlen, int no_csum, uint32_t seq) > { > - uint32_t *seq_update = &conn->seq_to_tap; > struct iovec *iov; > size_t l4len; > > + conn->seq_to_tap = seq; This is the sequence number for the frame we're sending (start of this frame), but not the current byte sequence sent to the "tap" (end of this frame), which would be seq + dlen, I think. > + > if (CONN_V4(conn)) { > struct iovec *iov_prev = tcp4_l2_iov[tcp4_payload_used - 1]; > const uint16_t *check = NULL; > @@ -2142,7 +2172,8 @@ static void tcp_data_to_tap(const struct ctx *c, struct tcp_tap_conn *conn, > check = &iph->check; > } > > - tcp4_seq_update[tcp4_payload_used].seq = seq_update; > + tcp4_seq_update[tcp4_payload_used].conn = conn; > + tcp4_seq_update[tcp4_payload_used].seq = seq; > tcp4_seq_update[tcp4_payload_used].len = dlen; > > iov = tcp4_l2_iov[tcp4_payload_used++]; > @@ -2151,7 +2182,8 @@ static void tcp_data_to_tap(const struct ctx *c, struct tcp_tap_conn *conn, > if (tcp4_payload_used > TCP_FRAMES_MEM - 1) > tcp_payload_flush(c); > } else if (CONN_V6(conn)) { > - tcp6_seq_update[tcp6_payload_used].seq = seq_update; > + tcp6_seq_update[tcp6_payload_used].conn = conn; > + tcp6_seq_update[tcp6_payload_used].seq = seq; > tcp6_seq_update[tcp6_payload_used].len = dlen; > > iov = tcp6_l2_iov[tcp6_payload_used++]; -- Stefano