From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=dJKVOiD9; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTPS id BB90D5A0271 for ; Mon, 05 May 2025 21:55:25 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1746474924; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=OaU7xpouo1jQ47lhM8DuSlNTHq/39/A+ojyW7Q4qnBk=; b=dJKVOiD9t2UyZwoaq90RfEsnOFnYmRqs2k4TUv1sRKLKGEEnhGf2DZTBu9aSto6st7uBiY PuwK3ZjQfXmJeLrLQ0SrsARadm4M/bfU6oMutnZylc3wZd2FZJWbW1+Ynsf2TeMRX0g0xM 1Lk2x0u5eYiUsDwAwnEMjkEE8EZID2w= Received: from mail-wr1-f70.google.com (mail-wr1-f70.google.com [209.85.221.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-470-gXY9prb4PuqJ2_qp1P6rew-1; Mon, 05 May 2025 15:55:23 -0400 X-MC-Unique: gXY9prb4PuqJ2_qp1P6rew-1 X-Mimecast-MFC-AGG-ID: gXY9prb4PuqJ2_qp1P6rew_1746474922 Received: by mail-wr1-f70.google.com with SMTP id ffacd0b85a97d-39c1b1c0969so2892369f8f.1 for ; Mon, 05 May 2025 12:55:23 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1746474922; x=1747079722; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=OaU7xpouo1jQ47lhM8DuSlNTHq/39/A+ojyW7Q4qnBk=; b=V+2okzapL91FeD5cQ08H5khixR1sRF3gPJKC1CTJrb/igG3tK61+silO7lbxGieEam qAHmCGKjJStkiFt7g+r1tVCm+2RPq38KMirJRM5sRbMBmKJjPx+6+443hgXHRHrBbWGt 9Zbnaa6urs3iXAeYzbH6Rls9FOBnMxORWnCecauIPHuLiD+khZN8i+6QvNQn0dei78rC y5poqcj4Ax50NM34FH/bxzKXRzdC7VUuCrN1wrsT6n8phIK0MhL+ArvK9I0aZaSApDFO FT5OJs0usSjInqBi4pEelUETtV74zEc4r8Epwt45x0H7LzdXuJPNXJDuuRl0pqtpF1Q4 jj0g== X-Gm-Message-State: AOJu0YyPCqL6CwYLzmdFC2lbIMhyStsx0MNrFYBYvYB586nB1w2TVebo A0dINSGW7CDroaN+B6q/aIUQNt3PbA8bsmEPdXv+ALjQlEjPRFtCDj3v4eOAwEhpKZmuz1SfbRQ Q4YTtVZFbXInpHoQHZ7gqNdmRPyEMwIRl/tjQk/fei35WNb6Ad/Dc5b08Vw== X-Gm-Gg: ASbGncsHYyjQDiNM+Q07rV0a0o6yTRXx/yQ3c1eGWG4478YfhoajV0seCUQaYKzjhom NB8dQnHfThNRVzx1PnI2x67WyVXURHBo4Fbd1Rjntqdv56XejQOV7LVx/vBQL/NekmYk2s4kRPd Ln9LceMbkd1ygcvwM6pR5H95RX8pJYim6mvAEGMXqMKgu3xaVxscz7cCt7m47uKjgYC8zw3wyiK 4MHyBPxfo4z5+jXdOZPiqee7nS6nLrfFRIHxdJ0fqgnJvgO0EQePsk35uLQZBHaZYkKaODxU6GZ YjuFMqQLqzwuUG4Rc8TOdDY= X-Received: by 2002:a5d:64ce:0:b0:3a0:9dfc:da4 with SMTP id ffacd0b85a97d-3a0ac1ff692mr167782f8f.42.1746474921807; Mon, 05 May 2025 12:55:21 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFffxex5TVjnesEE/J74XAU4De6aY7Gu7UeyLc6DFpFk6ZvxWU02AIJv6J3lj10xQBRnnrX/A== X-Received: by 2002:a5d:64ce:0:b0:3a0:9dfc:da4 with SMTP id ffacd0b85a97d-3a0ac1ff692mr167771f8f.42.1746474921396; Mon, 05 May 2025 12:55:21 -0700 (PDT) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [2a10:fc81:a806:d6a9::1]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-3a099ae7cc6sm11541825f8f.55.2025.05.05.12.55.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 05 May 2025 12:55:21 -0700 (PDT) Date: Mon, 5 May 2025 21:55:19 +0200 From: Stefano Brivio To: David Gibson Subject: Re: [PATCH 04/12] udp: Don't bother to batch datagrams from "listening" socket Message-ID: <20250505215519.6d745232@elisabeth> In-Reply-To: <20250404101542.3729316-5-david@gibson.dropbear.id.au> References: <20250404101542.3729316-1-david@gibson.dropbear.id.au> <20250404101542.3729316-5-david@gibson.dropbear.id.au> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.49; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: Nv__ShfFH3seSYSXwzC5haf_fFPW6CzNSY8XH5n0KJg_1746474922 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: DKL4SYECFO2VWEQKFNKEM5DIBWJ2H6TR X-Message-ID-Hash: DKL4SYECFO2VWEQKFNKEM5DIBWJ2H6TR X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Fri, 4 Apr 2025 21:15:34 +1100 David Gibson wrote: > A "listening" UDP socket can receive datagrams from multiple flows. So, > we currently have some quite subtle and complex code in > udp_buf_listen_sock_data() to group contiguously received packets for the > same flow into batches for forwarding. > > However, since we are now always using flow specific connect()ed sockets > once a flow is established, handling of datagrams on listening sockets is > essentially a slow path. Given that, it's not worth the complexity. > Substantially simplify the code by using an approach more like vhost-user, > and "peeking" at the address of the next datagram, one at a time to > determine the correct flow before we actually receive the data, > > This removes all meaningful use of the s_in and tosidx fields in > udp_meta_t, so they too can be removed, along with setting of msg_name and > msg_namelen in the msghdr arrays which referenced them. > > Signed-off-by: David Gibson > --- > udp.c | 81 ++++++++++++++++------------------------------------------- > 1 file changed, 22 insertions(+), 59 deletions(-) > > diff --git a/udp.c b/udp.c > index 6b72c30f..4444d762 100644 > --- a/udp.c > +++ b/udp.c > @@ -138,20 +138,15 @@ static struct ethhdr udp4_eth_hdr; > static struct ethhdr udp6_eth_hdr; > > /** > - * struct udp_meta_t - Pre-cooked headers and metadata for UDP packets > + * struct udp_meta_t - Pre-cooked headers for UDP packets > * @ip6h: Pre-filled IPv6 header (except for payload_len and addresses) > * @ip4h: Pre-filled IPv4 header (except for tot_len and saddr) > * @taph: Tap backend specific header > - * @s_in: Source socket address, filled in by recvmmsg() > - * @tosidx: sidx for the destination side of this datagram's flow > */ > static struct udp_meta_t { > struct ipv6hdr ip6h; > struct iphdr ip4h; > struct tap_hdr taph; > - > - union sockaddr_inany s_in; > - flow_sidx_t tosidx; > } > #ifdef __AVX2__ > __attribute__ ((aligned(32))) > @@ -234,8 +229,6 @@ static void udp_iov_init_one(const struct ctx *c, size_t i) > tiov[UDP_IOV_TAP] = tap_hdr_iov(c, &meta->taph); > tiov[UDP_IOV_PAYLOAD].iov_base = payload; > > - mh->msg_name = &meta->s_in; > - mh->msg_namelen = sizeof(meta->s_in); > mh->msg_iov = siov; > mh->msg_iovlen = 1; > } > @@ -687,60 +680,32 @@ static int udp_sock_recv(const struct ctx *c, int s, struct mmsghdr *mmh, int n) > static void udp_buf_listen_sock_data(const struct ctx *c, union epoll_ref ref, > const struct timespec *now) > { > - const socklen_t sasize = sizeof(udp_meta[0].s_in); > - /* See udp_buf_sock_data() comment */ > - int n = (c->mode == MODE_PASTA ? 1 : UDP_MAX_FRAMES), i; > - > - if ((n = udp_sock_recv(c, ref.fd, udp_mh_recv, n)) <= 0) > - return; > - > - /* We divide datagrams into batches based on how we need to send them, > - * determined by udp_meta[i].tosidx. To avoid either two passes through > - * the array, or recalculating tosidx for a single entry, we have to > - * populate it one entry *ahead* of the loop counter. > - */ > - udp_meta[0].tosidx = udp_flow_from_sock(c, ref, &udp_meta[0].s_in, now); > - udp_mh_recv[0].msg_hdr.msg_namelen = sasize; > - for (i = 0; i < n; ) { > - flow_sidx_t batchsidx = udp_meta[i].tosidx; > - uint8_t batchpif = pif_at_sidx(batchsidx); > - int batchstart = i; > - > - do { > - if (pif_is_socket(batchpif)) { > - udp_splice_prepare(udp_mh_recv, i); > - } else if (batchpif == PIF_TAP) { > - udp_tap_prepare(udp_mh_recv, i, > - flowside_at_sidx(batchsidx), > - false); > - } > - > - if (++i >= n) > - break; > - > - udp_meta[i].tosidx = udp_flow_from_sock(c, ref, > - &udp_meta[i].s_in, > - now); > - udp_mh_recv[i].msg_hdr.msg_namelen = sasize; > - } while (flow_sidx_eq(udp_meta[i].tosidx, batchsidx)); > - > - if (pif_is_socket(batchpif)) { > - udp_splice_send(c, batchstart, i - batchstart, > - batchsidx); > - } else if (batchpif == PIF_TAP) { > - tap_send_frames(c, &udp_l2_iov[batchstart][0], > - UDP_NUM_IOVS, i - batchstart); > - } else if (flow_sidx_valid(batchsidx)) { > - flow_sidx_t fromsidx = flow_sidx_opposite(batchsidx); > - struct udp_flow *uflow = udp_at_sidx(batchsidx); > + union sockaddr_inany src; > + > + while (udp_peek_addr(ref.fd, &src) == 0) { > + flow_sidx_t tosidx = udp_flow_from_sock(c, ref, &src, now); > + uint8_t topif = pif_at_sidx(tosidx); > + > + if (udp_sock_recv(c, ref.fd, udp_mh_recv, 1) <= 0) > + break; > + > + if (pif_is_socket(topif)) { > + udp_splice_prepare(udp_mh_recv, 0); > + udp_splice_send(c, 0, 1, tosidx); > + } else if (topif == PIF_TAP) { > + udp_tap_prepare(udp_mh_recv, 0, flowside_at_sidx(tosidx), > + false); > + tap_send_frames(c, &udp_l2_iov[0][0], UDP_NUM_IOVS, 1); > + } else if (flow_sidx_valid(tosidx)) { > + flow_sidx_t fromsidx = flow_sidx_opposite(tosidx); > + struct udp_flow *uflow = udp_at_sidx(tosidx); For the record: https://github.com/containers/podman/issues/26073 *might* come from this as it adds a udp_at_sidx() usage that I'm not quite sure about. Just a vague suspicion at the moment... -- Stefano