From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id 669235A031A for ; Fri, 05 Jul 2024 12:44:24 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202312; t=1720176251; bh=PD6uyFix3EpCFAdCbI8J5k/hm0jxRq3drfFVGcKm2JA=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=bNu1HrAgZ/UZMgqP903cB5VIpO8kBwoNQWntBGjs/WTWhIkSockstEcwVCeCyfX5a iYAbVxcO6mlslmetsL5oaDyjANUGewnkrgWLihftgquJUTAJgqYQPZPpbTTlQKdmOC sNNcI9nr/9ufXVsM3zBJpErpMCU6ETffPAt4K+raqJgQPHdF/QYxkpRTU4LWyjzKKe ZtI/5u3qm4xkZPksSTe/i3Ja1rZXMK0/3PQ+VUpEecZeVl1dz1bsPmxlpy+PQ2fsuz XJDeUz9NVptUgh8u3G99+FvduqdFnLK3ibFpHzpiABkdLERx/PAb07GbWvHE6J7/cR ZTc1ZGmZRW6yQ== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4WFqrW5vN0z4wbh; Fri, 5 Jul 2024 20:44:11 +1000 (AEST) Date: Fri, 5 Jul 2024 19:36:08 +1000 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH 09/11] udp: Consolidate datagram batching Message-ID: References: <20240704045835.1149746-1-david@gibson.dropbear.id.au> <20240704045835.1149746-10-david@gibson.dropbear.id.au> <20240705111045.3ddfa560@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="j/IgTrjBVLQfXsLF" Content-Disposition: inline In-Reply-To: <20240705111045.3ddfa560@elisabeth> Message-ID-Hash: 3TKNLKHW246KNGGHEQG7QSFL6BAAZSB2 X-Message-ID-Hash: 3TKNLKHW246KNGGHEQG7QSFL6BAAZSB2 X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --j/IgTrjBVLQfXsLF Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Jul 05, 2024 at 11:10:45AM +0200, Stefano Brivio wrote: > On Thu, 4 Jul 2024 14:58:33 +1000 > David Gibson wrote: >=20 > > When we receive datagrams on a socket, we need to split them into batch= es > > depending on how they need to be forwarded (either via a specific splice > > socket, or via tap). The logic to do this, is somewhat awkwardly split > > between udp_buf_sock_handler() itself, udp_splice_send() and > > udp_tap_send(). > >=20 > > Move all the batching logic into udp_buf_sock_handler(), leaving > > udp_splice_send() to just send the prepared batch. udp_tap_send() redu= ces > > to just a call to tap_send_frames() so open-code that call in > > udp_buf_sock_handler(). > >=20 > > This will allow separating the batching logic from the rest of the data= gram > > forwarding logic, which we'll need for upcoming flow table support. > >=20 > > Signed-off-by: David Gibson > > --- > > udp.c | 128 ++++++++++++++++++---------------------------------------- > > 1 file changed, 39 insertions(+), 89 deletions(-) > >=20 > > diff --git a/udp.c b/udp.c > > index 8ed59639..a317e986 100644 > > --- a/udp.c > > +++ b/udp.c > > @@ -501,42 +501,29 @@ static void udp_splice_prepare(struct mmsghdr *mm= h, unsigned idx) > > } > > =20 > > /** > > - * udp_splice_send() - Send datagrams from socket to socket > > + * udp_splice_send() - Send a batch of datagrams from socket to socket > > * @c: Execution context > > - * @start: Index of first datagram in udp[46]_l2_buf > > - * @n: Total number of datagrams in udp[46]_l2_buf pool > > - * @dst: Datagrams will be sent to this port (on destination side) > > + * @start: Index of batch's first datagram in udp[46]_l2_buf > > + * @n: Number of datagrams in batch > > + * @src: Source port for datagram (target side) > > + * @dst: Destination port for datagrams (target side) > > * @ref: epoll reference for origin socket > > * @now: Timestamp > > - * > > - * This consumes as many datagrams as are sendable via a single socket= =2E It > > - * requires that udp_meta[@start].splicesrc is initialised, and will i= nitialise > > - * udp_meta[].splicesrc for each datagram it consumes *and one more* (= if > > - * present). > > - * > > - * Return: Number of datagrams forwarded > > */ > > -static unsigned udp_splice_send(const struct ctx *c, size_t start, siz= e_t n, > > - in_port_t dst, union epoll_ref ref, > > - const struct timespec *now) > > +static void udp_splice_send(const struct ctx *c, size_t start, size_t = n, > > + in_port_t src, in_port_t dst, > > + union epoll_ref ref, > > + const struct timespec *now) > > { > > - in_port_t src =3D udp_meta[start].splicesrc; > > - struct mmsghdr *mmh_recv; > > - unsigned int i =3D start; > > int s; > > =20 > > - ASSERT(udp_meta[start].splicesrc >=3D 0); > > - ASSERT(ref.type =3D=3D EPOLL_TYPE_UDP); > > - > > if (ref.udp.v6) { > > - mmh_recv =3D udp6_mh_recv; > > udp_spliceto.sa6 =3D (struct sockaddr_in6) { > > .sin6_family =3D AF_INET6, > > .sin6_addr =3D in6addr_loopback, > > .sin6_port =3D htons(dst), > > }; > > } else { > > - mmh_recv =3D udp4_mh_recv; > > udp_spliceto.sa4 =3D (struct sockaddr_in) { > > .sin_family =3D AF_INET, > > .sin_addr =3D in4addr_loopback, > > @@ -544,15 +531,6 @@ static unsigned udp_splice_send(const struct ctx *= c, size_t start, size_t n, > > }; > > } > > =20 > > - do { > > - udp_splice_prepare(mmh_recv, i); > > - > > - if (++i >=3D n) > > - break; > > - > > - udp_meta[i].splicesrc =3D udp_mmh_splice_port(ref, &mmh_recv[i]); > > - } while (udp_meta[i].splicesrc =3D=3D src); > > - > > if (ref.udp.pif =3D=3D PIF_SPLICE) { > > src +=3D c->udp.fwd_in.rdelta[src]; > > s =3D udp_splice_init[ref.udp.v6][src].sock; > > @@ -560,7 +538,7 @@ static unsigned udp_splice_send(const struct ctx *c= , size_t start, size_t n, > > s =3D udp_splice_new(c, ref.udp.v6, src, false); > > =20 > > if (s < 0) > > - goto out; > > + return; > > =20 > > udp_splice_ns[ref.udp.v6][dst].ts =3D now->tv_sec; > > udp_splice_init[ref.udp.v6][src].ts =3D now->tv_sec; > > @@ -577,15 +555,13 @@ static unsigned udp_splice_send(const struct ctx = *c, size_t start, size_t n, > > s =3D arg.s; > > } > > if (s < 0) > > - goto out; > > + return; > > =20 > > udp_splice_init[ref.udp.v6][dst].ts =3D now->tv_sec; > > udp_splice_ns[ref.udp.v6][src].ts =3D now->tv_sec; > > } > > =20 > > - sendmmsg(s, udp_mh_splice + start, i - start, MSG_NOSIGNAL); > > -out: > > - return i - start; > > + sendmmsg(s, udp_mh_splice + start, n, MSG_NOSIGNAL); > > } > > =20 > > /** > > @@ -725,7 +701,7 @@ static size_t udp_update_hdr6(const struct ctx *c, > > * @v6: Prepare for IPv6? > > * @now: Current timestamp > > */ > > -static void udp_tap_prepare(const struct ctx *c, struct mmsghdr *mmh, > > +static void udp_tap_prepare(const struct ctx *c, const struct mmsghdr = *mmh, > > unsigned idx, in_port_t dstport, bool v6, > > const struct timespec *now) > > { > > @@ -752,49 +728,6 @@ static void udp_tap_prepare(const struct ctx *c, s= truct mmsghdr *mmh, > > (*tap_iov)[UDP_IOV_PAYLOAD].iov_len =3D l4len; > > } > > =20 > > -/** > > - * udp_tap_send() - Prepare UDP datagrams and send to tap interface > > - * @c: Execution context > > - * @start: Index of first datagram in udp[46]_l2_buf pool > > - * @n: Total number of datagrams in udp[46]_l2_buf pool > > - * @dstport: Destination port number on destination side > > - * @ref: Epoll reference for origin socket > > - * @now: Current timestamp > > - * > > - * This consumes as many frames as are sendable via tap. It requires = that > > - * udp_meta[@start].splicesrc is initialised, and will initialise > > - * udp_meta[].splicesrc for each frame it consumes *and one more* (if = present). > > - * > > - * Return: Number of frames sent via tap > > - */ > > -static unsigned udp_tap_send(const struct ctx *c, size_t start, size_t= n, > > - in_port_t dstport, union epoll_ref ref, > > - const struct timespec *now) > > -{ > > - struct mmsghdr *mmh_recv; > > - size_t i =3D start; > > - > > - ASSERT(udp_meta[start].splicesrc =3D=3D -1); > > - ASSERT(ref.type =3D=3D EPOLL_TYPE_UDP); > > - > > - if (ref.udp.v6) > > - mmh_recv =3D udp6_mh_recv; > > - else > > - mmh_recv =3D udp4_mh_recv; > > - > > - do { > > - udp_tap_prepare(c, mmh_recv, i, dstport, ref.udp.v6, now); > > - > > - if (++i >=3D n) > > - break; > > - > > - udp_meta[i].splicesrc =3D udp_mmh_splice_port(ref, &mmh_recv[i]); > > - } while (udp_meta[i].splicesrc =3D=3D -1); > > - > > - tap_send_frames(c, &udp_l2_iov[start][0], UDP_NUM_IOVS, i - start); > > - return i - start; > > -} > > - > > /** > > * udp_sock_recv() - Receive datagrams from a socket > > * @c: Execution context > > @@ -842,7 +775,7 @@ void udp_buf_sock_handler(const struct ctx *c, unio= n epoll_ref ref, uint32_t eve > > { > > struct mmsghdr *mmh_recv =3D ref.udp.v6 ? udp6_mh_recv : udp4_mh_recv; > > in_port_t dstport =3D ref.udp.port; > > - int n, m, i; > > + int n, i; > > =20 > > if ((n =3D udp_sock_recv(c, ref.fd, events, mmh_recv)) <=3D 0) > > return; > > @@ -852,19 +785,36 @@ void udp_buf_sock_handler(const struct ctx *c, un= ion epoll_ref ref, uint32_t eve > > else if (ref.udp.pif =3D=3D PIF_HOST) > > dstport +=3D c->udp.fwd_in.f.delta[dstport]; > > =20 > > - /* We divide things into batches based on how we need to send them, > > + /* We divide datagrams into batches based on how we need to send them, > > * determined by udp_meta[i].splicesrc. To avoid either two passes > > * through the array, or recalculating splicesrc for a single entry, = we > > - * have to populate it one entry *ahead* of the loop counter (if > > - * present). So we fill in entry 0 before the loop, then udp_*_send() > > - * populate one entry past where they consume. > > + * have to populate it one entry *ahead* of the loop counter. > > */ > > udp_meta[0].splicesrc =3D udp_mmh_splice_port(ref, mmh_recv); > > - for (i =3D 0; i < n; i +=3D m) { > > - if (udp_meta[i].splicesrc >=3D 0) > > - m =3D udp_splice_send(c, i, n, dstport, ref, now); > > + for (i =3D 0; i < n; ) { > > + int batchsrc =3D udp_meta[i].splicesrc; > > + int batchstart =3D i; > > + > > + do { > > + if (batchsrc >=3D 0) > > + udp_splice_prepare(mmh_recv, i); > > + else > > + udp_tap_prepare(c, mmh_recv, i, dstport, > > + ref.udp.v6, now); > > + > > + if (++i >=3D n) > > + break; > > + > > + udp_meta[i].splicesrc =3D udp_mmh_splice_port(ref, > > + &mmh_recv[i]); > > + } while (udp_meta[i].splicesrc =3D=3D batchsrc); > > + > > + if (batchsrc >=3D 0) > > + udp_splice_send(c, batchstart, i - batchstart, > > + batchsrc, dstport, ref, now); > > else > > - m =3D udp_tap_send(c, i, n, dstport, ref, now); > > + tap_send_frames(c, &udp_l2_iov[batchstart][0], > > + UDP_NUM_IOVS, i - batchstart); > > } >=20 > The logic looks correct to me, but the nested loop makes it a bit hard > to grasp. I don't disagree it's pretty hard to follow, but I haven't really seen a better way. > I'm wondering if we shouldn't rather have a single loop, always > preparing the datagrams, noting down the previous > udp_meta[i].splicesrc and the first index of a batch, and starting a new > batch (sending the previous one) once the current udp_meta[i].splicesrc > doesn't match the previous value. I can't really picture what you have in mind here. > I tried to sketch this quickly but failed for the moment to come up > with anything vaguely elegant, so I'm fine with either version. >=20 > Nits: curly brackets around multiple lines. That, at least, I can fix. --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --j/IgTrjBVLQfXsLF Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmaHvnkACgkQzQJF27ox 2GdKDA/+PHpZRslM95UhhKPYvEqYtUyYNbsHA8N2R0FOQdKZ1oH4SjXUXjX0t+nV d6W5VPSTAmGFo6QPfoQLXLlbatZA0XKoc0RaW7Fyn658VDdDImIJm2VsYtrxL8B/ VYRwicTC95OurooQ9EayG+jE1ZxKLK2uv2gE/ympc1l18Og6PAqfYVsMvXfl8qUO DPe0qBflToyWgHB3n+pxI+LCcoI8af0gN/MLNGegj6GBaFgC8rvwYEE8Mn17Ub5y eCxO92k1COFrPkP1n5K/dpR3KC05794dCP7RN5vDez/UkOtq8Otf7IrLqwz6cLdj ug2kwpXcer7g3ZigR9GK1QVqlqtGQ/CHXkVb1Ue2VavFkhBOcZ857fOB86evj0F1 FEzMgRpAXsV+kWFX1Xag/Co8dKom0G8aFplaI576oG/lCGNi3EXz55TsGWFNtbG5 XqGevFiEv9NkLvs0S1Tkie+HcrUVqKg5WIOW3VZQUg7zkG/WgK+YzYhhCo4WGlxT NoxAAr9ZSUDV0OWvmvArEEiw5DU2rW/GKHrXeCU/RVgDKdrD4545x9dMpAnoElMf 2nMYrW6mPAwYStwLY4smco/TSktVEy1Tz9RhAA3mtUw+BRVtj/5lRPR0IwvPuYU2 Ecq0ok6vnV1ILBJVpHoVlYSx75ixeIOxbqRwc+vc3rdNLysWZrQ= =7cSp -----END PGP SIGNATURE----- --j/IgTrjBVLQfXsLF--