From mboxrd@z Thu Jan  1 00:00:00 1970
Authentication-Results: passt.top; dmarc=pass (p=none dis=none) header.from=redhat.com
Authentication-Results: passt.top;
	dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=EIXARuf6;
	dkim-atps=neutral
Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124])
	by passt.top (Postfix) with ESMTP id 733795A004E
	for <passt-dev@passt.top>; Tue, 27 Aug 2024 07:33:37 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
	s=mimecast20190719; t=1724736816;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=2T+D511tBx+ZaF+nMpcqCv2Pby6KmdRTxR7AqoA98Vg=;
	b=EIXARuf649FeEOjmGhSJqM9sPr/ozHsl4tGfGZWcQGT6+DTu26u5BouL34Puz4WzVDaZsY
	zLVN+CZ57ZIlhQ1pJY6pSmWrvNA+iaCGCDY710Ci8KUiK+xLxtVEFxb1Q6G6gpcv0pK0Ot
	JCQhBcm4j/SL2YDW7+Ugz3dCEcOfW3o=
Received: from mail-pj1-f72.google.com (mail-pj1-f72.google.com
 [209.85.216.72]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id
 us-mta-673-tVaUYCm6P5K3an35Iw2-Fg-1; Tue, 27 Aug 2024 01:33:34 -0400
X-MC-Unique: tVaUYCm6P5K3an35Iw2-Fg-1
Received: by mail-pj1-f72.google.com with SMTP id 98e67ed59e1d1-2d699beb78dso2686588a91.0
        for <passt-dev@passt.top>; Mon, 26 Aug 2024 22:33:34 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1724736813; x=1725341613;
        h=content-transfer-encoding:mime-version:organization:references
         :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state
         :from:to:cc:subject:date:message-id:reply-to;
        bh=2T+D511tBx+ZaF+nMpcqCv2Pby6KmdRTxR7AqoA98Vg=;
        b=fucN+c+Rb2twxzlXes/wICNNKGQO13uFSmCU/ObNtEht/j6Gy5GU0wdwXJulT2aEWC
         aTQO0yQsQGdPht6Z0U21lS8e79bRIcO6cdWOmqhPaX7lsysWUIqZa7q8jAHj2xbs6Zkv
         JKPVa/2qdVIXxoFTGSYXcNGwov+aKqX1C0sKHplOcxdGx5NRdWnbHas3YQvuumxTgryQ
         knQkdrWFEVHu9tpc6TLm11RafjaOFrRcqdiK041oPTwmKIXwW+SZqFlkvxu1DSmK/MXd
         z7GFUV7AlxrOyojO8IirSQzTZKgSGLsTc0UloU+UYS4dCC9PFmkaZZPVnoRr9bkMQch3
         3SxA==
X-Gm-Message-State: AOJu0YzMJuuO9DLuFkmEdr4gRr7VePoyIg+IfGuUUw6obl7C5jFKsOB2
	h8+xrRKrohGUrtPfpRoP7NfQTaifrlu7kqD5WYTGR5cnccquwUdfqOfQrqN3kmeTehbS1KLAdTH
	daF2ZWMwzxJ1E87gT1Y0cT+U8KORkipMw2GTO+rw7AKEgj+ArGE6zc0h/Vw==
X-Received: by 2002:a17:902:f60a:b0:203:a279:a144 with SMTP id d9443c01a7336-204df45d558mr18849475ad.25.1724736813161;
        Mon, 26 Aug 2024 22:33:33 -0700 (PDT)
X-Google-Smtp-Source: AGHT+IFZ1j/pwvs86kWQkMIi+U8CyTVT5cLEJ840A6JTNQ2TT/Qogr+9dubSMh3BRq5o5vUWTh/k0Q==
X-Received: by 2002:a17:902:f60a:b0:203:a279:a144 with SMTP id d9443c01a7336-204df45d558mr18849295ad.25.1724736812594;
        Mon, 26 Aug 2024 22:33:32 -0700 (PDT)
Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [2a10:fc81:a806:d6a9::1])
        by smtp.gmail.com with ESMTPSA id d9443c01a7336-2038560a02esm75437875ad.212.2024.08.26.22.33.31
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 26 Aug 2024 22:33:31 -0700 (PDT)
Date: Tue, 27 Aug 2024 07:33:29 +0200
From: Stefano Brivio <sbrivio@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>
Subject: Re: [PATCH 1/3] udp: Merge udp[46]_mh_recv arrays
Message-ID: <20240827073329.565765e3@elisabeth>
In-Reply-To: <Zs0oDrj6nxzRSV5U@zatzit.fritz.box>
References: <20240826093716.1925064-1-david@gibson.dropbear.id.au>
	<20240826093716.1925064-2-david@gibson.dropbear.id.au>
	<20240826213255.769242da@elisabeth>
	<Zs0oDrj6nxzRSV5U@zatzit.fritz.box>
Organization: Red Hat
X-Mailer: Claws Mail 4.2.0 (GTK 3.24.41; x86_64-pc-linux-gnu)
MIME-Version: 1.0
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Message-ID-Hash: 3IROSBBUCHEJQLQ5CJ75MOGADEBLMTMR
X-Message-ID-Hash: 3IROSBBUCHEJQLQ5CJ75MOGADEBLMTMR
X-MailFrom: sbrivio@redhat.com
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
CC: passt-dev@passt.top
X-Mailman-Version: 3.3.8
Precedence: list
List-Id: Development discussion and patches for passt <passt-dev.passt.top>
Archived-At: <https://archives.passt.top/passt-dev/20240827073329.565765e3@elisabeth/>
Archived-At: <https://passt.top/hyperkitty/list/passt-dev@passt.top/message/3IROSBBUCHEJQLQ5CJ75MOGADEBLMTMR/>
List-Archive: <https://archives.passt.top/passt-dev/>
List-Archive: <https://passt.top/hyperkitty/list/passt-dev@passt.top/>
List-Help: <mailto:passt-dev-request@passt.top?subject=help>
List-Owner: <mailto:passt-dev-owner@passt.top>
List-Post: <mailto:passt-dev@passt.top>
List-Subscribe: <mailto:passt-dev-join@passt.top>
List-Unsubscribe: <mailto:passt-dev-leave@passt.top>

On Tue, 27 Aug 2024 11:12:46 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Mon, Aug 26, 2024 at 09:32:55PM +0200, Stefano Brivio wrote:
> > On Mon, 26 Aug 2024 19:37:14 +1000
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> >   
> > > We've already gotten rid of most of the IPv4/IPv6 specific data structures
> > > in udp.c by merging them with each other.  One significant one remains:
> > > udp[46]_mh_recv.  This was a bit awkward to remove because of a subtle
> > > interaction.  We initialise the msg_namelen fields to represent the total
> > > size we have for a socket address, but when we receive into the arrays
> > > those are modified to the actual length of the sockaddr we received.
> > > 
> > > That meant that naively merging the arrays meant that if we received IPv4
> > > datagrams, then IPv6 datagrams, the addresses for the latter would be
> > > truncated.  In this patch address that by resetting the received
> > > msg_namelen as soon as we've found a flow for the datagram.  Finding the
> > > flow is the only thing that might use the actual sockaddr length, although
> > > we in fact don't need it for the time being.
> > > 
> > > This also removes the last use of the 'v6' field from udp_listen_epoll_ref,
> > > so remove that as well.
> > > 
> > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > > ---
> > >  udp.c | 57 ++++++++++++++++++++-------------------------------------
> > >  udp.h |  2 --
> > >  2 files changed, 20 insertions(+), 39 deletions(-)
> > > 
> > > diff --git a/udp.c b/udp.c
> > > index 8a93aad6..6638c22b 100644
> > > --- a/udp.c
> > > +++ b/udp.c
> > > @@ -178,8 +178,7 @@ enum udp_iov_idx {
> > >  
> > >  /* IOVs and msghdr arrays for receiving datagrams from sockets */
> > >  static struct iovec	udp_iov_recv		[UDP_MAX_FRAMES];
> > > -static struct mmsghdr	udp4_mh_recv		[UDP_MAX_FRAMES];
> > > -static struct mmsghdr	udp6_mh_recv		[UDP_MAX_FRAMES];
> > > +static struct mmsghdr	udp_mh_recv		[UDP_MAX_FRAMES];
> > >  
> > >  /* IOVs and msghdr arrays for sending "spliced" datagrams to sockets */
> > >  static union sockaddr_inany udp_splice_to;
> > > @@ -222,6 +221,7 @@ void udp_update_l2_buf(const unsigned char *eth_d, const unsigned char *eth_s)
> > >  static void udp_iov_init_one(const struct ctx *c, size_t i)
> > >  {
> > >  	struct udp_payload_t *payload = &udp_payload[i];
> > > +	struct msghdr *mh = &udp_mh_recv[i].msg_hdr;
> > >  	struct udp_meta_t *meta = &udp_meta[i];
> > >  	struct iovec *siov = &udp_iov_recv[i];
> > >  	struct iovec *tiov = udp_l2_iov[i];
> > > @@ -236,27 +236,10 @@ static void udp_iov_init_one(const struct ctx *c, size_t i)
> > >  	tiov[UDP_IOV_TAP] = tap_hdr_iov(c, &meta->taph);
> > >  	tiov[UDP_IOV_PAYLOAD].iov_base = payload;
> > >  
> > > -	/* It's useful to have separate msghdr arrays for receiving.  Otherwise,
> > > -	 * an IPv4 recv() will alter msg_namelen, so we'd have to reset it every
> > > -	 * time or risk truncating the address on future IPv6 recv()s.
> > > -	 */
> > > -	if (c->ifi4) {
> > > -		struct msghdr *mh = &udp4_mh_recv[i].msg_hdr;
> > > -
> > > -		mh->msg_name	= &meta->s_in;
> > > -		mh->msg_namelen	= sizeof(struct sockaddr_in);
> > > -		mh->msg_iov	= siov;
> > > -		mh->msg_iovlen	= 1;
> > > -	}
> > > -
> > > -	if (c->ifi6) {
> > > -		struct msghdr *mh = &udp6_mh_recv[i].msg_hdr;
> > > -
> > > -		mh->msg_name	= &meta->s_in;
> > > -		mh->msg_namelen	= sizeof(struct sockaddr_in6);
> > > -		mh->msg_iov	= siov;
> > > -		mh->msg_iovlen	= 1;
> > > -	}
> > > +	mh->msg_name	= &meta->s_in;
> > > +	mh->msg_namelen	= sizeof(meta->s_in);
> > > +	mh->msg_iov	= siov;
> > > +	mh->msg_iovlen	= 1;
> > >  }
> > >  
> > >  /**
> > > @@ -506,10 +489,10 @@ static int udp_sock_recv(const struct ctx *c, int s, uint32_t events,
> > >  void udp_listen_sock_handler(const struct ctx *c, union epoll_ref ref,
> > >  			     uint32_t events, const struct timespec *now)
> > >  {
> > > -	struct mmsghdr *mmh_recv = ref.udp.v6 ? udp6_mh_recv : udp4_mh_recv;
> > > +	const socklen_t sasize = sizeof(udp_meta[0].s_in);
> > >  	int n, i;
> > >  
> > > -	if ((n = udp_sock_recv(c, ref.fd, events, mmh_recv)) <= 0)
> > > +	if ((n = udp_sock_recv(c, ref.fd, events, udp_mh_recv)) <= 0)
> > >  		return;
> > >  
> > >  	/* We divide datagrams into batches based on how we need to send them,
> > > @@ -518,6 +501,7 @@ void udp_listen_sock_handler(const struct ctx *c, union epoll_ref ref,
> > >  	 * populate it one entry *ahead* of the loop counter.
> > >  	 */
> > >  	udp_meta[0].tosidx = udp_flow_from_sock(c, ref, &udp_meta[0].s_in, now);
> > > +	udp_mh_recv[0].msg_hdr.msg_namelen = sasize;  
> > 
> > I don't understand why you need this assignment. To me it looks
> > redundant with:
> > 
> >   udp_mh_recv[i].msg_hdr.msg_namelen = sizeof(udp_meta[i].s_in);  
> 
> It's not redundant per se, because the later assignment only occurs
> for i > 0, so the first one is for slot 0.

I still don't see how: the second assignment (out of three) is done
before i is incremented, so that should cover i == 0 as well, right?

> It would, however, be
> possible to move to a single assignment in the loop body before i is
> incremented.
> 
> I did it this way, because I found it easier to reason about.  At
> least theoretically the value of msg_namelen written by recvmmsg()
> could be important, although we don't use yet (we rely on the
> sa_family field instead).  But because of that it felt wrong to
> overwrite that value before we've "consumed" it.  Logically that
> happens in udp_flow_from_sock() which is what takes the address in
> msg_name / msg_namelen and converts it into the long-term form (as
> part of the flowside).  Hence, clearing msg_namelen immediately after
> each call to udp_flow_from_sock() made sense to me.
> 
> I did consider changing udp_flow_from_sock() to take a socklen_t *
> which it clears after using.  That seemed slightly abstraction
> violationy to me: clearing msg_namelen only makes sense because the
> address is part of a re-used mmsghdr array, and that's not something
> udp_flow_from_sock() "knows".
> 
> That was my reasoning, anyway.  I'm happy enough to change it if you
> have a preferred approach.

No, no, this all makes sense. But you add three assignments here, and I
don't understand why #1 is needed if we have #2 and #3, or why #2 is
needed if we have #1 and #3.

> > later (because n > 0), and:
> >   
> > >  	for (i = 0; i < n; ) {
> > >  		flow_sidx_t batchsidx = udp_meta[i].tosidx;
> > >  		uint8_t batchpif = pif_at_sidx(batchsidx);
> > > @@ -525,18 +509,22 @@ void udp_listen_sock_handler(const struct ctx *c, union epoll_ref ref,
> > >  
> > >  		do {
> > >  			if (pif_is_socket(batchpif)) {
> > > -				udp_splice_prepare(mmh_recv, i);
> > > +				udp_splice_prepare(udp_mh_recv, i);
> > >  			} else if (batchpif == PIF_TAP) {
> > > -				udp_tap_prepare(mmh_recv, i,
> > > +				udp_tap_prepare(udp_mh_recv, i,
> > >  						flowside_at_sidx(batchsidx));
> > >  			}
> > >  
> > > +			/* Restore sockaddr length clobbered by recvmsg() */
> > > +			udp_mh_recv[i].msg_hdr.msg_namelen = sizeof(udp_meta[i].s_in);  
> > 
> > what is the difference between assigning sizeof(udp_meta[i].s_in); and
> > sasize? I thought it would be the same quantity.  
> 
> It is.  The only purpose of sasize is to avoid some over-long lines.

Right, but why do you use it just twice out of three assignments? What
is special with the one immediately above here?

> > > +
> > >  			if (++i >= n)
> > >  				break;
> > >  
> > >  			udp_meta[i].tosidx = udp_flow_from_sock(c, ref,
> > >  								&udp_meta[i].s_in,
> > >  								now);
> > > +			udp_mh_recv[i].msg_hdr.msg_namelen = sasize;
> > >  		} while (flow_sidx_eq(udp_meta[i].tosidx, batchsidx));
> > >  
> > >  		if (pif_is_socket(batchpif)) {  

-- 
Stefano