public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: Stefano Brivio <sbrivio@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: passt-dev@passt.top
Subject: Re: [PATCH v3 2/8] util, flow, pif: Simplify sock_l4_sa() interface
Date: Tue, 18 Nov 2025 01:19:21 +0100	[thread overview]
Message-ID: <20251118011921.4094e698@elisabeth> (raw)
In-Reply-To: <aRZoCiFLcIpIADBB@zatzit>

On Fri, 14 Nov 2025 10:21:46 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Thu, Nov 13, 2025 at 07:33:13AM +0100, Stefano Brivio wrote:
> > On Wed, 29 Oct 2025 17:26:22 +1100
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> >   
> > > sock_l4_sa() has a somewhat confusing 'v6only' option controlling whether
> > > to set the IPV6_V6ONLY socket option.  Usually it's set when the given
> > > address is IPv6, but not when we want to create a dual stack listening
> > > socket.  The latter only makes sense when the address is :: however.
> > > 
> > > Clarify this by only keeping the v6only option in an internal helper
> > > sock_l4_().  External users will call either sock_l4() which always creates
> > > a socket bound to a specific IP version, or sock_l4_dualstack() which
> > > creates a dual stack socket, but takes only a port not an address.  
> > 
> > I'm not sure if we'll ever need anything different, but I guess that
> > this is not the only obvious semantic of sock_l4_dualstack(), as it
> > could take a sockaddr_inany eventually, and bind() IPv6 address and its
> > v4-mapped equivalent (...does that even work?).  
> 
> Do you mean that if we have a v4-mapped address, then using an IPv6
> "dual stack" socket will listen both for IPv4 traffic and for IPv6
> traffic actually using that v4-mapped address on the wire (presumably
> as a result of a router translating to a local IPv6-only network)?  I
> think that will work, though I haven't tested.

Yes, that's what I meant.

> In that case we can determine that we need IPV6_V6ONLY from the
> address.  The only case that doesn't cover is if we want to listen for
> v4-mapped traffic already translated by a router but *not* native IPv4
> traffic.  I don't see a lot of reason to ever do that, so it's in the
> "refactor if we ever discover we need it" pile.

I thought that we might want to listen on both IP versions for whatever
reason, on a single socket, with a specific address (say, that v4-mapped
address and the equivalent untranslated address...?).

I know it can't be done now anyway, I'm just saying that
sock_l4_dualstack() forcing wildcard addresses isn't something we should
imply as part of "dualstack".

> Otherwise, the only case in which a single dual stack socket actually
> listens to traffic from both protocols is for a wildcard.  Maybe there
> are obscure wildcard addresses other than :: / 0.0.0.0, but that's
> also in the "worry about it later" pile.

Sure.

> Note that:
> 
> https://github.com/containers/podman/pull/14026/commits/772ead25318dfa340541197e92322bd2346df087
> 
> implies some sort of dual stack localhost support (it treats "dual
> stack" ::1 as listening on both ::1 and 127.0.0.1).  However, AFAICT
> that's just not correct.  On Linux, listening on ::1 listens only on
> ::1 even with V6ONLY explicitly set to 0.

Right, I don't even know what "simulated" means there. Actually there's
no problem description at all. Go figure. I'm not sure if we want to
report something (I'm not even sure what we should report).

> > > We drop the '_sa' suffix while we're at it - it exists because this used
> > > to be an internal version with a sock_l4() wrapper.  The wrapper no longer
> > > exists so the '_sa' is no longer useful.
> > > 
> > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > > ---
> > >  flow.c |  6 ++----
> > >  pif.c  | 10 +++-------
> > >  util.c | 27 +++++++++++++++++++++++----
> > >  util.h |  8 +++++---
> > >  4 files changed, 33 insertions(+), 18 deletions(-)
> > > 
> > > diff --git a/flow.c b/flow.c
> > > index 9926f408..fd530ddb 100644
> > > --- a/flow.c
> > > +++ b/flow.c
> > > @@ -186,8 +186,7 @@ static int flowside_sock_splice(void *arg)
> > >  
> > >  	ns_enter(a->c);
> > >  
> > > -	a->fd = sock_l4_sa(a->c, a->type, a->sa, NULL,
> > > -	                   a->sa->sa_family == AF_INET6, a->data);
> > > +	a->fd = sock_l4(a->c, a->type, a->sa, NULL, a->data);
> > >  	a->err = errno;
> > >  
> > >  	return 0;
> > > @@ -222,8 +221,7 @@ int flowside_sock_l4(const struct ctx *c, enum epoll_type type, uint8_t pif,
> > >  		else if (sa.sa_family == AF_INET6)
> > >  			ifname = c->ip6.ifname_out;
> > >  
> > > -		return sock_l4_sa(c, type, &sa, ifname,
> > > -				  sa.sa_family == AF_INET6, data);
> > > +		return sock_l4(c, type, &sa, ifname, data);
> > >  
> > >  	case PIF_SPLICE: {
> > >  		struct flowside_sock_args args = {
> > > diff --git a/pif.c b/pif.c
> > > index 31723b29..5fb1f455 100644
> > > --- a/pif.c
> > > +++ b/pif.c
> > > @@ -75,11 +75,7 @@ int pif_sock_l4(const struct ctx *c, enum epoll_type type, uint8_t pif,
> > >  		const union inany_addr *addr, const char *ifname,
> > >  		in_port_t port, uint32_t data)
> > >  {
> > > -	union sockaddr_inany sa = {
> > > -		.sa6.sin6_family = AF_INET6,
> > > -		.sa6.sin6_addr = in6addr_any,
> > > -		.sa6.sin6_port = htons(port),
> > > -	};
> > > +	union sockaddr_inany sa;
> > >  
> > >  	ASSERT(pif_is_socket(pif));
> > >  
> > > @@ -90,8 +86,8 @@ int pif_sock_l4(const struct ctx *c, enum epoll_type type, uint8_t pif,
> > >  	}
> > >  
> > >  	if (!addr)
> > > -		return sock_l4_sa(c, type, &sa, ifname, false, data);
> > > +		return sock_l4_dualstack(c, type, port, ifname, data);
> > >  
> > >  	pif_sockaddr(c, &sa, pif, addr, port);
> > > -	return sock_l4_sa(c, type, &sa, ifname, sa.sa_family == AF_INET6, data);
> > > +	return sock_l4(c, type, &sa, ifname, data);
> > >  }
> > > diff --git a/util.c b/util.c
> > > index 976fcabe..c94efae4 100644
> > > --- a/util.c
> > > +++ b/util.c
> > > @@ -40,7 +40,7 @@
> > >  #endif
> > >  
> > >  /**
> > > - * sock_l4_sa() - Create and bind socket to socket address, add to epoll list
> > > + * sock_l4_() - Create and bind socket to socket address, add to epoll list
> > >   * @c:		Execution context
> > >   * @type:	epoll type
> > >   * @sa:		Socket address to bind to
> > > @@ -50,9 +50,9 @@
> > >   *
> > >   * Return: newly created socket, negative error code on failure
> > >   */
> > > -int sock_l4_sa(const struct ctx *c, enum epoll_type type,
> > > -	       const union sockaddr_inany *sa, const char *ifname,
> > > -	       bool v6only, uint32_t data)
> > > +static int sock_l4_(const struct ctx *c, enum epoll_type type,
> > > +		    const union sockaddr_inany *sa, const char *ifname,
> > > +		    bool v6only, uint32_t data)
> > >  {
> > >  	sa_family_t af = sa->sa_family;
> > >  	union epoll_ref ref = { .type = type, .data = data };
> > > @@ -182,6 +182,25 @@ int sock_l4_sa(const struct ctx *c, enum epoll_type type,
> > >  	return fd;
> > >  }
> > >  
> > > +int sock_l4(const struct ctx *c, enum epoll_type type,
> > > +	    const union sockaddr_inany *sa, const char *ifname,
> > > +	    uint32_t data)  
> > 
> > Not extremely useful but it saves one "lookup":
> > 
> > /**
> >  * sock_l4() - Create and bind socket to given address, add to epoll list
> >  * @c:		Execution context
> >  * @type:	epoll type
> >  * @sa:		Socket address to bind to
> >  * @ifname:	Interface for binding, NULL for any
> >  *
> >  * Return: newly created socket, negative error code on failure
> >  */  
> 
> Oops, I meant to go back and add function comments here, but I
> obviously forgot.  Fixed.
> 
> While there I removed the "add to epoll list" which is no longer
> correct.

Oops, I hadn't solved the merge conflict yet...

> > > +{
> > > +	return sock_l4_(c, type, sa, ifname, sa->sa_family == AF_INET6, data);
> > > +}
> > > +
> > > +int sock_l4_dualstack(const struct ctx *c, enum epoll_type type,
> > > +		      in_port_t port, const char *ifname, uint32_t data)  
> > 
> > ...same here, and the comment might be used to clarify the
> > functionality.  
> 
> Done.
> 
> >   
> > > +{
> > > +	union sockaddr_inany sa = {
> > > +		.sa6.sin6_family = AF_INET6,
> > > +		.sa6.sin6_addr = in6addr_any,
> > > +		.sa6.sin6_port = htons(port),
> > > +	};
> > > +
> > > +	return sock_l4_(c, type, &sa, ifname, 0, data);
> > > +}
> > > +
> > >  /**
> > >   * sock_unix() - Create and bind AF_UNIX socket
> > >   * @sock_path:	Socket path. If empty, set on return (UNIX_SOCK_PATH as prefix)
> > > diff --git a/util.h b/util.h
> > > index e1a1ebc9..7f0cf686 100644
> > > --- a/util.h
> > > +++ b/util.h
> > > @@ -203,9 +203,11 @@ int do_clone(int (*fn)(void *), char *stack_area, size_t stack_size, int flags,
> > >  struct ctx;
> > >  union sockaddr_inany;
> > >  
> > > -int sock_l4_sa(const struct ctx *c, enum epoll_type type,
> > > -	       const union sockaddr_inany *sa, const char *ifname,
> > > -	       bool v6only, uint32_t data);
> > > +int sock_l4(const struct ctx *c, enum epoll_type type,
> > > +	    const union sockaddr_inany *sa, const char *ifname,
> > > +	    uint32_t data);
> > > +int sock_l4_dualstack(const struct ctx *c, enum epoll_type type,
> > > +		      in_port_t port, const char *ifname, uint32_t data);
> > >  int sock_unix(char *sock_path);
> > >  void sock_probe_mem(struct ctx *c);
> > >  long timespec_diff_ms(const struct timespec *a, const struct timespec *b);  

-- 
Stefano


  reply	other threads:[~2025-11-18  0:19 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-29  6:26 [PATCH v3 0/8] Reduce differences between inbound and outbound socket binding David Gibson
2025-10-29  6:26 ` [PATCH v3 1/8] inany: Let length of sockaddr_inany be implicit from the family David Gibson
2025-11-13  6:33   ` Stefano Brivio
2025-11-13 22:53     ` David Gibson
2025-10-29  6:26 ` [PATCH v3 2/8] util, flow, pif: Simplify sock_l4_sa() interface David Gibson
2025-11-13  6:33   ` Stefano Brivio
2025-11-13 23:21     ` David Gibson
2025-11-18  0:19       ` Stefano Brivio [this message]
2025-11-18  3:34         ` David Gibson
2025-11-19 11:42           ` Stefano Brivio
2025-11-20  0:05             ` David Gibson
2025-11-20  2:22               ` David Gibson
2025-10-29  6:26 ` [PATCH v3 3/8] tcp: Merge tcp_ns_sock_init[46]() into tcp_sock_init_one() David Gibson
2025-10-29  6:26 ` [PATCH v3 4/8] udp: Unify some more inbound/outbound parts of udp_sock_init() David Gibson
2025-11-13  6:33   ` Stefano Brivio
2025-11-13 23:33     ` David Gibson
2025-10-29  6:26 ` [PATCH v3 5/8] udp: Move udp_sock_init() special case to its caller David Gibson
2025-10-29  6:26 ` [PATCH v3 6/8] util: Fix setting of IPV6_V6ONLY socket option David Gibson
2025-11-13  6:33   ` Stefano Brivio
2025-11-14  0:24     ` David Gibson
2025-11-18  0:19       ` Stefano Brivio
2025-10-29  6:26 ` [PATCH v3 7/8] tcp, udp: Remove fallback if creating dual stack socket fails David Gibson
2025-10-29  6:26 ` [PATCH v3 8/8] [RFC, DO NOT APPLY] tcp, udp: Bind outbound listening sockets by interface instead of address David Gibson
2025-10-30  3:58 ` [PATCH v3 0/8] Reduce differences between inbound and outbound socket binding David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251118011921.4094e698@elisabeth \
    --to=sbrivio@redhat.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=passt-dev@passt.top \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).