public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: Stefano Brivio <sbrivio@redhat.com>
Cc: passt-dev@passt.top
Subject: Re: [PATCH v3 2/8] util, flow, pif: Simplify sock_l4_sa() interface
Date: Fri, 14 Nov 2025 10:21:46 +1100	[thread overview]
Message-ID: <aRZoCiFLcIpIADBB@zatzit> (raw)
In-Reply-To: <20251113073313.1287b4dc@elisabeth>

[-- Attachment #1: Type: text/plain, Size: 8279 bytes --]

On Thu, Nov 13, 2025 at 07:33:13AM +0100, Stefano Brivio wrote:
> On Wed, 29 Oct 2025 17:26:22 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > sock_l4_sa() has a somewhat confusing 'v6only' option controlling whether
> > to set the IPV6_V6ONLY socket option.  Usually it's set when the given
> > address is IPv6, but not when we want to create a dual stack listening
> > socket.  The latter only makes sense when the address is :: however.
> > 
> > Clarify this by only keeping the v6only option in an internal helper
> > sock_l4_().  External users will call either sock_l4() which always creates
> > a socket bound to a specific IP version, or sock_l4_dualstack() which
> > creates a dual stack socket, but takes only a port not an address.
> 
> I'm not sure if we'll ever need anything different, but I guess that
> this is not the only obvious semantic of sock_l4_dualstack(), as it
> could take a sockaddr_inany eventually, and bind() IPv6 address and its
> v4-mapped equivalent (...does that even work?).

Do you mean that if we have a v4-mapped address, then using an IPv6
"dual stack" socket will listen both for IPv4 traffic and for IPv6
traffic actually using that v4-mapped address on the wire (presumably
as a result of a router translating to a local IPv6-only network)?  I
think that will work, though I haven't tested.

In that case we can determine that we need IPV6_V6ONLY from the
address.  The only case that doesn't cover is if we want to listen for
v4-mapped traffic already translated by a router but *not* native IPv4
traffic.  I don't see a lot of reason to ever do that, so it's in the
"refactor if we ever discover we need it" pile.

Otherwise, the only case in which a single dual stack socket actually
listens to traffic from both protocols is for a wildcard.  Maybe there
are obscure wildcard addresses other than :: / 0.0.0.0, but that's
also in the "worry about it later" pile.

Note that:

https://github.com/containers/podman/pull/14026/commits/772ead25318dfa340541197e92322bd2346df087

implies some sort of dual stack localhost support (it treats "dual
stack" ::1 as listening on both ::1 and 127.0.0.1).  However, AFAICT
that's just not correct.  On Linux, listening on ::1 listens only on
::1 even with V6ONLY explicitly set to 0.

> > We drop the '_sa' suffix while we're at it - it exists because this used
> > to be an internal version with a sock_l4() wrapper.  The wrapper no longer
> > exists so the '_sa' is no longer useful.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  flow.c |  6 ++----
> >  pif.c  | 10 +++-------
> >  util.c | 27 +++++++++++++++++++++++----
> >  util.h |  8 +++++---
> >  4 files changed, 33 insertions(+), 18 deletions(-)
> > 
> > diff --git a/flow.c b/flow.c
> > index 9926f408..fd530ddb 100644
> > --- a/flow.c
> > +++ b/flow.c
> > @@ -186,8 +186,7 @@ static int flowside_sock_splice(void *arg)
> >  
> >  	ns_enter(a->c);
> >  
> > -	a->fd = sock_l4_sa(a->c, a->type, a->sa, NULL,
> > -	                   a->sa->sa_family == AF_INET6, a->data);
> > +	a->fd = sock_l4(a->c, a->type, a->sa, NULL, a->data);
> >  	a->err = errno;
> >  
> >  	return 0;
> > @@ -222,8 +221,7 @@ int flowside_sock_l4(const struct ctx *c, enum epoll_type type, uint8_t pif,
> >  		else if (sa.sa_family == AF_INET6)
> >  			ifname = c->ip6.ifname_out;
> >  
> > -		return sock_l4_sa(c, type, &sa, ifname,
> > -				  sa.sa_family == AF_INET6, data);
> > +		return sock_l4(c, type, &sa, ifname, data);
> >  
> >  	case PIF_SPLICE: {
> >  		struct flowside_sock_args args = {
> > diff --git a/pif.c b/pif.c
> > index 31723b29..5fb1f455 100644
> > --- a/pif.c
> > +++ b/pif.c
> > @@ -75,11 +75,7 @@ int pif_sock_l4(const struct ctx *c, enum epoll_type type, uint8_t pif,
> >  		const union inany_addr *addr, const char *ifname,
> >  		in_port_t port, uint32_t data)
> >  {
> > -	union sockaddr_inany sa = {
> > -		.sa6.sin6_family = AF_INET6,
> > -		.sa6.sin6_addr = in6addr_any,
> > -		.sa6.sin6_port = htons(port),
> > -	};
> > +	union sockaddr_inany sa;
> >  
> >  	ASSERT(pif_is_socket(pif));
> >  
> > @@ -90,8 +86,8 @@ int pif_sock_l4(const struct ctx *c, enum epoll_type type, uint8_t pif,
> >  	}
> >  
> >  	if (!addr)
> > -		return sock_l4_sa(c, type, &sa, ifname, false, data);
> > +		return sock_l4_dualstack(c, type, port, ifname, data);
> >  
> >  	pif_sockaddr(c, &sa, pif, addr, port);
> > -	return sock_l4_sa(c, type, &sa, ifname, sa.sa_family == AF_INET6, data);
> > +	return sock_l4(c, type, &sa, ifname, data);
> >  }
> > diff --git a/util.c b/util.c
> > index 976fcabe..c94efae4 100644
> > --- a/util.c
> > +++ b/util.c
> > @@ -40,7 +40,7 @@
> >  #endif
> >  
> >  /**
> > - * sock_l4_sa() - Create and bind socket to socket address, add to epoll list
> > + * sock_l4_() - Create and bind socket to socket address, add to epoll list
> >   * @c:		Execution context
> >   * @type:	epoll type
> >   * @sa:		Socket address to bind to
> > @@ -50,9 +50,9 @@
> >   *
> >   * Return: newly created socket, negative error code on failure
> >   */
> > -int sock_l4_sa(const struct ctx *c, enum epoll_type type,
> > -	       const union sockaddr_inany *sa, const char *ifname,
> > -	       bool v6only, uint32_t data)
> > +static int sock_l4_(const struct ctx *c, enum epoll_type type,
> > +		    const union sockaddr_inany *sa, const char *ifname,
> > +		    bool v6only, uint32_t data)
> >  {
> >  	sa_family_t af = sa->sa_family;
> >  	union epoll_ref ref = { .type = type, .data = data };
> > @@ -182,6 +182,25 @@ int sock_l4_sa(const struct ctx *c, enum epoll_type type,
> >  	return fd;
> >  }
> >  
> > +int sock_l4(const struct ctx *c, enum epoll_type type,
> > +	    const union sockaddr_inany *sa, const char *ifname,
> > +	    uint32_t data)
> 
> Not extremely useful but it saves one "lookup":
> 
> /**
>  * sock_l4() - Create and bind socket to given address, add to epoll list
>  * @c:		Execution context
>  * @type:	epoll type
>  * @sa:		Socket address to bind to
>  * @ifname:	Interface for binding, NULL for any
>  *
>  * Return: newly created socket, negative error code on failure
>  */

Oops, I meant to go back and add function comments here, but I
obviously forgot.  Fixed.

While there I removed the "add to epoll list" which is no longer
correct.


> > +{
> > +	return sock_l4_(c, type, sa, ifname, sa->sa_family == AF_INET6, data);
> > +}
> > +
> > +int sock_l4_dualstack(const struct ctx *c, enum epoll_type type,
> > +		      in_port_t port, const char *ifname, uint32_t data)
> 
> ...same here, and the comment might be used to clarify the
> functionality.

Done.

> 
> > +{
> > +	union sockaddr_inany sa = {
> > +		.sa6.sin6_family = AF_INET6,
> > +		.sa6.sin6_addr = in6addr_any,
> > +		.sa6.sin6_port = htons(port),
> > +	};
> > +
> > +	return sock_l4_(c, type, &sa, ifname, 0, data);
> > +}
> > +
> >  /**
> >   * sock_unix() - Create and bind AF_UNIX socket
> >   * @sock_path:	Socket path. If empty, set on return (UNIX_SOCK_PATH as prefix)
> > diff --git a/util.h b/util.h
> > index e1a1ebc9..7f0cf686 100644
> > --- a/util.h
> > +++ b/util.h
> > @@ -203,9 +203,11 @@ int do_clone(int (*fn)(void *), char *stack_area, size_t stack_size, int flags,
> >  struct ctx;
> >  union sockaddr_inany;
> >  
> > -int sock_l4_sa(const struct ctx *c, enum epoll_type type,
> > -	       const union sockaddr_inany *sa, const char *ifname,
> > -	       bool v6only, uint32_t data);
> > +int sock_l4(const struct ctx *c, enum epoll_type type,
> > +	    const union sockaddr_inany *sa, const char *ifname,
> > +	    uint32_t data);
> > +int sock_l4_dualstack(const struct ctx *c, enum epoll_type type,
> > +		      in_port_t port, const char *ifname, uint32_t data);
> >  int sock_unix(char *sock_path);
> >  void sock_probe_mem(struct ctx *c);
> >  long timespec_diff_ms(const struct timespec *a, const struct timespec *b);
> 
> -- 
> Stefano
> 

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2025-11-14  0:26 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-29  6:26 [PATCH v3 0/8] Reduce differences between inbound and outbound socket binding David Gibson
2025-10-29  6:26 ` [PATCH v3 1/8] inany: Let length of sockaddr_inany be implicit from the family David Gibson
2025-11-13  6:33   ` Stefano Brivio
2025-11-13 22:53     ` David Gibson
2025-10-29  6:26 ` [PATCH v3 2/8] util, flow, pif: Simplify sock_l4_sa() interface David Gibson
2025-11-13  6:33   ` Stefano Brivio
2025-11-13 23:21     ` David Gibson [this message]
2025-11-18  0:19       ` Stefano Brivio
2025-11-18  3:34         ` David Gibson
2025-11-19 11:42           ` Stefano Brivio
2025-11-20  0:05             ` David Gibson
2025-11-20  2:22               ` David Gibson
2025-10-29  6:26 ` [PATCH v3 3/8] tcp: Merge tcp_ns_sock_init[46]() into tcp_sock_init_one() David Gibson
2025-10-29  6:26 ` [PATCH v3 4/8] udp: Unify some more inbound/outbound parts of udp_sock_init() David Gibson
2025-11-13  6:33   ` Stefano Brivio
2025-11-13 23:33     ` David Gibson
2025-10-29  6:26 ` [PATCH v3 5/8] udp: Move udp_sock_init() special case to its caller David Gibson
2025-10-29  6:26 ` [PATCH v3 6/8] util: Fix setting of IPV6_V6ONLY socket option David Gibson
2025-11-13  6:33   ` Stefano Brivio
2025-11-14  0:24     ` David Gibson
2025-11-18  0:19       ` Stefano Brivio
2025-10-29  6:26 ` [PATCH v3 7/8] tcp, udp: Remove fallback if creating dual stack socket fails David Gibson
2025-10-29  6:26 ` [PATCH v3 8/8] [RFC, DO NOT APPLY] tcp, udp: Bind outbound listening sockets by interface instead of address David Gibson
2025-10-30  3:58 ` [PATCH v3 0/8] Reduce differences between inbound and outbound socket binding David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aRZoCiFLcIpIADBB@zatzit \
    --to=david@gibson.dropbear.id.au \
    --cc=passt-dev@passt.top \
    --cc=sbrivio@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).