From: David Gibson <david@gibson.dropbear.id.au>
To: Stefano Brivio <sbrivio@redhat.com>
Cc: passt-dev@passt.top
Subject: Re: [PATCH v3 2/8] util, flow, pif: Simplify sock_l4_sa() interface
Date: Fri, 14 Nov 2025 10:21:46 +1100 [thread overview]
Message-ID: <aRZoCiFLcIpIADBB@zatzit> (raw)
In-Reply-To: <20251113073313.1287b4dc@elisabeth>
[-- Attachment #1: Type: text/plain, Size: 8279 bytes --]
On Thu, Nov 13, 2025 at 07:33:13AM +0100, Stefano Brivio wrote:
> On Wed, 29 Oct 2025 17:26:22 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
>
> > sock_l4_sa() has a somewhat confusing 'v6only' option controlling whether
> > to set the IPV6_V6ONLY socket option. Usually it's set when the given
> > address is IPv6, but not when we want to create a dual stack listening
> > socket. The latter only makes sense when the address is :: however.
> >
> > Clarify this by only keeping the v6only option in an internal helper
> > sock_l4_(). External users will call either sock_l4() which always creates
> > a socket bound to a specific IP version, or sock_l4_dualstack() which
> > creates a dual stack socket, but takes only a port not an address.
>
> I'm not sure if we'll ever need anything different, but I guess that
> this is not the only obvious semantic of sock_l4_dualstack(), as it
> could take a sockaddr_inany eventually, and bind() IPv6 address and its
> v4-mapped equivalent (...does that even work?).
Do you mean that if we have a v4-mapped address, then using an IPv6
"dual stack" socket will listen both for IPv4 traffic and for IPv6
traffic actually using that v4-mapped address on the wire (presumably
as a result of a router translating to a local IPv6-only network)? I
think that will work, though I haven't tested.
In that case we can determine that we need IPV6_V6ONLY from the
address. The only case that doesn't cover is if we want to listen for
v4-mapped traffic already translated by a router but *not* native IPv4
traffic. I don't see a lot of reason to ever do that, so it's in the
"refactor if we ever discover we need it" pile.
Otherwise, the only case in which a single dual stack socket actually
listens to traffic from both protocols is for a wildcard. Maybe there
are obscure wildcard addresses other than :: / 0.0.0.0, but that's
also in the "worry about it later" pile.
Note that:
https://github.com/containers/podman/pull/14026/commits/772ead25318dfa340541197e92322bd2346df087
implies some sort of dual stack localhost support (it treats "dual
stack" ::1 as listening on both ::1 and 127.0.0.1). However, AFAICT
that's just not correct. On Linux, listening on ::1 listens only on
::1 even with V6ONLY explicitly set to 0.
> > We drop the '_sa' suffix while we're at it - it exists because this used
> > to be an internal version with a sock_l4() wrapper. The wrapper no longer
> > exists so the '_sa' is no longer useful.
> >
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> > flow.c | 6 ++----
> > pif.c | 10 +++-------
> > util.c | 27 +++++++++++++++++++++++----
> > util.h | 8 +++++---
> > 4 files changed, 33 insertions(+), 18 deletions(-)
> >
> > diff --git a/flow.c b/flow.c
> > index 9926f408..fd530ddb 100644
> > --- a/flow.c
> > +++ b/flow.c
> > @@ -186,8 +186,7 @@ static int flowside_sock_splice(void *arg)
> >
> > ns_enter(a->c);
> >
> > - a->fd = sock_l4_sa(a->c, a->type, a->sa, NULL,
> > - a->sa->sa_family == AF_INET6, a->data);
> > + a->fd = sock_l4(a->c, a->type, a->sa, NULL, a->data);
> > a->err = errno;
> >
> > return 0;
> > @@ -222,8 +221,7 @@ int flowside_sock_l4(const struct ctx *c, enum epoll_type type, uint8_t pif,
> > else if (sa.sa_family == AF_INET6)
> > ifname = c->ip6.ifname_out;
> >
> > - return sock_l4_sa(c, type, &sa, ifname,
> > - sa.sa_family == AF_INET6, data);
> > + return sock_l4(c, type, &sa, ifname, data);
> >
> > case PIF_SPLICE: {
> > struct flowside_sock_args args = {
> > diff --git a/pif.c b/pif.c
> > index 31723b29..5fb1f455 100644
> > --- a/pif.c
> > +++ b/pif.c
> > @@ -75,11 +75,7 @@ int pif_sock_l4(const struct ctx *c, enum epoll_type type, uint8_t pif,
> > const union inany_addr *addr, const char *ifname,
> > in_port_t port, uint32_t data)
> > {
> > - union sockaddr_inany sa = {
> > - .sa6.sin6_family = AF_INET6,
> > - .sa6.sin6_addr = in6addr_any,
> > - .sa6.sin6_port = htons(port),
> > - };
> > + union sockaddr_inany sa;
> >
> > ASSERT(pif_is_socket(pif));
> >
> > @@ -90,8 +86,8 @@ int pif_sock_l4(const struct ctx *c, enum epoll_type type, uint8_t pif,
> > }
> >
> > if (!addr)
> > - return sock_l4_sa(c, type, &sa, ifname, false, data);
> > + return sock_l4_dualstack(c, type, port, ifname, data);
> >
> > pif_sockaddr(c, &sa, pif, addr, port);
> > - return sock_l4_sa(c, type, &sa, ifname, sa.sa_family == AF_INET6, data);
> > + return sock_l4(c, type, &sa, ifname, data);
> > }
> > diff --git a/util.c b/util.c
> > index 976fcabe..c94efae4 100644
> > --- a/util.c
> > +++ b/util.c
> > @@ -40,7 +40,7 @@
> > #endif
> >
> > /**
> > - * sock_l4_sa() - Create and bind socket to socket address, add to epoll list
> > + * sock_l4_() - Create and bind socket to socket address, add to epoll list
> > * @c: Execution context
> > * @type: epoll type
> > * @sa: Socket address to bind to
> > @@ -50,9 +50,9 @@
> > *
> > * Return: newly created socket, negative error code on failure
> > */
> > -int sock_l4_sa(const struct ctx *c, enum epoll_type type,
> > - const union sockaddr_inany *sa, const char *ifname,
> > - bool v6only, uint32_t data)
> > +static int sock_l4_(const struct ctx *c, enum epoll_type type,
> > + const union sockaddr_inany *sa, const char *ifname,
> > + bool v6only, uint32_t data)
> > {
> > sa_family_t af = sa->sa_family;
> > union epoll_ref ref = { .type = type, .data = data };
> > @@ -182,6 +182,25 @@ int sock_l4_sa(const struct ctx *c, enum epoll_type type,
> > return fd;
> > }
> >
> > +int sock_l4(const struct ctx *c, enum epoll_type type,
> > + const union sockaddr_inany *sa, const char *ifname,
> > + uint32_t data)
>
> Not extremely useful but it saves one "lookup":
>
> /**
> * sock_l4() - Create and bind socket to given address, add to epoll list
> * @c: Execution context
> * @type: epoll type
> * @sa: Socket address to bind to
> * @ifname: Interface for binding, NULL for any
> *
> * Return: newly created socket, negative error code on failure
> */
Oops, I meant to go back and add function comments here, but I
obviously forgot. Fixed.
While there I removed the "add to epoll list" which is no longer
correct.
> > +{
> > + return sock_l4_(c, type, sa, ifname, sa->sa_family == AF_INET6, data);
> > +}
> > +
> > +int sock_l4_dualstack(const struct ctx *c, enum epoll_type type,
> > + in_port_t port, const char *ifname, uint32_t data)
>
> ...same here, and the comment might be used to clarify the
> functionality.
Done.
>
> > +{
> > + union sockaddr_inany sa = {
> > + .sa6.sin6_family = AF_INET6,
> > + .sa6.sin6_addr = in6addr_any,
> > + .sa6.sin6_port = htons(port),
> > + };
> > +
> > + return sock_l4_(c, type, &sa, ifname, 0, data);
> > +}
> > +
> > /**
> > * sock_unix() - Create and bind AF_UNIX socket
> > * @sock_path: Socket path. If empty, set on return (UNIX_SOCK_PATH as prefix)
> > diff --git a/util.h b/util.h
> > index e1a1ebc9..7f0cf686 100644
> > --- a/util.h
> > +++ b/util.h
> > @@ -203,9 +203,11 @@ int do_clone(int (*fn)(void *), char *stack_area, size_t stack_size, int flags,
> > struct ctx;
> > union sockaddr_inany;
> >
> > -int sock_l4_sa(const struct ctx *c, enum epoll_type type,
> > - const union sockaddr_inany *sa, const char *ifname,
> > - bool v6only, uint32_t data);
> > +int sock_l4(const struct ctx *c, enum epoll_type type,
> > + const union sockaddr_inany *sa, const char *ifname,
> > + uint32_t data);
> > +int sock_l4_dualstack(const struct ctx *c, enum epoll_type type,
> > + in_port_t port, const char *ifname, uint32_t data);
> > int sock_unix(char *sock_path);
> > void sock_probe_mem(struct ctx *c);
> > long timespec_diff_ms(const struct timespec *a, const struct timespec *b);
>
> --
> Stefano
>
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2025-11-14 0:26 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-29 6:26 [PATCH v3 0/8] Reduce differences between inbound and outbound socket binding David Gibson
2025-10-29 6:26 ` [PATCH v3 1/8] inany: Let length of sockaddr_inany be implicit from the family David Gibson
2025-11-13 6:33 ` Stefano Brivio
2025-11-13 22:53 ` David Gibson
2025-10-29 6:26 ` [PATCH v3 2/8] util, flow, pif: Simplify sock_l4_sa() interface David Gibson
2025-11-13 6:33 ` Stefano Brivio
2025-11-13 23:21 ` David Gibson [this message]
2025-11-18 0:19 ` Stefano Brivio
2025-11-18 3:34 ` David Gibson
2025-11-19 11:42 ` Stefano Brivio
2025-11-20 0:05 ` David Gibson
2025-11-20 2:22 ` David Gibson
2025-10-29 6:26 ` [PATCH v3 3/8] tcp: Merge tcp_ns_sock_init[46]() into tcp_sock_init_one() David Gibson
2025-10-29 6:26 ` [PATCH v3 4/8] udp: Unify some more inbound/outbound parts of udp_sock_init() David Gibson
2025-11-13 6:33 ` Stefano Brivio
2025-11-13 23:33 ` David Gibson
2025-10-29 6:26 ` [PATCH v3 5/8] udp: Move udp_sock_init() special case to its caller David Gibson
2025-10-29 6:26 ` [PATCH v3 6/8] util: Fix setting of IPV6_V6ONLY socket option David Gibson
2025-11-13 6:33 ` Stefano Brivio
2025-11-14 0:24 ` David Gibson
2025-11-18 0:19 ` Stefano Brivio
2025-10-29 6:26 ` [PATCH v3 7/8] tcp, udp: Remove fallback if creating dual stack socket fails David Gibson
2025-10-29 6:26 ` [PATCH v3 8/8] [RFC, DO NOT APPLY] tcp, udp: Bind outbound listening sockets by interface instead of address David Gibson
2025-10-30 3:58 ` [PATCH v3 0/8] Reduce differences between inbound and outbound socket binding David Gibson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aRZoCiFLcIpIADBB@zatzit \
--to=david@gibson.dropbear.id.au \
--cc=passt-dev@passt.top \
--cc=sbrivio@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).