From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from gandalf.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by passt.top (Postfix) with ESMTPS id B26ED5A0275 for ; Wed, 28 Feb 2024 03:30:11 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202312; t=1709087408; bh=gnMibEFScvkqogf9UTeOGcQWDNHV2P0Yij2T+nainmg=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=B5DN/bkAaTC3LRnxMUe1bH07qHJ2ebNEH85hQmWP8wq6AwIHzWxBXRU00U7wev/CD HYv+uYsnft7pFeWqNk7T5Lu5XhHXjt8IgpjSd9CIptf6EWzyjCrK420dfkUiX0dhnn xBHuWSHgwPMK79/QL9K4LouBI8QhmdnoD9q1AgBsLOjwR5f+Fbq/wJAMy009R+ChX0 1XrIs+PxUawpiJ9a9A22FwhhXA7Be4bG6lDhfH3NHv5j1YYbcxaYrTsrWX+mfBbdMR ra4AagZlwptnka3MoE9n7uNOp2pWjDyuCyeobZq+Quo65ot1IZVQmr3uM/oVP3+8sk GijRvcL4/JZaA== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4TkyxX4Ycyz4wxs; Wed, 28 Feb 2024 13:30:08 +1100 (AEDT) Date: Wed, 28 Feb 2024 13:30:05 +1100 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH 4/7] udp: Separate bound sockets from flags Message-ID: References: <20240221232115.1376333-1-david@gibson.dropbear.id.au> <20240221232115.1376333-5-david@gibson.dropbear.id.au> <20240227125659.6505dae9@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="HHcA7ShsvVdaMtBX" Content-Disposition: inline In-Reply-To: <20240227125659.6505dae9@elisabeth> Message-ID-Hash: 6YD4M64DD3DNNW6CBDB7O3BOPTJ2C2SL X-Message-ID-Hash: 6YD4M64DD3DNNW6CBDB7O3BOPTJ2C2SL X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --HHcA7ShsvVdaMtBX Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Feb 27, 2024 at 12:56:59PM +0100, Stefano Brivio wrote: > On Thu, 22 Feb 2024 10:21:12 +1100 > David Gibson wrote: >=20 > > struct udp_tap_port contains both a socket fd and a set of flags. But, > > confusingly, these fields are not actually related to each other at all. > >=20 > > udp_tap_map[?][port].sock contains a host UDP socket bound to 'port' (or > > -1 if we haven't opened such a socket). That is, it is indexed by the > > bound port =3D=3D the host side forwarding port =3D=3D destination for = host->guest > > packets =3D=3D source for guest->host packets. > >=20 > > udp_tap_map[?][port].flags, however, contains flags which control how we > > direct datagrams bound for 'port' from the tap interface. That is, it = is > > indexed by the destination for guest->host packets =3D=3D source for > > host->guest packets =3D=3D host side endpoint port. > >=20 > > Worse both aspects of this share the ts field used to control expiry. = When > > we update this because we've used the socket in slot 'port' we could be > > clobbering the flags timestamp for an unrelated flow of packets that > > happens to have the same endpoint port as our forwarding port. > >=20 > > Clarify this by moving the flags out into a separate array, with its own > > timestamp and expiry path. As a bonus this means that we can now use t= he > > same data structure for tracking the socket fds of "tap" and "splice" p= ath > > sockets. > >=20 > > Signed-off-by: David Gibson > > --- > > udp.c | 132 ++++++++++++++++++++++++++++++---------------------------- > > 1 file changed, 68 insertions(+), 64 deletions(-) > >=20 > > diff --git a/udp.c b/udp.c > > index 4200f849..97518d92 100644 > > --- a/udp.c > > +++ b/udp.c > > @@ -121,40 +121,39 @@ > > #define UDP_MAX_FRAMES 32 /* max # of frames to receive at once */ > > =20 > > /** > > - * struct udp_tap_port - Port tracking based on tap-facing source port > > - * @sock: Socket bound to source port used as index > > + * struct udp_port_flags - Translation options by remote Port tracking= based on tap-facing source port >=20 > I'm not sure what you mean by this -- perhaps "Address translation > options for bound source ports"? Oops, yes. I think I partially rewrote an existing comment, leaving an incoherent fusion of two different sentences. > > * @flags: Flags for local bind, loopback address/unicast address as s= ource > > - * @ts: Activity timestamp from tap, used for socket aging > > + * @ts: Activity timestamp > > */ > > -struct udp_tap_port { > > - int sock; > > - uint8_t flags; > > +struct udp_port_flags { > > #define PORT_LOCAL BIT(0) > > #define PORT_LOOPBACK BIT(1) > > #define PORT_GUA BIT(2) > > - > > + uint8_t flags; > > time_t ts; > > }; > > +static struct udp_port_flags udp_port_flags[IP_VERSIONS][NUM_PORTS]; > > =20 > > /** > > - * struct udp_splice_port - Bound socket for spliced communication > > - * @sock: Socket bound to index port > > + * struct udp_bound_port - Track bound UDP sockets > > + * @sock: Socket bound to index port (or -1) > > * @ts: Activity timestamp > > */ > > -struct udp_splice_port { > > +struct udp_bound_port { > > int sock; > > time_t ts; > > }; > > =20 > > -/* Port tracking, arrays indexed by packet source port (host order) */ > > -static struct udp_tap_port udp_tap_map [IP_VERSIONS][NUM_PORTS]; > > +/* Bound sockets indexed by bound port (host order) */ > > +static struct udp_bound_port udp_tap_sock [IP_VERSIONS][NUM_PORTS]; > > =20 > > -/* "Spliced" sockets indexed by bound port (host order) */ > > -static struct udp_splice_port udp_splice_ns [IP_VERSIONS][NUM_PORTS]; > > -static struct udp_splice_port udp_splice_init[IP_VERSIONS][NUM_PORTS]; > > +/* Bound sockets for splicing indexed by bound port (host order) */ > > +static struct udp_bound_port udp_splice_ns [IP_VERSIONS][NUM_PORTS]; > > +static struct udp_bound_port udp_splice_init [IP_VERSIONS][NUM_PORTS]; > > =20 > > enum udp_act_type { > > - UDP_ACT_TAP, > > + UDP_ACT_TAP_SOCK, > > + UDP_ACT_FLAGS, >=20 > I'm having a hard time reviewing the rest because of this, I think. > Earlier, UDP_ACT_TAP meant that the activity timestamp was observed on > a non-spliced port. Ah! I'd read those as "ACTions we needed to take" in the expiry timers, rather than tying them to specific activity. It amounts to the same thing pre-patch, but it makes more sense to me with that new lens. > Now, we have UDP_ACT_FLAGS which seems to mean that activity was seen > in the direction of (to) the "tap" side, and UDP_ACT_TAP_SOCK meaning > that activity was seen in the direction of (to) the socket. >=20 > But that's somehow the opposite of UDP_ACT_SPLICE_NS and > UDP_ACT_SPLICE_INIT -- even though they don't actually indicate > activity, rather the fact that activity timestamps refer to sockets that > originated in (*from*) the target namespace, or in the initial > namespace. >=20 > Anyway, I'm not quite sure: why do we need two separate flags for "tap" > (from/to) activity? We don't, this is an ugly side effect. The problem is that in scanning for expiries we only have a single port number. But the socket is associated with a particular forwarding port, while the flags are associated with a particular endpoint port. Without flow tracking, we don't have a way to match one to the other. The compromise here is - or is supposed to be - that we have independent timestamps for the socket and the flags. The two timestamps associated with a single flow are supposed to be updated at basically the same time - looks like I missed some things to do that as well. > I mean, as long as we observe activity on a non-spliced flow, we'll > update the related timestamp, and we make sure the activity flag is > set. Can't it simply be UDP_ACT_TAP? >=20 > The rest of the series looks good to me. >=20 > > UDP_ACT_SPLICE_NS, > > UDP_ACT_SPLICE_INIT, > > UDP_ACT_TYPE_MAX, > > @@ -246,7 +245,7 @@ void udp_portmap_clear(void) > > unsigned i; > > =20 > > for (i =3D 0; i < NUM_PORTS; i++) { > > - udp_tap_map[V4][i].sock =3D udp_tap_map[V6][i].sock =3D -1; > > + udp_tap_sock[V4][i].sock =3D udp_tap_sock[V6][i].sock =3D -1; > > udp_splice_ns[V4][i].sock =3D udp_splice_ns[V6][i].sock =3D -1; > > udp_splice_init[V4][i].sock =3D udp_splice_init[V6][i].sock =3D -1; > > } > > @@ -393,16 +392,16 @@ int udp_splice_new(const struct ctx *c, int v6, i= n_port_t src, bool ns) > > union epoll_ref ref =3D { .type =3D EPOLL_TYPE_UDP, > > .udp =3D { .splice =3D true, .v6 =3D v6, .port =3D src } > > }; > > - struct udp_splice_port *sp; > > + struct udp_bound_port *bp; > > int act, s; > > =20 > > if (ns) { > > ref.udp.pif =3D PIF_SPLICE; > > - sp =3D &udp_splice_ns[v6 ? V6 : V4][src]; > > + bp =3D &udp_splice_ns[v6 ? V6 : V4][src]; > > act =3D UDP_ACT_SPLICE_NS; > > } else { > > ref.udp.pif =3D PIF_HOST; > > - sp =3D &udp_splice_init[v6 ? V6 : V4][src]; > > + bp =3D &udp_splice_init[v6 ? V6 : V4][src]; > > act =3D UDP_ACT_SPLICE_INIT; > > } > > =20 > > @@ -437,7 +436,7 @@ int udp_splice_new(const struct ctx *c, int v6, in_= port_t src, bool ns) > > goto fail; > > } > > =20 > > - sp->sock =3D s; > > + bp->sock =3D s; > > bitmap_set(udp_act[v6 ? V6 : V4][act], src); > > =20 > > ev.data.u64 =3D ref.u64; > > @@ -601,15 +600,15 @@ static size_t udp_update_hdr4(const struct ctx *c= , int n, in_port_t dstport, > > } else if (IN4_IS_ADDR_LOOPBACK(&b->s_in.sin_addr) || > > IN4_ARE_ADDR_EQUAL(&b->s_in.sin_addr, &c->ip4.addr_seen)) { > > b->iph.saddr =3D c->ip4.gw.s_addr; > > - udp_tap_map[V4][src_port].ts =3D now->tv_sec; > > - udp_tap_map[V4][src_port].flags |=3D PORT_LOCAL; > > + udp_port_flags[V4][src_port].ts =3D now->tv_sec; > > + udp_port_flags[V4][src_port].flags |=3D PORT_LOCAL; > > =20 > > if (IN4_ARE_ADDR_EQUAL(&b->s_in.sin_addr.s_addr, &c->ip4.addr_seen)) > > - udp_tap_map[V4][src_port].flags &=3D ~PORT_LOOPBACK; > > + udp_port_flags[V4][src_port].flags &=3D ~PORT_LOOPBACK; > > else > > - udp_tap_map[V4][src_port].flags |=3D PORT_LOOPBACK; > > + udp_port_flags[V4][src_port].flags |=3D PORT_LOOPBACK; > > =20 > > - bitmap_set(udp_act[V4][UDP_ACT_TAP], src_port); > > + bitmap_set(udp_act[V4][UDP_ACT_FLAGS], src_port); > > } else { > > b->iph.saddr =3D b->s_in.sin_addr.s_addr; > > } > > @@ -664,20 +663,20 @@ static size_t udp_update_hdr6(const struct ctx *c= , int n, in_port_t dstport, > > else > > b->ip6h.saddr =3D c->ip6.addr_ll; > > =20 > > - udp_tap_map[V6][src_port].ts =3D now->tv_sec; > > - udp_tap_map[V6][src_port].flags |=3D PORT_LOCAL; > > + udp_port_flags[V6][src_port].ts =3D now->tv_sec; > > + udp_port_flags[V6][src_port].flags |=3D PORT_LOCAL; > > =20 > > if (IN6_IS_ADDR_LOOPBACK(src)) > > - udp_tap_map[V6][src_port].flags |=3D PORT_LOOPBACK; > > + udp_port_flags[V6][src_port].flags |=3D PORT_LOOPBACK; > > else > > - udp_tap_map[V6][src_port].flags &=3D ~PORT_LOOPBACK; > > + udp_port_flags[V6][src_port].flags &=3D ~PORT_LOOPBACK; > > =20 > > if (IN6_ARE_ADDR_EQUAL(src, &c->ip6.addr)) > > - udp_tap_map[V6][src_port].flags |=3D PORT_GUA; > > + udp_port_flags[V6][src_port].flags |=3D PORT_GUA; > > else > > - udp_tap_map[V6][src_port].flags &=3D ~PORT_GUA; > > + udp_port_flags[V6][src_port].flags &=3D ~PORT_GUA; > > =20 > > - bitmap_set(udp_act[V6][UDP_ACT_TAP], src_port); > > + bitmap_set(udp_act[V6][UDP_ACT_FLAGS], src_port); > > } else { > > b->ip6h.daddr =3D c->ip6.addr_seen; > > b->ip6h.saddr =3D b->s_in6.sin6_addr; > > @@ -857,16 +856,16 @@ int udp_tap_handler(struct ctx *c, uint8_t pif, > > s_in.sin_addr =3D c->ip4.dns_host; > > } else if (IN4_ARE_ADDR_EQUAL(&s_in.sin_addr, &c->ip4.gw) && > > !c->no_map_gw) { > > - if (!(udp_tap_map[V4][dst].flags & PORT_LOCAL) || > > - (udp_tap_map[V4][dst].flags & PORT_LOOPBACK)) > > + if (!(udp_port_flags[V4][dst].flags & PORT_LOCAL) || > > + (udp_port_flags[V4][dst].flags & PORT_LOOPBACK)) > > s_in.sin_addr.s_addr =3D htonl(INADDR_LOOPBACK); > > else > > s_in.sin_addr =3D c->ip4.addr_seen; > > } > > =20 > > debug("UDP from tap src=3D%hu dst=3D%hu, s=3D%d", > > - src, dst, udp_tap_map[V4][src].sock); > > - if ((s =3D udp_tap_map[V4][src].sock) < 0) { > > + src, dst, udp_tap_sock[V4][src].sock); > > + if ((s =3D udp_tap_sock[V4][src].sock) < 0) { > > struct in_addr bind_addr =3D IN4ADDR_ANY_INIT; > > union udp_epoll_ref uref =3D { > > .port =3D src, > > @@ -886,11 +885,11 @@ int udp_tap_handler(struct ctx *c, uint8_t pif, > > if (s < 0) > > return p->count - idx; > > =20 > > - udp_tap_map[V4][src].sock =3D s; > > - bitmap_set(udp_act[V4][UDP_ACT_TAP], src); > > + udp_tap_sock[V4][src].sock =3D s; > > + bitmap_set(udp_act[V4][UDP_ACT_TAP_SOCK], src); > > } > > =20 > > - udp_tap_map[V4][src].ts =3D now->tv_sec; > > + udp_tap_sock[V4][src].ts =3D now->tv_sec; > > } else { > > s_in6 =3D (struct sockaddr_in6) { > > .sin6_family =3D AF_INET6, > > @@ -907,10 +906,10 @@ int udp_tap_handler(struct ctx *c, uint8_t pif, > > s_in6.sin6_addr =3D c->ip6.dns_host; > > } else if (IN6_ARE_ADDR_EQUAL(daddr, &c->ip6.gw) && > > !c->no_map_gw) { > > - if (!(udp_tap_map[V6][dst].flags & PORT_LOCAL) || > > - (udp_tap_map[V6][dst].flags & PORT_LOOPBACK)) > > + if (!(udp_port_flags[V6][dst].flags & PORT_LOCAL) || > > + (udp_port_flags[V6][dst].flags & PORT_LOOPBACK)) > > s_in6.sin6_addr =3D in6addr_loopback; > > - else if (udp_tap_map[V6][dst].flags & PORT_GUA) > > + else if (udp_port_flags[V6][dst].flags & PORT_GUA) > > s_in6.sin6_addr =3D c->ip6.addr; > > else > > s_in6.sin6_addr =3D c->ip6.addr_seen; > > @@ -918,7 +917,7 @@ int udp_tap_handler(struct ctx *c, uint8_t pif, > > bind_addr =3D &c->ip6.addr_ll; > > } > > =20 > > - if ((s =3D udp_tap_map[V6][src].sock) < 0) { > > + if ((s =3D udp_tap_sock[V6][src].sock) < 0) { > > union udp_epoll_ref uref =3D { > > .v6 =3D 1, > > .port =3D src, > > @@ -939,11 +938,11 @@ int udp_tap_handler(struct ctx *c, uint8_t pif, > > if (s < 0) > > return p->count - idx; > > =20 > > - udp_tap_map[V6][src].sock =3D s; > > - bitmap_set(udp_act[V6][UDP_ACT_TAP], src); > > + udp_tap_sock[V6][src].sock =3D s; > > + bitmap_set(udp_act[V6][UDP_ACT_TAP_SOCK], src); > > } > > =20 > > - udp_tap_map[V6][src].ts =3D now->tv_sec; > > + udp_tap_sock[V6][src].ts =3D now->tv_sec; > > } > > =20 > > for (i =3D 0; i < (int)p->count - idx; i++) { > > @@ -1015,7 +1014,7 @@ int udp_sock_init(const struct ctx *c, int ns, sa= _family_t af, > > r4 =3D s =3D sock_l4(c, AF_INET, IPPROTO_UDP, addr, > > ifname, port, uref.u32); > > =20 > > - udp_tap_map[V4][uref.port].sock =3D s < 0 ? -1 : s; > > + udp_tap_sock[V4][uref.port].sock =3D s < 0 ? -1 : s; > > udp_splice_init[V4][port].sock =3D s < 0 ? -1 : s; > > } else { > > struct in_addr loopback =3D IN4ADDR_LOOPBACK_INIT; > > @@ -1033,7 +1032,7 @@ int udp_sock_init(const struct ctx *c, int ns, sa= _family_t af, > > r6 =3D s =3D sock_l4(c, AF_INET6, IPPROTO_UDP, addr, > > ifname, port, uref.u32); > > =20 > > - udp_tap_map[V6][uref.port].sock =3D s < 0 ? -1 : s; > > + udp_tap_sock[V6][uref.port].sock =3D s < 0 ? -1 : s; > > udp_splice_init[V6][port].sock =3D s < 0 ? -1 : s; > > } else { > > r6 =3D s =3D sock_l4(c, AF_INET6, IPPROTO_UDP, > > @@ -1086,32 +1085,37 @@ static void udp_splice_iov_init(void) > > static void udp_timer_one(struct ctx *c, int v6, enum udp_act_type typ= e, > > in_port_t port, const struct timespec *now) > > { > > - struct udp_splice_port *sp; > > - struct udp_tap_port *tp; > > + struct udp_bound_port *bp; > > + struct udp_port_flags *pf; > > int *sockp =3D NULL; > > =20 > > switch (type) { > > - case UDP_ACT_TAP: > > - tp =3D &udp_tap_map[v6 ? V6 : V4][port]; > > + case UDP_ACT_TAP_SOCK: > > + bp =3D &udp_tap_sock[v6 ? V6 : V4][port]; > > =20 > > - if (now->tv_sec - tp->ts > UDP_CONN_TIMEOUT) { > > - sockp =3D &tp->sock; > > - tp->flags =3D 0; > > - } > > + if (now->tv_sec - bp->ts > UDP_CONN_TIMEOUT) > > + sockp =3D &bp->sock; > > + > > + break; > > + case UDP_ACT_FLAGS: > > + pf =3D &udp_port_flags[v6 ? V6 : V4][port]; > > + > > + if (now->tv_sec - pf->ts > UDP_CONN_TIMEOUT) > > + pf->flags =3D 0; > > =20 > > break; > > case UDP_ACT_SPLICE_INIT: > > - sp =3D &udp_splice_init[v6 ? V6 : V4][port]; > > + bp =3D &udp_splice_init[v6 ? V6 : V4][port]; > > =20 > > - if (now->tv_sec - sp->ts > UDP_CONN_TIMEOUT) > > - sockp =3D &sp->sock; > > + if (now->tv_sec - bp->ts > UDP_CONN_TIMEOUT) > > + sockp =3D &bp->sock; > > =20 > > break; > > case UDP_ACT_SPLICE_NS: > > - sp =3D &udp_splice_ns[v6 ? V6 : V4][port]; > > + bp =3D &udp_splice_ns[v6 ? V6 : V4][port]; > > =20 > > - if (now->tv_sec - sp->ts > UDP_CONN_TIMEOUT) > > - sockp =3D &sp->sock; > > + if (now->tv_sec - bp->ts > UDP_CONN_TIMEOUT) > > + sockp =3D &bp->sock; > > =20 > > break; > > default: > > @@ -1141,7 +1145,7 @@ static void udp_port_rebind(struct ctx *c, bool o= utbound) > > =3D outbound ? c->udp.fwd_out.f.map : c->udp.fwd_in.f.map; > > const uint8_t *rmap > > =3D outbound ? c->udp.fwd_in.f.map : c->udp.fwd_out.f.map; > > - struct udp_splice_port (*socks)[NUM_PORTS] > > + struct udp_bound_port (*socks)[NUM_PORTS] > > =3D outbound ? udp_splice_ns : udp_splice_init; > > unsigned port; > > =20 >=20 --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --HHcA7ShsvVdaMtBX Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmXemqwACgkQzQJF27ox 2GdcSQ//fuw/TnwPhSJ5VIMFoEdsYKTLMz2KkircIX2vGTFB3VRnZR/lkP5tJusX ghkB+bYOPiinvsNB0NXCw6IeFeeKhQOnCfeXgo7wkaJ24+JNBk4qkSs2A+9450Ag LHdfhcJeZtNaMkXYE6u6OE5obzPdWuLXKC/CPSv9Xp23i0xrkqBWbwFLDcLl6z/o waFYlbgPjIi+JqUKOq2yw6S+REJxLH2lbdRdHn8dykS7t9BLb+OBvW9SWZktvsmO mINcNZgF959NFwp4ElWpE0YTwMxaxg10x3BNC+jjGGHgVbPGXfFC5Pd+thvS6cf+ 0RakNRX1Mc6+FJq/UJJCkYRCdORYQlg5XazHF6aitQv541rTfXfuqDngkGUm0fh7 QIoOFeDC+fcQappHLgleVnBZE4SvjzGaxOj0X4u1cdt2byDqn4TX/DUlJ833oF6i QpY8JEiKM1q7brY7iHP4Gbc17MQVeQRjh6NnsoLABepnm5XQhIqbAFcxrEW6wsc3 bXsl+4csOwVRXY7soMHkarEB+1RYVF7uZt3FP1DNnR6p/Zk4/pqKuINCHR0elxmz Z2zOFLebcvEd59wLjZPCCQ9xmf6SwGqWrffrNUgxffTL5z7DyUHL49GrVwnfGGrI 2kwEi7e8W92sL8pe2wvwCSZk6xIKvnaKvwwJPjwY861yhCxgZsQ= =wUje -----END PGP SIGNATURE----- --HHcA7ShsvVdaMtBX--