From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by passt.top (Postfix) with ESMTP id A7EF95A026F for ; Tue, 27 Feb 2024 12:57:39 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1709035058; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vaoNlcKlKzXOOKMjmEAsIjhjoeDoWkyA8DRO5r5pREg=; b=OF+2HStkyA9kK6EiSZgoCrRYHeQ68oAzxZEx+baGGGixhr+XZyJrSNzKLkFArtWqgg279V zAHo+zyxy6D62oYmDURUy40p6kfbMLfX+p3g9kCQvDrr1KJJTMhKpMTyZA0jAyUyOiGBGk Q+Rzq1T+jthHeSulHAWPBxhRxTEGJCo= Received: from mail-ej1-f72.google.com (mail-ej1-f72.google.com [209.85.218.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-541-YD3GbUhXM0GCzPXUxvbUaw-1; Tue, 27 Feb 2024 06:57:37 -0500 X-MC-Unique: YD3GbUhXM0GCzPXUxvbUaw-1 Received: by mail-ej1-f72.google.com with SMTP id a640c23a62f3a-a4314ed2638so124162666b.3 for ; Tue, 27 Feb 2024 03:57:36 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709035054; x=1709639854; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=vaoNlcKlKzXOOKMjmEAsIjhjoeDoWkyA8DRO5r5pREg=; b=UQnGWXFsdLQ4ShzD9rz2ha3Deg5jm8pW0Vp/nwph0Rnt1P3/5EmzA8+JM7+1e+NNgE WCVK+kQ1q9BI8jJcRkHrlR5670qnJWdrgJpJXJfdpzWQrKDU/mfkxCC8jUf9akAXabUi AaKdgyRKELWJ7iPEqPe/MO5IFabHmUGy5jmT8zbEn6tspe6NAvTbdNxJZZ8vJWv/FcJR N3LlAscTeXbzkQXBdaLW/AjC4ppcHrric/PwEqP6akBWW6tbNURfHXqqD/1uMgfo4SRt /PPUT6/FAOle/asXpWBczqr9/RlNFrQJQVKZoQen74mtmL+GLyYM/3y3KQi+w9ko62v1 oFyA== X-Gm-Message-State: AOJu0YxeH5fZJ4yyQeE37m7cFVen0P0Su3iR0MKvXAQjVNxSjiBC0T1D cHp7HHx0grb9e/ruFNhB8De8q9f7LfjDJ/J5s/7LJEXyNk69odehLwafadnRoZM5NA0Qx+B/Vls hP+qSioxDjevCQIUoc46pufzCQWo+WGLGLGh/1a+Wsel3DaivX2SDXD9wz/j7 X-Received: by 2002:a17:906:f854:b0:a43:21fb:d0db with SMTP id ks20-20020a170906f85400b00a4321fbd0dbmr4526322ejb.10.1709035054477; Tue, 27 Feb 2024 03:57:34 -0800 (PST) X-Google-Smtp-Source: AGHT+IEzFLVYszkRt9vPBPZi08BA7PcRjrWAUpz4Ee2/Q35tIFqLsFQWswu1SGM3YoN+wRLokehPmQ== X-Received: by 2002:a17:906:f854:b0:a43:21fb:d0db with SMTP id ks20-20020a170906f85400b00a4321fbd0dbmr4526313ejb.10.1709035054011; Tue, 27 Feb 2024 03:57:34 -0800 (PST) Received: from maya.cloud.tilaa.com (maya.cloud.tilaa.com. [164.138.29.33]) by smtp.gmail.com with ESMTPSA id o22-20020a1709061b1600b00a3f8cd4db25sm688235ejg.196.2024.02.27.03.57.33 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Tue, 27 Feb 2024 03:57:33 -0800 (PST) Date: Tue, 27 Feb 2024 12:56:59 +0100 From: Stefano Brivio To: David Gibson Subject: Re: [PATCH 4/7] udp: Separate bound sockets from flags Message-ID: <20240227125659.6505dae9@elisabeth> In-Reply-To: <20240221232115.1376333-5-david@gibson.dropbear.id.au> References: <20240221232115.1376333-1-david@gibson.dropbear.id.au> <20240221232115.1376333-5-david@gibson.dropbear.id.au> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.36; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: 4BM7SRWLQE3XBT7WSRB6EASXT5G5Y7SG X-Message-ID-Hash: 4BM7SRWLQE3XBT7WSRB6EASXT5G5Y7SG X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Thu, 22 Feb 2024 10:21:12 +1100 David Gibson wrote: > struct udp_tap_port contains both a socket fd and a set of flags. But, > confusingly, these fields are not actually related to each other at all. > > udp_tap_map[?][port].sock contains a host UDP socket bound to 'port' (or > -1 if we haven't opened such a socket). That is, it is indexed by the > bound port == the host side forwarding port == destination for host->guest > packets == source for guest->host packets. > > udp_tap_map[?][port].flags, however, contains flags which control how we > direct datagrams bound for 'port' from the tap interface. That is, it is > indexed by the destination for guest->host packets == source for > host->guest packets == host side endpoint port. > > Worse both aspects of this share the ts field used to control expiry. When > we update this because we've used the socket in slot 'port' we could be > clobbering the flags timestamp for an unrelated flow of packets that > happens to have the same endpoint port as our forwarding port. > > Clarify this by moving the flags out into a separate array, with its own > timestamp and expiry path. As a bonus this means that we can now use the > same data structure for tracking the socket fds of "tap" and "splice" path > sockets. > > Signed-off-by: David Gibson > --- > udp.c | 132 ++++++++++++++++++++++++++++++---------------------------- > 1 file changed, 68 insertions(+), 64 deletions(-) > > diff --git a/udp.c b/udp.c > index 4200f849..97518d92 100644 > --- a/udp.c > +++ b/udp.c > @@ -121,40 +121,39 @@ > #define UDP_MAX_FRAMES 32 /* max # of frames to receive at once */ > > /** > - * struct udp_tap_port - Port tracking based on tap-facing source port > - * @sock: Socket bound to source port used as index > + * struct udp_port_flags - Translation options by remote Port tracking based on tap-facing source port I'm not sure what you mean by this -- perhaps "Address translation options for bound source ports"? > * @flags: Flags for local bind, loopback address/unicast address as source > - * @ts: Activity timestamp from tap, used for socket aging > + * @ts: Activity timestamp > */ > -struct udp_tap_port { > - int sock; > - uint8_t flags; > +struct udp_port_flags { > #define PORT_LOCAL BIT(0) > #define PORT_LOOPBACK BIT(1) > #define PORT_GUA BIT(2) > - > + uint8_t flags; > time_t ts; > }; > +static struct udp_port_flags udp_port_flags[IP_VERSIONS][NUM_PORTS]; > > /** > - * struct udp_splice_port - Bound socket for spliced communication > - * @sock: Socket bound to index port > + * struct udp_bound_port - Track bound UDP sockets > + * @sock: Socket bound to index port (or -1) > * @ts: Activity timestamp > */ > -struct udp_splice_port { > +struct udp_bound_port { > int sock; > time_t ts; > }; > > -/* Port tracking, arrays indexed by packet source port (host order) */ > -static struct udp_tap_port udp_tap_map [IP_VERSIONS][NUM_PORTS]; > +/* Bound sockets indexed by bound port (host order) */ > +static struct udp_bound_port udp_tap_sock [IP_VERSIONS][NUM_PORTS]; > > -/* "Spliced" sockets indexed by bound port (host order) */ > -static struct udp_splice_port udp_splice_ns [IP_VERSIONS][NUM_PORTS]; > -static struct udp_splice_port udp_splice_init[IP_VERSIONS][NUM_PORTS]; > +/* Bound sockets for splicing indexed by bound port (host order) */ > +static struct udp_bound_port udp_splice_ns [IP_VERSIONS][NUM_PORTS]; > +static struct udp_bound_port udp_splice_init [IP_VERSIONS][NUM_PORTS]; > > enum udp_act_type { > - UDP_ACT_TAP, > + UDP_ACT_TAP_SOCK, > + UDP_ACT_FLAGS, I'm having a hard time reviewing the rest because of this, I think. Earlier, UDP_ACT_TAP meant that the activity timestamp was observed on a non-spliced port. Now, we have UDP_ACT_FLAGS which seems to mean that activity was seen in the direction of (to) the "tap" side, and UDP_ACT_TAP_SOCK meaning that activity was seen in the direction of (to) the socket. But that's somehow the opposite of UDP_ACT_SPLICE_NS and UDP_ACT_SPLICE_INIT -- even though they don't actually indicate activity, rather the fact that activity timestamps refer to sockets that originated in (*from*) the target namespace, or in the initial namespace. Anyway, I'm not quite sure: why do we need two separate flags for "tap" (from/to) activity? I mean, as long as we observe activity on a non-spliced flow, we'll update the related timestamp, and we make sure the activity flag is set. Can't it simply be UDP_ACT_TAP? The rest of the series looks good to me. > UDP_ACT_SPLICE_NS, > UDP_ACT_SPLICE_INIT, > UDP_ACT_TYPE_MAX, > @@ -246,7 +245,7 @@ void udp_portmap_clear(void) > unsigned i; > > for (i = 0; i < NUM_PORTS; i++) { > - udp_tap_map[V4][i].sock = udp_tap_map[V6][i].sock = -1; > + udp_tap_sock[V4][i].sock = udp_tap_sock[V6][i].sock = -1; > udp_splice_ns[V4][i].sock = udp_splice_ns[V6][i].sock = -1; > udp_splice_init[V4][i].sock = udp_splice_init[V6][i].sock = -1; > } > @@ -393,16 +392,16 @@ int udp_splice_new(const struct ctx *c, int v6, in_port_t src, bool ns) > union epoll_ref ref = { .type = EPOLL_TYPE_UDP, > .udp = { .splice = true, .v6 = v6, .port = src } > }; > - struct udp_splice_port *sp; > + struct udp_bound_port *bp; > int act, s; > > if (ns) { > ref.udp.pif = PIF_SPLICE; > - sp = &udp_splice_ns[v6 ? V6 : V4][src]; > + bp = &udp_splice_ns[v6 ? V6 : V4][src]; > act = UDP_ACT_SPLICE_NS; > } else { > ref.udp.pif = PIF_HOST; > - sp = &udp_splice_init[v6 ? V6 : V4][src]; > + bp = &udp_splice_init[v6 ? V6 : V4][src]; > act = UDP_ACT_SPLICE_INIT; > } > > @@ -437,7 +436,7 @@ int udp_splice_new(const struct ctx *c, int v6, in_port_t src, bool ns) > goto fail; > } > > - sp->sock = s; > + bp->sock = s; > bitmap_set(udp_act[v6 ? V6 : V4][act], src); > > ev.data.u64 = ref.u64; > @@ -601,15 +600,15 @@ static size_t udp_update_hdr4(const struct ctx *c, int n, in_port_t dstport, > } else if (IN4_IS_ADDR_LOOPBACK(&b->s_in.sin_addr) || > IN4_ARE_ADDR_EQUAL(&b->s_in.sin_addr, &c->ip4.addr_seen)) { > b->iph.saddr = c->ip4.gw.s_addr; > - udp_tap_map[V4][src_port].ts = now->tv_sec; > - udp_tap_map[V4][src_port].flags |= PORT_LOCAL; > + udp_port_flags[V4][src_port].ts = now->tv_sec; > + udp_port_flags[V4][src_port].flags |= PORT_LOCAL; > > if (IN4_ARE_ADDR_EQUAL(&b->s_in.sin_addr.s_addr, &c->ip4.addr_seen)) > - udp_tap_map[V4][src_port].flags &= ~PORT_LOOPBACK; > + udp_port_flags[V4][src_port].flags &= ~PORT_LOOPBACK; > else > - udp_tap_map[V4][src_port].flags |= PORT_LOOPBACK; > + udp_port_flags[V4][src_port].flags |= PORT_LOOPBACK; > > - bitmap_set(udp_act[V4][UDP_ACT_TAP], src_port); > + bitmap_set(udp_act[V4][UDP_ACT_FLAGS], src_port); > } else { > b->iph.saddr = b->s_in.sin_addr.s_addr; > } > @@ -664,20 +663,20 @@ static size_t udp_update_hdr6(const struct ctx *c, int n, in_port_t dstport, > else > b->ip6h.saddr = c->ip6.addr_ll; > > - udp_tap_map[V6][src_port].ts = now->tv_sec; > - udp_tap_map[V6][src_port].flags |= PORT_LOCAL; > + udp_port_flags[V6][src_port].ts = now->tv_sec; > + udp_port_flags[V6][src_port].flags |= PORT_LOCAL; > > if (IN6_IS_ADDR_LOOPBACK(src)) > - udp_tap_map[V6][src_port].flags |= PORT_LOOPBACK; > + udp_port_flags[V6][src_port].flags |= PORT_LOOPBACK; > else > - udp_tap_map[V6][src_port].flags &= ~PORT_LOOPBACK; > + udp_port_flags[V6][src_port].flags &= ~PORT_LOOPBACK; > > if (IN6_ARE_ADDR_EQUAL(src, &c->ip6.addr)) > - udp_tap_map[V6][src_port].flags |= PORT_GUA; > + udp_port_flags[V6][src_port].flags |= PORT_GUA; > else > - udp_tap_map[V6][src_port].flags &= ~PORT_GUA; > + udp_port_flags[V6][src_port].flags &= ~PORT_GUA; > > - bitmap_set(udp_act[V6][UDP_ACT_TAP], src_port); > + bitmap_set(udp_act[V6][UDP_ACT_FLAGS], src_port); > } else { > b->ip6h.daddr = c->ip6.addr_seen; > b->ip6h.saddr = b->s_in6.sin6_addr; > @@ -857,16 +856,16 @@ int udp_tap_handler(struct ctx *c, uint8_t pif, > s_in.sin_addr = c->ip4.dns_host; > } else if (IN4_ARE_ADDR_EQUAL(&s_in.sin_addr, &c->ip4.gw) && > !c->no_map_gw) { > - if (!(udp_tap_map[V4][dst].flags & PORT_LOCAL) || > - (udp_tap_map[V4][dst].flags & PORT_LOOPBACK)) > + if (!(udp_port_flags[V4][dst].flags & PORT_LOCAL) || > + (udp_port_flags[V4][dst].flags & PORT_LOOPBACK)) > s_in.sin_addr.s_addr = htonl(INADDR_LOOPBACK); > else > s_in.sin_addr = c->ip4.addr_seen; > } > > debug("UDP from tap src=%hu dst=%hu, s=%d", > - src, dst, udp_tap_map[V4][src].sock); > - if ((s = udp_tap_map[V4][src].sock) < 0) { > + src, dst, udp_tap_sock[V4][src].sock); > + if ((s = udp_tap_sock[V4][src].sock) < 0) { > struct in_addr bind_addr = IN4ADDR_ANY_INIT; > union udp_epoll_ref uref = { > .port = src, > @@ -886,11 +885,11 @@ int udp_tap_handler(struct ctx *c, uint8_t pif, > if (s < 0) > return p->count - idx; > > - udp_tap_map[V4][src].sock = s; > - bitmap_set(udp_act[V4][UDP_ACT_TAP], src); > + udp_tap_sock[V4][src].sock = s; > + bitmap_set(udp_act[V4][UDP_ACT_TAP_SOCK], src); > } > > - udp_tap_map[V4][src].ts = now->tv_sec; > + udp_tap_sock[V4][src].ts = now->tv_sec; > } else { > s_in6 = (struct sockaddr_in6) { > .sin6_family = AF_INET6, > @@ -907,10 +906,10 @@ int udp_tap_handler(struct ctx *c, uint8_t pif, > s_in6.sin6_addr = c->ip6.dns_host; > } else if (IN6_ARE_ADDR_EQUAL(daddr, &c->ip6.gw) && > !c->no_map_gw) { > - if (!(udp_tap_map[V6][dst].flags & PORT_LOCAL) || > - (udp_tap_map[V6][dst].flags & PORT_LOOPBACK)) > + if (!(udp_port_flags[V6][dst].flags & PORT_LOCAL) || > + (udp_port_flags[V6][dst].flags & PORT_LOOPBACK)) > s_in6.sin6_addr = in6addr_loopback; > - else if (udp_tap_map[V6][dst].flags & PORT_GUA) > + else if (udp_port_flags[V6][dst].flags & PORT_GUA) > s_in6.sin6_addr = c->ip6.addr; > else > s_in6.sin6_addr = c->ip6.addr_seen; > @@ -918,7 +917,7 @@ int udp_tap_handler(struct ctx *c, uint8_t pif, > bind_addr = &c->ip6.addr_ll; > } > > - if ((s = udp_tap_map[V6][src].sock) < 0) { > + if ((s = udp_tap_sock[V6][src].sock) < 0) { > union udp_epoll_ref uref = { > .v6 = 1, > .port = src, > @@ -939,11 +938,11 @@ int udp_tap_handler(struct ctx *c, uint8_t pif, > if (s < 0) > return p->count - idx; > > - udp_tap_map[V6][src].sock = s; > - bitmap_set(udp_act[V6][UDP_ACT_TAP], src); > + udp_tap_sock[V6][src].sock = s; > + bitmap_set(udp_act[V6][UDP_ACT_TAP_SOCK], src); > } > > - udp_tap_map[V6][src].ts = now->tv_sec; > + udp_tap_sock[V6][src].ts = now->tv_sec; > } > > for (i = 0; i < (int)p->count - idx; i++) { > @@ -1015,7 +1014,7 @@ int udp_sock_init(const struct ctx *c, int ns, sa_family_t af, > r4 = s = sock_l4(c, AF_INET, IPPROTO_UDP, addr, > ifname, port, uref.u32); > > - udp_tap_map[V4][uref.port].sock = s < 0 ? -1 : s; > + udp_tap_sock[V4][uref.port].sock = s < 0 ? -1 : s; > udp_splice_init[V4][port].sock = s < 0 ? -1 : s; > } else { > struct in_addr loopback = IN4ADDR_LOOPBACK_INIT; > @@ -1033,7 +1032,7 @@ int udp_sock_init(const struct ctx *c, int ns, sa_family_t af, > r6 = s = sock_l4(c, AF_INET6, IPPROTO_UDP, addr, > ifname, port, uref.u32); > > - udp_tap_map[V6][uref.port].sock = s < 0 ? -1 : s; > + udp_tap_sock[V6][uref.port].sock = s < 0 ? -1 : s; > udp_splice_init[V6][port].sock = s < 0 ? -1 : s; > } else { > r6 = s = sock_l4(c, AF_INET6, IPPROTO_UDP, > @@ -1086,32 +1085,37 @@ static void udp_splice_iov_init(void) > static void udp_timer_one(struct ctx *c, int v6, enum udp_act_type type, > in_port_t port, const struct timespec *now) > { > - struct udp_splice_port *sp; > - struct udp_tap_port *tp; > + struct udp_bound_port *bp; > + struct udp_port_flags *pf; > int *sockp = NULL; > > switch (type) { > - case UDP_ACT_TAP: > - tp = &udp_tap_map[v6 ? V6 : V4][port]; > + case UDP_ACT_TAP_SOCK: > + bp = &udp_tap_sock[v6 ? V6 : V4][port]; > > - if (now->tv_sec - tp->ts > UDP_CONN_TIMEOUT) { > - sockp = &tp->sock; > - tp->flags = 0; > - } > + if (now->tv_sec - bp->ts > UDP_CONN_TIMEOUT) > + sockp = &bp->sock; > + > + break; > + case UDP_ACT_FLAGS: > + pf = &udp_port_flags[v6 ? V6 : V4][port]; > + > + if (now->tv_sec - pf->ts > UDP_CONN_TIMEOUT) > + pf->flags = 0; > > break; > case UDP_ACT_SPLICE_INIT: > - sp = &udp_splice_init[v6 ? V6 : V4][port]; > + bp = &udp_splice_init[v6 ? V6 : V4][port]; > > - if (now->tv_sec - sp->ts > UDP_CONN_TIMEOUT) > - sockp = &sp->sock; > + if (now->tv_sec - bp->ts > UDP_CONN_TIMEOUT) > + sockp = &bp->sock; > > break; > case UDP_ACT_SPLICE_NS: > - sp = &udp_splice_ns[v6 ? V6 : V4][port]; > + bp = &udp_splice_ns[v6 ? V6 : V4][port]; > > - if (now->tv_sec - sp->ts > UDP_CONN_TIMEOUT) > - sockp = &sp->sock; > + if (now->tv_sec - bp->ts > UDP_CONN_TIMEOUT) > + sockp = &bp->sock; > > break; > default: > @@ -1141,7 +1145,7 @@ static void udp_port_rebind(struct ctx *c, bool outbound) > = outbound ? c->udp.fwd_out.f.map : c->udp.fwd_in.f.map; > const uint8_t *rmap > = outbound ? c->udp.fwd_in.f.map : c->udp.fwd_out.f.map; > - struct udp_splice_port (*socks)[NUM_PORTS] > + struct udp_bound_port (*socks)[NUM_PORTS] > = outbound ? udp_splice_ns : udp_splice_init; > unsigned port; > -- Stefano