From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from gandalf.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id 6085D5A0275 for ; Fri, 8 Mar 2024 01:49:44 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202312; t=1709858977; bh=4IblH6b7EJhe0GCM/GlfsV/bkrmHsD5YgGqvhDWFs7o=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=oi/QFnfdsZbn/h9noA52zNJnkSMger/WVZ3BWnyxu/VAyrUpA2nWn2VNbVTqgGjmB Nirar5to1x0mL+6kXzxn8Cf5zPduI/wL30/8odapSqtKMBlQVzM9oE3vlsKLmqra5L 8fkGwayRSTHxwwYHuaQm1fLrm81oUufEhYo8IWZELo7d5HWzXw9i1cUXLXQn52niLz eDWzGBXrzyS/kHI41FvTFB7oocke2gx+9g9ivIzfFHOSwKAdF5M6mQjZIPg4ONmy4q WGkSDXt+7Ub5hjUkW3QsAGW0MsJqiXqGFwW951mEDz+3WeSfGzIz+NYyEcN6Lexorw 5HNqyoaHXrO8w== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4TrSHP631kz4wcK; Fri, 8 Mar 2024 11:49:37 +1100 (AEDT) Date: Fri, 8 Mar 2024 11:48:55 +1100 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH 1/3] icmp: Store ping socket information in flow table Message-ID: References: <20240229041534.2573559-1-david@gibson.dropbear.id.au> <20240229041534.2573559-2-david@gibson.dropbear.id.au> <20240307095315.36eeef59@elisabeth> <20240308002544.59c2c09e@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="mlGw8iMeeCm0EWtL" Content-Disposition: inline In-Reply-To: <20240308002544.59c2c09e@elisabeth> Message-ID-Hash: 3XBKECBTWXE2H5UNYHL44B3D5W2GCF6Q X-Message-ID-Hash: 3XBKECBTWXE2H5UNYHL44B3D5W2GCF6Q X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --mlGw8iMeeCm0EWtL Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Mar 08, 2024 at 12:25:44AM +0100, Stefano Brivio wrote: > On Thu, 7 Mar 2024 21:24:28 +1100 > David Gibson wrote: >=20 > > On Thu, Mar 07, 2024 at 09:53:15AM +0100, Stefano Brivio wrote: > > > Apologies for the delay, I'm still reviewing 3/3, but I have a questi= on > > > meanwhile: > > >=20 > > > On Thu, 29 Feb 2024 15:15:32 +1100 > > > David Gibson wrote: > > > =20 > > > > Currently icmp_id_map[][] stores information about ping sockets in a > > > > bespoke structure. Move the same information into new types of flow > > > > in the flow table. To match that change, replace the existing ICMP > > > > timer with a flow-based timer for expiring ping sockets. This has = the > > > > advantage that we only need to scan the active flows, not all possi= ble > > > > ids. > > > >=20 > > > > We convert icmp_id_map[][] to point to the flow table entries, rath= er > > > > than containing its own information. We do still use that array for > > > > locating the right ping flows, rather than using a "flow native" fo= rm > > > > of lookup for the time being. > > > >=20 > > > > Signed-off-by: David Gibson > > > > --- > > > > Makefile | 6 +-- > > > > flow.c | 9 ++++ > > > > flow.h | 4 ++ > > > > flow_table.h | 2 + > > > > icmp.c | 143 +++++++++++++++++++++++------------------------= ---- > > > > icmp.h | 2 +- > > > > icmp_flow.h | 31 +++++++++++ > > > > passt.c | 5 -- > > > > 8 files changed, 115 insertions(+), 87 deletions(-) > > > > create mode 100644 icmp_flow.h > > > >=20 > > > > diff --git a/Makefile b/Makefile > > > > index 2d6a5155..47fc5448 100644 > > > > --- a/Makefile > > > > +++ b/Makefile > > > > @@ -54,9 +54,9 @@ SRCS =3D $(PASST_SRCS) $(QRAP_SRCS) > > > > MANPAGES =3D passt.1 pasta.1 qrap.1 > > > > =20 > > > > PASST_HEADERS =3D arch.h arp.h checksum.h conf.h dhcp.h dhcpv6.h f= low.h fwd.h \ > > > > - flow_table.h icmp.h inany.h iov.h isolation.h lineread.h log.h nd= p.h \ > > > > - netlink.h packet.h passt.h pasta.h pcap.h pif.h siphash.h tap.h t= cp.h \ > > > > - tcp_conn.h tcp_splice.h udp.h util.h > > > > + flow_table.h icmp.h icmp_flow.h inany.h iov.h isolation.h linerea= d.h \ > > > > + log.h ndp.h netlink.h packet.h passt.h pasta.h pcap.h pif.h sipha= sh.h \ > > > > + tap.h tcp.h tcp_conn.h tcp_splice.h udp.h util.h > > > > HEADERS =3D $(PASST_HEADERS) seccomp.h > > > > =20 > > > > C :=3D \#include \nstruct tcp_info x =3D { .tcpi_snd_= wnd =3D 0 }; > > > > diff --git a/flow.c b/flow.c > > > > index d7974d59..5835d6c0 100644 > > > > --- a/flow.c > > > > +++ b/flow.c > > > > @@ -21,6 +21,8 @@ const char *flow_type_str[] =3D { > > > > [FLOW_TYPE_NONE] =3D "", > > > > [FLOW_TCP] =3D "TCP connection", > > > > [FLOW_TCP_SPLICE] =3D "TCP connection (spliced)", > > > > + [FLOW_PING4] =3D "ICMP ping sequence", > > > > + [FLOW_PING6] =3D "ICMPv6 ping sequence", > > > > }; > > > > static_assert(ARRAY_SIZE(flow_type_str) =3D=3D FLOW_NUM_TYPES, > > > > "flow_type_str[] doesn't match enum flow_type"); > > > > @@ -28,6 +30,8 @@ static_assert(ARRAY_SIZE(flow_type_str) =3D=3D FL= OW_NUM_TYPES, > > > > const uint8_t flow_proto[] =3D { > > > > [FLOW_TCP] =3D IPPROTO_TCP, > > > > [FLOW_TCP_SPLICE] =3D IPPROTO_TCP, > > > > + [FLOW_PING4] =3D IPPROTO_ICMP, > > > > + [FLOW_PING6] =3D IPPROTO_ICMPV6, > > > > }; > > > > static_assert(ARRAY_SIZE(flow_proto) =3D=3D FLOW_NUM_TYPES, > > > > "flow_proto[] doesn't match enum flow_type"); > > > > @@ -294,6 +298,11 @@ void flow_defer_handler(const struct ctx *c, c= onst struct timespec *now) > > > > if (!closed && timer) > > > > tcp_splice_timer(c, flow); > > > > break; > > > > + case FLOW_PING4: > > > > + case FLOW_PING6: > > > > + if (timer) > > > > + closed =3D icmp_ping_timer(c, flow, now); > > > > + break; > > > > default: > > > > /* Assume other flow types don't need any handling */ > > > > ; > > > > diff --git a/flow.h b/flow.h > > > > index 8b66751b..c943c441 100644 > > > > --- a/flow.h > > > > +++ b/flow.h > > > > @@ -19,6 +19,10 @@ enum flow_type { > > > > FLOW_TCP, > > > > /* A TCP connection between a host socket and ns socket */ > > > > FLOW_TCP_SPLICE, > > > > + /* ICMP echo requests from guest to host and matching replies bac= k */ > > > > + FLOW_PING4, > > > > + /* ICMPv6 echo requests from guest to host and matching replies b= ack */ > > > > + FLOW_PING6, > > > > =20 > > > > FLOW_NUM_TYPES, > > > > }; > > > > diff --git a/flow_table.h b/flow_table.h > > > > index eecf8844..b7e5529a 100644 > > > > --- a/flow_table.h > > > > +++ b/flow_table.h > > > > @@ -8,6 +8,7 @@ > > > > #define FLOW_TABLE_H > > > > =20 > > > > #include "tcp_conn.h" > > > > +#include "icmp_flow.h" > > > > =20 > > > > /** > > > > * struct flow_free_cluster - Information about a cluster of free = entries > > > > @@ -33,6 +34,7 @@ union flow { > > > > struct flow_free_cluster free; > > > > struct tcp_tap_conn tcp; > > > > struct tcp_splice_conn tcp_splice; > > > > + struct icmp_ping_flow ping; > > > > }; > > > > =20 > > > > /* Global Flow Table */ > > > > diff --git a/icmp.c b/icmp.c > > > > index fb2fcafc..1caf791d 100644 > > > > --- a/icmp.c > > > > +++ b/icmp.c > > > > @@ -39,24 +39,17 @@ > > > > #include "siphash.h" > > > > #include "inany.h" > > > > #include "icmp.h" > > > > +#include "flow_table.h" > > > > =20 > > > > #define ICMP_ECHO_TIMEOUT 60 /* s, timeout for ICMP socket activit= y */ > > > > #define ICMP_NUM_IDS (1U << 16) > > > > =20 > > > > -/** > > > > - * struct icmp_id_sock - Tracking information for single ICMP echo= identifier > > > > - * @sock: Bound socket for identifier > > > > - * @seq: Last sequence number sent to tap, host order, -1: not sen= t yet > > > > - * @ts: Last associated activity from tap, seconds > > > > - */ > > > > -struct icmp_id_sock { > > > > - int sock; > > > > - int seq; > > > > - time_t ts; > > > > -}; > > > > +/* Sides of a flow as we use them for ping streams */ > > > > +#define SOCKSIDE 0 > > > > +#define TAPSIDE 1 > > > > =20 > > > > /* Indexed by ICMP echo identifier */ > > > > -static struct icmp_id_sock icmp_id_map[IP_VERSIONS][ICMP_NUM_IDS]; > > > > +static struct icmp_ping_flow *icmp_id_map[IP_VERSIONS][ICMP_NUM_ID= S]; > > > > =20 > > > > /** > > > > * icmp_sock_handler() - Handle new data from ICMP or ICMPv6 socket > > > > @@ -66,8 +59,8 @@ static struct icmp_id_sock icmp_id_map[IP_VERSION= S][ICMP_NUM_IDS]; > > > > */ > > > > void icmp_sock_handler(const struct ctx *c, sa_family_t af, union = epoll_ref ref) > > > > { > > > > - struct icmp_id_sock *const id_sock =3D af =3D=3D AF_INET > > > > - ? &icmp_id_map[V4][ref.icmp.id] : &icmp_id_map[V6][ref.icmp.id]; > > > > + struct icmp_ping_flow *pingf =3D af =3D=3D AF_INET > > > > + ? icmp_id_map[V4][ref.icmp.id] : icmp_id_map[V6][ref.icmp.id]; > > > > const char *const pname =3D af =3D=3D AF_INET ? "ICMP" : "ICMPv6"; > > > > union sockaddr_inany sr; > > > > socklen_t sl =3D sizeof(sr); > > > > @@ -78,6 +71,8 @@ void icmp_sock_handler(const struct ctx *c, sa_fa= mily_t af, union epoll_ref ref) > > > > if (c->no_icmp) > > > > return; > > > > =20 > > > > + ASSERT(pingf); > > > > + > > > > n =3D recvfrom(ref.fd, buf, sizeof(buf), 0, &sr.sa, &sl); > > > > if (n < 0) { > > > > warn("%s: recvfrom() error on ping socket: %s", > > > > @@ -112,10 +107,10 @@ void icmp_sock_handler(const struct ctx *c, s= a_family_t af, union epoll_ref ref) > > > > =20 > > > > /* In PASTA mode, we'll get any reply we send, discard them. */ > > > > if (c->mode =3D=3D MODE_PASTA) { > > > > - if (id_sock->seq =3D=3D seq) > > > > + if (pingf->seq =3D=3D seq) > > > > return; > > > > =20 > > > > - id_sock->seq =3D seq; > > > > + pingf->seq =3D seq; > > > > } > > > > =20 > > > > debug("%s: echo reply to tap, ID: %"PRIu16", seq: %"PRIu16, pname, > > > > @@ -132,16 +127,22 @@ unexpected: > > > > } > > > > =20 > > > > /** > > > > - * icmp_ping_close() - Close and clean up a ping socket > > > > + * icmp_ping_close() - Close and clean up a ping flow > > > > * @c: Execution context > > > > - * @id_sock: Socket number and other info > > > > + * @pingf: ping flow entry to close > > > > */ > > > > -static void icmp_ping_close(const struct ctx *c, struct icmp_id_so= ck *id_sock) > > > > +static void icmp_ping_close(const struct ctx *c, > > > > + const struct icmp_ping_flow *pingf) > > > > { > > > > - epoll_ctl(c->epollfd, EPOLL_CTL_DEL, id_sock->sock, NULL); > > > > - close(id_sock->sock); > > > > - id_sock->sock =3D -1; > > > > - id_sock->seq =3D -1; > > > > + uint16_t id =3D pingf->id; > > > > + > > > > + epoll_ctl(c->epollfd, EPOLL_CTL_DEL, pingf->sock, NULL); > > > > + close(pingf->sock); > > > > + > > > > + if (pingf->f.type =3D=3D FLOW_PING4) > > > > + icmp_id_map[V4][id] =3D NULL; > > > > + else > > > > + icmp_id_map[V6][id] =3D NULL; > > > > } > > > > =20 > > > > /** > > > > @@ -151,17 +152,27 @@ static void icmp_ping_close(const struct ctx = *c, struct icmp_id_sock *id_sock) > > > > * @af: Address family, AF_INET or AF_INET6 > > > > * @id: ICMP id for the new socket > > > > * > > > > - * Return: Newly opened ping socket fd, or -1 on failure > > > > + * Return: Newly opened ping flow, or NULL on failure > > > > */ > > > > -static int icmp_ping_new(const struct ctx *c, struct icmp_id_sock = *id_sock, > > > > - sa_family_t af, uint16_t id) > > > > +static struct icmp_ping_flow *icmp_ping_new(const struct ctx *c, > > > > + struct icmp_ping_flow **id_sock, =20 > > >=20 > > > I'm not quite sure why we still need id_sock passed as parameter, and > > > what it's supposed to contain (you haven't updated the function > > > comment). =20 > >=20 > > Oops. >=20 > I can change it according to your comment below: >=20 > * @id_sock: Pointer to ping flow entry slot in icmp_id_map[] to update >=20 > ? Sounds good. > > > Now that all the information is encapsulated in the flow, and you > > > return the new flow, with a trivial change in icmp_tap_handler(), > > > couldn't we just drop id_sock here? =20 > >=20 > > No, because this is only the "partial" flow table implementation, > > lacking the address information in the common part of the flow. > > Without that, while the information on a single flow is in the flow > > table, we still need the icmp_id_map[] arrays to *find* the relevant > > flow. id_sock passed here is the relevant "slot" in those arrays, so > > we can update them to index the new flow (that's why it's a (ping_flow > > **) not a (ping_flow *)). >=20 > Ah, right, of course, I thought the caller could/should take care of > updating the slot, too, but now I understand the commit message, thanks. >=20 --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --mlGw8iMeeCm0EWtL Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmXqYHMACgkQzQJF27ox 2Gdmrg/7BN7+GvTi+jDY1YobSYG9D8vEujOJ6yXP5SYOdJTsoSOq5Ohy6xCen5DD 6JPXcCVScgAbLWy0t+B/XaLSaCdW5Kmk271+TEnu8/JfERF3bHkZh6hhGr86iUeH pgy91mm8/p9ejtp2RRqO2ODAuhb+0c/BR7UPSCgOY1TyLjCBxSQFVdxNmNA7Eb+U WCaU5N+9XbvAiTRs4ppV+ihHoCKpj7/HLbwCkYFqvQl1ntIQPLiwEamtrZnBLX2v 46YbRLVkeiYsUcZaFMiFjs6nAaLOiotLFy54y+96Mq2l2yd7vZLAJ+3kIie4MSlj i8+Me3o/t9CJ+zDoPp9pqRy0nMGcHkz3Bpv5qlNxzwBvA2p+UQv4C8spFt8bKa4q Ewnngf0ijvV9BlVOKTH77RkIn8JZgT8Tnh3BEl27A0TOFp3XuOxuDJIoa2U8wxDO IkxVQ4FCHf5wJfIDy6wDs0eUcZC3Mi1vXcND2tgd8WZH22fD812z8EJ7tC1WnIUn O5tnBBz5HcMdEMae1mLGxXi2UI8aCES+1VJSdO2/LyJWArhWsCiqvbSliSBMpNXt 01HXCgllFVL9lOVgkdLsbXzIDpAzqsdMUCAzHk2UTwgFR0v3CP+dlteDo/X55wli SLnQNne2iTbFvnHNxEpdUWOQCJbHz36ZBkmV6OWhUJdbWdZ+CcI= =NRZK -----END PGP SIGNATURE----- --mlGw8iMeeCm0EWtL--