From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from gandalf.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id D17A85A026F for ; Mon, 18 Mar 2024 06:47:47 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202312; t=1710740860; bh=tGmuO4/4l8kO7nNVICozKL32YyJRryd1Dkc1fTJg+54=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=EhKmfhIoJGoBNnO0fz+5dfPYhnuxY5XGi/j7NE7A7ooPY/hdtCZcMICggi2d5TG1s v7U4D7N8xs0d5swBLIbX87VPLWD69QWgymCbQe1Tp4zXrJ0IE33O1mCnm3ZrU6DnCT TwLAeeZsuXwD6RfyuUqmvyw6mTzCUGqzzfkJiV//5IdHaYu5/10Hw0iKS5MWInSGXN j7Fb96efOumJ08dZ6BA5p0WtbXgpej7Te38ZqTsRPdaPaeZQyTNGDzAhnSFptIqrB9 qhAXHHDRvp6U5ryv0Uw+6M42txUqr6Nm/5z6JNvz6lKQm79f3EKVJKZdA9kIsTRn2+ /Gp+UhFaVbu3w== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4TykQh18wKz4wck; Mon, 18 Mar 2024 16:47:40 +1100 (AEDT) Date: Mon, 18 Mar 2024 16:47:34 +1100 From: David Gibson To: Laurent Vivier Subject: Re: [RFC v2] tcp: Replace TCP buffer structure by an iovec array Message-ID: References: <20240315175119.1618550-1-lvivier@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="0EPG35Ou21ymlSNW" Content-Disposition: inline In-Reply-To: <20240315175119.1618550-1-lvivier@redhat.com> Message-ID-Hash: QLEYM2PKDBDPS5BCHPRQ4IKCGG4QUV5I X-Message-ID-Hash: QLEYM2PKDBDPS5BCHPRQ4IKCGG4QUV5I X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --0EPG35Ou21ymlSNW Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Mar 15, 2024 at 06:51:19PM +0100, Laurent Vivier wrote: > To be able to provide pointers to TCP headers and IP headers without > worrying about alignment in the structure, split the structure into > several arrays and point to each part of the frame using an iovec array. >=20 > Using iovec also allows us to simply ignore the first entry when the > vnet length header is not needed. >=20 > Signed-off-by: Laurent Vivier > --- >=20 > Notes: > v2: > - rebased on top of David's series > "Some improvements to the tap send path" Thanks, and sorry for the disruption. > - no performance improvement anymore Well, that's unfortunate. I think we need to drill down and find out what actually made the difference here, since it's a pretty big change =66rom something rather subtle. > - remove the iovec functions of v1 introduced in tap.c to use new > functions from David > - don't use an array of array of iovec Huh. Is there a specific reason for that? I actually changed *to* using a 2d array in my in-progress patches which do something similar in the UDP code, based on your example. > - fix comments > - re-introduce MSS4 and MSS6 > - address comments from David and Stefano > =20 > I have tested only passt, not pasta Is there a reason for that? If you're having trouble running the testsuite, we really need to fix that... >=20 > tap.c | 3 +- > tcp.c | 479 ++++++++++++++++++++++++++++------------------------------ > 2 files changed, 233 insertions(+), 249 deletions(-) >=20 > diff --git a/tap.c b/tap.c > index 13e4da79d690..605b472b5ca2 100644 > --- a/tap.c > +++ b/tap.c > @@ -424,8 +424,7 @@ size_t tap_send_frames(const struct ctx *c, const str= uct iovec *iov, > debug("tap: failed to send %zu frames of %zu", > nframes - m, nframes); > =20 > - pcap_multiple(iov, bufs_per_frame, m, > - c->mode =3D=3D MODE_PASST ? sizeof(uint32_t) : 0); > + pcap_multiple(iov, bufs_per_frame, m, 0); If you're no longer changing the semantics of tap_send_frames() from those in my patch, I don't think you want this change any more either. > return m; > } > diff --git a/tcp.c b/tcp.c > index a1860d10b15f..d5c09dbf53ab 100644 > --- a/tcp.c > +++ b/tcp.c > @@ -318,39 +318,8 @@ > =20 > /* MSS rounding: see SET_MSS() */ > #define MSS_DEFAULT 536 > - > -struct tcp4_l2_head { /* For MSS4 macro: keep in sync with tcp4_l2_buf_t= */ > -#ifdef __AVX2__ > - uint8_t pad[26]; > -#else > - uint8_t pad[2]; > -#endif > - struct tap_hdr taph; > - struct iphdr iph; > - struct tcphdr th; > -#ifdef __AVX2__ > -} __attribute__ ((packed, aligned(32))); > -#else > -} __attribute__ ((packed, aligned(__alignof__(unsigned int)))); > -#endif > - > -struct tcp6_l2_head { /* For MSS6 macro: keep in sync with tcp6_l2_buf_t= */ > -#ifdef __AVX2__ > - uint8_t pad[14]; > -#else > - uint8_t pad[2]; > -#endif > - struct tap_hdr taph; > - struct ipv6hdr ip6h; > - struct tcphdr th; > -#ifdef __AVX2__ > -} __attribute__ ((packed, aligned(32))); > -#else > -} __attribute__ ((packed, aligned(__alignof__(unsigned int)))); > -#endif > - > -#define MSS4 ROUND_DOWN(USHRT_MAX - sizeof(struct tcp4_l2_head), 4) > -#define MSS6 ROUND_DOWN(USHRT_MAX - sizeof(struct tcp6_l2_head), 4) > +#define MSS4 ROUND_DOWN(ETH_MAX_MTU - ETH_HLEN - sizeof(struct tcphdr= ) - sizeof(struct iphdr), sizeof(uint32_t)) > +#define MSS6 ROUND_DOWN(ETH_MAX_MTU - ETH_HLEN - sizeof(struct tcphdr= ) - sizeof(struct ipv6hdr), sizeof(uint32_t)) I'd be inclined to base these static constants on only the IP level restriction (16-bit L3 packet length), i.e. replace (ETH_MAX_MTU - ETH_HLEN) with USHRT_MAX (or a new L3_MAX_LENGTH constant with the same value). The L2 MTU restriction, ought to be applied at runtime, to properly handle the --mtu option. > #define WINDOW_DEFAULT 14600 /* RFC 6928 */ > #ifdef HAS_SND_WND > @@ -445,133 +414,111 @@ struct tcp_buf_seq_update { > }; > =20 > /* Static buffers */ > - > /** > - * tcp4_l2_buf_t - Pre-cooked IPv4 packet buffers for tap connections > - * @pad: Align TCP header to 32 bytes, for AVX2 checksum calculation only > - * @taph: Tap-level headers (partially pre-filled) > - * @iph: Pre-filled IP header (except for tot_len and saddr) > - * @uh: Headroom for TCP header > - * @data: Storage for TCP payload > + * struct tcp_l2_payload_t - TCP header and data to send data Pre-existing problem, but without a typedef, I don't think the _t suffix is useful. I also don't think the "l2" really adds any useful information here. > + * 32 bytes aligned to be able to use AVX2 checksum > + * @th: TCP header > + * @data: TCP data > */ > -static struct tcp4_l2_buf_t { > -#ifdef __AVX2__ > - uint8_t pad[26]; /* 0, align th to 32 bytes */ > -#else > - uint8_t pad[2]; /* align iph to 4 bytes 0 */ > -#endif > - struct tap_hdr taph; /* 26 2 */ > - struct iphdr iph; /* 44 20 */ > - struct tcphdr th; /* 64 40 */ > - uint8_t data[MSS4]; /* 84 60 */ > - /* 65536 65532 */ > +struct tcp_l2_payload_t { > + struct tcphdr th; > + uint8_t data[65536 - sizeof(struct tcphdr)]; > #ifdef __AVX2__ > -} __attribute__ ((packed, aligned(32))) > +} __attribute__ ((packed, aligned(32))); > #else > -} __attribute__ ((packed, aligned(__alignof__(unsigned int)))) > +} __attribute__ ((packed, aligned(__alignof__(unsigned int)))); > #endif > -tcp4_l2_buf[TCP_FRAMES_MEM]; > =20 > -static struct tcp_buf_seq_update tcp4_l2_buf_seq_update[TCP_FRAMES_MEM]; > - > -static unsigned int tcp4_l2_buf_used; > +static_assert(MSS4 <=3D sizeof(((struct tcp_l2_payload_t *)0)->data)); > +static_assert(MSS6 <=3D sizeof(((struct tcp_l2_payload_t *)0)->data)); I think you could avoid the (safe but ugly) explicit 0-dereferences by writing the static_assert()s in terms of the actual global variables once you've defined them, rather than purely the types. > =20 > /** > - * tcp6_l2_buf_t - Pre-cooked IPv6 packet buffers for tap connections > - * @pad: Align IPv6 header for checksum calculation to 32B (AVX2) or 4B > - * @taph: Tap-level headers (partially pre-filled) > - * @ip6h: Pre-filled IP header (except for payload_len and addresses) > - * @th: Headroom for TCP header > - * @data: Storage for TCP payload > + * struct tcp_l2_flags_t - TCP header and data to send option flags Again, I don't think the "l2" really says anything useful. It was a bit unclear what it referred to in the original, but it's even more out of place now that the structure doesn't include L2 headers. > + * @th: TCP header > + * @opts TCP option flags > */ > -struct tcp6_l2_buf_t { > -#ifdef __AVX2__ > - uint8_t pad[14]; /* 0 align ip6h to 32 bytes */ > -#else > - uint8_t pad[2]; /* align ip6h to 4 bytes 0 */ > -#endif > - struct tap_hdr taph; /* 14 2 */ > - struct ipv6hdr ip6h; /* 32 20 */ > - struct tcphdr th; /* 72 60 */ > - uint8_t data[MSS6]; /* 92 80 */ > - /* 65536 65532 */ > +struct tcp_l2_flags_t { > + struct tcphdr th; > + char opts[OPT_MSS_LEN + OPT_WS_LEN + 1]; > #ifdef __AVX2__ > -} __attribute__ ((packed, aligned(32))) > +} __attribute__ ((packed, aligned(32))); > #else > -} __attribute__ ((packed, aligned(__alignof__(unsigned int)))) > +} __attribute__ ((packed, aligned(__alignof__(unsigned int)))); > #endif > -tcp6_l2_buf[TCP_FRAMES_MEM]; > =20 > -static struct tcp_buf_seq_update tcp6_l2_buf_seq_update[TCP_FRAMES_MEM]; > +static uint32_t tcp4_vnet_len [TCP_FRAMES_MEM]; > +/* Ethernet header for IPv4 frames */ > +static struct ethhdr tcp4_eth_src; > + > +/* IPv4 headers */ > +static struct iphdr tcp4_l2_ip[TCP_FRAMES_MEM]; > +/* TCP headers and data for IPv4 frames */ > +static struct tcp_l2_payload_t tcp4_l2_payload[TCP_FRAMES_MEM]; > =20 > +static struct tcp_buf_seq_update tcp4_l2_buf_seq_update[TCP_FRAMES_MEM]; > +static unsigned int tcp4_l2_buf_used; > + > +static uint32_t tcp4_flags_vnet_len [TCP_FRAMES_MEM]; > +/* IPv4 headers for TCP option flags frames */ > +static struct iphdr tcp4_l2_flags_ip[TCP_FRAMES_MEM]; > +/* TCP headers and option flags for IPv4 frames */ > +static struct tcp_l2_flags_t tcp4_l2_flags[TCP_FRAMES_MEM]; > + > +static unsigned int tcp4_l2_flags_buf_used; > + > +static uint32_t tcp6_vnet_len [TCP_FRAMES_MEM]; > +/* Ethernet header for IPv6 frames */ > +static struct ethhdr tcp6_eth_src; > + > +/* IPv6 headers */ > +static struct ipv6hdr tcp6_l2_ip[TCP_FRAMES_MEM]; > +/* TCP headers and data for IPv6 frames */ > +static struct tcp_l2_payload_t tcp6_l2_payload[TCP_FRAMES_MEM]; The v6 vnet_len, ethhdr and payload arrays can be shared with IPv4, but that can and IMO should be a follow on patch. > +static struct tcp_buf_seq_update tcp6_l2_buf_seq_update[TCP_FRAMES_MEM]; > static unsigned int tcp6_l2_buf_used; > =20 > +static uint32_t tcp6_flags_vnet_len [TCP_FRAMES_MEM]; > +/* IPv6 headers for TCP option flags frames */ > +static struct ipv6hdr tcp6_l2_flags_ip[TCP_FRAMES_MEM]; > +/* TCP headers and option flags for IPv6 frames */ > +static struct tcp_l2_flags_t tcp6_l2_flags[TCP_FRAMES_MEM]; > + > +static unsigned int tcp6_l2_flags_buf_used; > + > /* recvmsg()/sendmsg() data for tap */ > static char tcp_buf_discard [MAX_WINDOW]; > static struct iovec iov_sock [TCP_FRAMES_MEM + 1]; > =20 > -static struct iovec tcp4_l2_iov [TCP_FRAMES_MEM]; > -static struct iovec tcp6_l2_iov [TCP_FRAMES_MEM]; > -static struct iovec tcp4_l2_flags_iov [TCP_FRAMES_MEM]; > -static struct iovec tcp6_l2_flags_iov [TCP_FRAMES_MEM]; > +/* > + * enum tcp_iov_parts - I/O vector parts for one TCP frame > + * TCP_IOV_TAP TAP header > + * TCP_IOV_ETH ethernet header > + * TCP_IOV_IP IP (v4/v6) header > + * TCP_IOV_PAYLOAD IP payload (TCP header + data) > + * TCP_IOV_NUM is the number of entries in the iovec array If we have these descriptions adjacent to the actual values, I don't think we need to duplicate them in the overall enum comment. > + */ > +enum tcp_iov_parts { > + /* TAP header */ > + TCP_IOV_TAP =3D 0, > + /* ethernet header */ > + TCP_IOV_ETH =3D 1, > + /* IP (v4/v6) header */ > + TCP_IOV_IP =3D 2, > + /* IP payload (TCP header + data) */ > + TCP_IOV_PAYLOAD =3D 3, > + /* number of entries in the iovec array */ > + TCP_IOV_NUM I think our convention has been to use 'FOO_NUM_XS' rather than 'FOO_X_NUM' for constants like this, to distinguish it a little visually from the individual enum values. (e.g. PIF_NUM_TYPES, FLOW_NUM_TYPES). > +}; > + > +static struct iovec tcp4_l2_iov [TCP_FRAMES_MEM * TCP_IOV_NUM]; > +static struct iovec tcp6_l2_iov [TCP_FRAMES_MEM * TCP_IOV_NUM]; > +static struct iovec tcp4_l2_flags_iov [TCP_FRAMES_MEM * TCP_IOV_NUM]; > +static struct iovec tcp6_l2_flags_iov [TCP_FRAMES_MEM * TCP_IOV_NUM]; > =20 > /* sendmsg() to socket */ > static struct iovec tcp_iov [UIO_MAXIOV]; > =20 > -/** > - * tcp4_l2_flags_buf_t - IPv4 packet buffers for segments without data (= flags) > - * @pad: Align TCP header to 32 bytes, for AVX2 checksum calculation only > - * @taph: Tap-level headers (partially pre-filled) > - * @iph: Pre-filled IP header (except for tot_len and saddr) > - * @th: Headroom for TCP header > - * @opts: Headroom for TCP options > - */ > -static struct tcp4_l2_flags_buf_t { > -#ifdef __AVX2__ > - uint8_t pad[26]; /* 0, align th to 32 bytes */ > -#else > - uint8_t pad[2]; /* align iph to 4 bytes 0 */ > -#endif > - struct tap_hdr taph; /* 26 2 */ > - struct iphdr iph; /* 44 20 */ > - struct tcphdr th; /* 64 40 */ > - char opts[OPT_MSS_LEN + OPT_WS_LEN + 1]; > -#ifdef __AVX2__ > -} __attribute__ ((packed, aligned(32))) > -#else > -} __attribute__ ((packed, aligned(__alignof__(unsigned int)))) > -#endif > -tcp4_l2_flags_buf[TCP_FRAMES_MEM]; > - > -static unsigned int tcp4_l2_flags_buf_used; > - > -/** > - * tcp6_l2_flags_buf_t - IPv6 packet buffers for segments without data (= flags) > - * @pad: Align IPv6 header for checksum calculation to 32B (AVX2) or 4B > - * @taph: Tap-level headers (partially pre-filled) > - * @ip6h: Pre-filled IP header (except for payload_len and addresses) > - * @th: Headroom for TCP header > - * @opts: Headroom for TCP options > - */ > -static struct tcp6_l2_flags_buf_t { > -#ifdef __AVX2__ > - uint8_t pad[14]; /* 0 align ip6h to 32 bytes */ > -#else > - uint8_t pad[2]; /* align ip6h to 4 bytes 0 */ > -#endif > - struct tap_hdr taph; /* 14 2 */ > - struct ipv6hdr ip6h; /* 32 20 */ > - struct tcphdr th /* 72 */ __attribute__ ((aligned(4))); /* 60 */ > - char opts[OPT_MSS_LEN + OPT_WS_LEN + 1]; > -#ifdef __AVX2__ > -} __attribute__ ((packed, aligned(32))) > -#else > -} __attribute__ ((packed, aligned(__alignof__(unsigned int)))) > -#endif > -tcp6_l2_flags_buf[TCP_FRAMES_MEM]; > - > -static unsigned int tcp6_l2_flags_buf_used; > - > #define CONN(idx) (&(FLOW(idx)->tcp)) > =20 > /* Table for lookup from remote address, local port, remote port */ > @@ -967,25 +914,14 @@ static void tcp_update_check_tcp6(struct ipv6hdr *i= p6h, struct tcphdr *th) > } > =20 > /** > - * tcp_update_l2_buf() - Update L2 buffers with Ethernet and IPv4 addres= ses > + * tcp_update_l2_buf() - Update ethernet header buffers with addresses > * @eth_d: Ethernet destination address, NULL if unchanged > * @eth_s: Ethernet source address, NULL if unchanged > */ > void tcp_update_l2_buf(const unsigned char *eth_d, const unsigned char *= eth_s) > { > - int i; > - > - for (i =3D 0; i < TCP_FRAMES_MEM; i++) { > - struct tcp4_l2_flags_buf_t *b4f =3D &tcp4_l2_flags_buf[i]; > - struct tcp6_l2_flags_buf_t *b6f =3D &tcp6_l2_flags_buf[i]; > - struct tcp4_l2_buf_t *b4 =3D &tcp4_l2_buf[i]; > - struct tcp6_l2_buf_t *b6 =3D &tcp6_l2_buf[i]; > - > - eth_update_mac(&b4->taph.eh, eth_d, eth_s); > - eth_update_mac(&b6->taph.eh, eth_d, eth_s); > - eth_update_mac(&b4f->taph.eh, eth_d, eth_s); > - eth_update_mac(&b6f->taph.eh, eth_d, eth_s); > - } > + eth_update_mac(&tcp4_eth_src, eth_d, eth_s); > + eth_update_mac(&tcp6_eth_src, eth_d, eth_s); > } > =20 > /** > @@ -995,29 +931,42 @@ void tcp_update_l2_buf(const unsigned char *eth_d, = const unsigned char *eth_s) > static void tcp_sock4_iov_init(const struct ctx *c) > { > struct iphdr iph =3D L2_BUF_IP4_INIT(IPPROTO_TCP); > - struct iovec *iov; > int i; > =20 > - for (i =3D 0; i < ARRAY_SIZE(tcp4_l2_buf); i++) { > - tcp4_l2_buf[i] =3D (struct tcp4_l2_buf_t) { > - .taph =3D TAP_HDR_INIT(ETH_P_IP), > - .iph =3D iph, > - .th =3D { .doff =3D sizeof(struct tcphdr) / 4, .ack =3D 1 } > - }; > - } > + tcp4_eth_src.h_proto =3D htons_constant(ETH_P_IP); > + for (i =3D 0; i < TCP_FRAMES_MEM; i++) { > + struct iovec *iov; > =20 > - for (i =3D 0; i < ARRAY_SIZE(tcp4_l2_flags_buf); i++) { > - tcp4_l2_flags_buf[i] =3D (struct tcp4_l2_flags_buf_t) { > - .taph =3D TAP_HDR_INIT(ETH_P_IP), > - .iph =3D L2_BUF_IP4_INIT(IPPROTO_TCP) > - }; > - } > + /* headers */ > + tcp4_l2_ip[i] =3D iph; > + tcp4_l2_payload[i].th.doff =3D sizeof(struct tcphdr) / 4; > + tcp4_l2_payload[i].th.ack =3D 1; > =20 > - for (i =3D 0, iov =3D tcp4_l2_iov; i < TCP_FRAMES_MEM; i++, iov++) > - iov->iov_base =3D tap_frame_base(c, &tcp4_l2_buf[i].taph); > + tcp4_l2_flags_ip[i] =3D iph; > + tcp4_l2_flags[i].th.doff =3D sizeof(struct tcphdr) / 4; > + tcp4_l2_flags[i].th.ack =3D 1; > =20 > - for (i =3D 0, iov =3D tcp4_l2_flags_iov; i < TCP_FRAMES_MEM; i++, iov++) > - iov->iov_base =3D tap_frame_base(c, &tcp4_l2_flags_buf[i].taph); > + /* iovecs */ > + iov =3D &tcp4_l2_iov[i * TCP_IOV_NUM]; > + iov[TCP_IOV_TAP].iov_base =3D &tcp4_vnet_len[i]; > + iov[TCP_IOV_TAP].iov_len =3D c->mode =3D=3D MODE_PASST ? > + sizeof(uint32_t) : 0; > + iov[TCP_IOV_ETH].iov_base =3D &tcp4_eth_src; > + iov[TCP_IOV_ETH].iov_len =3D sizeof(struct ethhdr); I'd prefer to see ..base =3D &x; ..len =3D sizeof(x); rather than repeating the type name. > + iov[TCP_IOV_IP].iov_base =3D &tcp4_l2_ip[i]; > + iov[TCP_IOV_IP].iov_len =3D sizeof(struct iphdr); > + iov[TCP_IOV_PAYLOAD].iov_base =3D &tcp4_l2_payload[i]; > + > + iov =3D &tcp4_l2_flags_iov[i * TCP_IOV_NUM]; > + iov[TCP_IOV_TAP].iov_base =3D &tcp4_flags_vnet_len[i]; > + iov[TCP_IOV_TAP].iov_len =3D c->mode =3D=3D MODE_PASST ? > + sizeof(uint32_t) : 0; > + iov[TCP_IOV_ETH].iov_base =3D &tcp4_eth_src; > + iov[TCP_IOV_ETH].iov_len =3D sizeof(struct ethhdr); > + iov[TCP_IOV_IP].iov_base =3D &tcp4_l2_flags_ip[i]; > + iov[TCP_IOV_IP].iov_len =3D sizeof(struct iphdr); > + iov[TCP_IOV_PAYLOAD].iov_base =3D &tcp4_l2_flags[i]; > + } > } > =20 > /** > @@ -1026,29 +975,43 @@ static void tcp_sock4_iov_init(const struct ctx *c) > */ > static void tcp_sock6_iov_init(const struct ctx *c) > { > - struct iovec *iov; > + struct ipv6hdr ip6 =3D L2_BUF_IP6_INIT(IPPROTO_TCP); > int i; > =20 > - for (i =3D 0; i < ARRAY_SIZE(tcp6_l2_buf); i++) { > - tcp6_l2_buf[i] =3D (struct tcp6_l2_buf_t) { > - .taph =3D TAP_HDR_INIT(ETH_P_IPV6), > - .ip6h =3D L2_BUF_IP6_INIT(IPPROTO_TCP), > - .th =3D { .doff =3D sizeof(struct tcphdr) / 4, .ack =3D 1 } > - }; > - } > + tcp6_eth_src.h_proto =3D htons_constant(ETH_P_IPV6); > + for (i =3D 0; i < TCP_FRAMES_MEM; i++) { > + struct iovec *iov; > =20 > - for (i =3D 0; i < ARRAY_SIZE(tcp6_l2_flags_buf); i++) { > - tcp6_l2_flags_buf[i] =3D (struct tcp6_l2_flags_buf_t) { > - .taph =3D TAP_HDR_INIT(ETH_P_IPV6), > - .ip6h =3D L2_BUF_IP6_INIT(IPPROTO_TCP) > - }; > - } > + /* headers */ > + tcp6_l2_ip[i] =3D ip6; > + tcp6_l2_payload[i].th.doff =3D sizeof(struct tcphdr) / 4; > + tcp6_l2_payload[i].th.ack =3D 1; > =20 > - for (i =3D 0, iov =3D tcp6_l2_iov; i < TCP_FRAMES_MEM; i++, iov++) > - iov->iov_base =3D tap_frame_base(c, &tcp6_l2_buf[i].taph); > + tcp6_l2_flags_ip[i] =3D ip6; > + tcp6_l2_flags[i].th.doff =3D sizeof(struct tcphdr) / 4; > + tcp6_l2_flags[i].th .ack =3D 1; > =20 > - for (i =3D 0, iov =3D tcp6_l2_flags_iov; i < TCP_FRAMES_MEM; i++, iov++) > - iov->iov_base =3D tap_frame_base(c, &tcp6_l2_flags_buf[i].taph); > + /* iovecs */ > + iov =3D &tcp6_l2_iov[i * TCP_IOV_NUM]; > + iov[TCP_IOV_TAP].iov_base =3D &tcp6_vnet_len[i]; > + iov[TCP_IOV_TAP].iov_len =3D c->mode =3D=3D MODE_PASST ? > + sizeof(uint32_t) : 0; > + iov[TCP_IOV_ETH].iov_base =3D &tcp6_eth_src; > + iov[TCP_IOV_ETH].iov_len =3D sizeof(struct ethhdr); > + iov[TCP_IOV_IP].iov_base =3D &tcp6_l2_ip[i]; > + iov[TCP_IOV_IP].iov_len =3D sizeof(struct ipv6hdr); > + iov[TCP_IOV_PAYLOAD].iov_base =3D &tcp6_l2_payload[i]; > + > + iov =3D &tcp6_l2_flags_iov[i * TCP_IOV_NUM]; > + iov[TCP_IOV_TAP].iov_base =3D &tcp6_flags_vnet_len[i]; > + iov[TCP_IOV_TAP].iov_len =3D c->mode =3D=3D MODE_PASST ? > + sizeof(uint32_t) : 0; > + iov[TCP_IOV_ETH].iov_base =3D &tcp6_eth_src; > + iov[TCP_IOV_ETH].iov_len =3D sizeof(struct ethhdr); > + iov[TCP_IOV_IP].iov_base =3D &tcp6_l2_flags_ip[i]; > + iov[TCP_IOV_IP].iov_len =3D sizeof(struct ipv6hdr); > + iov[TCP_IOV_PAYLOAD].iov_base =3D &tcp6_l2_flags[i]; > + } > } > =20 > /** > @@ -1289,10 +1252,12 @@ static void tcp_rst_do(struct ctx *c, struct tcp_= tap_conn *conn); > */ > static void tcp_l2_flags_buf_flush(const struct ctx *c) > { > - tap_send_frames(c, tcp6_l2_flags_iov, 1, tcp6_l2_flags_buf_used); > + tap_send_frames(c, tcp6_l2_flags_iov, TCP_IOV_NUM, > + tcp6_l2_flags_buf_used); > tcp6_l2_flags_buf_used =3D 0; > =20 > - tap_send_frames(c, tcp4_l2_flags_iov, 1, tcp4_l2_flags_buf_used); > + tap_send_frames(c, tcp4_l2_flags_iov, TCP_IOV_NUM, > + tcp4_l2_flags_buf_used); > tcp4_l2_flags_buf_used =3D 0; > } > =20 > @@ -1305,12 +1270,12 @@ static void tcp_l2_data_buf_flush(const struct ct= x *c) > unsigned i; > size_t m; > =20 > - m =3D tap_send_frames(c, tcp6_l2_iov, 1, tcp6_l2_buf_used); > + m =3D tap_send_frames(c, tcp6_l2_iov, TCP_IOV_NUM, tcp6_l2_buf_used); > for (i =3D 0; i < m; i++) > *tcp6_l2_buf_seq_update[i].seq +=3D tcp6_l2_buf_seq_update[i].len; > tcp6_l2_buf_used =3D 0; > =20 > - m =3D tap_send_frames(c, tcp4_l2_iov, 1, tcp4_l2_buf_used); > + m =3D tap_send_frames(c, tcp4_l2_iov, TCP_IOV_NUM, tcp4_l2_buf_used); > for (i =3D 0; i < m; i++) > *tcp4_l2_buf_seq_update[i].seq +=3D tcp4_l2_buf_seq_update[i].len; > tcp4_l2_buf_used =3D 0; > @@ -1433,7 +1398,7 @@ static size_t tcp_fill_headers6(const struct ctx *c, > * tcp_l2_buf_fill_headers() - Fill 802.3, IP, TCP headers in pre-cooked= buffers > * @c: Execution context > * @conn: Connection pointer > - * @p: Pointer to any type of TCP pre-cooked buffer > + * @iov: Pointer to an array of iovec of TCP pre-cooked buffer > * @plen: Payload length (including TCP header options) > * @check: Checksum, if already known > * @seq: Sequence number for this segment > @@ -1442,26 +1407,22 @@ static size_t tcp_fill_headers6(const struct ctx = *c, > */ > static size_t tcp_l2_buf_fill_headers(const struct ctx *c, > const struct tcp_tap_conn *conn, > - void *p, size_t plen, > + struct iovec *iov, size_t plen, > const uint16_t *check, uint32_t seq) > { > const struct in_addr *a4 =3D inany_v4(&conn->faddr); > size_t ip_len, tlen; > =20 > if (a4) { > - struct tcp4_l2_buf_t *b =3D (struct tcp4_l2_buf_t *)p; > - > - ip_len =3D tcp_fill_headers4(c, conn, &b->iph, &b->th, plen, > + ip_len =3D tcp_fill_headers4(c, conn, iov[TCP_IOV_IP].iov_base, > + iov[TCP_IOV_PAYLOAD].iov_base, plen, > check, seq); > - > - tlen =3D tap_frame_len(c, &b->taph, ip_len); > + tlen =3D ip_len - sizeof(struct iphdr); > } else { > - struct tcp6_l2_buf_t *b =3D (struct tcp6_l2_buf_t *)p; > - > - ip_len =3D tcp_fill_headers6(c, conn, &b->ip6h, &b->th, plen, > + ip_len =3D tcp_fill_headers6(c, conn, iov[TCP_IOV_IP].iov_base, > + iov[TCP_IOV_PAYLOAD].iov_base, plen, > seq); > - > - tlen =3D tap_frame_len(c, &b->taph, ip_len); > + tlen =3D ip_len - sizeof(struct ipv6hdr); You're changing from returning the frame length to returning the IP payload length, so the function comment should be updated as well. > } > =20 > return tlen; > @@ -1595,16 +1556,16 @@ static int tcp_send_flag(struct ctx *c, struct tc= p_tap_conn *conn, int flags) > { > uint32_t prev_ack_to_tap =3D conn->seq_ack_to_tap; > uint32_t prev_wnd_to_tap =3D conn->wnd_to_tap; > - struct tcp4_l2_flags_buf_t *b4 =3D NULL; > - struct tcp6_l2_flags_buf_t *b6 =3D NULL; > + struct tcp_l2_flags_t *payload; > struct tcp_info tinfo =3D { 0 }; > socklen_t sl =3D sizeof(tinfo); > int s =3D conn->sock; > + uint32_t *vnet_len; > size_t optlen =3D 0; > - struct iovec *iov; > struct tcphdr *th; > + struct iovec *iov; > + size_t ip_len; > char *data; > - void *p; > =20 > if (SEQ_GE(conn->seq_ack_to_tap, conn->seq_from_tap) && > !flags && conn->wnd_to_tap) > @@ -1627,19 +1588,21 @@ static int tcp_send_flag(struct ctx *c, struct tc= p_tap_conn *conn, int flags) > return 0; > =20 > if (CONN_V4(conn)) { > - iov =3D tcp4_l2_flags_iov + tcp4_l2_flags_buf_used; > - p =3D b4 =3D tcp4_l2_flags_buf + tcp4_l2_flags_buf_used++; > - th =3D &b4->th; > - > - /* gcc 11.2 would complain on data =3D (char *)(th + 1); */ > - data =3D b4->opts; > + iov =3D &tcp4_l2_flags_iov[tcp4_l2_flags_buf_used++ * > + TCP_IOV_NUM]; > + vnet_len =3D iov[TCP_IOV_TAP].iov_base; > + *vnet_len =3D sizeof(struct ethhdr) + sizeof(struct iphdr); You're storing a native-endian value in *vnet_len here, then a network endian value later. I think that's extremely fragile and easy to mess up on later changes. > } else { > - iov =3D tcp6_l2_flags_iov + tcp6_l2_flags_buf_used; > - p =3D b6 =3D tcp6_l2_flags_buf + tcp6_l2_flags_buf_used++; > - th =3D &b6->th; > - data =3D b6->opts; > + iov =3D &tcp6_l2_flags_iov[tcp6_l2_flags_buf_used++ * > + TCP_IOV_NUM]; > + vnet_len =3D iov[TCP_IOV_TAP].iov_base; > + *vnet_len =3D sizeof(struct ethhdr) + sizeof(struct ipv6hdr); > } > =20 > + payload =3D iov[TCP_IOV_PAYLOAD].iov_base; > + th =3D &payload->th; > + data =3D payload->opts; > + > if (flags & SYN) { > int mss; > =20 > @@ -1688,8 +1651,10 @@ static int tcp_send_flag(struct ctx *c, struct tcp= _tap_conn *conn, int flags) > th->syn =3D !!(flags & SYN); > th->fin =3D !!(flags & FIN); > =20 > - iov->iov_len =3D tcp_l2_buf_fill_headers(c, conn, p, optlen, > - NULL, conn->seq_to_tap); > + ip_len =3D tcp_l2_buf_fill_headers(c, conn, iov, optlen, NULL, > + conn->seq_to_tap); > + iov[TCP_IOV_PAYLOAD].iov_len =3D ip_len; > + *vnet_len =3D htonl(*vnet_len + ip_len); Fwiw, in the UDP patches I'm working on, I still called tap_frame_len() to set the vnet_len field, but returned a different length from the fill_headers() equivalent to size the relevant IOV segment. That way fill_headers() does fill all the headers, including the vnet_len. > if (th->ack) { > if (SEQ_GE(conn->seq_ack_to_tap, conn->seq_from_tap)) > @@ -1705,23 +1670,29 @@ static int tcp_send_flag(struct ctx *c, struct tc= p_tap_conn *conn, int flags) > if (th->fin || th->syn) > conn->seq_to_tap++; > =20 > - if (CONN_V4(conn)) { > - if (flags & DUP_ACK) { > - memcpy(b4 + 1, b4, sizeof(*b4)); > - (iov + 1)->iov_len =3D iov->iov_len; > - tcp4_l2_flags_buf_used++; > + if (flags & DUP_ACK) { > + struct iovec *dup_iov; > + int i; > + > + if (CONN_V4(conn)) > + dup_iov =3D &tcp4_l2_flags_iov[tcp4_l2_flags_buf_used++ * > + TCP_IOV_NUM]; > + else > + dup_iov =3D &tcp6_l2_flags_iov[tcp6_l2_flags_buf_used++ * > + TCP_IOV_NUM]; > + > + for (i =3D 0; i < TCP_IOV_NUM; i++) { > + memcpy(dup_iov[i].iov_base, iov[i].iov_base, > + iov[i].iov_len); > + dup_iov[i].iov_len =3D iov[i].iov_len; IIUC, the iov_len for everything except TCP_IOV_PAYLOAD should be uniform, so most of these won't change anything. I get nervous about fragility if we write to values which are supposed to be readonly after initialisation. > } > + } > =20 > - if (tcp4_l2_flags_buf_used > ARRAY_SIZE(tcp4_l2_flags_buf) - 2) > + if (CONN_V4(conn)) { > + if (tcp4_l2_flags_buf_used > TCP_FRAMES_MEM - 2) > tcp_l2_flags_buf_flush(c); > } else { > - if (flags & DUP_ACK) { > - memcpy(b6 + 1, b6, sizeof(*b6)); > - (iov + 1)->iov_len =3D iov->iov_len; > - tcp6_l2_flags_buf_used++; > - } > - > - if (tcp6_l2_flags_buf_used > ARRAY_SIZE(tcp6_l2_flags_buf) - 2) > + if (tcp6_l2_flags_buf_used > TCP_FRAMES_MEM - 2) > tcp_l2_flags_buf_flush(c); > } > =20 > @@ -2168,30 +2139,44 @@ static void tcp_data_to_tap(const struct ctx *c, = struct tcp_tap_conn *conn, > ssize_t plen, int no_csum, uint32_t seq) > { > uint32_t *seq_update =3D &conn->seq_to_tap; > + uint32_t *vnet_len; > struct iovec *iov; > + size_t ip_len; > =20 > if (CONN_V4(conn)) { > - struct tcp4_l2_buf_t *b =3D &tcp4_l2_buf[tcp4_l2_buf_used]; > - const uint16_t *check =3D no_csum ? &(b - 1)->iph.check : NULL; > + struct iovec *iov_prev =3D &tcp4_l2_iov[(tcp4_l2_buf_used - 1) * TCP_I= OV_NUM]; > + const uint16_t *check =3D NULL; > + > + if (no_csum) { > + struct iphdr *iph =3D iov_prev[TCP_IOV_IP].iov_base; > + check =3D &iph->check; > + } > =20 > tcp4_l2_buf_seq_update[tcp4_l2_buf_used].seq =3D seq_update; > tcp4_l2_buf_seq_update[tcp4_l2_buf_used].len =3D plen; > =20 > - iov =3D tcp4_l2_iov + tcp4_l2_buf_used++; > - iov->iov_len =3D tcp_l2_buf_fill_headers(c, conn, b, plen, > - check, seq); > - if (tcp4_l2_buf_used > ARRAY_SIZE(tcp4_l2_buf) - 1) > + iov =3D &tcp4_l2_iov[tcp4_l2_buf_used++ * TCP_IOV_NUM]; > + ip_len =3D tcp_l2_buf_fill_headers(c, conn, iov, plen, check, > + seq); > + iov[TCP_IOV_PAYLOAD].iov_len =3D ip_len; > + vnet_len =3D iov[TCP_IOV_TAP].iov_base; > + *vnet_len =3D htonl(sizeof(struct ethhdr) + > + sizeof(struct iphdr) + > + ip_len); > + if (tcp4_l2_buf_used > TCP_FRAMES_MEM - 1) > tcp_l2_data_buf_flush(c); > } else if (CONN_V6(conn)) { > - struct tcp6_l2_buf_t *b =3D &tcp6_l2_buf[tcp6_l2_buf_used]; > - > tcp6_l2_buf_seq_update[tcp6_l2_buf_used].seq =3D seq_update; > tcp6_l2_buf_seq_update[tcp6_l2_buf_used].len =3D plen; > =20 > - iov =3D tcp6_l2_iov + tcp6_l2_buf_used++; > - iov->iov_len =3D tcp_l2_buf_fill_headers(c, conn, b, plen, > - NULL, seq); > - if (tcp6_l2_buf_used > ARRAY_SIZE(tcp6_l2_buf) - 1) > + iov =3D &tcp6_l2_iov[tcp6_l2_buf_used++ * TCP_IOV_NUM]; > + ip_len =3D tcp_l2_buf_fill_headers(c, conn, iov, plen, NULL, seq); > + iov[TCP_IOV_PAYLOAD].iov_len =3D ip_len; > + vnet_len =3D iov[TCP_IOV_TAP].iov_base; > + *vnet_len =3D htonl(sizeof(struct ethhdr) + > + sizeof(struct ipv6hdr) + > + ip_len); > + if (tcp6_l2_buf_used > TCP_FRAMES_MEM - 1) > tcp_l2_data_buf_flush(c); > } > } > @@ -2247,8 +2232,8 @@ static int tcp_data_from_sock(struct ctx *c, struct= tcp_tap_conn *conn) > iov_sock[0].iov_base =3D tcp_buf_discard; > iov_sock[0].iov_len =3D already_sent; > =20 > - if (( v4 && tcp4_l2_buf_used + fill_bufs > ARRAY_SIZE(tcp4_l2_buf)) || > - (!v4 && tcp6_l2_buf_used + fill_bufs > ARRAY_SIZE(tcp6_l2_buf))) { > + if (( v4 && tcp4_l2_buf_used + fill_bufs > TCP_FRAMES_MEM) || > + (!v4 && tcp6_l2_buf_used + fill_bufs > TCP_FRAMES_MEM)) { > tcp_l2_data_buf_flush(c); > =20 > /* Silence Coverity CWE-125 false positive */ > @@ -2257,9 +2242,9 @@ static int tcp_data_from_sock(struct ctx *c, struct= tcp_tap_conn *conn) > =20 > for (i =3D 0, iov =3D iov_sock + 1; i < fill_bufs; i++, iov++) { > if (v4) > - iov->iov_base =3D &tcp4_l2_buf[tcp4_l2_buf_used + i].data; > + iov->iov_base =3D &tcp4_l2_payload[tcp4_l2_buf_used + i].data; > else > - iov->iov_base =3D &tcp6_l2_buf[tcp6_l2_buf_used + i].data; > + iov->iov_base =3D &tcp6_l2_payload[tcp6_l2_buf_used + i].data; > iov->iov_len =3D mss; > } > if (iov_rem) --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --0EPG35Ou21ymlSNW Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmX31XUACgkQzQJF27ox 2GetOw//UiQICllH/kdE4rWLmPnp5+kC6dz+sJ2t2BPJPaTbUXkto7dQmN/4ScAs bC3INlX0ElEfO6gQ8N14EI+BFW326oMJbe1P36Ftw0mD5I2EXkpQvmpqYVuMBWdO MVXkUQiROmfsKUL9W5og0dS5T5QtC8DuhS9lSgLjRJrNYf021Y6toEFlCzeDL6v+ QZ5Gu8HaOP4fEdN/gjR+IixpEKznz2Kq7ajn0Yn/9nsGv2bbA0eAD/5ld5V+DAvn 5z1ekmPjMAAq/aOz8tNadowUH3h3UT+OH3jGonqPO/2SMI7j95RLlEHQoaegyJVG yiA08ordixdkEFA6pkbuQukxz5uodkGY/UjPFVEahsAJy93ZgPbFYph3B83tKYez oPXF7ER2642X/XXQOGrEze+dCVJTOM0q6Ko/tly8oGv01PLGYUnbH5TScqLX65iN PV6jcKhazzwennWSAP/U9dLsJH7i/agd1QGSr2s44+K1lA4UgMt6GRnN5HwlTg2L lnLWqylafSTMyqrC08pa0NEnffaNnMwSp08lV1kLgSmb2oSrDY38A9B4QayUO3sk 2cv6GRi6HkL+B4shYsoaCJBMmD4kOgu3EuF8VjzdKwDEy79cokA00b3wyZQTnUdM NNqikiVVeIxqSuJmn8a+29Eyxc2WUu2RSLfPhS94iRIH+hDuhQo= =WcmW -----END PGP SIGNATURE----- --0EPG35Ou21ymlSNW--