From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from gandalf.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id 2434C5A0272 for ; Tue, 6 Feb 2024 01:33:26 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202312; t=1707179599; bh=tAz1IgsjIUDjUv5yVjAt95H5skRYeNeZrncTt1S5FZY=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=XBl9e9l6QoGS4tt9PJR5MGyRLNJQpnfScuYFkS3CYR8rGA2WG0MTfLY/scMwOD9nD gR3ksss9V6q6Z5QzgG2wxJ+TyMilFUzOtHbD2pk354GMu9Ew8+7iEKbiQv/W27U9fX YCk99OdvYpPDXYCSIe1WZmw0e5Qv+TI1ggiGFYizMCDDAYV7cdynTAZdqXBsLfymN0 Wn9IPnuQu2kXN8SNd4uHKWvse91xvYNhH4ndqc61No0wz3QC3+3Q0lxkzsbNxoNsdg lSg/n1kUYgc9DDVWfBeHVV8piyo+/s0DY+mn2syPAxTlwOFM9n5aFP1wqVJ51AHk6G E0o4iCt/ESIHQ== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4TTPNv3VTcz4wd0; Tue, 6 Feb 2024 11:33:19 +1100 (AEDT) Date: Tue, 6 Feb 2024 11:24:15 +1100 From: David Gibson To: Laurent Vivier Subject: Re: [PATCH 08/24] tcp: extract buffer management from tcp_send_flag() Message-ID: References: <20240202141151.3762941-1-lvivier@redhat.com> <20240202141151.3762941-9-lvivier@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="QFBht4xQoNXjRQGL" Content-Disposition: inline In-Reply-To: <20240202141151.3762941-9-lvivier@redhat.com> Message-ID-Hash: AIJFPM4PTIGZP4NHCLH4VWHWZEWONLCM X-Message-ID-Hash: AIJFPM4PTIGZP4NHCLH4VWHWZEWONLCM X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --QFBht4xQoNXjRQGL Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Feb 02, 2024 at 03:11:35PM +0100, Laurent Vivier wrote: This definitely needs a commit message to explain what you're trying to achieve here. Without further context "buffer management" suggests to me allocation / freeing / placement of buffers, whereas what's actually being moved here is code related to the construction of headers within the buffers. > Signed-off-by: Laurent Vivier > --- > tcp.c | 224 +++++++++++++++++++++++++++++++++------------------------- > 1 file changed, 129 insertions(+), 95 deletions(-) >=20 > diff --git a/tcp.c b/tcp.c > index 2fd6bc2eda53..20ad8a4e5271 100644 > --- a/tcp.c > +++ b/tcp.c > @@ -1320,87 +1320,98 @@ void tcp_defer_handler(struct ctx *c) > tcp_l2_data_buf_flush(c); > } > =20 Function comment. > +static void tcp_set_tcp_header(struct tcphdr *th, > + const struct tcp_tap_conn *conn, uint32_t seq) > +{ > + th->source =3D htons(conn->fport); > + th->dest =3D htons(conn->eport); > + th->seq =3D htonl(seq); > + th->ack_seq =3D htonl(conn->seq_ack_to_tap); > + if (conn->events & ESTABLISHED) { > + th->window =3D htons(conn->wnd_to_tap); > + } else { > + unsigned wnd =3D conn->wnd_to_tap << conn->ws_to_tap; > + > + th->window =3D htons(MIN(wnd, USHRT_MAX)); > + } > +} > + > /** > - * tcp_l2_buf_fill_headers() - Fill 802.3, IP, TCP headers in pre-cooked= buffers > + * ipv4_fill_headers() - Fill 802.3, IPv4, TCP headers in pre-cooked buf= fers > * @c: Execution context > * @conn: Connection pointer > - * @p: Pointer to any type of TCP pre-cooked buffer > + * @iph: Pointer to IPv4 header, immediately followed by a TCP header > * @plen: Payload length (including TCP header options) > * @check: Checksum, if already known > * @seq: Sequence number for this segment > * > - * Return: frame length including L2 headers, host order > + * Return: IP frame length including L2 headers, host order This is an odd case where adding an extra descriptor is making it less clear to me. Without context, "frame" suggests to me an entire L2 frame containing whatever inner protocol. But I'm not immediately sure what "IP frame" means: is it an L2 frame which happens to have IP inside, or does it mean just the IP packet without the L2 headers. The rest of the sentence clarifies it's the first, but it still throws me for a moment. > */ > -static size_t tcp_l2_buf_fill_headers(const struct ctx *c, > - const struct tcp_tap_conn *conn, > - void *p, size_t plen, > - const uint16_t *check, uint32_t seq) > + > +static size_t ipv4_fill_headers(const struct ctx *c, > + const struct tcp_tap_conn *conn, > + struct iphdr *iph, size_t plen, > + const uint16_t *check, uint32_t seq) I really like this re-organization of the header filling code. I wonder if separating it into a separate patch might make the remainder patch easier to follow. > { > + struct tcphdr *th =3D (void *)(iph + 1); > const struct in_addr *a4 =3D inany_v4(&conn->faddr); > - size_t ip_len, tlen; > - > -#define SET_TCP_HEADER_COMMON_V4_V6(b, conn, seq) \ > -do { \ > - b->th.source =3D htons(conn->fport); \ > - b->th.dest =3D htons(conn->eport); \ > - b->th.seq =3D htonl(seq); \ > - b->th.ack_seq =3D htonl(conn->seq_ack_to_tap); \ > - if (conn->events & ESTABLISHED) { \ > - b->th.window =3D htons(conn->wnd_to_tap); \ > - } else { \ > - unsigned wnd =3D conn->wnd_to_tap << conn->ws_to_tap; \ > - \ > - b->th.window =3D htons(MIN(wnd, USHRT_MAX)); \ > - } \ > -} while (0) > - > - if (a4) { > - struct tcp4_l2_buf_t *b =3D (struct tcp4_l2_buf_t *)p; > - > - ip_len =3D plen + sizeof(struct iphdr) + sizeof(struct tcphdr); > - b->iph.tot_len =3D htons(ip_len); > - b->iph.saddr =3D a4->s_addr; > - b->iph.daddr =3D c->ip4.addr_seen.s_addr; > - > - b->iph.check =3D check ? *check : > - ipv4_hdr_checksum(&b->iph, IPPROTO_TCP); > - > - SET_TCP_HEADER_COMMON_V4_V6(b, conn, seq); > - > - b->th.check =3D tcp_update_check_tcp4(&b->iph); > - > - tlen =3D tap_iov_len(c, &b->taph, ip_len); > - } else { > - struct tcp6_l2_buf_t *b =3D (struct tcp6_l2_buf_t *)p; > + size_t ip_len =3D plen + sizeof(struct iphdr) + sizeof(struct tcphdr); > =20 > - ip_len =3D plen + sizeof(struct ipv6hdr) + sizeof(struct tcphdr); > + iph->tot_len =3D htons(ip_len); > + iph->saddr =3D a4->s_addr; > + iph->daddr =3D c->ip4.addr_seen.s_addr; > =20 > - b->ip6h.payload_len =3D htons(plen + sizeof(struct tcphdr)); > - b->ip6h.saddr =3D conn->faddr.a6; > - if (IN6_IS_ADDR_LINKLOCAL(&b->ip6h.saddr)) > - b->ip6h.daddr =3D c->ip6.addr_ll_seen; > - else > - b->ip6h.daddr =3D c->ip6.addr_seen; > + iph->check =3D check ? *check : ipv4_hdr_checksum(iph, IPPROTO_TCP); > + > + tcp_set_tcp_header(th, conn, seq); > + > + th->check =3D tcp_update_check_tcp4(iph); > + > + return ip_len; > +} > + > +/** > + * ipv6_fill_headers() - Fill 802.3, IPv6, TCP headers in pre-cooked buf= fers > + * @c: Execution context > + * @conn: Connection pointer > + * @ip6h: Pointer to IPv6 header, immediately followed by a TCP header > + * @plen: Payload length (including TCP header options) > + * @check: Checksum, if already known > + * @seq: Sequence number for this segment > + * > + * Return: IP frame length including L2 headers, host order > + */ > + > +static size_t ipv6_fill_headers(const struct ctx *c, > + const struct tcp_tap_conn *conn, > + struct ipv6hdr *ip6h, size_t plen, > + uint32_t seq) > +{ > + struct tcphdr *th =3D (void *)(ip6h + 1); > + size_t ip_len =3D plen + sizeof(struct ipv6hdr) + sizeof(struct tcphdr); > =20 > - memset(b->ip6h.flow_lbl, 0, 3); > + ip6h->payload_len =3D htons(plen + sizeof(struct tcphdr)); > + ip6h->saddr =3D conn->faddr.a6; > + if (IN6_IS_ADDR_LINKLOCAL(&ip6h->saddr)) > + ip6h->daddr =3D c->ip6.addr_ll_seen; > + else > + ip6h->daddr =3D c->ip6.addr_seen; > =20 > - SET_TCP_HEADER_COMMON_V4_V6(b, conn, seq); > + memset(ip6h->flow_lbl, 0, 3); > =20 > - b->th.check =3D tcp_update_check_tcp6(&b->ip6h); > + tcp_set_tcp_header(th, conn, seq); > =20 > - b->ip6h.hop_limit =3D 255; > - b->ip6h.version =3D 6; > - b->ip6h.nexthdr =3D IPPROTO_TCP; > + th->check =3D tcp_update_check_tcp6(ip6h); > =20 > - b->ip6h.flow_lbl[0] =3D (conn->sock >> 16) & 0xf; > - b->ip6h.flow_lbl[1] =3D (conn->sock >> 8) & 0xff; > - b->ip6h.flow_lbl[2] =3D (conn->sock >> 0) & 0xff; > + ip6h->hop_limit =3D 255; > + ip6h->version =3D 6; > + ip6h->nexthdr =3D IPPROTO_TCP; > =20 > - tlen =3D tap_iov_len(c, &b->taph, ip_len); > - } > -#undef SET_TCP_HEADER_COMMON_V4_V6 > + ip6h->flow_lbl[0] =3D (conn->sock >> 16) & 0xf; > + ip6h->flow_lbl[1] =3D (conn->sock >> 8) & 0xff; > + ip6h->flow_lbl[2] =3D (conn->sock >> 0) & 0xff; > =20 > - return tlen; > + return ip_len; > } > =20 > /** > @@ -1520,27 +1531,21 @@ static void tcp_update_seqack_from_tap(const stru= ct ctx *c, > } > =20 > /** > - * tcp_send_flag() - Send segment with flags to tap (no payload) > + * do_tcp_send_flag() - Send segment with flags to tap (no payload) As a rule, I don't love "do_X" as a function name. I particularly dislike it here because AFAICT this function isn't actually the one that "does" the sending of the flag - that happens with the tcp_l2_flags_buf_flush() in the caller. > * @c: Execution context > * @conn: Connection pointer > * @flags: TCP flags: if not set, send segment only if ACK is due > * > * Return: negative error code on connection reset, 0 otherwise > */ > -static int tcp_send_flag(struct ctx *c, struct tcp_tap_conn *conn, int f= lags) > + > +static int do_tcp_send_flag(struct ctx *c, struct tcp_tap_conn *conn, in= t flags, struct tcphdr *th, char *data, size_t optlen) > { > uint32_t prev_ack_to_tap =3D conn->seq_ack_to_tap; > uint32_t prev_wnd_to_tap =3D conn->wnd_to_tap; > - struct tcp4_l2_flags_buf_t *b4 =3D NULL; > - struct tcp6_l2_flags_buf_t *b6 =3D NULL; > struct tcp_info tinfo =3D { 0 }; > socklen_t sl =3D sizeof(tinfo); > int s =3D conn->sock; > - size_t optlen =3D 0; > - struct iovec *iov; > - struct tcphdr *th; > - char *data; > - void *p; > =20 > if (SEQ_GE(conn->seq_ack_to_tap, conn->seq_from_tap) && > !flags && conn->wnd_to_tap) > @@ -1562,26 +1567,9 @@ static int tcp_send_flag(struct ctx *c, struct tcp= _tap_conn *conn, int flags) > if (!tcp_update_seqack_wnd(c, conn, flags, &tinfo) && !flags) > return 0; > =20 > - if (CONN_V4(conn)) { > - iov =3D tcp4_l2_flags_iov + tcp4_l2_flags_buf_used; > - p =3D b4 =3D tcp4_l2_flags_buf + tcp4_l2_flags_buf_used++; > - th =3D &b4->th; > - > - /* gcc 11.2 would complain on data =3D (char *)(th + 1); */ > - data =3D b4->opts; > - } else { > - iov =3D tcp6_l2_flags_iov + tcp6_l2_flags_buf_used; > - p =3D b6 =3D tcp6_l2_flags_buf + tcp6_l2_flags_buf_used++; > - th =3D &b6->th; > - data =3D b6->opts; > - } > - > if (flags & SYN) { > int mss; > =20 > - /* Options: MSS, NOP and window scale (8 bytes) */ > - optlen =3D OPT_MSS_LEN + 1 + OPT_WS_LEN; > - > *data++ =3D OPT_MSS; > *data++ =3D OPT_MSS_LEN; > =20 > @@ -1624,9 +1612,6 @@ static int tcp_send_flag(struct ctx *c, struct tcp_= tap_conn *conn, int flags) > th->syn =3D !!(flags & SYN); > th->fin =3D !!(flags & FIN); > =20 > - iov->iov_len =3D tcp_l2_buf_fill_headers(c, conn, p, optlen, > - NULL, conn->seq_to_tap); > - > if (th->ack) { > if (SEQ_GE(conn->seq_ack_to_tap, conn->seq_from_tap)) > conn_flag(c, conn, ~ACK_TO_TAP_DUE); > @@ -1641,8 +1626,38 @@ static int tcp_send_flag(struct ctx *c, struct tcp= _tap_conn *conn, int flags) > if (th->fin || th->syn) > conn->seq_to_tap++; > =20 > + return 1; > +} > + > +static int tcp_send_flag(struct ctx *c, struct tcp_tap_conn *conn, int f= lags) > +{ > + size_t optlen =3D 0; > + struct iovec *iov; > + size_t ip_len; > + int ret; > + > + /* Options: MSS, NOP and window scale (8 bytes) */ > + if (flags & SYN) > + optlen =3D OPT_MSS_LEN + 1 + OPT_WS_LEN; > + > if (CONN_V4(conn)) { > + struct tcp4_l2_flags_buf_t *b4; > + > + iov =3D tcp4_l2_flags_iov + tcp4_l2_flags_buf_used; > + b4 =3D tcp4_l2_flags_buf + tcp4_l2_flags_buf_used++; > + > + ret =3D do_tcp_send_flag(c, conn, flags, &b4->th, b4->opts, > + optlen); > + if (ret <=3D 0) > + return ret; > + > + ip_len =3D ipv4_fill_headers(c, conn, &b4->iph, optlen, > + NULL, conn->seq_to_tap); > + > + iov->iov_len =3D tap_iov_len(c, &b4->taph, ip_len); > + > if (flags & DUP_ACK) { > + > memcpy(b4 + 1, b4, sizeof(*b4)); > (iov + 1)->iov_len =3D iov->iov_len; > tcp4_l2_flags_buf_used++; > @@ -1651,6 +1666,21 @@ static int tcp_send_flag(struct ctx *c, struct tcp= _tap_conn *conn, int flags) > if (tcp4_l2_flags_buf_used > ARRAY_SIZE(tcp4_l2_flags_buf) - 2) > tcp_l2_flags_buf_flush(c); > } else { > + struct tcp6_l2_flags_buf_t *b6; > + > + iov =3D tcp6_l2_flags_iov + tcp6_l2_flags_buf_used; > + b6 =3D tcp6_l2_flags_buf + tcp6_l2_flags_buf_used++; > + > + ret =3D do_tcp_send_flag(c, conn, flags, &b6->th, b6->opts, > + optlen); > + if (ret <=3D 0) > + return ret; > + > + ip_len =3D ipv6_fill_headers(c, conn, &b6->ip6h, optlen, > + conn->seq_to_tap); > + > + iov->iov_len =3D tap_iov_len(c, &b6->taph, ip_len); > + > if (flags & DUP_ACK) { > memcpy(b6 + 1, b6, sizeof(*b6)); > (iov + 1)->iov_len =3D iov->iov_len; > @@ -2050,6 +2080,7 @@ static void tcp_data_to_tap(const struct ctx *c, st= ruct tcp_tap_conn *conn, > { > uint32_t *seq_update =3D &conn->seq_to_tap; > struct iovec *iov; > + size_t ip_len; > =20 > if (CONN_V4(conn)) { > struct tcp4_l2_buf_t *b =3D &tcp4_l2_buf[tcp4_l2_buf_used]; > @@ -2058,9 +2089,11 @@ static void tcp_data_to_tap(const struct ctx *c, s= truct tcp_tap_conn *conn, > tcp4_l2_buf_seq_update[tcp4_l2_buf_used].seq =3D seq_update; > tcp4_l2_buf_seq_update[tcp4_l2_buf_used].len =3D plen; > =20 > + ip_len =3D ipv4_fill_headers(c, conn, &b->iph, plen, > + check, seq); > + > iov =3D tcp4_l2_iov + tcp4_l2_buf_used++; > - iov->iov_len =3D tcp_l2_buf_fill_headers(c, conn, b, plen, > - check, seq); > + iov->iov_len =3D tap_iov_len(c, &b->taph, ip_len); > if (tcp4_l2_buf_used > ARRAY_SIZE(tcp4_l2_buf) - 1) > tcp_l2_data_buf_flush(c); > } else if (CONN_V6(conn)) { > @@ -2069,9 +2102,10 @@ static void tcp_data_to_tap(const struct ctx *c, s= truct tcp_tap_conn *conn, > tcp6_l2_buf_seq_update[tcp6_l2_buf_used].seq =3D seq_update; > tcp6_l2_buf_seq_update[tcp6_l2_buf_used].len =3D plen; > =20 > + ip_len =3D ipv6_fill_headers(c, conn, &b->ip6h, plen, seq); > + > iov =3D tcp6_l2_iov + tcp6_l2_buf_used++; > - iov->iov_len =3D tcp_l2_buf_fill_headers(c, conn, b, plen, > - NULL, seq); > + iov->iov_len =3D tap_iov_len(c, &b->taph, ip_len); > if (tcp6_l2_buf_used > ARRAY_SIZE(tcp6_l2_buf) - 1) > tcp_l2_data_buf_flush(c); > } --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --QFBht4xQoNXjRQGL Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmXBfB4ACgkQzQJF27ox 2Gd+8g//f7rgghVxGuwH2PJp69zwr4UJ4vzYtpYzqfr1LwfNZSx/fjAJugEbOArE Hap6/HcFGZBrqkpQ9rTCxFNTeiGr9XOefcXuCTMFs+1ZdYYiJ8MEcZAhlbzVR4dN jIANi1kXkwA0x1m/6TJoXImvJD2Y2XXdRl3Jp7m0bSFFPQ1uA3EU5kTuIT08fej1 GYINHaUUKbdI8XXZE7QCPmW/vNuo4qhqsb5wh/FYuqoKc+WjayhnUnBD+Ij8nu8a zPr6HJrrIUtqYmcRc6f6PoKtgF2Of1T1Ss1GS5J1GSjFAqEB5gOdU9dwAAZ+VRxo wKc84QNchxZ2nVDtXVP87zZuKCEMCvE2x5+2P3ER0Zj71OMRDFifQ7DAn5sTZFHn CWEKPg3BKzaMaDkXu3Rnt17ZSilF+R7DLGyRspVFDBZtTj/aqUbMSy4nrSiHhjRH awWNwTRgBcjgOzNQHPuiD9fg3Qmy9xcEXA9uf9R3wGaUKeBfCCnsAv2/7er7028e DQ2Aw1Mr5OHQ23DRkKIQzsvOPQJsrX4KTufbGIaHgo9lUyRL6WkW6WDhKyDYa2TN Ldm4CnsCycYFVwhvbNHBTB9N2cwhFw7qqShh5BnC23lEJfCOpLGrYijJ9mOMkDRj RV09O1jaAXgjufU6kpw7yGyhhwTUmyaoCadi09QsBiXvFEEd1dE= =Xhm2 -----END PGP SIGNATURE----- --QFBht4xQoNXjRQGL--