From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202502 header.b=Ps/CJeLG; dkim-atps=neutral Received: from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id 5D2A05A0634 for ; Mon, 03 Mar 2025 07:42:35 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202502; t=1740984147; bh=2E4APHkKa8zgReLHiWwDiJmuLxbd3iuzWEutYlkCUys=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=Ps/CJeLGhYKYvd53rVTKJUft/kKhnIUc6jJEaIlQV5pMA4F+Uo86ppcW36mT4BB/Z a3bsiNQOM/0qCgNZ6hrBfuBuMFHiol8ejm3z8j6ffh0DgnlzXLsak/Gyg8U9RaPZmy oFdq+J9tk0GZMjs+7oO5LOkzGZvAtufmTra5Puvjg0DWBO5hq/0okrOzc/loDFB0u1 O5zWNB62xguyo7u8AdfbrfT62j6VNKj4ySvasioLu55gH/LqBJiMYEah0opeEJmDWq KcUwjX0XlPdDB2xCQmUIMUKGw5cqDEkqaoiUr16z4oeMZP/MuZxqJB6X/DEF7hszCJ jbqvwQj5+j5rg== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4Z5q4M0bGTz4whq; Mon, 3 Mar 2025 17:42:27 +1100 (AEDT) Date: Mon, 3 Mar 2025 17:42:20 +1100 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH 1/1] tcp: Send RST in response to guest packets that match no connection Message-ID: References: <20250228044844.861635-1-david@gibson.dropbear.id.au> <20250228044844.861635-2-david@gibson.dropbear.id.au> <20250228141022.25d54fc5@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="ql9CnfuS90DfTHKg" Content-Disposition: inline In-Reply-To: <20250228141022.25d54fc5@elisabeth> Message-ID-Hash: OANAHFBXVBFK627O6PMP4VV6HSUIDR5V X-Message-ID-Hash: OANAHFBXVBFK627O6PMP4VV6HSUIDR5V X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --ql9CnfuS90DfTHKg Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Feb 28, 2025 at 02:10:22PM +0100, Stefano Brivio wrote: > I had just nits so I originally planned to ignore some and fix some up > on merge but then I realised something else, so here are nits and > concern: >=20 > On Fri, 28 Feb 2025 15:48:44 +1100 > David Gibson wrote: >=20 > > Currently, if a non-SYN TCP packet arrives which doesn't match any exis= ting > > connection, we simply ignore it. However RFC 9293, section 3.10.7.1 sa= ys > > we should respond with an RST to a non-SYN, non-RST packet that's for a > > CLOSED (i.e. non-existent) connection. > >=20 > > This can arise in practice with migration, in cases where some error me= ans > > we have to discard a connection. We destroy the connection with tcp_rs= t() > > in that case, but because the guest is stopped, we may not be able to > > deliver the RST packet on the tap interface immediately. This change > > ensures an RST will be sent if the guest tries to use the connection ag= ain. > >=20 > > A similar situation can arise if a passt/pasta instance is killed or > > crashes, but is then replaced with another attached to the same guest. > > This can leave the guest with stale connections that the new passt inst= ance > > isn't aware of. It's better to send an RST so the guest knows quickly > > these are broken, rather than letting them linger until they time out. > >=20 > > Signed-off-by: David Gibson > > --- > > tap.c | 13 +++++------ > > tap.h | 6 +++++ > > tcp.c | 72 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > 3 files changed, 84 insertions(+), 7 deletions(-) > >=20 > > diff --git a/tap.c b/tap.c > > index 44b0fc0f..3c6a3e33 100644 > > --- a/tap.c > > +++ b/tap.c > > @@ -122,7 +122,7 @@ const struct in6_addr *tap_ip6_daddr(const struct c= tx *c, > > * > > * Return: pointer at which to write the packet's payload > > */ > > -static void *tap_push_l2h(const struct ctx *c, void *buf, uint16_t pro= to) > > +void *tap_push_l2h(const struct ctx *c, void *buf, uint16_t proto) > > { > > struct ethhdr *eh =3D (struct ethhdr *)buf; > > =20 > > @@ -143,8 +143,8 @@ static void *tap_push_l2h(const struct ctx *c, void= *buf, uint16_t proto) > > * > > * Return: pointer at which to write the packet's payload > > */ > > -static void *tap_push_ip4h(struct iphdr *ip4h, struct in_addr src, > > - struct in_addr dst, size_t l4len, uint8_t proto) > > +void *tap_push_ip4h(struct iphdr *ip4h, struct in_addr src, > > + struct in_addr dst, size_t l4len, uint8_t proto) > > { > > uint16_t l3len =3D l4len + sizeof(*ip4h); > > =20 > > @@ -229,10 +229,9 @@ void tap_icmp4_send(const struct ctx *c, struct in= _addr src, struct in_addr dst, > > * > > * Return: pointer at which to write the packet's payload > > */ > > -static void *tap_push_ip6h(struct ipv6hdr *ip6h, > > - const struct in6_addr *src, > > - const struct in6_addr *dst, > > - size_t l4len, uint8_t proto, uint32_t flow) > > +void *tap_push_ip6h(struct ipv6hdr *ip6h, > > + const struct in6_addr *src, const struct in6_addr *dst, > > + size_t l4len, uint8_t proto, uint32_t flow) > > { > > ip6h->payload_len =3D htons(l4len); > > ip6h->priority =3D 0; > > diff --git a/tap.h b/tap.h > > index a476a125..390ac127 100644 > > --- a/tap.h > > +++ b/tap.h > > @@ -42,6 +42,9 @@ static inline void tap_hdr_update(struct tap_hdr *thd= r, size_t l2len) > > thdr->vnet_len =3D htonl(l2len); > > } > > =20 > > +void *tap_push_l2h(const struct ctx *c, void *buf, uint16_t proto); > > +void *tap_push_ip4h(struct iphdr *ip4h, struct in_addr src, > > + struct in_addr dst, size_t l4len, uint8_t proto); > > void tap_udp4_send(const struct ctx *c, struct in_addr src, in_port_t = sport, > > struct in_addr dst, in_port_t dport, > > const void *in, size_t dlen); > > @@ -49,6 +52,9 @@ void tap_icmp4_send(const struct ctx *c, struct in_ad= dr src, struct in_addr dst, > > const void *in, size_t l4len); > > const struct in6_addr *tap_ip6_daddr(const struct ctx *c, > > const struct in6_addr *src); > > +void *tap_push_ip6h(struct ipv6hdr *ip6h, > > + const struct in6_addr *src, const struct in6_addr *dst, > > + size_t l4len, uint8_t proto, uint32_t flow); > > void tap_udp6_send(const struct ctx *c, > > const struct in6_addr *src, in_port_t sport, > > const struct in6_addr *dst, in_port_t dport, > > diff --git a/tcp.c b/tcp.c > > index b3aa9a2c..50670547 100644 > > --- a/tcp.c > > +++ b/tcp.c > > @@ -1868,6 +1868,76 @@ static void tcp_conn_from_sock_finish(const stru= ct ctx *c, > > tcp_data_from_sock(c, conn); > > } > > =20 > > +/** > > + * tcp_no_conn() - Respond to a non-SYN packet matching no connection >=20 > The name makes sense when seen from the perspective of the caller, but > not so much as a stand-alone thing. >=20 > I don't have great ideas and tcp_no_conn() sounds fine anyway. >=20 > I was thinking of tcp_reply_spurious(), tcp_rst_unrelated(), > tcp_rst_invalid() (in netfilter terms), tcp_handle_stray() or things > like that. I've gone with tcp_rst_no_conn() for the next spin, which I hope is a little better. >=20 > > + * @c: Execution context > > + * @af: Address family, AF_INET or AF_INET6 > > + * @saddr: Source address of the packet we're responding to > > + * @daddr: Destination address of the packet we're responding to > > + * @th: TCP header of the packet we're responding to* >=20 > "to*" Fixed. > > + * @l4len: Packet length, including TCP header > > + */ > > +void tcp_no_conn(const struct ctx *c, int af, > > + const void *saddr, const void *daddr, > > + const struct tcphdr *th, size_t l4len) > > +{ > > + struct iov_tail payload =3D IOV_TAIL(NULL, 0, 0); > > + struct tcphdr *rsth; > > + char buf[USHRT_MAX]; > > + uint32_t psum =3D 0; > > + size_t rst_l2len; > > + > > + /* Don't respond to RSTs without a connection */ > > + if (th->rst) > > + return; > > + > > + if (af =3D=3D AF_INET) { > > + struct iphdr *ip4h =3D tap_push_l2h(c, buf, ETH_P_IP); > > + const struct in_addr *rst_src =3D daddr; > > + const struct in_addr *rst_dst =3D saddr; > > + > > + rsth =3D tap_push_ip4h(ip4h, *rst_src, *rst_dst, > > + sizeof(*rsth), IPPROTO_TCP); > > + psum =3D proto_ipv4_header_psum(sizeof(*rsth), IPPROTO_TCP, > > + *rst_src, *rst_dst); > > + > > + } else { > > + struct ipv6hdr *ip6h =3D tap_push_l2h(c, buf, ETH_P_IPV6); > > + const struct in6_addr *rst_src =3D daddr; > > + const struct in6_addr *rst_dst =3D saddr; > > + > > + /* FIXME: do we need to take the flow id from the IPv6 header > > + * we're responding to? > > + */ >=20 > I wanted to check if we actually need to, then I realised my test won't > actually tell, but let me share the steps because it can at least be > used to check the mechanism as a whole: >=20 > $ ./pasta -p http.pcap --config-net -- wget http://example.com -O /dev/nu= ll 2>/dev/null > $ tshark -r http.pcap -Y 'http.request.version' -w http_get.pcap > $ ./pasta -p replay.pcap --config-net -I i -- tcpreplay -Wi i http_get.pc= ap > Saving packet capture to replay.pcap > Actual: 1 packets (200 bytes) sent in 0.000003 seconds > Rated: 66666666.6 Bps, 533.33 Mbps, 333333.33 pps > Statistics for network device: i > Successful packets: 1 > Failed packets: 0 > Truncated packets: 0 > Retried packets (ENOBUFS): 0 > Retried packets (EAGAIN): 0 >=20 > and now we can have a look at the RST: >=20 > $ tshark -r replay.pcap -Y tcp > 3 0.099072 2a01:4f8:222:904::2 =E2=86=92 2600:1408:ec00:36::1736:7f= 31 HTTP 200 GET / HTTP/1.1=20 > 4 0.099111 2600:1408:ec00:36::1736:7f31 =E2=86=92 2a01:4f8:222:904:= :2 TCP 74 80 =E2=86=92 53378 [RST] Seq=3D1 Win=3D0 Len=3D0 >=20 > Back to the flow label: tcp_v6_send_reset(), which handles this case in > the kernel, copies it from the message we're replying to, so we probably > should as well (other than making obvious RFC 6437 sense). Right. I figured we really should, it was just trickier to implement, because we don't have obvious access to the IPv6 header at this point. > Judging from __inet6_lookup_skb(), __inet6_lookup_established(), > inet6_match(), etc., called in turn from tcp_v6_rcv(), it doesn't > _seem_ to be strictly necessary. >=20 > Conversely, passt commit 16b08367a57f ("tap: Fill the IPv6 flow label fie= ld > to represent flow association") was needed because tcp_v6_syn_recv_sock() > actually checks that. >=20 > So, on one hand, I think it's much safer to do that, because the kernel > could decide to check the label on every packet at some point, and I would > call it a feature rather than user breakage. >=20 > On the other hand, it adds some complexity (should tcp_tap_handler() just > dump that only as needed? Should we make that part of the flow lookup > logic instead...?) that we don't strictly need right now and we can take > care of it as a follow-up, so I don't have a particular preference. >=20 > I just wanted to point out that we don't need to, but I think we should > eventually copy the label. I'm fine either way for this fix, though. I had a look into this... and discovered it raised several additional questions. So I'll respin without changing that behaviour and we can maybe look at this later. > > + rsth =3D tap_push_ip6h(ip6h, rst_src, rst_dst, > > + sizeof(*rsth), IPPROTO_TCP, 0); > > + psum =3D proto_ipv6_header_psum(sizeof(*rsth), IPPROTO_TCP, > > + rst_src, rst_dst); > > + } > > + > > + memset(rsth, 0, sizeof(*rsth)); > > + > > + rsth->source =3D th->dest; > > + rsth->dest =3D th->source; > > + rsth->rst =3D 1; > > + rsth->doff =3D sizeof(*rsth) / 4UL; > > + > > + /* Sequence matching logic from RFC9293 section 3.10.7.1 */ >=20 > $ grep "RFC [0-9]" *.c | wc -l > 24 > $ grep "RFC[0-9]" *.c | wc -l > 1 >=20 > Call me a troglodyte but I would never find the reference without the > space. Sure, easily fixed. Troglodyte ;-p. > > + if (th->ack) { > > + rsth->seq =3D th->ack_seq; > > + } else { > > + size_t dlen =3D l4len - th->doff * 4UL; > > + uint32_t ack =3D ntohl(th->seq) + dlen; > > + > > + rsth->ack_seq =3D htonl(ack); > > + rsth->ack =3D 1; > > + } > > + > > + tcp_update_csum(psum, rsth, &payload); > > + rst_l2len =3D ((char *)rsth - buf) + sizeof(*rsth); > > + tap_send_single(c, buf, rst_l2len); > > +} > > + > > /** > > * tcp_tap_handler() - Handle packets from tap and state transitions > > * @c: Execution context > > @@ -1915,6 +1985,8 @@ int tcp_tap_handler(const struct ctx *c, uint8_t = pif, sa_family_t af, > > if (opts && th->syn && !th->ack) > > tcp_conn_from_tap(c, af, saddr, daddr, th, > > opts, optlen, now); > > + else > > + tcp_no_conn(c, af, saddr, daddr, th, len); > > return 1; > > } > > =20 >=20 --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --ql9CnfuS90DfTHKg Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmfFT0sACgkQzQJF27ox 2GfsWxAArBck2b8q1FRsRClcRPghn6qMq9MlMDUXosebsHz7UFUqYwQ4kZpl+BLJ Scoj36nXjJUUNjmJsVQ5h6l08jNOrzohx+OW4qVjPumZpwlud8avAnIq5deVfCM8 BN5QJIBJ6VOw+18J67bGvje65B2o5q7wKmJO6EkZ8eTlAZtw6yWf3v4vZ5L6Rmg3 6o3n7EqH8m1VRYERZIYoqkUj66IvBfXFwW/dqftbPzaRNZJ6uQwPQHJJx62M4jhZ JnsrSZbHXw1MJXdnhG0CkdEuQ6lZmqLBBkGV0Dnjifqs1NdHkdnjX4bxfPITkyCg HJojFWlG4MwHZcrnf8BKFuUMudh84rcXFye8NB8FzaKNxLyWiiP20de6a11RON69 tCtI2Bn6Zr30lLmwAvrlpWywPIoxOadf3ghtBYX7GxN7ca1Tq5HM7y0/RvtlYXBb Dvvif7GSb8OX8DOC4RonvgTVpvRkS72j51Y3BeUOnZcNv+a1XysnzTeFUKFf5ZdF xz0yELMW5eq87zFggknuD5XmQP1xjU5GcEHJqBW+ey4kR0dfrSaP9m6Q7HS1fm3S B1XVTvfvUmtTVjfmUVGppB1fESHnA0823+H59E8VDe3Re0TC71GWqAVyHzb3/9Wh oQeT4PTicclZH47gnhP121kvruELZ/mi10+N3YG2hTQo8Ho++fM= =ttkd -----END PGP SIGNATURE----- --ql9CnfuS90DfTHKg--