From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=aQUSdcGB; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTPS id CB1115A0272 for ; Fri, 28 Feb 2025 14:10:29 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1740748228; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PQouvQjAzCOHR3GcnL7DzaG/kBXV5MN4ThE6jUhajFk=; b=aQUSdcGBW2UCvXT02qfaOVvPNyw/M7m+OzBT3NeZjcGCmOpMbm5kXO/f4BOO/TZB2ZMGSX oL/jCUTVz28LyzDWHYRYsTnSvCCBFewluX1M7kB7vp3VJ4jyIq5szVzgVXOwXu7YzWGSQy 1R/M9aQSPvTRLRUOgA6c61ndqj4izbw= Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-326-86A4xJA7PY68ffJhOXhCYw-1; Fri, 28 Feb 2025 08:10:27 -0500 X-MC-Unique: 86A4xJA7PY68ffJhOXhCYw-1 X-Mimecast-MFC-AGG-ID: 86A4xJA7PY68ffJhOXhCYw_1740748226 Received: by mail-wr1-f72.google.com with SMTP id ffacd0b85a97d-390eefb2913so405869f8f.0 for ; Fri, 28 Feb 2025 05:10:26 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740748225; x=1741353025; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=moRfpxXW0BvrBlHnA/orI8um8B9977e53pHAlmLaKH0=; b=ptzZ6rJqMHiwVv2DWpg6bWWORRPYSs4SsKiVZljn2VZSyt4kXygCbNcgMzaS7xc7nC 6DFKNll25xH64vJXg6ZIq39RaOQ/uLbkCCgzd/tQbmCs6dLQFZNkoaXPofX8+nnvdMfm BAblVp8wGMVRkq5O+McRnaREgnKpFzUlYR2fzDMgCqDWsfQ716ckpRvNzeR3AL8ovgPb zwc9ZLltG+hp7hMNvSxxq4ViJkJM7ivnl19NiDdsoByxGyzODmKRZBWpUmuU5DPCMvcL rDAeqq76QErR/VGXFuh3Wg+ilquSI/WRtpxtbDF8kcVL2Mk8rcc9nKN3Fv0da+mrsh/8 KckQ== X-Gm-Message-State: AOJu0YxgY8b+GKIf1Rr6Ay+3PpNTo3tfXqQC2JpRO8+Wc7mrrq0BCHLx DPqLcsZX4iRus8IQ+g7Qy1jpcxW/RLkQf/Nn3XWZsqFsViYMNIqNeZG8l9tDxsi0ArcbqXfpbXv 1BPu0jFsZ9bTLG7JgKIOif195WzgyuwAMgmk0R5Di565UrOjkGivTfLEjMHt0 X-Gm-Gg: ASbGncuQhD2a/ji8CIN+KtySi6UAnw5tqrrMBoUhJZYpF9r+Op9dstjecSx2TEYZb7r JL6lWOM83nM7zv1+IT6hzZqKp1OXXj3LqG4e91dqXG8SHnUKElNxTIXEKTNqBEBqe62wuD8bbuA gCfaST6g1mxSxdnrrRnhlBAYUQgHsUP+/lJA6WNAZB6yqTera+J5fl5iQPq5n17OX1OpJa6nArq LeRtSg6fEEYsqvUgVlb2QdoB7Re4Yqy9A9Rp/iK9hsn30Xxk7USTQAM1MMhuQ1WDXy2f46JQpK3 jAKazAeUe6vtD4vQ9zdHlbQW8qY= X-Received: by 2002:a5d:47c3:0:b0:38d:e584:81ea with SMTP id ffacd0b85a97d-390eca803edmr2590976f8f.45.1740748225027; Fri, 28 Feb 2025 05:10:25 -0800 (PST) X-Google-Smtp-Source: AGHT+IHAwWHdmikCRC97Dt8zMMXLfTYBb9hJgDwk9MH18pOXVAl63rZX5vepoC765hDdh7TdA4UyXA== X-Received: by 2002:a5d:47c3:0:b0:38d:e584:81ea with SMTP id ffacd0b85a97d-390eca803edmr2590957f8f.45.1740748224577; Fri, 28 Feb 2025 05:10:24 -0800 (PST) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [2a10:fc81:a806:d6a9::1]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-390e4844a16sm5184506f8f.79.2025.02.28.05.10.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 28 Feb 2025 05:10:24 -0800 (PST) Date: Fri, 28 Feb 2025 14:10:22 +0100 From: Stefano Brivio To: David Gibson Subject: Re: [PATCH 1/1] tcp: Send RST in response to guest packets that match no connection Message-ID: <20250228141022.25d54fc5@elisabeth> In-Reply-To: <20250228044844.861635-2-david@gibson.dropbear.id.au> References: <20250228044844.861635-1-david@gibson.dropbear.id.au> <20250228044844.861635-2-david@gibson.dropbear.id.au> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.41; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: 0V2VQkJpUmKTrxEAcY7QTJq_GGz82EbJIMePjYgHB20_1740748226 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Message-ID-Hash: VJXYBHVHU2JS7IL3BRGS7XPEEQFJBH3P X-Message-ID-Hash: VJXYBHVHU2JS7IL3BRGS7XPEEQFJBH3P X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: I had just nits so I originally planned to ignore some and fix some up on merge but then I realised something else, so here are nits and concern: On Fri, 28 Feb 2025 15:48:44 +1100 David Gibson wrote: > Currently, if a non-SYN TCP packet arrives which doesn't match any existi= ng > connection, we simply ignore it. However RFC 9293, section 3.10.7.1 says > we should respond with an RST to a non-SYN, non-RST packet that's for a > CLOSED (i.e. non-existent) connection. >=20 > This can arise in practice with migration, in cases where some error mean= s > we have to discard a connection. We destroy the connection with tcp_rst(= ) > in that case, but because the guest is stopped, we may not be able to > deliver the RST packet on the tap interface immediately. This change > ensures an RST will be sent if the guest tries to use the connection agai= n. >=20 > A similar situation can arise if a passt/pasta instance is killed or > crashes, but is then replaced with another attached to the same guest. > This can leave the guest with stale connections that the new passt instan= ce > isn't aware of. It's better to send an RST so the guest knows quickly > these are broken, rather than letting them linger until they time out. >=20 > Signed-off-by: David Gibson > --- > tap.c | 13 +++++------ > tap.h | 6 +++++ > tcp.c | 72 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > 3 files changed, 84 insertions(+), 7 deletions(-) >=20 > diff --git a/tap.c b/tap.c > index 44b0fc0f..3c6a3e33 100644 > --- a/tap.c > +++ b/tap.c > @@ -122,7 +122,7 @@ const struct in6_addr *tap_ip6_daddr(const struct ctx= *c, > * > * Return: pointer at which to write the packet's payload > */ > -static void *tap_push_l2h(const struct ctx *c, void *buf, uint16_t proto= ) > +void *tap_push_l2h(const struct ctx *c, void *buf, uint16_t proto) > { > =09struct ethhdr *eh =3D (struct ethhdr *)buf; > =20 > @@ -143,8 +143,8 @@ static void *tap_push_l2h(const struct ctx *c, void *= buf, uint16_t proto) > * > * Return: pointer at which to write the packet's payload > */ > -static void *tap_push_ip4h(struct iphdr *ip4h, struct in_addr src, > -=09=09=09 struct in_addr dst, size_t l4len, uint8_t proto) > +void *tap_push_ip4h(struct iphdr *ip4h, struct in_addr src, > +=09=09 struct in_addr dst, size_t l4len, uint8_t proto) > { > =09uint16_t l3len =3D l4len + sizeof(*ip4h); > =20 > @@ -229,10 +229,9 @@ void tap_icmp4_send(const struct ctx *c, struct in_a= ddr src, struct in_addr dst, > * > * Return: pointer at which to write the packet's payload > */ > -static void *tap_push_ip6h(struct ipv6hdr *ip6h, > -=09=09=09 const struct in6_addr *src, > -=09=09=09 const struct in6_addr *dst, > -=09=09=09 size_t l4len, uint8_t proto, uint32_t flow) > +void *tap_push_ip6h(struct ipv6hdr *ip6h, > +=09=09 const struct in6_addr *src, const struct in6_addr *dst, > +=09=09 size_t l4len, uint8_t proto, uint32_t flow) > { > =09ip6h->payload_len =3D htons(l4len); > =09ip6h->priority =3D 0; > diff --git a/tap.h b/tap.h > index a476a125..390ac127 100644 > --- a/tap.h > +++ b/tap.h > @@ -42,6 +42,9 @@ static inline void tap_hdr_update(struct tap_hdr *thdr,= size_t l2len) > =09=09thdr->vnet_len =3D htonl(l2len); > } > =20 > +void *tap_push_l2h(const struct ctx *c, void *buf, uint16_t proto); > +void *tap_push_ip4h(struct iphdr *ip4h, struct in_addr src, > +=09=09 struct in_addr dst, size_t l4len, uint8_t proto); > void tap_udp4_send(const struct ctx *c, struct in_addr src, in_port_t sp= ort, > =09=09 struct in_addr dst, in_port_t dport, > =09=09 const void *in, size_t dlen); > @@ -49,6 +52,9 @@ void tap_icmp4_send(const struct ctx *c, struct in_addr= src, struct in_addr dst, > =09=09 const void *in, size_t l4len); > const struct in6_addr *tap_ip6_daddr(const struct ctx *c, > =09=09=09=09 const struct in6_addr *src); > +void *tap_push_ip6h(struct ipv6hdr *ip6h, > +=09=09 const struct in6_addr *src, const struct in6_addr *dst, > +=09=09 size_t l4len, uint8_t proto, uint32_t flow); > void tap_udp6_send(const struct ctx *c, > =09=09 const struct in6_addr *src, in_port_t sport, > =09=09 const struct in6_addr *dst, in_port_t dport, > diff --git a/tcp.c b/tcp.c > index b3aa9a2c..50670547 100644 > --- a/tcp.c > +++ b/tcp.c > @@ -1868,6 +1868,76 @@ static void tcp_conn_from_sock_finish(const struct= ctx *c, > =09tcp_data_from_sock(c, conn); > } > =20 > +/** > + * tcp_no_conn() - Respond to a non-SYN packet matching no connection The name makes sense when seen from the perspective of the caller, but not so much as a stand-alone thing. I don't have great ideas and tcp_no_conn() sounds fine anyway. I was thinking of tcp_reply_spurious(), tcp_rst_unrelated(), tcp_rst_invalid() (in netfilter terms), tcp_handle_stray() or things like that. > + * @c:=09=09Execution context > + * @af:=09=09Address family, AF_INET or AF_INET6 > + * @saddr:=09Source address of the packet we're responding to > + * @daddr:=09Destination address of the packet we're responding to > + * @th:=09=09TCP header of the packet we're responding to* "to*" > + * @l4len:=09Packet length, including TCP header > + */ > +void tcp_no_conn(const struct ctx *c, int af, > +=09=09 const void *saddr, const void *daddr, > +=09=09 const struct tcphdr *th, size_t l4len) > +{ > +=09struct iov_tail payload =3D IOV_TAIL(NULL, 0, 0); > +=09struct tcphdr *rsth; > +=09char buf[USHRT_MAX]; > +=09uint32_t psum =3D 0; > +=09size_t rst_l2len; > + > +=09/* Don't respond to RSTs without a connection */ > +=09if (th->rst) > +=09=09return; > + > +=09if (af =3D=3D AF_INET) { > +=09=09struct iphdr *ip4h =3D tap_push_l2h(c, buf, ETH_P_IP); > +=09=09const struct in_addr *rst_src =3D daddr; > +=09=09const struct in_addr *rst_dst =3D saddr; > + > +=09=09rsth =3D tap_push_ip4h(ip4h, *rst_src, *rst_dst, > +=09=09=09=09 sizeof(*rsth), IPPROTO_TCP); > +=09=09psum =3D proto_ipv4_header_psum(sizeof(*rsth), IPPROTO_TCP, > +=09=09=09=09=09 *rst_src, *rst_dst); > + > +=09} else { > +=09=09struct ipv6hdr *ip6h =3D tap_push_l2h(c, buf, ETH_P_IPV6); > +=09=09const struct in6_addr *rst_src =3D daddr; > +=09=09const struct in6_addr *rst_dst =3D saddr; > + > +=09=09/* FIXME: do we need to take the flow id from the IPv6 header > +=09=09 * we're responding to? > +=09=09 */ I wanted to check if we actually need to, then I realised my test won't actually tell, but let me share the steps because it can at least be used to check the mechanism as a whole: $ ./pasta -p http.pcap --config-net -- wget http://example.com -O /dev/null= 2>/dev/null $ tshark -r http.pcap -Y 'http.request.version' -w http_get.pcap $ ./pasta -p replay.pcap --config-net -I i -- tcpreplay -Wi i http_get.pcap Saving packet capture to replay.pcap Actual: 1 packets (200 bytes) sent in 0.000003 seconds Rated: 66666666.6 Bps, 533.33 Mbps, 333333.33 pps Statistics for network device: i =09Successful packets: 1 =09Failed packets: 0 =09Truncated packets: 0 =09Retried packets (ENOBUFS): 0 =09Retried packets (EAGAIN): 0 and now we can have a look at the RST: $ tshark -r replay.pcap -Y tcp 3 0.099072 2a01:4f8:222:904::2 =E2=86=92 2600:1408:ec00:36::1736:7f31= HTTP 200 GET / HTTP/1.1=20 4 0.099111 2600:1408:ec00:36::1736:7f31 =E2=86=92 2a01:4f8:222:904::2= TCP 74 80 =E2=86=92 53378 [RST] Seq=3D1 Win=3D0 Len=3D0 Back to the flow label: tcp_v6_send_reset(), which handles this case in the kernel, copies it from the message we're replying to, so we probably should as well (other than making obvious RFC 6437 sense). Judging from __inet6_lookup_skb(), __inet6_lookup_established(), inet6_match(), etc., called in turn from tcp_v6_rcv(), it doesn't _seem_ to be strictly necessary. Conversely, passt commit 16b08367a57f ("tap: Fill the IPv6 flow label field to represent flow association") was needed because tcp_v6_syn_recv_sock() actually checks that. So, on one hand, I think it's much safer to do that, because the kernel could decide to check the label on every packet at some point, and I would call it a feature rather than user breakage. On the other hand, it adds some complexity (should tcp_tap_handler() just dump that only as needed? Should we make that part of the flow lookup logic instead...?) that we don't strictly need right now and we can take care of it as a follow-up, so I don't have a particular preference. I just wanted to point out that we don't need to, but I think we should eventually copy the label. I'm fine either way for this fix, though. > +=09=09rsth =3D tap_push_ip6h(ip6h, rst_src, rst_dst, > +=09=09=09=09 sizeof(*rsth), IPPROTO_TCP, 0); > +=09=09psum =3D proto_ipv6_header_psum(sizeof(*rsth), IPPROTO_TCP, > +=09=09=09=09=09 rst_src, rst_dst); > +=09} > + > +=09memset(rsth, 0, sizeof(*rsth)); > + > +=09rsth->source =3D th->dest; > +=09rsth->dest =3D th->source; > +=09rsth->rst =3D 1; > +=09rsth->doff =3D sizeof(*rsth) / 4UL; > + > +=09/* Sequence matching logic from RFC9293 section 3.10.7.1 */ $ grep "RFC [0-9]" *.c | wc -l 24 $ grep "RFC[0-9]" *.c | wc -l 1 Call me a troglodyte but I would never find the reference without the space. > +=09if (th->ack) { > +=09=09rsth->seq =3D th->ack_seq; > +=09} else { > +=09=09size_t dlen =3D l4len - th->doff * 4UL; > +=09=09uint32_t ack =3D ntohl(th->seq) + dlen; > + > +=09=09rsth->ack_seq =3D htonl(ack); > +=09=09rsth->ack =3D 1; > +=09} > + > +=09tcp_update_csum(psum, rsth, &payload); > +=09rst_l2len =3D ((char *)rsth - buf) + sizeof(*rsth); > +=09tap_send_single(c, buf, rst_l2len); > +} > + > /** > * tcp_tap_handler() - Handle packets from tap and state transitions > * @c:=09=09Execution context > @@ -1915,6 +1985,8 @@ int tcp_tap_handler(const struct ctx *c, uint8_t pi= f, sa_family_t af, > =09=09if (opts && th->syn && !th->ack) > =09=09=09tcp_conn_from_tap(c, af, saddr, daddr, th, > =09=09=09=09=09 opts, optlen, now); > +=09=09else > +=09=09=09tcp_no_conn(c, af, saddr, daddr, th, len); > =09=09return 1; > =09} > =20 --=20 Stefano