From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=Jk1KZlKq; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTPS id 174275A0621 for ; Wed, 19 Feb 2025 06:37:34 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1739943453; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fPQQqtWGCQPCOc4F7VCu234A+on9ztgBZzAsL8uEuic=; b=Jk1KZlKqNfVj2Dj9TgUuxytb05+9PjqQArbg2QrWIh2U7TwB6wU6dzzvxh8IHgi22GPlM+ BUubWic1a5zZUrMPWZYxRTEYf6ab7NilOjSOHooLRrf1fiBYNCZ2hnLw8MP2tjdJYBden8 YSXtCu/i15L4rnAL9XMq78/JEbhFVFQ= Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-652-I9IxtM7ROVCo3YIuRnB24Q-1; Wed, 19 Feb 2025 00:37:32 -0500 X-MC-Unique: I9IxtM7ROVCo3YIuRnB24Q-1 X-Mimecast-MFC-AGG-ID: I9IxtM7ROVCo3YIuRnB24Q_1739943451 Received: by mail-wr1-f72.google.com with SMTP id ffacd0b85a97d-38f2c0aa6d6so187484f8f.0 for ; Tue, 18 Feb 2025 21:37:32 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739943451; x=1740548251; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=zoG1+yv+ls1LVUEHn90fwV4Mc33csViFPinokyrt0Cc=; b=DeybOO1wQwv4Jr3StEVlg6TdXfhLTJuADQEtD5v7VeitAoOCWML00fam7gIjw2Pu7P RJPE48FtYDFugOr7QU02cvnPF7TxPDOwOdOZIX18aFra+UCBAlU3xYjwyjpezeb0iizy WJKqJ5iNBT7Mtlb3OmnhONftru92lKSXXGNZjbGkcxW7Xtl9571kD3e3ki4cLpMNmBm+ RoxSzleogPKSMF9znMMX0pfKyanmAylTkAiUMCnJDVnzbUMuRtdx1g0yzGMu1vxAT4T0 TICdUMrwYzRR+SKl8V7ps20XB6k+N3A47rmRx7symE7TwdShCV1897lpFCNRLART/Rgj 0Xmg== X-Gm-Message-State: AOJu0YxfgHaxHjhuNT6a1ok9I3NfUbux3uEk+5D4a3W52rW8kwRE2Tal cF0FCtQxlMeY76QaN9aV03Lln259ZPkMFsUvKSVNzhi0XjYkwxKe4npCW5jSM2YWKKnFrDl12yk YOZ77o1L53mjtRxm1iqKbo8JgtP7gbYBbtbIh9S5sIrSklZCTHpMaFDId4Q== X-Gm-Gg: ASbGnctGFlDHTAr/C0ycs3dEsUlPhodEBULPWHFvA1CEl0HQRCYUZ8TQTBAkMI8glAn 8sYHaMFBPhYS4EKUDqG4QMQ+WyDi4zG9H8bwRFglZQLW9MoALaE3fJrgmvsdupeD9RRjBlZDZXo bLAJDwRByS7rJbqURr27Gn2WR3JFiNLjzAf/k+bUR/WUFc+JRaGexCm5vVArOw3jQUlPiJ32M4Y N+HJVA7QL1aDKL+BbPFb/ijcdgAe7v+tInSOqoHGA9RTjNN23oRyrw8bE5eqCa/gj4sPXqXKu6B 4Ypf0gSc0qnN8gMt X-Received: by 2002:a05:6000:1a87:b0:38f:4ffd:c757 with SMTP id ffacd0b85a97d-38f57bd9167mr1908655f8f.2.1739943450724; Tue, 18 Feb 2025 21:37:30 -0800 (PST) X-Google-Smtp-Source: AGHT+IHJ4hczut/xEaPmQaKwasqOxWt+hVLLJx+dcf/ETWcdehqSks0aOTwnpYPPP2XYfs2e+wJWHA== X-Received: by 2002:a05:6000:1a87:b0:38f:4ffd:c757 with SMTP id ffacd0b85a97d-38f57bd9167mr1908644f8f.2.1739943450333; Tue, 18 Feb 2025 21:37:30 -0800 (PST) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [2a10:fc81:a806:d6a9::1]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-38f258dcc45sm16948808f8f.33.2025.02.18.21.37.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 18 Feb 2025 21:37:29 -0800 (PST) Date: Wed, 19 Feb 2025 06:37:28 +0100 From: Stefano Brivio To: David Gibson Subject: Re: [PATCH 3/3] conf: Be more precise about minimum MTUs Message-ID: <20250219063728.309bf1ac@elisabeth> In-Reply-To: <20250219031429.3708026-4-david@gibson.dropbear.id.au> References: <20250219031429.3708026-1-david@gibson.dropbear.id.au> <20250219031429.3708026-4-david@gibson.dropbear.id.au> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.41; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: fcMI5fz3Af7ImHTeLd2s7Y5N9Or3yeIylkYbegjzeqw_1739943451 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Message-ID-Hash: CYDXSNQYCBLB5IZ5YL4MCQLAPPR537E6 X-Message-ID-Hash: CYDXSNQYCBLB5IZ5YL4MCQLAPPR537E6 X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Wed, 19 Feb 2025 14:14:29 +1100 David Gibson wrote: > Currently we reject the -m option if given a value less than ETH_MAX_MTU ETH_MIN_MTU > (68). That define is derived from the kernel, but its name is misleading= : > it doesn't really have anything to do with Ethernet per se, but is rather > the minimum payload any L2 link must be able to handle in order to carry > IPv4. Yes, that should be IPV4_MIN_MTU instead, but it was only added as recently as 4.14 kernels, so I opted for ETH_MIN_MTU. A misnomer as you pointed out, but safe. > For IPv6, it's not sufficient: that requires an MTU of at least > 1280. >=20 > Furthermore, the value of 68 is the minimum IP *fragment* size the link > must be able to carry. Since we don't support IP fragmentation, it's not > sufficient for us. Instead we should clamp the MTU to 576 for IPv4 - the > minimum IP datagram size that all hosts must be able to accept. First off, the only assumption in RFC 791 terms we can _perhaps_ make is that we are some kind of "module" (also called "node", could be host or router), not a (full) host. Maybe not even a module. So, with that regard, we don't need to be prepared to _accept_ (for ourselves as destination) any particular datagram size. Second, even if all hosts need to be able to accept 576-byte datagrams, that doesn't mean that all links need to be able to carry them. The MTU refers _to the link_, not to what a host is able to accept. And that's the reason why you can set 68 bytes as MTU on most network interfaces on Linux. We set sub-576 values ourselves in tests: $ grep -rn "mtu 256" * passt_tcp:95:guest=09ip link set dev __IFNAME__ mtu 256 passt_vu_tcp:95:guest=09ip link set dev __IFNAME__ mtu 256 That is, indeed, all hosts (not "modules") need to be able to accept (not "forward") datagram sizes of at least 576 bytes... but that's only assuming you can deliver those datagrams to them. This is not just a theoretical matter. As late as 2018, I was made aware of a setup with several (local!) nodes with links between them having ~380 bytes as MTU. Sure enough, the reason why I know about this was an issue coming from the same flawed assumption made in kernel commit c9fefa08190f ("ip6_tunnel: get the min mtu properly in ip6_tnl_xmit"), and fixed by 82a40777de12 ("ip6_tunnel: use the right value for ipv4 min mtu check in ip6_tnl_xmit"). See also commit b4331a681822 ("vti6: Change minimum MTU to IPV4_MIN_MTU, vti6 can carry IPv4 too") on the subject of what links can carry vs. what endpoints should be able to forward. > Move the verification of the MTU's lower bound to logic specific to the I= P > versions and correct those errors. >=20 > Signed-off-by: David Gibson > --- > conf.c | 20 +++++++++++++++----- > ip.h | 7 +++++++ > util.h | 3 --- > 3 files changed, 22 insertions(+), 8 deletions(-) >=20 > diff --git a/conf.c b/conf.c > index c5ee07b0..e127acc1 100644 > --- a/conf.c > +++ b/conf.c > @@ -1663,9 +1663,9 @@ void conf(struct ctx *c, int argc, char **argv) > =09=09=09if (errno || *e) > =09=09=09=09die("Invalid MTU: %s", optarg); > =20 > -=09=09=09if (mtu && (mtu < ETH_MIN_MTU || mtu > ETH_MAX_MTU)) { > -=09=09=09=09die("MTU %lu out of range (%u..%u)", mtu, > -=09=09=09=09 ETH_MIN_MTU, ETH_MAX_MTU); > +=09=09=09if (mtu > ETH_MAX_MTU) { > +=09=09=09=09die("MTU %lu too large (max %u)", > +=09=09=09=09 mtu, ETH_MAX_MTU); > =09=09=09} > =20 > =09=09=09c->mtu =3D mtu; > @@ -1838,10 +1838,20 @@ void conf(struct ctx *c, int argc, char **argv) > =09log_conf_parsed =3D true;=09=09/* Stop printing everything */ > =20 > =09nl_sock_init(c, false); > -=09if (!v6_only) > +=09if (!v6_only) { > +=09=09if (c->mtu < IPV4_MINMAX_DATAGRAM) { Now, if you want to make this symmetric with the IPv6 case, we could also move this here... it just unnecessarily adds lines of code, and this function is already (necessarily) rather long. > +=09=09=09die("MTU %"PRIu16" is too small for IPv4 (minimum %u)", > +=09=09=09 c->mtu, IPV4_MINMAX_DATAGRAM); > +=09=09} > =09=09c->ifi4 =3D conf_ip4(ifi4, &c->ip4); > -=09if (!v4_only) > +=09} > +=09if (!v4_only) { > +=09=09if (c->mtu < IPV6_MIN_MTU) { > +=09=09=09die("MTU %"PRIu16" is too small for IPv6 (minimum %u)", > +=09=09=09 c->mtu, IPV6_MIN_MTU); Does the fact that we don't disable IPv6 imply that IPv6 must be working at all times? In my opinion not. It's also rather convenient to be able to specify '-m 200' (for whatever test) without having to give '-4' explicitly. >From a functionality perspective, I think warn() would be a better choice. > +=09=09} > =09=09c->ifi6 =3D conf_ip6(ifi6, &c->ip6); > +=09} > =09if ((*c->ip4.ifname_out && !c->ifi4) || > =09 (*c->ip6.ifname_out && !c->ifi6)) > =09=09die("External interface not usable"); > diff --git a/ip.h b/ip.h > index 1544dbf4..8f5262fa 100644 > --- a/ip.h > +++ b/ip.h > @@ -104,4 +104,11 @@ static const struct in6_addr in6addr_ll_all_nodes = =3D { > /* IPv4 Limited Broadcast (RFC 919, Section 7), 255.255.255.255 */ > static const struct in_addr in4addr_broadcast =3D { 0xffffffff }; > =20 > +/* Minimum IP datagram size all hosts must be prepared to accept (RFC 79= 1) */ > +#define IPV4_MINMAX_DATAGRAM=09576 > + > +#ifndef IPV6_MIN_MTU > +#define IPV6_MIN_MTU=09=091280 > +#endif > + > #endif /* IP_H */ > diff --git a/util.h b/util.h > index 50e96d32..bdca5ee6 100644 > --- a/util.h > +++ b/util.h > @@ -34,9 +34,6 @@ > #ifndef ETH_MAX_MTU > #define ETH_MAX_MTU=09=09=09USHRT_MAX > #endif > -#ifndef ETH_MIN_MTU > -#define ETH_MIN_MTU=09=09=0968 > -#endif I would be fine with using IPV4_MIN_MTU by the way and adding a fallback for it (there's no such thing as IP_MIN_MTU). > #ifndef IP_MAX_MTU > #define IP_MAX_MTU=09=09=09USHRT_MAX > #endif --=20 Stefano