From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=UbLGsbun; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by passt.top (Postfix) with ESMTPS id 6559B5A061A for ; Thu, 20 Feb 2025 07:45:48 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1740033947; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Eoqbi5QMP6BAqVVQTanDBPVRQa5UcudIxKcBBMbVcGA=; b=UbLGsbuncRGOGDxoHndzCUuFUmffT7PbW7RR0qLHRomz0VZzHJ1d2pLYtAQrLobUaebDkF ycKIOtN1lcKmDUSy8yfVlkU8tw3FU9kn4drMTDG466TjjSowvlNKT0SNzNYTWfEnOC2W7l S8P/9TsJxgRDIcA+yqjTsryOOzE3NZc= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-663-BgPmU_NrNa66JuLhHJNDCA-1; Thu, 20 Feb 2025 01:45:45 -0500 X-MC-Unique: BgPmU_NrNa66JuLhHJNDCA-1 X-Mimecast-MFC-AGG-ID: BgPmU_NrNa66JuLhHJNDCA_1740033944 Received: by mail-wm1-f69.google.com with SMTP id 5b1f17b1804b1-4394b2c19ccso7077725e9.1 for ; Wed, 19 Feb 2025 22:45:45 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740033944; x=1740638744; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=Eoqbi5QMP6BAqVVQTanDBPVRQa5UcudIxKcBBMbVcGA=; b=GzRC9rdLDVOZsdaIracto5u8DB7z7k55V1Rmblr3s2wvrJHLctWgY76ztdOJdZ34zN pVwismgHSCWFNOetzt+tJVMKZDGRzv9peassITrl6ujRO2SDeuLOiUy6KVEB6m2F4jCk h89/HQFiDmaJCkVbVIV+5m6BYyuwDdIFs7gslaQTjYh/ChTR8+yJFXVETyjrPAZIJDBI x7rJPlUF19AyeiMYEXdK+m/uAqfxnOd3HGiINXFeb9KnkTLyy6G/QABXDk0aQKkmr61L x/nvfOK+YfvmIppoG90fxPFuHqzvEUDSV4pkfW9m9x5BDfegfjn0FPYIJ6PlBkMtTzzA 2stg== X-Gm-Message-State: AOJu0YwlVntzyA+GzQX0kfmjWQHesEBWc/JHsOdqgXFEcYJfzCEtr/+q U/Hz/w+8cofvrL5dhZT7/mQlvrzQc+M2E+evJo5l29g1L3IUHsGAj2ouz/BNFZfD3PwsHRdPE4S 9fJMYuXKeQdTY53iMk/EvZ5R2mrs0rBv8cTi1WYl1cKEiQL6LdG2eu+q4IQ== X-Gm-Gg: ASbGncvshpV/2f1ECRohCaIC5P8SjDnY2oIaabqCTFLJ4R+3nBHonm9BXXk0ftJH8eu WZCCqdhitUEWoveek3PxkwnOfHdP2ECZBBZ0qJZuCKduXi/Z7ftgpRf/NXM1BmnOv3ujDTXo1/h 3PCjXFXyQhYoPFLzJc04Ti2WzB0IYtY79dfo1UMZl7CpxNZARqzZvmjnSJahNhFZ3RWmzUWvxhG YTSz74L9qo0/nAk4s8vIDMdG6NXiyxwhQl9i+/+ioVkXBt86dvZRNiEjYw/+kWE7UM7jL46eOXL HlMnCfvpGUY3+49P X-Received: by 2002:a05:600c:4450:b0:439:916a:b3db with SMTP id 5b1f17b1804b1-439916ab5bdmr119328565e9.6.1740033944152; Wed, 19 Feb 2025 22:45:44 -0800 (PST) X-Google-Smtp-Source: AGHT+IHd7B41yj8EGxcgbno7QEP7QowHvFmsMaw8VxeHXOBYdIOPr3PpVlYgbvUz4/wyhmttUlBv9A== X-Received: by 2002:a05:600c:4450:b0:439:916a:b3db with SMTP id 5b1f17b1804b1-439916ab5bdmr119328295e9.6.1740033943596; Wed, 19 Feb 2025 22:45:43 -0800 (PST) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [2a10:fc81:a806:d6a9::1]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-4398148f4fcsm117115275e9.7.2025.02.19.22.45.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 Feb 2025 22:45:42 -0800 (PST) Date: Thu, 20 Feb 2025 07:45:40 +0100 From: Stefano Brivio To: David Gibson Subject: Re: [PATCH 3/3] conf: Be more precise about minimum MTUs Message-ID: <20250220074540.318bee27@elisabeth> In-Reply-To: References: <20250219031429.3708026-1-david@gibson.dropbear.id.au> <20250219031429.3708026-4-david@gibson.dropbear.id.au> <20250219063728.309bf1ac@elisabeth> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.41; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: StngNVmm2EgPdufNsHCm8OUCsJdgFsaTDXG-cNOULEg_1740033944 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: FVFXTG5MPD6RPYAADBD3YZRXIEH2IIN3 X-Message-ID-Hash: FVFXTG5MPD6RPYAADBD3YZRXIEH2IIN3 X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Thu, 20 Feb 2025 14:55:30 +1100 David Gibson wrote: > On Wed, Feb 19, 2025 at 06:37:28AM +0100, Stefano Brivio wrote: > > On Wed, 19 Feb 2025 14:14:29 +1100 > > David Gibson wrote: > > > > > Currently we reject the -m option if given a value less than ETH_MAX_MTU > > > > ETH_MIN_MTU > > > > > (68). That define is derived from the kernel, but its name is misleading: > > > it doesn't really have anything to do with Ethernet per se, but is rather > > > the minimum payload any L2 link must be able to handle in order to carry > > > IPv4. > > > > Yes, that should be IPV4_MIN_MTU instead, but it was only added as > > recently as 4.14 kernels, so I opted for ETH_MIN_MTU. A misnomer as you > > pointed out, but safe. > > Ah, thanks, I hadn't realised that newer kernels had better named > constants. When I respin I'll use matching names. > > > > For IPv6, it's not sufficient: that requires an MTU of at least > > > 1280. > > > > > > Furthermore, the value of 68 is the minimum IP *fragment* size the link > > > must be able to carry. Since we don't support IP fragmentation, it's not > > > sufficient for us. Instead we should clamp the MTU to 576 for IPv4 - the > > > minimum IP datagram size that all hosts must be able to accept. > > > > First off, the only assumption in RFC 791 terms we can _perhaps_ make is > > that we are some kind of "module" (also called "node", could be host or > > router), not a (full) host. Maybe not even a module. So, with that > > regard, we don't need to be prepared to _accept_ (for ourselves as > > destination) any particular datagram size. > > > > Second, even if all hosts need to be able to accept 576-byte datagrams, > > that doesn't mean that all links need to be able to carry them. The MTU > > refers _to the link_, not to what a host is able to accept. > > Ah... yes. I was thinking that that requirement implied that a link > which can't fragment was useless if it couldn't carry 576-byte > datagrams, but thinking over your examples here I realise I was > mistaken. > > > And that's the reason why you can set 68 bytes as MTU on most network > > interfaces on Linux. We set sub-576 values ourselves in tests: > > > > $ grep -rn "mtu 256" * > > passt_tcp:95:guest ip link set dev __IFNAME__ mtu 256 > > passt_vu_tcp:95:guest ip link set dev __IFNAME__ mtu 256 > > > > That is, indeed, all hosts (not "modules") need to be able to accept > > (not "forward") datagram sizes of at least 576 bytes... but that's only > > assuming you can deliver those datagrams to them. > > > > This is not just a theoretical matter. As late as 2018, I was made > > aware of a setup with several (local!) nodes with links between them > > having ~380 bytes as MTU. > > > > Sure enough, the reason why I know about this was an issue coming from > > the same flawed assumption made in kernel commit c9fefa08190f > > ("ip6_tunnel: get the min mtu properly in ip6_tnl_xmit"), and fixed by > > 82a40777de12 ("ip6_tunnel: use the right value for ipv4 min mtu check > > in ip6_tnl_xmit"). > > > > See also commit b4331a681822 ("vti6: Change minimum MTU to IPV4_MIN_MTU, > > vti6 can carry IPv4 too") on the subject of what links can carry vs. > > what endpoints should be able to forward. > > > > > Move the verification of the MTU's lower bound to logic specific to the IP > > > versions and correct those errors. > > > > > > Signed-off-by: David Gibson > > > --- > > > conf.c | 20 +++++++++++++++----- > > > ip.h | 7 +++++++ > > > util.h | 3 --- > > > 3 files changed, 22 insertions(+), 8 deletions(-) > > > > > > diff --git a/conf.c b/conf.c > > > index c5ee07b0..e127acc1 100644 > > > --- a/conf.c > > > +++ b/conf.c > > > @@ -1663,9 +1663,9 @@ void conf(struct ctx *c, int argc, char **argv) > > > if (errno || *e) > > > die("Invalid MTU: %s", optarg); > > > > > > - if (mtu && (mtu < ETH_MIN_MTU || mtu > ETH_MAX_MTU)) { > > > - die("MTU %lu out of range (%u..%u)", mtu, > > > - ETH_MIN_MTU, ETH_MAX_MTU); > > > + if (mtu > ETH_MAX_MTU) { > > > + die("MTU %lu too large (max %u)", > > > + mtu, ETH_MAX_MTU); > > > } > > > > > > c->mtu = mtu; > > > @@ -1838,10 +1838,20 @@ void conf(struct ctx *c, int argc, char **argv) > > > log_conf_parsed = true; /* Stop printing everything */ > > > > > > nl_sock_init(c, false); > > > - if (!v6_only) > > > + if (!v6_only) { > > > + if (c->mtu < IPV4_MINMAX_DATAGRAM) { > > > > Now, if you want to make this symmetric with the IPv6 case, we could > > also move this here... it just unnecessarily adds lines of code, and > > this function is already (necessarily) rather long. > > Sorry, I'm not following what change you're suggesting (or discussing?). The exact change I quoted: moving the check on the minimum MTU to here: if (c->mtu < IPV4_MINMAX_DATAGRAM) { compared to doing it earlier in conf(). > > > + die("MTU %"PRIu16" is too small for IPv4 (minimum %u)", > > > + c->mtu, IPV4_MINMAX_DATAGRAM); > > > + } > > > c->ifi4 = conf_ip4(ifi4, &c->ip4); > > > - if (!v4_only) > > > + } > > > + if (!v4_only) { > > > + if (c->mtu < IPV6_MIN_MTU) { > > > + die("MTU %"PRIu16" is too small for IPv6 (minimum %u)", > > > + c->mtu, IPV6_MIN_MTU); > > > > Does the fact that we don't disable IPv6 imply that IPv6 must be > > working at all times? In my opinion not. > > > > It's also rather convenient to be able to specify '-m 200' (for > > whatever test) without having to give '-4' explicitly. > > > > >From a functionality perspective, I think warn() would be a better > > choice. > > warn() and disable the relevant protocol. That makes sense, I'll make > that change. I don't think it makes sense to disable IPv4, highlighting quote: > > Does the fact that we don't disable IPv6 imply that IPv6 must be > > working at all times? In my opinion not. ...you can advertise a small MTU for whatever reason. The guest might configure it or not. The guest might change it later on. We have no way to re-enable IPv6 once it's disabled, though. So let's just do what the user says, I would suggest, and warn them that it *might* not work. There is zero functionality gained by disabling IPv6. -- Stefano