* [PATCH 0/3] Improve validation of --mtu option @ 2025-02-19 3:14 David Gibson 2025-02-19 3:14 ` [PATCH 1/3] conf: More thorough error checking when parsing " David Gibson ` (2 more replies) 0 siblings, 3 replies; 11+ messages in thread From: David Gibson @ 2025-02-19 3:14 UTC (permalink / raw) To: Stefano Brivio, passt-dev; +Cc: David Gibson Here are some of the more straightforward parts of my patches making various MTU related fixes. These ones improve how we validate the --mtu option. David Gibson (3): conf: More thorough error checking when parsing --mtu option conf: Use 0 instead of -1 as "unassigned" mtu value conf: Be more precise about minimum MTUs conf.c | 40 ++++++++++++++++++++++++++-------------- dhcp.c | 2 +- ip.h | 7 +++++++ ndp.c | 2 +- passt.h | 3 ++- pasta.c | 2 +- tcp.c | 2 +- util.h | 3 --- 8 files changed, 39 insertions(+), 22 deletions(-) -- 2.48.1 ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/3] conf: More thorough error checking when parsing --mtu option 2025-02-19 3:14 [PATCH 0/3] Improve validation of --mtu option David Gibson @ 2025-02-19 3:14 ` David Gibson 2025-02-19 6:56 ` Stefano Brivio 2025-02-19 3:14 ` [PATCH 2/3] conf: Use 0 instead of -1 as "unassigned" mtu value David Gibson 2025-02-19 3:14 ` [PATCH 3/3] conf: Be more precise about minimum MTUs David Gibson 2 siblings, 1 reply; 11+ messages in thread From: David Gibson @ 2025-02-19 3:14 UTC (permalink / raw) To: Stefano Brivio, passt-dev; +Cc: David Gibson We're a bit sloppy with parsing MTU which can lead to some surprising, though fairly harmless, results: * Passing a non-number like '-m xyz' will not give an error and act like -m 0 * Junk after a number (e.g. '-m 1500pqr') will be ignored rather than giving an error * We parse the MTU as a long, then immediately assign to an int, so on some platforms certain ludicrously out of bounds values will be silently truncated, rather than giving an error Be a bit more thorough with the error checking to avoid that. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> --- conf.c | 21 +++++++++++++++------ 1 file changed, 15 insertions(+), 6 deletions(-) diff --git a/conf.c b/conf.c index 18017f51..335f37c9 100644 --- a/conf.c +++ b/conf.c @@ -1652,20 +1652,29 @@ void conf(struct ctx *c, int argc, char **argv) die("Invalid PID file: %s", optarg); break; - case 'm': + case 'm': { + unsigned long mtu; + char *e; + errno = 0; - c->mtu = strtol(optarg, NULL, 0); + mtu = strtoul(optarg, &e, 0); + + if (errno || *e) + die("Invalid MTU: %s", optarg); - if (!c->mtu) { + if (!mtu) { c->mtu = -1; break; } - if (c->mtu < ETH_MIN_MTU || c->mtu > (int)ETH_MAX_MTU || - errno) - die("Invalid MTU: %s", optarg); + if (mtu < ETH_MIN_MTU || mtu > ETH_MAX_MTU) { + die("MTU %lu out of range (%u..%u)", mtu, + ETH_MIN_MTU, ETH_MAX_MTU); + } + c->mtu = mtu; break; + } case 'a': if (inet_pton(AF_INET6, optarg, &c->ip6.addr) && !IN6_IS_ADDR_UNSPECIFIED(&c->ip6.addr) && -- @@ -1652,20 +1652,29 @@ void conf(struct ctx *c, int argc, char **argv) die("Invalid PID file: %s", optarg); break; - case 'm': + case 'm': { + unsigned long mtu; + char *e; + errno = 0; - c->mtu = strtol(optarg, NULL, 0); + mtu = strtoul(optarg, &e, 0); + + if (errno || *e) + die("Invalid MTU: %s", optarg); - if (!c->mtu) { + if (!mtu) { c->mtu = -1; break; } - if (c->mtu < ETH_MIN_MTU || c->mtu > (int)ETH_MAX_MTU || - errno) - die("Invalid MTU: %s", optarg); + if (mtu < ETH_MIN_MTU || mtu > ETH_MAX_MTU) { + die("MTU %lu out of range (%u..%u)", mtu, + ETH_MIN_MTU, ETH_MAX_MTU); + } + c->mtu = mtu; break; + } case 'a': if (inet_pton(AF_INET6, optarg, &c->ip6.addr) && !IN6_IS_ADDR_UNSPECIFIED(&c->ip6.addr) && -- 2.48.1 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 1/3] conf: More thorough error checking when parsing --mtu option 2025-02-19 3:14 ` [PATCH 1/3] conf: More thorough error checking when parsing " David Gibson @ 2025-02-19 6:56 ` Stefano Brivio 0 siblings, 0 replies; 11+ messages in thread From: Stefano Brivio @ 2025-02-19 6:56 UTC (permalink / raw) To: David Gibson; +Cc: passt-dev On Wed, 19 Feb 2025 14:14:27 +1100 David Gibson <david@gibson.dropbear.id.au> wrote: > We're a bit sloppy with parsing MTU which can lead to some surprising, > though fairly harmless, results: > * Passing a non-number like '-m xyz' will not give an error and act like > -m 0 > * Junk after a number (e.g. '-m 1500pqr') will be ignored rather than > giving an error > * We parse the MTU as a long, then immediately assign to an int, so on > some platforms certain ludicrously out of bounds values will be > silently truncated, rather than giving an error > > Be a bit more thorough with the error checking to avoid that. > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Applied. -- Stefano ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 2/3] conf: Use 0 instead of -1 as "unassigned" mtu value 2025-02-19 3:14 [PATCH 0/3] Improve validation of --mtu option David Gibson 2025-02-19 3:14 ` [PATCH 1/3] conf: More thorough error checking when parsing " David Gibson @ 2025-02-19 3:14 ` David Gibson 2025-02-19 6:56 ` Stefano Brivio 2025-02-19 3:14 ` [PATCH 3/3] conf: Be more precise about minimum MTUs David Gibson 2 siblings, 1 reply; 11+ messages in thread From: David Gibson @ 2025-02-19 3:14 UTC (permalink / raw) To: Stefano Brivio, passt-dev; +Cc: David Gibson On the command line -m 0 means "don't assign an MTU" (letting the guest use its default. However, internally we use (c->mtu == -1) to represent that state. We use (c->mtu == 0) to represent "the user didn't specify on the command line, so use the default" - but this is only used during conf(), never afterwards. This is unnecessarily confusing. We can instead just initialise c->mtu to its default (65520) before parsing options and use 0 on both the command line and internally to represent the "don't assign" special case. This ensures that c->mtu is always 0..65535, so we can store it in a uint16_t which is more natural. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> --- conf.c | 11 ++--------- dhcp.c | 2 +- ndp.c | 2 +- passt.h | 3 ++- pasta.c | 2 +- tcp.c | 2 +- 6 files changed, 8 insertions(+), 14 deletions(-) diff --git a/conf.c b/conf.c index 335f37c9..c5ee07b0 100644 --- a/conf.c +++ b/conf.c @@ -1413,6 +1413,7 @@ void conf(struct ctx *c, int argc, char **argv) optstring = "+dqfel:hs:F:p:P:m:a:n:M:g:i:o:D:S:H:461t:u:"; } + c->mtu = ROUND_DOWN(ETH_MAX_MTU - ETH_HLEN, sizeof(uint32_t)); c->tcp.fwd_in.mode = c->tcp.fwd_out.mode = FWD_UNSET; c->udp.fwd_in.mode = c->udp.fwd_out.mode = FWD_UNSET; memcpy(c->our_tap_mac, MAC_OUR_LAA, ETH_ALEN); @@ -1662,12 +1663,7 @@ void conf(struct ctx *c, int argc, char **argv) if (errno || *e) die("Invalid MTU: %s", optarg); - if (!mtu) { - c->mtu = -1; - break; - } - - if (mtu < ETH_MIN_MTU || mtu > ETH_MAX_MTU) { + if (mtu && (mtu < ETH_MIN_MTU || mtu > ETH_MAX_MTU)) { die("MTU %lu out of range (%u..%u)", mtu, ETH_MIN_MTU, ETH_MAX_MTU); } @@ -1980,9 +1976,6 @@ void conf(struct ctx *c, int argc, char **argv) c->no_dhcpv6 = 1; } - if (!c->mtu) - c->mtu = ROUND_DOWN(ETH_MAX_MTU - ETH_HLEN, sizeof(uint32_t)); - get_dns(c); if (!*c->pasta_ifn) { diff --git a/dhcp.c b/dhcp.c index 4a209f10..66a716e3 100644 --- a/dhcp.c +++ b/dhcp.c @@ -417,7 +417,7 @@ int dhcp(const struct ctx *c, const struct pool *p) &c->ip4.guest_gw, sizeof(c->ip4.guest_gw)); } - if (c->mtu != -1) { + if (c->mtu) { opts[26].slen = 2; opts[26].s[0] = c->mtu / 256; opts[26].s[1] = c->mtu % 256; diff --git a/ndp.c b/ndp.c index 37bf7a35..ded2081d 100644 --- a/ndp.c +++ b/ndp.c @@ -256,7 +256,7 @@ static void ndp_ra(const struct ctx *c, const struct in6_addr *dst) ptr = &ra.var[0]; - if (c->mtu != -1) { + if (c->mtu) { struct opt_mtu *mtu = (struct opt_mtu *)ptr; *mtu = (struct opt_mtu) { .header = { diff --git a/passt.h b/passt.h index 1f0dab54..28d13892 100644 --- a/passt.h +++ b/passt.h @@ -274,6 +274,8 @@ struct ctx { int fd_repair; unsigned char our_tap_mac[ETH_ALEN]; unsigned char guest_mac[ETH_ALEN]; + uint16_t mtu; + uint64_t hash_secret[2]; int ifi4; @@ -298,7 +300,6 @@ struct ctx { int no_icmp; struct icmp_ctx icmp; - int mtu; int no_dns; int no_dns_search; int no_dhcp_dns; diff --git a/pasta.c b/pasta.c index 585a51c6..fa3e7de5 100644 --- a/pasta.c +++ b/pasta.c @@ -319,7 +319,7 @@ void pasta_ns_conf(struct ctx *c) if (c->pasta_conf_ns) { unsigned int flags = IFF_UP; - if (c->mtu != -1) + if (c->mtu) nl_link_set_mtu(nl_sock_ns, c->pasta_ifi, c->mtu); if (c->ifi6) /* Avoid duplicate address detection on link up */ diff --git a/tcp.c b/tcp.c index f498f5bc..e3c0a53b 100644 --- a/tcp.c +++ b/tcp.c @@ -1139,7 +1139,7 @@ int tcp_prepare_flags(const struct ctx *c, struct tcp_tap_conn *conn, if (flags & SYN) { int mss; - if (c->mtu == -1) { + if (!c->mtu) { mss = tinfo.tcpi_snd_mss; } else { mss = c->mtu - sizeof(struct tcphdr); -- @@ -1139,7 +1139,7 @@ int tcp_prepare_flags(const struct ctx *c, struct tcp_tap_conn *conn, if (flags & SYN) { int mss; - if (c->mtu == -1) { + if (!c->mtu) { mss = tinfo.tcpi_snd_mss; } else { mss = c->mtu - sizeof(struct tcphdr); -- 2.48.1 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 2/3] conf: Use 0 instead of -1 as "unassigned" mtu value 2025-02-19 3:14 ` [PATCH 2/3] conf: Use 0 instead of -1 as "unassigned" mtu value David Gibson @ 2025-02-19 6:56 ` Stefano Brivio 0 siblings, 0 replies; 11+ messages in thread From: Stefano Brivio @ 2025-02-19 6:56 UTC (permalink / raw) To: David Gibson; +Cc: passt-dev On Wed, 19 Feb 2025 14:14:28 +1100 David Gibson <david@gibson.dropbear.id.au> wrote: > On the command line -m 0 means "don't assign an MTU" (letting the guest use > its default. However, internally we use (c->mtu == -1) to represent that > state. We use (c->mtu == 0) to represent "the user didn't specify on the > command line, so use the default" - but this is only used during conf(), > never afterwards. > > This is unnecessarily confusing. We can instead just initialise c->mtu to > its default (65520) before parsing options and use 0 on both the command > line and internally to represent the "don't assign" special case. This > ensures that c->mtu is always 0..65535, so we can store it in a uint16_t > which is more natural. > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Applied. -- Stefano ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 3/3] conf: Be more precise about minimum MTUs 2025-02-19 3:14 [PATCH 0/3] Improve validation of --mtu option David Gibson 2025-02-19 3:14 ` [PATCH 1/3] conf: More thorough error checking when parsing " David Gibson 2025-02-19 3:14 ` [PATCH 2/3] conf: Use 0 instead of -1 as "unassigned" mtu value David Gibson @ 2025-02-19 3:14 ` David Gibson 2025-02-19 5:37 ` Stefano Brivio 2 siblings, 1 reply; 11+ messages in thread From: David Gibson @ 2025-02-19 3:14 UTC (permalink / raw) To: Stefano Brivio, passt-dev; +Cc: David Gibson Currently we reject the -m option if given a value less than ETH_MAX_MTU (68). That define is derived from the kernel, but its name is misleading: it doesn't really have anything to do with Ethernet per se, but is rather the minimum payload any L2 link must be able to handle in order to carry IPv4. For IPv6, it's not sufficient: that requires an MTU of at least 1280. Furthermore, the value of 68 is the minimum IP *fragment* size the link must be able to carry. Since we don't support IP fragmentation, it's not sufficient for us. Instead we should clamp the MTU to 576 for IPv4 - the minimum IP datagram size that all hosts must be able to accept. Move the verification of the MTU's lower bound to logic specific to the IP versions and correct those errors. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> --- conf.c | 20 +++++++++++++++----- ip.h | 7 +++++++ util.h | 3 --- 3 files changed, 22 insertions(+), 8 deletions(-) diff --git a/conf.c b/conf.c index c5ee07b0..e127acc1 100644 --- a/conf.c +++ b/conf.c @@ -1663,9 +1663,9 @@ void conf(struct ctx *c, int argc, char **argv) if (errno || *e) die("Invalid MTU: %s", optarg); - if (mtu && (mtu < ETH_MIN_MTU || mtu > ETH_MAX_MTU)) { - die("MTU %lu out of range (%u..%u)", mtu, - ETH_MIN_MTU, ETH_MAX_MTU); + if (mtu > ETH_MAX_MTU) { + die("MTU %lu too large (max %u)", + mtu, ETH_MAX_MTU); } c->mtu = mtu; @@ -1838,10 +1838,20 @@ void conf(struct ctx *c, int argc, char **argv) log_conf_parsed = true; /* Stop printing everything */ nl_sock_init(c, false); - if (!v6_only) + if (!v6_only) { + if (c->mtu < IPV4_MINMAX_DATAGRAM) { + die("MTU %"PRIu16" is too small for IPv4 (minimum %u)", + c->mtu, IPV4_MINMAX_DATAGRAM); + } c->ifi4 = conf_ip4(ifi4, &c->ip4); - if (!v4_only) + } + if (!v4_only) { + if (c->mtu < IPV6_MIN_MTU) { + die("MTU %"PRIu16" is too small for IPv6 (minimum %u)", + c->mtu, IPV6_MIN_MTU); + } c->ifi6 = conf_ip6(ifi6, &c->ip6); + } if ((*c->ip4.ifname_out && !c->ifi4) || (*c->ip6.ifname_out && !c->ifi6)) die("External interface not usable"); diff --git a/ip.h b/ip.h index 1544dbf4..8f5262fa 100644 --- a/ip.h +++ b/ip.h @@ -104,4 +104,11 @@ static const struct in6_addr in6addr_ll_all_nodes = { /* IPv4 Limited Broadcast (RFC 919, Section 7), 255.255.255.255 */ static const struct in_addr in4addr_broadcast = { 0xffffffff }; +/* Minimum IP datagram size all hosts must be prepared to accept (RFC 791) */ +#define IPV4_MINMAX_DATAGRAM 576 + +#ifndef IPV6_MIN_MTU +#define IPV6_MIN_MTU 1280 +#endif + #endif /* IP_H */ diff --git a/util.h b/util.h index 50e96d32..bdca5ee6 100644 --- a/util.h +++ b/util.h @@ -34,9 +34,6 @@ #ifndef ETH_MAX_MTU #define ETH_MAX_MTU USHRT_MAX #endif -#ifndef ETH_MIN_MTU -#define ETH_MIN_MTU 68 -#endif #ifndef IP_MAX_MTU #define IP_MAX_MTU USHRT_MAX #endif -- @@ -34,9 +34,6 @@ #ifndef ETH_MAX_MTU #define ETH_MAX_MTU USHRT_MAX #endif -#ifndef ETH_MIN_MTU -#define ETH_MIN_MTU 68 -#endif #ifndef IP_MAX_MTU #define IP_MAX_MTU USHRT_MAX #endif -- 2.48.1 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 3/3] conf: Be more precise about minimum MTUs 2025-02-19 3:14 ` [PATCH 3/3] conf: Be more precise about minimum MTUs David Gibson @ 2025-02-19 5:37 ` Stefano Brivio 2025-02-20 3:55 ` David Gibson 0 siblings, 1 reply; 11+ messages in thread From: Stefano Brivio @ 2025-02-19 5:37 UTC (permalink / raw) To: David Gibson; +Cc: passt-dev On Wed, 19 Feb 2025 14:14:29 +1100 David Gibson <david@gibson.dropbear.id.au> wrote: > Currently we reject the -m option if given a value less than ETH_MAX_MTU ETH_MIN_MTU > (68). That define is derived from the kernel, but its name is misleading: > it doesn't really have anything to do with Ethernet per se, but is rather > the minimum payload any L2 link must be able to handle in order to carry > IPv4. Yes, that should be IPV4_MIN_MTU instead, but it was only added as recently as 4.14 kernels, so I opted for ETH_MIN_MTU. A misnomer as you pointed out, but safe. > For IPv6, it's not sufficient: that requires an MTU of at least > 1280. > > Furthermore, the value of 68 is the minimum IP *fragment* size the link > must be able to carry. Since we don't support IP fragmentation, it's not > sufficient for us. Instead we should clamp the MTU to 576 for IPv4 - the > minimum IP datagram size that all hosts must be able to accept. First off, the only assumption in RFC 791 terms we can _perhaps_ make is that we are some kind of "module" (also called "node", could be host or router), not a (full) host. Maybe not even a module. So, with that regard, we don't need to be prepared to _accept_ (for ourselves as destination) any particular datagram size. Second, even if all hosts need to be able to accept 576-byte datagrams, that doesn't mean that all links need to be able to carry them. The MTU refers _to the link_, not to what a host is able to accept. And that's the reason why you can set 68 bytes as MTU on most network interfaces on Linux. We set sub-576 values ourselves in tests: $ grep -rn "mtu 256" * passt_tcp:95:guest ip link set dev __IFNAME__ mtu 256 passt_vu_tcp:95:guest ip link set dev __IFNAME__ mtu 256 That is, indeed, all hosts (not "modules") need to be able to accept (not "forward") datagram sizes of at least 576 bytes... but that's only assuming you can deliver those datagrams to them. This is not just a theoretical matter. As late as 2018, I was made aware of a setup with several (local!) nodes with links between them having ~380 bytes as MTU. Sure enough, the reason why I know about this was an issue coming from the same flawed assumption made in kernel commit c9fefa08190f ("ip6_tunnel: get the min mtu properly in ip6_tnl_xmit"), and fixed by 82a40777de12 ("ip6_tunnel: use the right value for ipv4 min mtu check in ip6_tnl_xmit"). See also commit b4331a681822 ("vti6: Change minimum MTU to IPV4_MIN_MTU, vti6 can carry IPv4 too") on the subject of what links can carry vs. what endpoints should be able to forward. > Move the verification of the MTU's lower bound to logic specific to the IP > versions and correct those errors. > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au> > --- > conf.c | 20 +++++++++++++++----- > ip.h | 7 +++++++ > util.h | 3 --- > 3 files changed, 22 insertions(+), 8 deletions(-) > > diff --git a/conf.c b/conf.c > index c5ee07b0..e127acc1 100644 > --- a/conf.c > +++ b/conf.c > @@ -1663,9 +1663,9 @@ void conf(struct ctx *c, int argc, char **argv) > if (errno || *e) > die("Invalid MTU: %s", optarg); > > - if (mtu && (mtu < ETH_MIN_MTU || mtu > ETH_MAX_MTU)) { > - die("MTU %lu out of range (%u..%u)", mtu, > - ETH_MIN_MTU, ETH_MAX_MTU); > + if (mtu > ETH_MAX_MTU) { > + die("MTU %lu too large (max %u)", > + mtu, ETH_MAX_MTU); > } > > c->mtu = mtu; > @@ -1838,10 +1838,20 @@ void conf(struct ctx *c, int argc, char **argv) > log_conf_parsed = true; /* Stop printing everything */ > > nl_sock_init(c, false); > - if (!v6_only) > + if (!v6_only) { > + if (c->mtu < IPV4_MINMAX_DATAGRAM) { Now, if you want to make this symmetric with the IPv6 case, we could also move this here... it just unnecessarily adds lines of code, and this function is already (necessarily) rather long. > + die("MTU %"PRIu16" is too small for IPv4 (minimum %u)", > + c->mtu, IPV4_MINMAX_DATAGRAM); > + } > c->ifi4 = conf_ip4(ifi4, &c->ip4); > - if (!v4_only) > + } > + if (!v4_only) { > + if (c->mtu < IPV6_MIN_MTU) { > + die("MTU %"PRIu16" is too small for IPv6 (minimum %u)", > + c->mtu, IPV6_MIN_MTU); Does the fact that we don't disable IPv6 imply that IPv6 must be working at all times? In my opinion not. It's also rather convenient to be able to specify '-m 200' (for whatever test) without having to give '-4' explicitly. From a functionality perspective, I think warn() would be a better choice. > + } > c->ifi6 = conf_ip6(ifi6, &c->ip6); > + } > if ((*c->ip4.ifname_out && !c->ifi4) || > (*c->ip6.ifname_out && !c->ifi6)) > die("External interface not usable"); > diff --git a/ip.h b/ip.h > index 1544dbf4..8f5262fa 100644 > --- a/ip.h > +++ b/ip.h > @@ -104,4 +104,11 @@ static const struct in6_addr in6addr_ll_all_nodes = { > /* IPv4 Limited Broadcast (RFC 919, Section 7), 255.255.255.255 */ > static const struct in_addr in4addr_broadcast = { 0xffffffff }; > > +/* Minimum IP datagram size all hosts must be prepared to accept (RFC 791) */ > +#define IPV4_MINMAX_DATAGRAM 576 > + > +#ifndef IPV6_MIN_MTU > +#define IPV6_MIN_MTU 1280 > +#endif > + > #endif /* IP_H */ > diff --git a/util.h b/util.h > index 50e96d32..bdca5ee6 100644 > --- a/util.h > +++ b/util.h > @@ -34,9 +34,6 @@ > #ifndef ETH_MAX_MTU > #define ETH_MAX_MTU USHRT_MAX > #endif > -#ifndef ETH_MIN_MTU > -#define ETH_MIN_MTU 68 > -#endif I would be fine with using IPV4_MIN_MTU by the way and adding a fallback for it (there's no such thing as IP_MIN_MTU). > #ifndef IP_MAX_MTU > #define IP_MAX_MTU USHRT_MAX > #endif -- Stefano ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 3/3] conf: Be more precise about minimum MTUs 2025-02-19 5:37 ` Stefano Brivio @ 2025-02-20 3:55 ` David Gibson 2025-02-20 6:45 ` Stefano Brivio 0 siblings, 1 reply; 11+ messages in thread From: David Gibson @ 2025-02-20 3:55 UTC (permalink / raw) To: Stefano Brivio; +Cc: passt-dev [-- Attachment #1: Type: text/plain, Size: 6836 bytes --] On Wed, Feb 19, 2025 at 06:37:28AM +0100, Stefano Brivio wrote: > On Wed, 19 Feb 2025 14:14:29 +1100 > David Gibson <david@gibson.dropbear.id.au> wrote: > > > Currently we reject the -m option if given a value less than ETH_MAX_MTU > > ETH_MIN_MTU > > > (68). That define is derived from the kernel, but its name is misleading: > > it doesn't really have anything to do with Ethernet per se, but is rather > > the minimum payload any L2 link must be able to handle in order to carry > > IPv4. > > Yes, that should be IPV4_MIN_MTU instead, but it was only added as > recently as 4.14 kernels, so I opted for ETH_MIN_MTU. A misnomer as you > pointed out, but safe. Ah, thanks, I hadn't realised that newer kernels had better named constants. When I respin I'll use matching names. > > For IPv6, it's not sufficient: that requires an MTU of at least > > 1280. > > > > Furthermore, the value of 68 is the minimum IP *fragment* size the link > > must be able to carry. Since we don't support IP fragmentation, it's not > > sufficient for us. Instead we should clamp the MTU to 576 for IPv4 - the > > minimum IP datagram size that all hosts must be able to accept. > > First off, the only assumption in RFC 791 terms we can _perhaps_ make is > that we are some kind of "module" (also called "node", could be host or > router), not a (full) host. Maybe not even a module. So, with that > regard, we don't need to be prepared to _accept_ (for ourselves as > destination) any particular datagram size. > > Second, even if all hosts need to be able to accept 576-byte datagrams, > that doesn't mean that all links need to be able to carry them. The MTU > refers _to the link_, not to what a host is able to accept. Ah... yes. I was thinking that that requirement implied that a link which can't fragment was useless if it couldn't carry 576-byte datagrams, but thinking over your examples here I realise I was mistaken. > And that's the reason why you can set 68 bytes as MTU on most network > interfaces on Linux. We set sub-576 values ourselves in tests: > > $ grep -rn "mtu 256" * > passt_tcp:95:guest ip link set dev __IFNAME__ mtu 256 > passt_vu_tcp:95:guest ip link set dev __IFNAME__ mtu 256 > > That is, indeed, all hosts (not "modules") need to be able to accept > (not "forward") datagram sizes of at least 576 bytes... but that's only > assuming you can deliver those datagrams to them. > > This is not just a theoretical matter. As late as 2018, I was made > aware of a setup with several (local!) nodes with links between them > having ~380 bytes as MTU. > > Sure enough, the reason why I know about this was an issue coming from > the same flawed assumption made in kernel commit c9fefa08190f > ("ip6_tunnel: get the min mtu properly in ip6_tnl_xmit"), and fixed by > 82a40777de12 ("ip6_tunnel: use the right value for ipv4 min mtu check > in ip6_tnl_xmit"). > > See also commit b4331a681822 ("vti6: Change minimum MTU to IPV4_MIN_MTU, > vti6 can carry IPv4 too") on the subject of what links can carry vs. > what endpoints should be able to forward. > > > Move the verification of the MTU's lower bound to logic specific to the IP > > versions and correct those errors. > > > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au> > > --- > > conf.c | 20 +++++++++++++++----- > > ip.h | 7 +++++++ > > util.h | 3 --- > > 3 files changed, 22 insertions(+), 8 deletions(-) > > > > diff --git a/conf.c b/conf.c > > index c5ee07b0..e127acc1 100644 > > --- a/conf.c > > +++ b/conf.c > > @@ -1663,9 +1663,9 @@ void conf(struct ctx *c, int argc, char **argv) > > if (errno || *e) > > die("Invalid MTU: %s", optarg); > > > > - if (mtu && (mtu < ETH_MIN_MTU || mtu > ETH_MAX_MTU)) { > > - die("MTU %lu out of range (%u..%u)", mtu, > > - ETH_MIN_MTU, ETH_MAX_MTU); > > + if (mtu > ETH_MAX_MTU) { > > + die("MTU %lu too large (max %u)", > > + mtu, ETH_MAX_MTU); > > } > > > > c->mtu = mtu; > > @@ -1838,10 +1838,20 @@ void conf(struct ctx *c, int argc, char **argv) > > log_conf_parsed = true; /* Stop printing everything */ > > > > nl_sock_init(c, false); > > - if (!v6_only) > > + if (!v6_only) { > > + if (c->mtu < IPV4_MINMAX_DATAGRAM) { > > Now, if you want to make this symmetric with the IPv6 case, we could > also move this here... it just unnecessarily adds lines of code, and > this function is already (necessarily) rather long. Sorry, I'm not following what change you're suggesting (or discussing?). > > + die("MTU %"PRIu16" is too small for IPv4 (minimum %u)", > > + c->mtu, IPV4_MINMAX_DATAGRAM); > > + } > > c->ifi4 = conf_ip4(ifi4, &c->ip4); > > - if (!v4_only) > > + } > > + if (!v4_only) { > > + if (c->mtu < IPV6_MIN_MTU) { > > + die("MTU %"PRIu16" is too small for IPv6 (minimum %u)", > > + c->mtu, IPV6_MIN_MTU); > > Does the fact that we don't disable IPv6 imply that IPv6 must be > working at all times? In my opinion not. > > It's also rather convenient to be able to specify '-m 200' (for > whatever test) without having to give '-4' explicitly. > > >From a functionality perspective, I think warn() would be a better > choice. warn() and disable the relevant protocol. That makes sense, I'll make that change. > > > + } > > c->ifi6 = conf_ip6(ifi6, &c->ip6); > > + } > > if ((*c->ip4.ifname_out && !c->ifi4) || > > (*c->ip6.ifname_out && !c->ifi6)) > > die("External interface not usable"); > > diff --git a/ip.h b/ip.h > > index 1544dbf4..8f5262fa 100644 > > --- a/ip.h > > +++ b/ip.h > > @@ -104,4 +104,11 @@ static const struct in6_addr in6addr_ll_all_nodes = { > > /* IPv4 Limited Broadcast (RFC 919, Section 7), 255.255.255.255 */ > > static const struct in_addr in4addr_broadcast = { 0xffffffff }; > > > > +/* Minimum IP datagram size all hosts must be prepared to accept (RFC 791) */ > > +#define IPV4_MINMAX_DATAGRAM 576 > > + > > +#ifndef IPV6_MIN_MTU > > +#define IPV6_MIN_MTU 1280 > > +#endif > > + > > #endif /* IP_H */ > > diff --git a/util.h b/util.h > > index 50e96d32..bdca5ee6 100644 > > --- a/util.h > > +++ b/util.h > > @@ -34,9 +34,6 @@ > > #ifndef ETH_MAX_MTU > > #define ETH_MAX_MTU USHRT_MAX > > #endif > > -#ifndef ETH_MIN_MTU > > -#define ETH_MIN_MTU 68 > > -#endif > > I would be fine with using IPV4_MIN_MTU by the way and adding a > fallback for it (there's no such thing as IP_MIN_MTU). > > > #ifndef IP_MAX_MTU > > #define IP_MAX_MTU USHRT_MAX > > #endif > -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 3/3] conf: Be more precise about minimum MTUs 2025-02-20 3:55 ` David Gibson @ 2025-02-20 6:45 ` Stefano Brivio 2025-02-20 10:06 ` David Gibson 0 siblings, 1 reply; 11+ messages in thread From: Stefano Brivio @ 2025-02-20 6:45 UTC (permalink / raw) To: David Gibson; +Cc: passt-dev On Thu, 20 Feb 2025 14:55:30 +1100 David Gibson <david@gibson.dropbear.id.au> wrote: > On Wed, Feb 19, 2025 at 06:37:28AM +0100, Stefano Brivio wrote: > > On Wed, 19 Feb 2025 14:14:29 +1100 > > David Gibson <david@gibson.dropbear.id.au> wrote: > > > > > Currently we reject the -m option if given a value less than ETH_MAX_MTU > > > > ETH_MIN_MTU > > > > > (68). That define is derived from the kernel, but its name is misleading: > > > it doesn't really have anything to do with Ethernet per se, but is rather > > > the minimum payload any L2 link must be able to handle in order to carry > > > IPv4. > > > > Yes, that should be IPV4_MIN_MTU instead, but it was only added as > > recently as 4.14 kernels, so I opted for ETH_MIN_MTU. A misnomer as you > > pointed out, but safe. > > Ah, thanks, I hadn't realised that newer kernels had better named > constants. When I respin I'll use matching names. > > > > For IPv6, it's not sufficient: that requires an MTU of at least > > > 1280. > > > > > > Furthermore, the value of 68 is the minimum IP *fragment* size the link > > > must be able to carry. Since we don't support IP fragmentation, it's not > > > sufficient for us. Instead we should clamp the MTU to 576 for IPv4 - the > > > minimum IP datagram size that all hosts must be able to accept. > > > > First off, the only assumption in RFC 791 terms we can _perhaps_ make is > > that we are some kind of "module" (also called "node", could be host or > > router), not a (full) host. Maybe not even a module. So, with that > > regard, we don't need to be prepared to _accept_ (for ourselves as > > destination) any particular datagram size. > > > > Second, even if all hosts need to be able to accept 576-byte datagrams, > > that doesn't mean that all links need to be able to carry them. The MTU > > refers _to the link_, not to what a host is able to accept. > > Ah... yes. I was thinking that that requirement implied that a link > which can't fragment was useless if it couldn't carry 576-byte > datagrams, but thinking over your examples here I realise I was > mistaken. > > > And that's the reason why you can set 68 bytes as MTU on most network > > interfaces on Linux. We set sub-576 values ourselves in tests: > > > > $ grep -rn "mtu 256" * > > passt_tcp:95:guest ip link set dev __IFNAME__ mtu 256 > > passt_vu_tcp:95:guest ip link set dev __IFNAME__ mtu 256 > > > > That is, indeed, all hosts (not "modules") need to be able to accept > > (not "forward") datagram sizes of at least 576 bytes... but that's only > > assuming you can deliver those datagrams to them. > > > > This is not just a theoretical matter. As late as 2018, I was made > > aware of a setup with several (local!) nodes with links between them > > having ~380 bytes as MTU. > > > > Sure enough, the reason why I know about this was an issue coming from > > the same flawed assumption made in kernel commit c9fefa08190f > > ("ip6_tunnel: get the min mtu properly in ip6_tnl_xmit"), and fixed by > > 82a40777de12 ("ip6_tunnel: use the right value for ipv4 min mtu check > > in ip6_tnl_xmit"). > > > > See also commit b4331a681822 ("vti6: Change minimum MTU to IPV4_MIN_MTU, > > vti6 can carry IPv4 too") on the subject of what links can carry vs. > > what endpoints should be able to forward. > > > > > Move the verification of the MTU's lower bound to logic specific to the IP > > > versions and correct those errors. > > > > > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au> > > > --- > > > conf.c | 20 +++++++++++++++----- > > > ip.h | 7 +++++++ > > > util.h | 3 --- > > > 3 files changed, 22 insertions(+), 8 deletions(-) > > > > > > diff --git a/conf.c b/conf.c > > > index c5ee07b0..e127acc1 100644 > > > --- a/conf.c > > > +++ b/conf.c > > > @@ -1663,9 +1663,9 @@ void conf(struct ctx *c, int argc, char **argv) > > > if (errno || *e) > > > die("Invalid MTU: %s", optarg); > > > > > > - if (mtu && (mtu < ETH_MIN_MTU || mtu > ETH_MAX_MTU)) { > > > - die("MTU %lu out of range (%u..%u)", mtu, > > > - ETH_MIN_MTU, ETH_MAX_MTU); > > > + if (mtu > ETH_MAX_MTU) { > > > + die("MTU %lu too large (max %u)", > > > + mtu, ETH_MAX_MTU); > > > } > > > > > > c->mtu = mtu; > > > @@ -1838,10 +1838,20 @@ void conf(struct ctx *c, int argc, char **argv) > > > log_conf_parsed = true; /* Stop printing everything */ > > > > > > nl_sock_init(c, false); > > > - if (!v6_only) > > > + if (!v6_only) { > > > + if (c->mtu < IPV4_MINMAX_DATAGRAM) { > > > > Now, if you want to make this symmetric with the IPv6 case, we could > > also move this here... it just unnecessarily adds lines of code, and > > this function is already (necessarily) rather long. > > Sorry, I'm not following what change you're suggesting (or discussing?). The exact change I quoted: moving the check on the minimum MTU to here: if (c->mtu < IPV4_MINMAX_DATAGRAM) { compared to doing it earlier in conf(). > > > + die("MTU %"PRIu16" is too small for IPv4 (minimum %u)", > > > + c->mtu, IPV4_MINMAX_DATAGRAM); > > > + } > > > c->ifi4 = conf_ip4(ifi4, &c->ip4); > > > - if (!v4_only) > > > + } > > > + if (!v4_only) { > > > + if (c->mtu < IPV6_MIN_MTU) { > > > + die("MTU %"PRIu16" is too small for IPv6 (minimum %u)", > > > + c->mtu, IPV6_MIN_MTU); > > > > Does the fact that we don't disable IPv6 imply that IPv6 must be > > working at all times? In my opinion not. > > > > It's also rather convenient to be able to specify '-m 200' (for > > whatever test) without having to give '-4' explicitly. > > > > >From a functionality perspective, I think warn() would be a better > > choice. > > warn() and disable the relevant protocol. That makes sense, I'll make > that change. I don't think it makes sense to disable IPv4, highlighting quote: > > Does the fact that we don't disable IPv6 imply that IPv6 must be > > working at all times? In my opinion not. ...you can advertise a small MTU for whatever reason. The guest might configure it or not. The guest might change it later on. We have no way to re-enable IPv6 once it's disabled, though. So let's just do what the user says, I would suggest, and warn them that it *might* not work. There is zero functionality gained by disabling IPv6. -- Stefano ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 3/3] conf: Be more precise about minimum MTUs 2025-02-20 6:45 ` Stefano Brivio @ 2025-02-20 10:06 ` David Gibson 2025-02-20 10:14 ` Stefano Brivio 0 siblings, 1 reply; 11+ messages in thread From: David Gibson @ 2025-02-20 10:06 UTC (permalink / raw) To: Stefano Brivio; +Cc: passt-dev [-- Attachment #1: Type: text/plain, Size: 7168 bytes --] On Thu, Feb 20, 2025 at 07:45:40AM +0100, Stefano Brivio wrote: > On Thu, 20 Feb 2025 14:55:30 +1100 > David Gibson <david@gibson.dropbear.id.au> wrote: > > > On Wed, Feb 19, 2025 at 06:37:28AM +0100, Stefano Brivio wrote: > > > On Wed, 19 Feb 2025 14:14:29 +1100 > > > David Gibson <david@gibson.dropbear.id.au> wrote: > > > > > > > Currently we reject the -m option if given a value less than ETH_MAX_MTU > > > > > > ETH_MIN_MTU > > > > > > > (68). That define is derived from the kernel, but its name is misleading: > > > > it doesn't really have anything to do with Ethernet per se, but is rather > > > > the minimum payload any L2 link must be able to handle in order to carry > > > > IPv4. > > > > > > Yes, that should be IPV4_MIN_MTU instead, but it was only added as > > > recently as 4.14 kernels, so I opted for ETH_MIN_MTU. A misnomer as you > > > pointed out, but safe. > > > > Ah, thanks, I hadn't realised that newer kernels had better named > > constants. When I respin I'll use matching names. > > > > > > For IPv6, it's not sufficient: that requires an MTU of at least > > > > 1280. > > > > > > > > Furthermore, the value of 68 is the minimum IP *fragment* size the link > > > > must be able to carry. Since we don't support IP fragmentation, it's not > > > > sufficient for us. Instead we should clamp the MTU to 576 for IPv4 - the > > > > minimum IP datagram size that all hosts must be able to accept. > > > > > > First off, the only assumption in RFC 791 terms we can _perhaps_ make is > > > that we are some kind of "module" (also called "node", could be host or > > > router), not a (full) host. Maybe not even a module. So, with that > > > regard, we don't need to be prepared to _accept_ (for ourselves as > > > destination) any particular datagram size. > > > > > > Second, even if all hosts need to be able to accept 576-byte datagrams, > > > that doesn't mean that all links need to be able to carry them. The MTU > > > refers _to the link_, not to what a host is able to accept. > > > > Ah... yes. I was thinking that that requirement implied that a link > > which can't fragment was useless if it couldn't carry 576-byte > > datagrams, but thinking over your examples here I realise I was > > mistaken. > > > > > And that's the reason why you can set 68 bytes as MTU on most network > > > interfaces on Linux. We set sub-576 values ourselves in tests: > > > > > > $ grep -rn "mtu 256" * > > > passt_tcp:95:guest ip link set dev __IFNAME__ mtu 256 > > > passt_vu_tcp:95:guest ip link set dev __IFNAME__ mtu 256 > > > > > > That is, indeed, all hosts (not "modules") need to be able to accept > > > (not "forward") datagram sizes of at least 576 bytes... but that's only > > > assuming you can deliver those datagrams to them. > > > > > > This is not just a theoretical matter. As late as 2018, I was made > > > aware of a setup with several (local!) nodes with links between them > > > having ~380 bytes as MTU. > > > > > > Sure enough, the reason why I know about this was an issue coming from > > > the same flawed assumption made in kernel commit c9fefa08190f > > > ("ip6_tunnel: get the min mtu properly in ip6_tnl_xmit"), and fixed by > > > 82a40777de12 ("ip6_tunnel: use the right value for ipv4 min mtu check > > > in ip6_tnl_xmit"). > > > > > > See also commit b4331a681822 ("vti6: Change minimum MTU to IPV4_MIN_MTU, > > > vti6 can carry IPv4 too") on the subject of what links can carry vs. > > > what endpoints should be able to forward. > > > > > > > Move the verification of the MTU's lower bound to logic specific to the IP > > > > versions and correct those errors. > > > > > > > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au> > > > > --- > > > > conf.c | 20 +++++++++++++++----- > > > > ip.h | 7 +++++++ > > > > util.h | 3 --- > > > > 3 files changed, 22 insertions(+), 8 deletions(-) > > > > > > > > diff --git a/conf.c b/conf.c > > > > index c5ee07b0..e127acc1 100644 > > > > --- a/conf.c > > > > +++ b/conf.c > > > > @@ -1663,9 +1663,9 @@ void conf(struct ctx *c, int argc, char **argv) > > > > if (errno || *e) > > > > die("Invalid MTU: %s", optarg); > > > > > > > > - if (mtu && (mtu < ETH_MIN_MTU || mtu > ETH_MAX_MTU)) { > > > > - die("MTU %lu out of range (%u..%u)", mtu, > > > > - ETH_MIN_MTU, ETH_MAX_MTU); > > > > + if (mtu > ETH_MAX_MTU) { > > > > + die("MTU %lu too large (max %u)", > > > > + mtu, ETH_MAX_MTU); > > > > } > > > > > > > > c->mtu = mtu; > > > > @@ -1838,10 +1838,20 @@ void conf(struct ctx *c, int argc, char **argv) > > > > log_conf_parsed = true; /* Stop printing everything */ > > > > > > > > nl_sock_init(c, false); > > > > - if (!v6_only) > > > > + if (!v6_only) { > > > > + if (c->mtu < IPV4_MINMAX_DATAGRAM) { > > > > > > Now, if you want to make this symmetric with the IPv6 case, we could > > > also move this here... it just unnecessarily adds lines of code, and > > > this function is already (necessarily) rather long. > > > > Sorry, I'm not following what change you're suggesting (or discussing?). > > The exact change I quoted: moving the check on the minimum MTU to here: > > if (c->mtu < IPV4_MINMAX_DATAGRAM) { > > compared to doing it earlier in conf(). But... the diff you're commenting on is already doing exactly that. What am I missing? > > > > + die("MTU %"PRIu16" is too small for IPv4 (minimum %u)", > > > > + c->mtu, IPV4_MINMAX_DATAGRAM); > > > > + } > > > > c->ifi4 = conf_ip4(ifi4, &c->ip4); > > > > - if (!v4_only) > > > > + } > > > > + if (!v4_only) { > > > > + if (c->mtu < IPV6_MIN_MTU) { > > > > + die("MTU %"PRIu16" is too small for IPv6 (minimum %u)", > > > > + c->mtu, IPV6_MIN_MTU); > > > > > > Does the fact that we don't disable IPv6 imply that IPv6 must be > > > working at all times? In my opinion not. > > > > > > It's also rather convenient to be able to specify '-m 200' (for > > > whatever test) without having to give '-4' explicitly. > > > > > > >From a functionality perspective, I think warn() would be a better > > > choice. > > > > warn() and disable the relevant protocol. That makes sense, I'll make > > that change. > > I don't think it makes sense to disable IPv4, highlighting quote: > > > > Does the fact that we don't disable IPv6 imply that IPv6 must be > > > working at all times? In my opinion not. > > ...you can advertise a small MTU for whatever reason. The guest might > configure it or not. The guest might change it later on. We have no way > to re-enable IPv6 once it's disabled, though. Ah... good point. > So let's just do what the user says, I would suggest, and warn them > that it *might* not work. There is zero functionality gained by > disabling IPv6. Ok, I'll send a v3 which does that. -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 3/3] conf: Be more precise about minimum MTUs 2025-02-20 10:06 ` David Gibson @ 2025-02-20 10:14 ` Stefano Brivio 0 siblings, 0 replies; 11+ messages in thread From: Stefano Brivio @ 2025-02-20 10:14 UTC (permalink / raw) To: David Gibson; +Cc: passt-dev On Thu, 20 Feb 2025 21:06:52 +1100 David Gibson <david@gibson.dropbear.id.au> wrote: > On Thu, Feb 20, 2025 at 07:45:40AM +0100, Stefano Brivio wrote: > > On Thu, 20 Feb 2025 14:55:30 +1100 > > David Gibson <david@gibson.dropbear.id.au> wrote: > > > > > On Wed, Feb 19, 2025 at 06:37:28AM +0100, Stefano Brivio wrote: > > > > On Wed, 19 Feb 2025 14:14:29 +1100 > > > > David Gibson <david@gibson.dropbear.id.au> wrote: > > > > > > > > > Currently we reject the -m option if given a value less than ETH_MAX_MTU > > > > > > > > ETH_MIN_MTU > > > > > > > > > (68). That define is derived from the kernel, but its name is misleading: > > > > > it doesn't really have anything to do with Ethernet per se, but is rather > > > > > the minimum payload any L2 link must be able to handle in order to carry > > > > > IPv4. > > > > > > > > Yes, that should be IPV4_MIN_MTU instead, but it was only added as > > > > recently as 4.14 kernels, so I opted for ETH_MIN_MTU. A misnomer as you > > > > pointed out, but safe. > > > > > > Ah, thanks, I hadn't realised that newer kernels had better named > > > constants. When I respin I'll use matching names. > > > > > > > > For IPv6, it's not sufficient: that requires an MTU of at least > > > > > 1280. > > > > > > > > > > Furthermore, the value of 68 is the minimum IP *fragment* size the link > > > > > must be able to carry. Since we don't support IP fragmentation, it's not > > > > > sufficient for us. Instead we should clamp the MTU to 576 for IPv4 - the > > > > > minimum IP datagram size that all hosts must be able to accept. > > > > > > > > First off, the only assumption in RFC 791 terms we can _perhaps_ make is > > > > that we are some kind of "module" (also called "node", could be host or > > > > router), not a (full) host. Maybe not even a module. So, with that > > > > regard, we don't need to be prepared to _accept_ (for ourselves as > > > > destination) any particular datagram size. > > > > > > > > Second, even if all hosts need to be able to accept 576-byte datagrams, > > > > that doesn't mean that all links need to be able to carry them. The MTU > > > > refers _to the link_, not to what a host is able to accept. > > > > > > Ah... yes. I was thinking that that requirement implied that a link > > > which can't fragment was useless if it couldn't carry 576-byte > > > datagrams, but thinking over your examples here I realise I was > > > mistaken. > > > > > > > And that's the reason why you can set 68 bytes as MTU on most network > > > > interfaces on Linux. We set sub-576 values ourselves in tests: > > > > > > > > $ grep -rn "mtu 256" * > > > > passt_tcp:95:guest ip link set dev __IFNAME__ mtu 256 > > > > passt_vu_tcp:95:guest ip link set dev __IFNAME__ mtu 256 > > > > > > > > That is, indeed, all hosts (not "modules") need to be able to accept > > > > (not "forward") datagram sizes of at least 576 bytes... but that's only > > > > assuming you can deliver those datagrams to them. > > > > > > > > This is not just a theoretical matter. As late as 2018, I was made > > > > aware of a setup with several (local!) nodes with links between them > > > > having ~380 bytes as MTU. > > > > > > > > Sure enough, the reason why I know about this was an issue coming from > > > > the same flawed assumption made in kernel commit c9fefa08190f > > > > ("ip6_tunnel: get the min mtu properly in ip6_tnl_xmit"), and fixed by > > > > 82a40777de12 ("ip6_tunnel: use the right value for ipv4 min mtu check > > > > in ip6_tnl_xmit"). > > > > > > > > See also commit b4331a681822 ("vti6: Change minimum MTU to IPV4_MIN_MTU, > > > > vti6 can carry IPv4 too") on the subject of what links can carry vs. > > > > what endpoints should be able to forward. > > > > > > > > > Move the verification of the MTU's lower bound to logic specific to the IP > > > > > versions and correct those errors. > > > > > > > > > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au> > > > > > --- > > > > > conf.c | 20 +++++++++++++++----- > > > > > ip.h | 7 +++++++ > > > > > util.h | 3 --- > > > > > 3 files changed, 22 insertions(+), 8 deletions(-) > > > > > > > > > > diff --git a/conf.c b/conf.c > > > > > index c5ee07b0..e127acc1 100644 > > > > > --- a/conf.c > > > > > +++ b/conf.c > > > > > @@ -1663,9 +1663,9 @@ void conf(struct ctx *c, int argc, char **argv) > > > > > if (errno || *e) > > > > > die("Invalid MTU: %s", optarg); > > > > > > > > > > - if (mtu && (mtu < ETH_MIN_MTU || mtu > ETH_MAX_MTU)) { > > > > > - die("MTU %lu out of range (%u..%u)", mtu, > > > > > - ETH_MIN_MTU, ETH_MAX_MTU); > > > > > + if (mtu > ETH_MAX_MTU) { > > > > > + die("MTU %lu too large (max %u)", > > > > > + mtu, ETH_MAX_MTU); > > > > > } > > > > > > > > > > c->mtu = mtu; > > > > > @@ -1838,10 +1838,20 @@ void conf(struct ctx *c, int argc, char **argv) > > > > > log_conf_parsed = true; /* Stop printing everything */ > > > > > > > > > > nl_sock_init(c, false); > > > > > - if (!v6_only) > > > > > + if (!v6_only) { > > > > > + if (c->mtu < IPV4_MINMAX_DATAGRAM) { > > > > > > > > Now, if you want to make this symmetric with the IPv6 case, we could > > > > also move this here... it just unnecessarily adds lines of code, and > > > > this function is already (necessarily) rather long. > > > > > > Sorry, I'm not following what change you're suggesting (or discussing?). > > > > The exact change I quoted: moving the check on the minimum MTU to here: > > > > if (c->mtu < IPV4_MINMAX_DATAGRAM) { > > > > compared to doing it earlier in conf(). > > But... the diff you're commenting on is already doing exactly that. Right, I said that we could (anyway) move the check here as your patch does, just to make that symmetric with IPv6, regardless of my other considerations. I just think it's unnecessary. > What am I missing? Nothing, I guess. > > > > > + die("MTU %"PRIu16" is too small for IPv4 (minimum %u)", > > > > > + c->mtu, IPV4_MINMAX_DATAGRAM); > > > > > + } > > > > > c->ifi4 = conf_ip4(ifi4, &c->ip4); > > > > > - if (!v4_only) > > > > > + } > > > > > + if (!v4_only) { > > > > > + if (c->mtu < IPV6_MIN_MTU) { > > > > > + die("MTU %"PRIu16" is too small for IPv6 (minimum %u)", > > > > > + c->mtu, IPV6_MIN_MTU); > > > > > > > > Does the fact that we don't disable IPv6 imply that IPv6 must be > > > > working at all times? In my opinion not. > > > > > > > > It's also rather convenient to be able to specify '-m 200' (for > > > > whatever test) without having to give '-4' explicitly. > > > > > > > > >From a functionality perspective, I think warn() would be a better > > > > choice. > > > > > > warn() and disable the relevant protocol. That makes sense, I'll make > > > that change. > > > > I don't think it makes sense to disable IPv4, highlighting quote: > > > > > > Does the fact that we don't disable IPv6 imply that IPv6 must be > > > > working at all times? In my opinion not. > > > > ...you can advertise a small MTU for whatever reason. The guest might > > configure it or not. The guest might change it later on. We have no way > > to re-enable IPv6 once it's disabled, though. > > Ah... good point. > > > So let's just do what the user says, I would suggest, and warn them > > that it *might* not work. There is zero functionality gained by > > disabling IPv6. > > Ok, I'll send a v3 which does that. Okay, thanks. -- Stefano ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2025-02-20 10:14 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2025-02-19 3:14 [PATCH 0/3] Improve validation of --mtu option David Gibson 2025-02-19 3:14 ` [PATCH 1/3] conf: More thorough error checking when parsing " David Gibson 2025-02-19 6:56 ` Stefano Brivio 2025-02-19 3:14 ` [PATCH 2/3] conf: Use 0 instead of -1 as "unassigned" mtu value David Gibson 2025-02-19 6:56 ` Stefano Brivio 2025-02-19 3:14 ` [PATCH 3/3] conf: Be more precise about minimum MTUs David Gibson 2025-02-19 5:37 ` Stefano Brivio 2025-02-20 3:55 ` David Gibson 2025-02-20 6:45 ` Stefano Brivio 2025-02-20 10:06 ` David Gibson 2025-02-20 10:14 ` Stefano Brivio
Code repositories for project(s) associated with this public inbox https://passt.top/passt This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for IMAP folder(s).