From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=bLTshQPN; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by passt.top (Postfix) with ESMTPS id 787845A061B for ; Tue, 07 Oct 2025 00:33:24 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1759790003; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=OIkJi8qBftNgnyFGBzhL/ev/y/Wom0i/4PL9A0vV3UQ=; b=bLTshQPN2otnciHlyEG0wB+z6BdjkJ2MOW2Sm1Dd9qcwOPXW0ZW82CF4kzuAfBEjjVa9iW WEZ4cQs4DWK1WWfWBzU6biQV6X1dnJnPDpBUmsgvWnVRfI41PK8EwB17fRUQZMilvLqYZ3 U1eu0r4K3NcHqxXkWIhgU9bGcZobzJ8= Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-571-aoAjEyYCNnO8luOMrSStBw-1; Mon, 06 Oct 2025 18:33:22 -0400 X-MC-Unique: aoAjEyYCNnO8luOMrSStBw-1 X-Mimecast-MFC-AGG-ID: aoAjEyYCNnO8luOMrSStBw_1759790001 Received: by mail-wr1-f72.google.com with SMTP id ffacd0b85a97d-3f3c118cbb3so4055835f8f.3 for ; Mon, 06 Oct 2025 15:33:22 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1759790000; x=1760394800; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=OIkJi8qBftNgnyFGBzhL/ev/y/Wom0i/4PL9A0vV3UQ=; b=bcI7j9USVxu8pjim/cRXUh4k4PNDSOD4eLc0O7E00edA5EUrbVwnAP0Ap0U0t1415v qTcHCFNeg47EghNE4YkcrkZDuBqyVhCaZIhPzIY/CJnuV4ZQGMTuZTOkWHUkTZS5Wu/l ylqwE2oXuvNtpDQUGqniHUPrhpFbeWPdWb4qtgv+5oY1cLcm2tvw9d3vCN/wVFdtxfhr j2+h0Ji/ZN9UuAPuUBN7T7ObAhE7SNh1z+GuaM/mEqBYhSDHv0foFuMaKQqTNhZV2vl4 KnVotr9b5ukjjI76kPgTlnD3lDgfHDLAkNQeYXJ0wqxzdiKLJo8SFOJpuLKYtJYM88hr lSYQ== X-Forwarded-Encrypted: i=1; AJvYcCUXXrCuswH7t8Hn90PoAp1qamlCSbS+byn/+f5gUPtjzNdk4hcAotvoRWtKpAQYNlHSm6Rw5bk7CPA=@passt.top X-Gm-Message-State: AOJu0Yw8EaVRIUJcV6CLsoGxlP328MJ3vnFwmRhw+vZ1B82wlJCroqJZ MiiXB91xnG8NReFwQRxjh5tLVT/AeQQRKQSkT0QqzGe3jCSGM1vHLKv8Hhcez+BBA8HznYLgO6a L2LelvHiPqZxAoXLWqino1EnliNRuGkBf0oFcOUkOMEhcOlUoESa8WSqYBmQqpQ== X-Gm-Gg: ASbGncshF0ktIzj4Ghxu94uvMAhKLBNA2VWDctu5WohgRQWrVeHYERxmaE08Hej0LrK 1KuOPHSQBjAghdhC7/YE96Y2t2aj3B4N8Vq5WkCqh/jPcYn6kjC6+TGF29UyM0o+6JvY4dkomB1 yfMyRcOv15pDijo68PwFXYuT3JgMRP00sdcdN0j1+rZjy0p7Km8RdbElmhDHJTPZLofP6/8x4Nw AcjR0D1dBSYMwpdzKvdOFfs99hs+XJUZsvracYjb/fG0r8yfSCxVedsy2oTzdtzC1CC+TQm7Rgo TOVCwI5LLEvcKvbn5/SgEUC6oH7FGxVUPPlgkxyz8nF+nlEnzANZDnU6 X-Received: by 2002:a5d:5d08:0:b0:424:2275:63b4 with SMTP id ffacd0b85a97d-425671c5795mr7683039f8f.61.1759790000121; Mon, 06 Oct 2025 15:33:20 -0700 (PDT) X-Google-Smtp-Source: AGHT+IF0FohGbBvMz91bJqFZqCgqGx6qQ5zywDlGaBwiqrCDdXLijTQpK+zUHxbgUi1thZgYpvRjwA== X-Received: by 2002:a5d:5d08:0:b0:424:2275:63b4 with SMTP id ffacd0b85a97d-425671c5795mr7683025f8f.61.1759789999518; Mon, 06 Oct 2025 15:33:19 -0700 (PDT) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [2a10:fc81:a806:d6a9::1]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-4255d8f0846sm22894740f8f.45.2025.10.06.15.33.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 06 Oct 2025 15:33:18 -0700 (PDT) Date: Tue, 7 Oct 2025 00:33:17 +0200 From: Stefano Brivio To: Jon Maloy Subject: Re: [PATCH v12 2/9] fwd: Add cache table for ARP/NDP contents Message-ID: <20251007003317.5b4dc567@elisabeth> In-Reply-To: <82865ef4-326e-48af-ab9a-6367f2c9e023@redhat.com> References: <20251003003412.588801-1-jmaloy@redhat.com> <20251003003412.588801-3-jmaloy@redhat.com> <82865ef4-326e-48af-ab9a-6367f2c9e023@redhat.com> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.49; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: 1O0g8dgsVdgGrU0wgrjIjHfLIHnOZCRTTxBx3M6Spa4_1759790001 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: ZXIR7Y2MDVUMWR4WAUXPH6HQPGJTEEHC X-Message-ID-Hash: ZXIR7Y2MDVUMWR4WAUXPH6HQPGJTEEHC X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: David Gibson , dgibson@redhat.com, passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Sun, 5 Oct 2025 11:52:42 -0400 Jon Maloy wrote: > On 2025-10-03 00:31, David Gibson wrote: > > On Thu, Oct 02, 2025 at 08:34:05PM -0400, Jon Maloy wrote: > >> We add a cache table to keep track of the contents of the kernel ARP > >> and NDP tables. The table is fed from the just introduced netlink based > > [...] > > >> +void fwd_neigh_table_update(const struct ctx *c, const union inany_addr *addr, > >> + const uint8_t *mac) > >> +{ > >> + struct neigh_table *t = &neigh_table; > >> + struct neigh_table_entry *e; > >> + ssize_t slot; > >> + > >> + /* MAC address might change sometimes */ > >> + e = fwd_neigh_table_find(c, addr); > >> + if (e) { > >> + if (inany_equals(addr, &inany_from_v4(c->ip4.guest_gw))) > >> + return; > >> + if (inany_equals(addr, (union inany_addr *)&c->ip6.guest_gw)) > >> + return; > > > > This doesn't make sense to me. From the way its looked up in 4/9, the > > IP addresses in the table are as seen by the host, not by the guest > > (we look up the table *after* applying NAT). Which makes guest_gw not > > meaningful here. > > > > You _do_ need to handle the case that addr is loopback (which guest_gw > > might be translated to), and that's handled by your dummy entries. > > > > The other case we might need to consider here is if the (translated) > > address is the host's IP. Do we want to give out_tap_addr for that > > case? Or the host's real MAC on the template interface? If the > > latter will that be in the host ARP table, or would we have to handle > > it specially? > > > This is my current understanding of this: > > 1) Adding the loopback entries to the neigbour table is harmless, > but unnecessary. Loopback entries yes, but I thought you would add the equivalent guest-side entries. > fwd_neigh_mac_get() will always return our_tap_mac > when it doesn't find an entry in the table, which is the case here. Kind of, because while with IPv4: $ pasta sh -c 'ip n a 127.0.0.1 dev lo lladdr 00:ba:db:ad:1d:ea; ip n' 0.0.0.0 dev lo lladdr 00:ba:db:ad:1d:ea PERMANENT that won't match anything, with IPv6: $ pasta sh -c 'ip n a ::1 dev lo lladdr 00:ba:db:ad:1d:ea; ip n' ::1 dev lo lladdr 00:ba:db:ad:1d:ea PERMANENT that would match the loopback address. I don't think we actually need to care about this, though. > This addition was a mistake. > > > 2) Regarding the default gw, we have two cases: > > 2.1) guest_gw IP is the same as the host_gw IP: I'm afraid we're losing a substantial amount of time on ambiguous terminology, so let me clarify what my understanding of these terms is: a. "guest_gw IP" -> any of the two IP addresses, in the guest, possibly mapping to the host ...that's because the default gateway address in the guest doesn't really matter, so I don't think you're referring to the IP address of the default gateway in the guest. b. "host_gw IP" -> the IP address of the default gateway on the host > In this case, we'll have an event trying to inserting an entry > into the table with the host_gw's real mac address. > In v11, this mapping was announced to the guest, but later > contradicted by all ARP/NDP messages he receives, because they all > come from PASTA/PASST and uses our_tap_mac as source mac in the > message header. This is probably harmless, but causes confusion in > the guest and a warning wireshark. > In v12, I therefore chose to add the entry but suppress the > announcements for the gw, and also the updates of > the mac fields from subsequent events. > This is not right either. We either have to be consistent, and > add the real host_gw mac in all messages sent from the purported > gw address, or we just as consistently use our_tap_mac. I think > the latter is simpler with less consequences for the code. I also think we should use our_tap_mac for any message coming from the host itself: they're not coming from the gateway (as seen by the host), which we are effectively shadowing. > I think we should just simply suppress all events from the host > gw in such cases. This is generally fine, I think, but it comes with the additional problem that, should we ever make the map_guest_addr or map_host_loopback parameters configurable at runtime, we'll miss the entry at that point and we'll need to rework this. If it doesn't cause substantial effort, I would suggest that, if entries always refer to IP addresses as seen from the guest (David's suggestion), you would actually add the entries mapping to the host to the table, resolving to our_tap_mac. Those addresses are *not* c->ip[46].guest_gw though. They are c->ip[46].map_guest_addr and c->ip[46].map_host_loopback. See nat_outbound() for an overview of translated addresses. > (Here I have a question: I don't see any NAT mapping making > it possible to communicate between the guest and the the host gw > when guest gw IP and host gw IP are are equal. Shouldn't there be > one?) Right, there's no such parameter at the moment, because the assumption is that, if you need to reach the default gateway of the host, you would specify a non-default --map-host-loopback (possibly 'none'). On the other hand we plan to make this more generic, perhaps as part of https://bugs.passt.top/show_bug.cgi?id=140, perhaps as part of another effort. The flow table should enable arbitrary NATs, and was implemented with that in mind. We just don't have a suitable configuration front-end at the moment, even though command line parameters would probably be enough to cover most common cases. > 2.2) guest_gw IP is different from host_gw IP: > This case is simpler. We just treat the host_gw as any other > local host, add and update its entry at new events, and pass it on > to the guest as an announcement at first contact, consistently > using that host's true mapping. > > 3) Regarding the host address, we also have two cases: > > 3.1) guest IP is the same as the host IP (on the default interface): My interpretation of the terms here is: c. "guest IP" -> one of the configured guest IP addresses (c->ip[46].addr) d. "host IP" -> one of the IP addresses configured on the host (not in struct ctx, but two of those will become c->ip[46].addr by default) > In this case the guest can only reach the host by using the > guest_gw IP (which lands on the host), Not according to my interpretation of "guest_gw IP" (a. above). > some or the other IPs > on the host, or the map_host_loopback IP, as far as I > understand. Yes. And c->ip[46].map_guest_addr on top of them. > The host IP will never enter the neigbour table by ARP/NDP > event, and there is no reason for us to add a dummy for it. If none of the host IP addresses are visible to the guest, right, there's no need for any matching entry. > 3.2) guest IP differs from the host IP. > In this case we let fwd_neigh_table_init() add an entry for the > host as if it were any other host, using the IP and mac > addresses from the template interface. This assumes that we'll use MAC addresses assigned to host interfaces, instead of using our_tap_mac. I think it looks more elegant than the alternative, but mind that there can be 2^32 - 1 interfaces on Linux (see dev_index_reserve() in net/core/dev.c), so that might flood the table right away. Maybe our_tap_mac isn't that bad after all. > Conclusion: > - We use nat_inbound() instead of nat_outbound() before consulting the > table, like you suggest. If this implies standardising on guest-side addresses, it makes sense to me as well. > - We need to manually add an entry representing the host in the case the > host IP differs from the guest IP. This entry is announced. > In the case the IPs are the same we don't add any entry. I don't think host and guest IP addresses are really related to this point. It's rather about the two sets of addresses that might map to the host. > - Local host entries are added by ARP/NDP events, but only if > nat_inbound() doesn't translate the IP address (will that ever > happen?). > We suppress all events from the host gw in case it has the same > IP as the guest gw (unless this is covered by the nat_inbound() > check?). I wouldn't hide any address like that, to make this a bit more future-proof. > - We may want a NAT mapping making it possible to reach the host gw > in the case it has the same IP as the guest gw. This has no effect > on our neighbour table. > > If you agree with my above analysis I can go ahead and post a v13 of > the series early next week, including the above changes and fixes > for your other comments to the series. ...the rest sounds good to me. -- Stefano