From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by passt.top (Postfix) with ESMTPS id D42D35A0319 for ; Fri, 05 Jul 2024 04:07:35 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202312; t=1720145250; bh=+izE4rDSv6f1IQEfGY3B/JzICAV4Qq+77sw6N2TMzB4=; h=From:To:Cc:Subject:Date:From; b=iYfgInAwFzMzpWSZA3LuVwyHqTPcVbIw/AjoKglFP3wS7KjbBNhHRZK0SzPEv8Yfr 6P1FCPZ/92Y/HvjbmjY8XX3CKaS5J2kb3AgUiLtaOw5bBtp/F2K2vxmZPVoFjKJgV/ 182xsbiq46KmQIJgz9dx6Mr8CFhI5P13EIvCRd4H2cUkfd+KZDaOnK9t7CVXUld6ef q49apEhs2ZfZ1MOBC6DMddKcb3+YP0JYV1HfJtgRdvu/wcv2ySO90HyS/fDWGCITSA /OwgStXYQscLQwauVC3xU2s7DHs6YGI7OFjOfdMOjkQQrKEBFFrstCu+paDZnZJtU0 Z8GCKnizggV4g== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4WFcNL6Lnjz4wny; Fri, 5 Jul 2024 12:07:30 +1000 (AEST) From: David Gibson To: Stefano Brivio , passt-dev@passt.top Subject: [PATCH v7 00/27] Unified flow table Date: Fri, 5 Jul 2024 12:06:57 +1000 Message-ID: <20240705020724.3447719-1-david@gibson.dropbear.id.au> X-Mailer: git-send-email 2.45.2 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Message-ID-Hash: VGXHTEBAZNEWHJIMHVHB7JZHTUDZ5NMF X-Message-ID-Hash: VGXHTEBAZNEWHJIMHVHB7JZHTUDZ5NMF X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: jmaloy@redhat.com, David Gibson X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: This is the seventh draft of an implementation of more general "connection" tracking, as described at: https://pad.passt.top/p/NewForwardingModel This series changes the TCP connection table and hash table into a more general flow table that can track other protocols as well. Each flow uniformly keeps track of all the relevant addresses and ports, which will allow for more robust control of NAT and port forwarding. ICMP and UDP are converted to use the new flow table. This is based on the recent series of UDP flow table preliminaries. Caveats: * We roughly double the size of a connection/flow entry * We don't yet record the local address of flows initiated from a socket, even in cases where it's bound to a specific address. Changes since v6: * Complete redesign of the UDP flow handling * Rebased (handling the change to bind() probing for local addresses was surprisingly fiddly) * Replace sockaddr_from_inany() with pif_sockaddr() which can correctly handle scope_id for different interfaces, and returns whether the address is non-trivial for convenience * Preserve specific loopback addresses in forwarding logic Changes since v5: * flowside_from_af() is now static * Small fixes to state verification * Pass protocol specific types into deferred/timer callbacks * No longer require complete forwarding address info for the hash table (we won't have it for UDP) * Fix bugs with logging of flow addresses * Make sure to initialise sin_zero field sockaddr_from_inany * Added patch better typing parameters to flow type specific callbacks * Terminology change "forwarded side" to "target side" * Assorted wording and style tweaks based on Stefano's review * Fold introduction of struct flowside and populating the initiating side together * Manage outbound addresses via the flow table as well * Support for UDP * Correct type of 'b' in flowside_lookup() (was a signed int) Changes since v4: * flowside_from_af() no longer fills in unspecified addresses when passed NULL * Split and rename flow hash lookup function * Clarified flow state transitions, and enforced where practical * Made side 0 always the initiating side of a flow, rather than letting the protocol specific code decide * Separated pifs from flowside addresses to allow better structure packing Changes since v3: * Complex rebase on top of the many things that have happened upstream since v2. * Assorted other changes. * Replace TAPFSIDE() and SOCKFSIDE() macros with local variables. Changes since v2: * Cosmetic fixes based on review * Extra doc comments for enum flow_type * Rename flowside to flowaddrs which turns out to make more sense in light of future changes * Fix bug where the socket flowaddrs for tap initiated connections wasn't initialised to match the socket address we were using in the case of map-gw NAT * New flowaddrs_from_sock() helper used in most cases which is cleaner and should avoid bugs like the above * Using newer centralised workarounds for clang-tidy issue 58992 * Remove duplicate definition of FLOW_MAX as maximum flow type and maximum number of tracked flows * Rebased on newer versions of preliminary work (ICMP, flow based dispatch and allocation, bind/address cleanups) * Unified hash table as well as base flow table * Integrated ICMP Changes since v1: * Terminology changes - "Endpoint" address/port instead of "correspondent" address/port - "flowside" instead of "demiflow" * Actually move the connection table to a new flow table structure in new files * Significant rearrangement of earlier patchs on top of that new table, to reduce churn David Gibson (27): flow: Common address information for initiating side flow: Common address information for target side tcp, flow: Remove redundant information, repack connection structures tcp: Obtain guest address from flowside tcp: Manage outbound address via flow table tcp: Simplify endpoint validation using flowside information tcp_splice: Eliminate SPLICE_V6 flag tcp, flow: Replace TCP specific hash function with general flow hash flow, tcp: Generalise TCP hash table to general flow hash table tcp: Re-use flow hash for initial sequence number generation icmp: Remove redundant id field from flow table entry icmp: Obtain destination addresses from the flowsides icmp: Look up ping flows using flow hash icmp: Eliminate icmp_id_map flow: Helper to create sockets based on flowside icmp: Manage outbound socket address via flow table flow, tcp: Flow based NAT and port forwarding for TCP flow, icmp: Use general flow forwarding rules for ICMP fwd: Update flow forwarding logic for UDP udp: Create flows for datagrams from originating sockets udp: Handle "spliced" datagrams with per-flow sockets udp: Remove obsolete splice tracking udp: Find or create flows for datagrams from tap interface udp: Direct datagrams from host to guest via flow table udp: Remove obsolete socket tracking udp: Remove rdelta port forwarding maps udp: Rename UDP listening sockets Makefile | 4 +- conf.c | 14 +- epoll_type.h | 6 +- flow.c | 481 +++++++++++++++++++++- flow.h | 47 +++ flow_table.h | 57 ++- fwd.c | 184 ++++++++- fwd.h | 9 + icmp.c | 105 ++--- icmp_flow.h | 2 - inany.h | 2 - passt.c | 10 +- passt.h | 5 +- pif.c | 45 +++ pif.h | 17 + tap.c | 11 - tap.h | 1 - tcp.c | 521 ++++++------------------ tcp_buf.c | 6 +- tcp_conn.h | 51 +-- tcp_internal.h | 10 +- tcp_splice.c | 98 +---- tcp_splice.h | 5 +- udp.c | 1055 +++++++++++++++++++----------------------------- udp.h | 33 +- udp_flow.h | 27 ++ util.c | 9 +- util.h | 3 + 28 files changed, 1549 insertions(+), 1269 deletions(-) create mode 100644 udp_flow.h -- 2.45.2