From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202602 header.b=HnMj13s/; dkim-atps=neutral Received: from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id D28AB5A026D for ; Wed, 04 Mar 2026 05:28:41 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202602; t=1772598517; bh=fTeTw25HIzjQvXpcD3f52XEOKfIhwkjGr+kCSe0Ima8=; h=Date:From:To:Subject:From; b=HnMj13s/F4wkSO87Fp51aFraixpqR5tA0LMiSpBD9Nv2lPHdNuPSjrvJFbZ9yuOO3 MjP4dafMPCP0Tt7mhHYfOijFMEyzorvFCysyLplF+YcjSi1A2qLDjQetCId0jS3HgB dFAcvZO72aVhLLOgp6jJWrSf6lybmuAq3hfEidKVE7/lJ6nz+/qgENp4STtTpkTC7s N3PJ5AMkDIi8no/rO8xiMwegGhyPfHqcj0kQGFfMF1SOZkpDql17KkW7fqYOc1iNiL I5AKmR/Ison5a/fxuoDv6v2P7sRj7ELbQFdHcs8fYuFVyBVC4LhU4sZX++Ig9Pek76 2kDyur3nW6Gbw== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4fQfn14JGvz4wCH; Wed, 04 Mar 2026 15:28:37 +1100 (AEDT) Date: Wed, 4 Mar 2026 15:28:30 +1100 From: David Gibson To: passt-dev@passt.top Subject: Pesto protocol proposals Message-ID: MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="vG+DHZyF0KmdQpAv" Content-Disposition: inline Message-ID-Hash: AJNKRCLLESDG4XYEOMH6D2YX63JF3NYL X-Message-ID-Hash: AJNKRCLLESDG4XYEOMH6D2YX63JF3NYL X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --vG+DHZyF0KmdQpAv Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Most of today and yesterday I've spent thinking about the dynamic update model and protocol. I certainly don't have all the details pinned down, let alone any implementation, but I have come to some conclusions. # Shadow forward table On further consideration, I think this is a bad idea. To avoid peer visible disruption, we don't want to destroy and recreate listening sockets that are associated with a forward rule that's not being altered. Doing that with a shadow table would mean we'd need to essentially diff the two tables as we switch. That seems moderately complex, and kind of silly when then client almost certainly have created the shadow table using specific adds/removes from the original table. # Rule states / active bit I think we *do* still want two stage activation of new rules: if the first stage fails we're guaranteed we can roll back with no peer visible consequences. The second stage (the actual bind()s and listen()s) doesn't have that property, but that's unavoidable. To implement that, I think each rule should have an "active" bit. Or, at least an active bit - it's possible that might not be enough, but we could extend it to a state field, loosely analagous to the state field in flow table entries. But let's assume just the active bit, until/unless a case shows up where it's insufficient. Entries are always inserted in inactive state. Entries must be moved to inactive state before deletion. fwd_listen_sync() would ignore inactive entries. Turning the active flag on triggers the actual bind() / listen() calls, but requires no rearrangement of the fwd table or socket array. Turning it off close()s the associated sockets, but again requires no rearrangement of the data structures. # Tentative client operations ## INSERT Parameters: rule specification + rule index Returns: error status Inserts the new rule (inactive) at the given index (moving later rules, if necessary). Fails with no effect, for a bad index or if there's no room in the table or socket array. Does *not* check for conflicts with other rules. NOTE: moving rules could mean thousands of epoll_ctl() calls to adjust rule indices. We don't expect those to fail, but if they did, what do we do? ## DELETE Parameters: rule index Returns: error status Deletes the given inactive rule (moving later rules, if necessary). Fails with no effect if it's a bad index, or the given rule is active. NOTE: As for INSERT ## ACTIVATE Parameters: rule index Returns: error status Enables the rule, bind()ing all the necessary listening sockets. Fails with no effect if the rule conflicts with another active rule. Completes, even if some bind()s fail (see later for handling of this). ## DEACTIVATE Parameters: rule index Returns: error status Disables the rule, close()ing any listening sockets. Fails with no effect for bad index or an already inactive rule. ## STATUS Parameters: rule index Returns: active/inactive bit + possible metadata Indicates whether the rule is currently active. Could also give limited metadata about the rule (see below for possible use in bind() error reporting). ## READ Parameters: rule index Returns: rule specification, or error code Reads out the rule spec and returns it. Fails for bad index. To dump the whole table, the client can READ each slot starting from 0, until it gets an error. # Suggested client workflow I suggest the client should: 1. Parse all rule modifications 2. INSERT all new rules -> On error, DELETE them again 3. DEACTIVATE all removed rules -> Should only fail if the client has done something wrong 4. ACTIVATE all new rules -> On error (rule conflict): DEACTIVATE rules we already ACTIVATEd ACTIVATE rules we already DEACTIVATEd DELETE rules we INSERTed 5. Check for bind errors (see details later) If there are failures we can't tolerate: DEACTIVATE rules we already ACTIVATEd ACTIVATE rules we already DEACTIVATEd DELETE rules we INSERTed 6. DELETE rules we DEACTIVATEd -> Should only fail if the client has done something wrong DEACTIVATE comes before ACTIVATE to avoid spurious conflicts between new rules and rules we're deleting. I think that gets us closeish to "as atomic as we can be", at least =66rom the perspective of peers. The main case it doesn't catch is that we don't detect rule conflicts until after we might have removed some rules. Is that good enough? # Bind error handling Note, in the below I'm considering pasta/passt's command line handling and conf path as a client rather than part of the backend. The distinction weak and non-weak entries is a bit clunky. How many failures is too many is kind of a question for the client, not the backend. So I'm suggesting we remove that concept from the backend. ACTIVATE completes even if some or all binds fail. However, we keep a count of how many sockets we got a bind() or listen() failure for in the rule, and it can be retrieved with STATUS. That lets the client make a decision as to whether to live with it or roll back as best it can. A client could also potentially poll later to see if some failures resolved themselves (we now reattempt bind()s on every fwd_listen_sync()). NOTE: the meaning of that count is pretty straightforward with regular rules, but with SCAN rules, we'd have to be more careful. # Concurrent updates I suggest we prevent concurrent updates by only allowing one client to connect to the control socket at a time. # Possible tweaks Not sure if these are improvements or not, but they're options to consider. ## Rule conflicts Currently fwd_rule_add() checks for rules with conflicts and rejects them. We can't really report that at INSERT, because we could get a bogus conflict with a rule we intend to DEACTIVATE/DELETE. But reporting at ACTIVATE is also a bit clunky. This could potentially be sidestepped by removing the notion of rule conflicts entirely. Instead overlapping rules are simply allowed, and the first rule to match a flow wins. ## Rollback in backend The proposal above has rollback essentially handled by the client. We could instead do it in the backend. - Instead of a single active bit, each rule has an "active now" and "active future" bit - On client connect, all active future bits are set equal to active now bits - INSERT adds a rule with active now false and active future true - If we conflict check, we check only against active future rules - DELETE clears the active future bit - ACTIVATE/DEACTIVATE no longer exist - ROLLBACK deletes all !active now rules and sets active future bits to active now bits again - COMMIT does the bind()s and close()s and on success, sets active now bits to active future bits. On failure... it's fairly complex, we'd need to think about it ## Persistent rule IDs Proposal above uses raw indices in the table to identify rules, which means INSERT and DELETE change the numbers of other rules. That in turn requires a bunch of epoll_ctl()s to update existing sockets. Here's one way we could avoid some of that with a persistent rule ID: - Each rule has an ID (say a u32), supplied by the client at INSERT - Rules still apply in ID order, so the order matters, but not the exact values - INSERT to an existing rule ID is not permitted - you must DELETE first Internally we still store the table packed, but sorted by ID. We could look up rule by ID either with a binary search, or maybe a radix lookup table. If we reduced the ID to a u16 (or so) we could potentially use a single level lookup table. I suspect binary search might be faster than a lookup table anyway, because of dcache impact. So, we still to memmove() things about for INSERT, and maybe update the lookup table, but that's relatively easy. epoll data holds the persistent ID, so that doesn't need to be altered. Clients (including the internal conf path) could choose to leave gaps in the IDs to leave space for future inserts. And/or certain ranges could be reserved by convention for different purposes. --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --vG+DHZyF0KmdQpAv Content-Type: application/pgp-signature; name=signature.asc -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmmntO4ACgkQzQJF27ox 2GcTvQ/8CmbEM69DAz+k2emero4uygTczmVvCTY0BuvqIOtmXr1FwxvKC5ug/K91 afvxgk1XmO0fP/C8y2Y/5MgtQAjCcwgFVjEbsD0z+Wo+wCdltB84wkk5Zsec1CjQ a6ezGjCjhY2C8scViBcHYU54QS+0onitd0nzx+f3uRDa8KiwaYRgdqp4iUxXn178 ospMtsOXQg+KkB/ucHz0R9g1rYfcVs+DbhUgCodDnyoHh3/6WsLoJLWDGnDyF35U UGxMqMmU/aY/UJp3SAKh5YauJGQrGFrYPy6/tK7BmxV+WWSW40ceI1Nd3kWrWCh2 DZrbbhqI8/aSm1PH89xwG9BVG361XqAuCkLICPhjU9BntVrUfGvwFU6M6ZgorRqZ DhSIifPhd0jdVySFDhhllfjXzyTmW5wWfZFQK9iI1PkFjZbq36OB6+bClbMCZESF 9Pv5fO9tB3tlDC4DfD+TLEIhqro05uV9roqTUfCJE1WzaToQaTjTNUwZ9zpv+4cr FdvR7ztHbKxpK7tHki4joukAuq4VluGpDCwoVNIdTARr1qAw/rlbc1wQ/BTV8Ozf 8LvIvs46U/L+reuUSw9NDKwNrsIa2g+1RbNMZuwtNpjpn3/ZL52GwM0V6KU6TWth Iqi0Xz9HFvgB1j0LFLh3ky92ics0LaZq6uyHoQWZY5Ynz8x4NJY= =DN8g -----END PGP SIGNATURE----- --vG+DHZyF0KmdQpAv--