From mboxrd@z Thu Jan  1 00:00:00 1970
Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com
Authentication-Results: passt.top;
	dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=W7yvpP0N;
	dkim-atps=neutral
Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124])
	by passt.top (Postfix) with ESMTPS id 215385A0265
	for <passt-dev@passt.top>; Fri, 06 Mar 2026 11:58:20 +0100 (CET)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
	s=mimecast20190719; t=1772794698;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=/GMxYlIl+WmOPzKlv0KywrJjS3GuMbBZGCYyrPKvZmw=;
	b=W7yvpP0NnbPeqNvLLziPYFZ+iLpig5x8pCsVUMGUHysYokYYOA5hwNb+paw/cImEcoUEOg
	K1chI0+q2NUO3296vMAVXRZYr18CV76FjPUZqSo49rDyNuBt9JwZxpxzxQ4U8xNs18nKx/
	7cc9pFa0ww1ooRR6GrOYH1qiQu/MFIg=
Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com
 [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id
 us-mta-312-D_9RwT0UMLuCGkv5c3gNTw-1; Fri, 06 Mar 2026 05:58:17 -0500
X-MC-Unique: D_9RwT0UMLuCGkv5c3gNTw-1
X-Mimecast-MFC-AGG-ID: D_9RwT0UMLuCGkv5c3gNTw_1772794696
Received: by mail-wm1-f69.google.com with SMTP id 5b1f17b1804b1-4837b6f6b93so66928925e9.3
        for <passt-dev@passt.top>; Fri, 06 Mar 2026 02:58:17 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1772794696; x=1773399496;
        h=date:content-transfer-encoding:mime-version:organization:references
         :in-reply-to:message-id:subject:cc:to:from:x-gm-gg
         :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=/GMxYlIl+WmOPzKlv0KywrJjS3GuMbBZGCYyrPKvZmw=;
        b=CCV3Tn4hwgnXZok5o0YX6pnKM8P/zhfT9R0PL0FaIJo/6esU+3YPC0Oh/f3PAc9xB9
         ZjTHpcbHqo3wA70B6FY39BaNM9ZT6LqnBycCq85dT1HefTHpGNAH+6VMyQPxchv72/Kr
         zEwe5LEeVGZO7P6Q4S1mtbtC4lGT/hFPTW+cFQWlFRvE6bas+ZSDN3ZSN7Xs4yVIqjMU
         myitQc48wszCo1/9nFu5C2Jfyo6k5kOrCwsXr13YuKtbMSuovWLKWaxY9AhGK21RDTTW
         uWxJfzJpIKwdeb4CGxrSVVTTHevnG6UfNo1sl0l1Gh6zqFDM2CM6F/paSmT0XNoVaFYw
         BA2g==
X-Gm-Message-State: AOJu0Yxv5PSM2llsq1R5P2KQaiBSQMebBCWr4tM3Rb9qLjeo5Uhjm6Xz
	iVXfV4F2rNX68FJUc9juAZY983cwjH0tNTuojYdANkBxOXEjqqFq0Mb1icnAqM2NRlkKVN3vc0i
	hu0piKJCvsTrJzvxY7TYAShSiZi+hY06WQaIoj5EMCEB53KlF1ihhzEKbaOpcHA==
X-Gm-Gg: ATEYQzyIqkfNEjtfirYqn5AG861e6K129KZzvSqEi7619cTK6+iYI90LpexrhNUnIuk
	U7FMq8BEP/70TKLeohwW/gfxkZuxPBUUI2R64rlAkiCGEcVV1BFQSKsVyjmYzipbtkVKr2YgCuJ
	9phGg1apYvis9dw98O1H/TyKc5h5eJ8ZR+hAvz/LWvTI+qB+r7IZP0pvG2UbScjFZjSUGFBTiOj
	Rjjc+d/a9RZk9OcZ6DXsamRuuJIUtDEwvOR4YcXiLD0QW8GbpgHhCrxatei1X5WydcnVCuIPe6l
	Ld2eZH4CfGOO0FW26x04DRKWIc2+knDKROC1hUrcBdpRrTI/8CJXRGBXnNqsYBfvEOu9xi5GqQJ
	tzAvp7xScJClC2RThRGnF53kjtKF31Pdd
X-Received: by 2002:a05:600c:8218:b0:480:69ae:f0e9 with SMTP id 5b1f17b1804b1-48526958b30mr28919765e9.16.1772794695888;
        Fri, 06 Mar 2026 02:58:15 -0800 (PST)
X-Received: by 2002:a05:600c:8218:b0:480:69ae:f0e9 with SMTP id 5b1f17b1804b1-48526958b30mr28919325e9.16.1772794695373;
        Fri, 06 Mar 2026 02:58:15 -0800 (PST)
Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [2a10:fc81:a806:d6a9::1])
        by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-439dadb85b8sm3157995f8f.17.2026.03.06.02.58.14
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Fri, 06 Mar 2026 02:58:14 -0800 (PST)
From: Stefano Brivio <sbrivio@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>
Subject: Re: Pesto Protocol Proposals, imProved
Message-ID: <20260306115813.478c8c81@elisabeth>
In-Reply-To: <aaoo92UqMbcGfbVu@zatzit>
References: <aaoo92UqMbcGfbVu@zatzit>
Organization: Red Hat
X-Mailer: Claws Mail 4.2.0 (GTK 3.24.49; x86_64-pc-linux-gnu)
MIME-Version: 1.0
Date: Fri, 06 Mar 2026 11:58:14 +0100 (CET)
X-Mimecast-Spam-Score: 0
X-Mimecast-MFC-PROC-ID: DSoN-cZ3KAKQGFjjysh3lyyV4woDobccSAhKWM5re8w_1772794696
X-Mimecast-Originator: redhat.com
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Message-ID-Hash: BQV7G3OZA6HW6ACAGXQYLSEXJKRVORCK
X-Message-ID-Hash: BQV7G3OZA6HW6ACAGXQYLSEXJKRVORCK
X-MailFrom: sbrivio@redhat.com
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
CC: passt-dev@passt.top
X-Mailman-Version: 3.3.8
Precedence: list
List-Id: Development discussion and patches for passt <passt-dev.passt.top>
Archived-At: <https://archives.passt.top/passt-dev/20260306115813.478c8c81@elisabeth/>
Archived-At: <https://passt.top/hyperkitty/list/passt-dev@passt.top/message/BQV7G3OZA6HW6ACAGXQYLSEXJKRVORCK/>
List-Archive: <https://archives.passt.top/passt-dev/>
List-Archive: <https://passt.top/hyperkitty/list/passt-dev@passt.top/>
List-Help: <mailto:passt-dev-request@passt.top?subject=help>
List-Owner: <mailto:passt-dev-owner@passt.top>
List-Post: <mailto:passt-dev@passt.top>
List-Subscribe: <mailto:passt-dev-join@passt.top>
List-Unsubscribe: <mailto:passt-dev-leave@passt.top>

On Fri, 6 Mar 2026 12:08:07 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> Stefano convinced me that my earlier proposal for the dynamic update
> protocol was unnecessarily complex.  Plus, I saw a much better way of
> handling socket continuity in the context of a "whole table"
> replacement.  So here's an entirely revised protocol suggestion.
> 
> # Outline
> 
> I suggest that each connection to the control socket handles a single
> transaction.
> 
>     1. Server hello
>         - Server sends magic number, version
> 	- Possibly feature flags / limits (e.g. max number of rules
>           allowed)

Feature flags and limits could be fixed depending on the version, for
simplicity.

If pifs are unexpected (somebody trying to forward ports to a container
and touching passt instead) we should find out as part of 3. I can't
think of other substantial types of mismatches.

>     2. Client hello
>         - Client sends magic number
> 	- Do we need anything else?

As long as we have a version reported by the server, we should be fine.
We'll just increase it if we need something else.

Do we want a client version too?

>     3. Server lists pifs
>         - Server sends the number of pifs, their indices and names

Up to here, I guess we can skip all this for an initial
Podman-side-complete implementation.

>     4. Server lists rules
>         - Server sends the list of rules, one pif at a time

Could this be a fixed-size blob with up to, say, 16 pifs?

We'll need to generalise pifs at some point. I'm not sure if it makes
things simpler. I would defer this to the implementation.

>     5. Client gives new rules
>         - Client sends the new list of rules, one pif at a time
> 	- Server loads them into the shadow table, and validates
> 	  (no socket operations)

Is it one shadow table per pif or one with everything?

If it's one per pif, do we want to have the whole exchange prepended by
"load table for pif x" or "store table for pif y" commands?

I would suggest not, at the moment, as it looks slightly complicated,
but eventually in a later version we could switch to that.

>     6. Server acknowledges
>         - Either reports an error and disconnects, or acks waiting for
> 	  client
>     7. Client signals apply
>         - Server swaps shadow and active tables, and syncs sockets
> 	  with new active table
>     8. Server gives error summary
>         - Server reports bind/listen/whatever errors
>     9a. Client signals commit
>         - Shadow table (now the old table) discarded
> or
>     9b. Client signals rollback
>         - Shadow and active tables swapped back, syncs sockets
> 	- Discard shadow table (now the "new" table again)
> 	- New bind error report?

Do we need these as five separate steps?

Couldn't the server simply apply or try to apply as soon as the client
is done, and acknowledge or return error once everything is done?

What about this instead:

	5. Client sends new rules (blob of known size)

	6. Server receives, loads into shadow table, swaps tables and
           syncs socket, with rollback to old table on error.

	8. Server sends error / success summary (single byte, at
	   least in this version)

>     10. Server closes control connection

...if we keep my 8. above, it would be more logical that the client
closes the connection.

> # Client disconnects
> 
> A client disconnect before step (7) is straightforward: discard the
> shadow table, nothing has changed.
> 
> A client disconnect between (7) and (9) triggers a rollback, same as (9b).    

In my modified version, a client disconnect during 5. would trigger
discarding of the shadow table that's being filled (kind of no-op
really).

A disconnect after that doesn't affect the following steps instead, but
the server won't report error or success.

> # Error reporting
> 
> Error reporting at step (6) is fairly straightforward: we can send an
> error code and/or an error message.
> 
> Error reporting at (8) is trickier.  As a first cut, we could just
> report "yes" or "no" - taking into account the FWD_WEAK flag.  But the
> client might be able to make better decisions or at least better
> messages to the user if we report more detailed information.
> Exactly how detailed is an open question: number of bind failures?
> number of failures per rule?  specific ports which failed?

For the moment I would report a single byte.

Later, we could probably send back the list of rules with a success /
error type version for each one of them. Think of just sending the
same type of fixed-size table back and forth.

> # Interim steps
> 
> I propose these steps toward implementing this:
> 
>  i. Merge TCP and UDP rule tables.  The protocol above assumes a
>     single rule table per-pif, which I think is an easier model to
>     understand and more extensible for future protocol support.
>  ii. Read-only client.  Implement steps (1) to (4).  Client can query
>      and list the current rules, but not change them.
>  iii. Rule updates.  Implement remaining protocol steps, but with a
>       "close and re-open" approach on the server, so unaltered
>       listening sockets might briefly disappear
>  iv. Socket continuity.  Have the socket sync "steal" sockets from the
>      old table in preference to re-opening them.
> 
> If you have any time to work on (ii) while I work on (i), those should
> be parallelizable.

Yes, I'll start adapting the existing draft as soon as possible. I
think ii. could go in parallel with all the other steps, I can just
call some stubs meanwhile.

> # Concurrent updates
> 
> Server guarantees that a single transaction as above is atomic in the
> sense that nothing else is allowed to change the rules between (4) and
> (9).  The easiest way to do that initially is probably to only allow a
> single client connection at a time.

I would call this a feature...

> If there's a reason to, we could
> alter that so that concurrent connections are allowed, but if another
> client changed anything after step (4), then we give an error on the
> next op (or maybe just close the control socket from the server side).

...even if we go for my modified version.

> # Tweaks / variants
> 
>  - I'm not sure that step (2) is necessary

I would skip it. The only reason why we might want it is to send a
client version, but we can also implement sending of a client version
only starting from a newer server version.

>  - I'm not certain that step (7) is necessary, although I do kind of
>    prefer the client getting a chance to see a "so far, so good"
>    before any socket operations happen.

I think it's quite unrealistic that we'll ever manage to build some
sensible logic to decide what to do depending on partial failures.

If so far is not good, the server should just abort, and the user will
have to fix mistakes and try again.

-- 
Stefano