From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=L1BXrkYg; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTPS id 4FDEA5A0271 for ; Mon, 22 Sep 2025 09:17:28 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1758525447; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ze3Z9c/ZlRvj8q+oQqVIU/tUMvN6YvBHDEDhnHr7UbU=; b=L1BXrkYg9XjIZ+hc0VnSGDbcadxcvo4j90NdFGtGEmc5rKDkJUr6ST1musAzHAYo8OqFN+ rswBDHEZ6yu9XUO6WpSDb2WTruwwS/vNIk9R1yDom+eMRI4p7bKabtnFE8jBKBLGKINxDO OSwFKG49Rs3guSEIIGfCLBNVog9BYoQ= Received: from mail-ed1-f69.google.com (mail-ed1-f69.google.com [209.85.208.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-526-Ag1i-Rs9NTO32gc8ETgyzA-1; Mon, 22 Sep 2025 03:17:25 -0400 X-MC-Unique: Ag1i-Rs9NTO32gc8ETgyzA-1 X-Mimecast-MFC-AGG-ID: Ag1i-Rs9NTO32gc8ETgyzA_1758525444 Received: by mail-ed1-f69.google.com with SMTP id 4fb4d7f45d1cf-62f4f7d7501so3662963a12.0 for ; Mon, 22 Sep 2025 00:17:25 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1758525444; x=1759130244; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ze3Z9c/ZlRvj8q+oQqVIU/tUMvN6YvBHDEDhnHr7UbU=; b=fSpFRAQ5uPlnCEkE/hVwoAhHI4UPI+njnmjpq1hA82ETAFK05/pEdMMMp+eeb+j/lr 7Ux1rKHZ/G8tukS6aZtwWadwPuPTwW2kMVAo4xCfKql1K7q8Xj2uYJKBVGoA29lhfqAT M8IaQSpuxq+OYn7nm8PBgp5t7l7Lf9ybA+zlyW+KcLAJXQdXXMfBlLsBaEir4jV2cLGw goXMqv+TA70mWjCLDpiRkTDHDH+LW36wrF0Q1AMhoiJh9hpFVNsqL2jSWjPLQDVBkW+/ doi9m/zzj5KMUZrcoCrVN0OXWe8xymW3YF1jXFeIN/puJs2x0bYLPkqHeoL73P9C3EED 8ACA== X-Forwarded-Encrypted: i=1; AJvYcCUdsIGRLuxPPbnw2I3xf0cU6HQFk4PnjTtziwuJUiSu3GTOp/+2471LFgbu7LdGBa8eonuMvuHyCI4=@passt.top X-Gm-Message-State: AOJu0Yyjh7MOLTxWA0B/limBJMuyFJOhy4ZYYbHvwv6h/YVvHqFwWA+Y w6/8wDwqE3snlA18j7odB3Y4xPXpBqjQYXAaWVFKKXDgiR9870tMlhJgcNJ6faQjCr4bNuR6OtU 3tw8rkHYKN3lYEHNeRY6g0rqErH9Z3gUvRjdQpqB1RH5NdCf1quHTFM+wL43AP3+GSgyQa0aZaV zfiFkVGaSD+MoTyhMm8jgect+AcRI1 X-Gm-Gg: ASbGncucxRgIFPA354WR5c1MSZ0+XJXI9qfCgfjG4Ly5tR2SLxzMEBYmA3Ey8QBFds1 699zCtwgvCj7QchZH0nFv6bScqBKDJzZQLG5UnMU2r9im9cOndoZvh4GvM1tdSeazBmdjntINmY lElcOJeg8JFOeCanH1orb7sg== X-Received: by 2002:a05:6402:5107:b0:62f:1e7a:f842 with SMTP id 4fb4d7f45d1cf-62fc0abc18amr11989344a12.27.1758525444136; Mon, 22 Sep 2025 00:17:24 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHRPPiJRmU69vwkYP7BAICDLefSHAJ+KG3v4H3fwVrWwui3pvKEXkKneOgERBPePvLYl5e7T+u7wEw35i9JRwA= X-Received: by 2002:a05:6402:5107:b0:62f:1e7a:f842 with SMTP id 4fb4d7f45d1cf-62fc0abc18amr11989321a12.27.1758525443626; Mon, 22 Sep 2025 00:17:23 -0700 (PDT) MIME-Version: 1.0 References: <20250911085519.24395-1-yuhuang@redhat.com> <20250911115425.79eaaac5@elisabeth> <20250915081319.00e72e53@elisabeth> <20250918091714.77192b00@elisabeth> In-Reply-To: From: Yumei Huang Date: Mon, 22 Sep 2025 15:17:12 +0800 X-Gm-Features: AS18NWAHaMCYIxmazTBRnT-uAsWbt9-Z1N3N8_XUNPqWmJxZSNgD15hf8_MDUec Message-ID: Subject: Re: [PATCH] tap: Drop frames if no client connected To: David Gibson X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: q0qiQr6dynrKzYRnzkLeraeKqRQPNc-PjzjuVbLY2YQ_1758525444 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Message-ID-Hash: VIRGCDHO2J66IWIGED5SBUP2ROUK74MI X-Message-ID-Hash: VIRGCDHO2J66IWIGED5SBUP2ROUK74MI X-MailFrom: yuhuang@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: Stefano Brivio , passt-dev@passt.top, lvivier@redhat.com X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Fri, Sep 19, 2025 at 9:38=E2=80=AFAM David Gibson wrote: > > On Thu, Sep 18, 2025 at 09:17:14AM +0200, Stefano Brivio wrote: > > On Thu, 18 Sep 2025 14:28:37 +1000 > > David Gibson wrote: > > > > > On Mon, Sep 15, 2025 at 08:13:19AM +0200, Stefano Brivio wrote: > > > > On Fri, 12 Sep 2025 12:01:37 +1000 > > > > David Gibson wrote: > > > > > > > > > On Thu, Sep 11, 2025 at 11:54:25AM +0200, Stefano Brivio wrote: > > > > > > On Thu, 11 Sep 2025 16:55:19 +0800 > > > > > > Yumei Huang wrote: > > > > > > > > > > > > > If no client is attached, discard outgoing frames and report = them as > > > > > > > sent. This mimics the behavior of a physical host with its ne= twork > > > > > > > cable unplugged. > > > > > > > > > > > > > > Suggested-by: David Gibson > > > > > > > Signed-off-by: Yumei Huang > > > > > > > > > > > > Thanks, the fix itself obviously makes sense, but I have a few = questions > > > > > > and comments: > > > > > > > > > > > > - first off, what happens if we don't return early in tap_send_= frames()? > > > > > > Commit messages for fixes (assuming this is a fix) should alw= ays say > > > > > > what concrete problem we had, what is going to be fixed, or i= f we're > > > > > > not aware of any real issue but things are just fragile / wro= ng > > > > > > > > > > Without this we will get an EBADF in either writev() (pasta) or > > > > > sendmsg() (passt). That's basically harmless, but a bit ugly. > > > > > Explicitly catching this case results in behaviour that's probabl= y a > > > > > bit clearer to debug if we hit it. > > > > > > > > > > Putting that context in the commit message would be useful. > > > > > > > > > > > - until a while ago, this couldn't happen at all. We were just = blocking > > > > > > the whole execution as long as the tap / guest / container in= terface > > > > > > wasn't up and running. > > > > > > > > > > > > I wonder when this changed and if it makes sense to go back t= o the > > > > > > previous behaviour. I had just a quick look and I wonder if I > > > > > > accidentally broke this in c9b241346569 ("conf, passt, tap: O= pen > > > > > > socket and PID files before switching UID/GID"). > > > > > > > > > > > > Before that, main() would call tap_sock_init(), which would c= all > > > > > > tap_sock_unix_open(), a blocking function. > > > > > > > > > > > > Should we make the whole thing blocking again? If not, is the= re > > > > > > anything else that's breaking with that? Timers, other inputs= , etc. > > > > > > > > > > I don't think we can quite do that. I'm not sure if it's the onl= y > > > > > reason, but for vhost-user I believe we need the epoll loop up an= d > > > > > running before we have the tap connection fully set up, because w= e > > > > > need it to process the vhost-user control messages. Laurent, can= you > > > > > verify? > > > > > > > > We discussed this in the past, before realising that the execution > > > > continues for whatever reason, and probably before I broke the > > > > assumption that guest connection was blocking. > > > > > > > > Yes, in the vhost-user case, the epoll loop needs to run before we = have > > > > a working connection to the guest, but: > > > > > > > > - we can anyway block until the control socket is set up (we used t= o do > > > > that) > > > > > > The vhost-user control socket? I'm not entirely sure what you mean b= y > > > "block" here. Since we need the epoll loop up, I don't see how we ca= n > > > block in the conventional sense. > > > > Let's rather say "until the data setup is complete". > > > > And by "block", I mean we would ignore any other event, obviously we > > have to listen to the control socket (in the main loop or in a separate > > dedicated loop). > > Ok. We could do that. I don't think the peer visible behaviour would > really be different from what we get now silently dropping frames to > tap. I'm not convinced it's really simpler than the current approach > either: > > * For now, we could just skip all epoll handling if the event type > isn't the control socket, but we'd need to be finer grained about > this if we add anything else that needs handling before guest > connection (e.g. dynamic configuration update mechanism and/or > netlink monitor) > > * Ignoring events in that way could lead to us busy-looping on epoll, > because we might not clear events. So we're back to having to > consider every event type, at least to some extent. > > > I'm not suggesting we do this though (see below). It's just a > > possibility. > > > > > > > > > - the vhost-user implementation autonomously throws data away recei= ved > > > > before that point > > > > > > Right. It doesn't have anywhere to put it, so it doesn't have much > > > choice. > > > > > > > Now, I don't think we necessarily need to stick to that approach, i= t > > > > was the obvious choice when passt was much simpler, and it keeps th= ings > > > > simple in the sense that we don't need to care about cases like the > > > > ones this patch is addressing. > > > > > > > > On the other hand, if we want to switch to a different model, we ne= ed > > > > to have a look at other possible breakages, I guess. > > > > > > > > > There are several different approaches we can take here. I discu= ssed > > > > > some with Yumei and suggested she take this one. Here's some > > > > > reasoning (maybe this would also be useful in the commit message, > > > > > though it's rather bulky) > > > > > > > > > > # Don't listen() until the tap connection is ready > > > > > > > > > > - It's not clear that the host rejecting the connection is bette= r > > > > > than the host accepting, then the connection stalling until th= e > > > > > guest is ready. > > > > > - Would require substantial rework because we currently listen()= as > > > > > we parse the command line and don't store the information we'd= need > > > > > to do it later. > > > > > > > > Right, that looks like a lot of effort for nothing. > > > > > > > > > # Don't accept() until the tap connection is ready > > > > > > > > > > - To the peer, will behave basically the same as this patch - th= e > > > > > host will complete the TCP handshake, then the connection will= stall > > > > > until the guest is ready. > > > > > > > > Same here. > > > > > > > > > - More work to implement, because essentially every sock-side ha= ndler > > > > > has to check fd_tap and abort early > > > > > > > > There's one substantial issue at TCP level, though, that we're keep= ing > > > > with the current approach and with this patch: we'll accept inbound > > > > connections and silently stall them. > > > > > > > > We could mitigate that by making the TCP handler aware of this, and= by > > > > resetting the connection if the guest isn't there. This would at le= ast > > > > be consistent with the case where the guest isn't listening on the = port > > > > (we accept(), fail to connect to it, eventually call tcp_rst()). > > > > > > True. Arguably less consistent with a non-passt-connected peer that'= s > > > not there though. Plus with the silently stall approach we have a > > > chance that the TCP connection will recover if the guest attaches > > > reasonably soon. > > > > > > > If we don't do this, I think we should at least check what happens = in > > > > terms of race conditions between passt starting and the guest appea= ring > > > > and accepting the connection. I guess we'll retry for a bit, which = is > > > > desirable, but we should check that the whole retrying thing actual= ly > > > > works. > > > > > > > > That's because the current approach just happened by accident. > > > > > > Right. I'm not entirely sure what concrete action you're suggesting > > > at this point, though. > > > > What I suggested in Monday's call and seemed to be all agreed upon, and > > also mentioned above: *check what happens*. > > > > Try that case, with this patch. > > > > Does it work to cover situations where users might start passt a bit > > before the guest connects, and try to connect to services right away? > > > > I suggested using ssh which should have a quite long timeout and retry > > connecting for a while. You mentioned you would assist Yumei in testing > > this if needed. > > Ah, yes, you're right and I'd forgotten that. Following up today. I tried both 'ssh' and 'socat'(writing a big file) before a guest connects, they get a 'Connection reset' after 10s, even if the guest connects in ~2s. It's because, when start ssh or socat, passt would try to finish the tcp handshake with the guest. It sends SYN to the guest immediately and waits for SYN-ACK. However, the SYN frame is dropped/lost due to no guest connected. So though the guest connects in seconds, the tcp handshake would timeout, and returns rst via tcp_rst(). Either with or without this patch, they got the same 'connection reset'. Maybe it's something to fix? > > -- > David Gibson (he or they) | I'll have my music baroque, and my code > david AT gibson.dropbear.id.au | minimalist, thank you, not the other wa= y > | around. > http://www.ozlabs.org/~dgibson --=20 Thanks, Yumei Huang