From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202506 header.b=oR4QrPYT; dkim-atps=neutral Received: from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id E303B5A0282 for ; Wed, 23 Jul 2025 02:22:59 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202506; t=1753230012; bh=F+CL5rgk4C4RcuPgzPIlNrSOFFa7TGjOT2aP0/Ubz9Y=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=oR4QrPYTxBjDpR+UNORBGeACCkkoJY4q7g39kVGOI5Xkei04LVR+U+48Nil2gn98y a+Sd4dY0tw67ZyvSUcl9p/sEv/PZDrG1WiR/IadfBPoqsl7IjQuUkAC3HFQ48zcb+W 8M3Ar+y2qZhhEDMqZcRu8EAEIIxjvYuhyn8iZHPuPl84nb25Ix2oeXFv/Q2eUwVz/a pXYh0nPwcaKsGN45KSAl3k3Cq4bt7cNcyZ7XwuA1Yp+q927XNrz0DaOGtauq17SSED 8kjCXfUhOnjwBJ0dwoCBx7m/5BRCbszYGa88skHpCgPt+Jwz8NHbfnJ27eZ7/mtB8n CNN8k4jzjVOAw== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4bmvsm32kJz4wcd; Wed, 23 Jul 2025 10:20:12 +1000 (AEST) Date: Wed, 23 Jul 2025 10:22:35 +1000 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH v2] treewide: By default, don't quit source after migration, keep sockets open Message-ID: References: <20250722211244.3129523-1-sbrivio@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="88Kg7dwnN9ulFD+d" Content-Disposition: inline In-Reply-To: <20250722211244.3129523-1-sbrivio@redhat.com> Message-ID-Hash: BRAWGK67H2IK2R6WOKRW24C3F2NLKVBG X-Message-ID-Hash: BRAWGK67H2IK2R6WOKRW24C3F2NLKVBG X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, Nir Dothan X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --88Kg7dwnN9ulFD+d Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Jul 22, 2025 at 11:12:44PM +0200, Stefano Brivio wrote: > We are hitting an issue in the KubeVirt integration where some data is > still sent to the source instance even after migration is complete. As > we exit, the kernel closes our sockets and resets connections. The > resulting RST segments are sent to peers, effectively terminating > connections that were meanwhile migrated. >=20 > At the moment, this is not done intentionally, but in the future > KubeVirt might enable OVN-Kubernetes features where source and > destination nodes are explicitly getting mirrored traffic for a while, > in order to decrease migration downtime. >=20 > By default, don't quit after migration is completed on the source: the > previous behaviour can be enabled with the new, but deprecated, > --migrate-exit option. >=20 > Also, by default, keep migrated TCP sockets open (in repair mode) as > long as we're running, and ignore events on any epoll descriptor > representing data channels. The previous behaviour can be enabled with > the new, equally deprecated, --migrate-no-linger option. >=20 > By keeping sockets open, and not exiting, we prevent the kernel > running on the source node to send out RST segments if further data > reaches us. >=20 > Reported-by: Nir Dothan > Signed-off-by: Stefano Brivio Reviewed-by: David Gibson > --- > v2: > - assorted changes in commit message > - context variable ignore_linger becomes ignore_no_linger > - new options are deprecated > - don't ignore events on some descriptors, drop them from epoll >=20 > conf.c | 22 ++++++++++++++++++++++ > flow.c | 2 +- > passt.1 | 24 ++++++++++++++++++++++++ > passt.h | 4 ++++ > tcp.c | 9 +++++++-- > tcp_conn.h | 3 ++- > test/lib/setup | 4 ++-- > vhost_user.c | 8 ++++++-- > 8 files changed, 68 insertions(+), 8 deletions(-) >=20 > diff --git a/conf.c b/conf.c > index 6c747aa..f47f48e 100644 > --- a/conf.c > +++ b/conf.c > @@ -864,6 +864,14 @@ static void usage(const char *name, FILE *f, int sta= tus) > FPRINTF(f, > " --repair-path PATH path for passt-repair(1)\n" > " default: append '.repair' to UNIX domain path\n"); > + FPRINTF(f, > + " --migrate-exit DEPRECATED:\n" > + " source quits after migration\n" > + " default: source keeps running after migration\n"); > + FPRINTF(f, > + " --migrate-no-linger DEPRECATED:\n" > + " close sockets on migration\n" > + " default: keep sockets open, ignore events\n"); > } > =20 > FPRINTF(f, > @@ -1470,6 +1478,8 @@ void conf(struct ctx *c, int argc, char **argv) > {"socket-path", required_argument, NULL, 's' }, > {"fqdn", required_argument, NULL, 27 }, > {"repair-path", required_argument, NULL, 28 }, > + {"migrate-exit", no_argument, NULL, 29 }, > + {"migrate-no-linger", no_argument, NULL, 30 }, > { 0 }, > }; > const char *optstring =3D "+dqfel:hs:F:I:p:P:m:a:n:M:g:i:o:D:S:H:461t:u= :T:U:"; > @@ -1686,6 +1696,18 @@ void conf(struct ctx *c, int argc, char **argv) > optarg)) > die("Invalid passt-repair path: %s", optarg); > =20 > + break; > + case 29: > + if (c->mode !=3D MODE_VU) > + die("--migrate-exit is for vhost-user mode only"); > + c->migrate_exit =3D true; > + > + break; > + case 30: > + if (c->mode !=3D MODE_VU) > + die("--migrate-no-linger is for vhost-user mode only"); > + c->migrate_no_linger =3D true; > + > break; > case 'd': > c->debug =3D 1; > diff --git a/flow.c b/flow.c > index 00885f6..feefda3 100644 > --- a/flow.c > +++ b/flow.c > @@ -1091,7 +1091,7 @@ int flow_migrate_source(struct ctx *c, const struct= migrate_stage *stage, > * as EIO). > */ > foreach_established_tcp_flow(flow) { > - rc =3D tcp_flow_migrate_source_ext(fd, &flow->tcp); > + rc =3D tcp_flow_migrate_source_ext(c, fd, &flow->tcp); > if (rc) { > flow_err(flow, "Can't send extended data: %s", > strerror_(-rc)); > diff --git a/passt.1 b/passt.1 > index 60066c2..94363b7 100644 > --- a/passt.1 > +++ b/passt.1 > @@ -439,6 +439,30 @@ Default, for \-\-vhost-user mode only, is to append = \fI.repair\fR to the path > chosen for the hypervisor UNIX domain socket. No socket is created if no= t in > \-\-vhost-user mode. > =20 > +.TP > +.BR \-\-migrate-exit (DEPRECATED) > +Exit after a completed migration as source. By default, \fBpasst\fR keeps > +running and the migrated guest can continue using its connection, or a n= ew guest > +can connect. > + > +Note that this configuration option is \fBdeprecated\fR and will be remo= ved in a > +future version. It is not expected to be of any use, and it simply refle= cts a > +legacy behaviour. If you have any use for this, refer to \fBREPORTING BU= GS\fR > +below. > + > +.TP > +.BR \-\-migrate-no-linger (DEPRECATED) > +Close TCP sockets on the source instance once migration completes. > + > +By default, sockets are kept open, and events on data sockets are ignore= d, so > +that any further message reaching sockets after the source migrated is s= ilently > +ignored, to avoid connection resets in case data is received after migra= tion. > + > +Note that this configuration option is \fBdeprecated\fR and will be remo= ved in a > +future version. It is not expected to be of any use, and it simply refle= cts a > +legacy behaviour. If you have any use for this, refer to \fBREPORTING BU= GS\fR > +below. > + > .TP > .BR \-F ", " \-\-fd " " \fIFD > Pass a pre-opened, connected socket to \fBpasst\fR. Usually the socket i= s opened > diff --git a/passt.h b/passt.h > index 8693794..4cfd6eb 100644 > --- a/passt.h > +++ b/passt.h > @@ -241,6 +241,8 @@ struct ip6_ctx { > * @device_state_fd: Device state migration channel > * @device_state_result: Device state migration result > * @migrate_target: Are we the target, on the next migration request? > + * @migrate_no_linger: Close sockets as we migrate them > + * @migrate_exit: Exit (on source) once migration is complete > */ > struct ctx { > enum passt_modes mode; > @@ -318,6 +320,8 @@ struct ctx { > int device_state_fd; > int device_state_result; > bool migrate_target; > + bool migrate_no_linger; > + bool migrate_exit; > }; > =20 > void proto_update_l2_buf(const unsigned char *eth_d, > diff --git a/tcp.c b/tcp.c > index 2b88466..b10ad8a 100644 > --- a/tcp.c > +++ b/tcp.c > @@ -3286,12 +3286,14 @@ int tcp_flow_migrate_source(int fd, struct tcp_ta= p_conn *conn) > =20 > /** > * tcp_flow_migrate_source_ext() - Dump queues, close sockets, send fina= l data > + * @c: Execution context > * @fd: Descriptor for state migration > * @conn: Pointer to the TCP connection structure > * > * Return: 0 on success, negative (not -EIO) on failure, -EIO on sending= failure > */ > -int tcp_flow_migrate_source_ext(int fd, const struct tcp_tap_conn *conn) > +int tcp_flow_migrate_source_ext(struct ctx *c, > + int fd, const struct tcp_tap_conn *conn) > { > uint32_t peek_offset =3D conn->seq_to_tap - conn->seq_ack_from_tap; > struct tcp_tap_transfer_ext *t =3D &migrate_ext[FLOW_IDX(conn)]; > @@ -3336,7 +3338,10 @@ int tcp_flow_migrate_source_ext(int fd, const stru= ct tcp_tap_conn *conn) > if ((rc =3D tcp_flow_dump_seq(conn, &t->seq_rcv))) > goto fail; > =20 > - close(s); > + if (c->migrate_no_linger) > + close(s); > + else > + epoll_del(c, s); > =20 > /* Adjustments unrelated to FIN segments: sequence numbers we dumped are > * based on the end of the queues. > diff --git a/tcp_conn.h b/tcp_conn.h > index 35d813d..d49ae88 100644 > --- a/tcp_conn.h > +++ b/tcp_conn.h > @@ -236,7 +236,8 @@ int tcp_flow_repair_on(struct ctx *c, const struct tc= p_tap_conn *conn); > int tcp_flow_repair_off(struct ctx *c, const struct tcp_tap_conn *conn); > =20 > int tcp_flow_migrate_source(int fd, struct tcp_tap_conn *conn); > -int tcp_flow_migrate_source_ext(int fd, const struct tcp_tap_conn *conn); > +int tcp_flow_migrate_source_ext(struct ctx *c, int fd, > + const struct tcp_tap_conn *conn); > =20 > int tcp_flow_migrate_target(struct ctx *c, int fd); > int tcp_flow_migrate_target_ext(struct ctx *c, struct tcp_tap_conn *conn= , int fd); > diff --git a/test/lib/setup b/test/lib/setup > index 575bc21..5994598 100755 > --- a/test/lib/setup > +++ b/test/lib/setup > @@ -350,7 +350,7 @@ setup_migrate() { > =20 > sleep 1 > =20 > - __opts=3D"--vhost-user" > + __opts=3D"--vhost-user --migrate-exit --migrate-no-linger" > [ ${PCAP} -eq 1 ] && __opts=3D"${__opts} -p ${LOGDIR}/passt_1.pcap" > [ ${DEBUG} -eq 1 ] && __opts=3D"${__opts} -d" > [ ${TRACE} -eq 1 ] && __opts=3D"${__opts} --trace" > @@ -360,7 +360,7 @@ setup_migrate() { > =20 > context_run_bg passt_repair_1 "./passt-repair ${STATESETUP}/passt_1.soc= ket.repair" > =20 > - __opts=3D"--vhost-user" > + __opts=3D"--vhost-user --migrate-exit --migrate-no-linger" > [ ${PCAP} -eq 1 ] && __opts=3D"${__opts} -p ${LOGDIR}/passt_2.pcap" > [ ${DEBUG} -eq 1 ] && __opts=3D"${__opts} -d" > [ ${TRACE} -eq 1 ] && __opts=3D"${__opts} --trace" > diff --git a/vhost_user.c b/vhost_user.c > index e7fb049..4bfa941 100644 > --- a/vhost_user.c > +++ b/vhost_user.c > @@ -1207,7 +1207,11 @@ void vu_control_handler(struct vu_dev *vdev, int f= d, uint32_t events) > if (vmsg.hdr.request =3D=3D VHOST_USER_CHECK_DEVICE_STATE && > vdev->context->device_state_result =3D=3D 0 && > !vdev->context->migrate_target) { > - info("Migration complete, exiting"); > - _exit(EXIT_SUCCESS); > + if (vdev->context->migrate_exit) { > + info("Migration complete, exiting"); > + _exit(EXIT_SUCCESS); > + } > + > + info("Migration complete"); > } > } --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --88Kg7dwnN9ulFD+d Content-Type: application/pgp-signature; name=signature.asc -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmiAKzwACgkQzQJF27ox 2Gfwsg/8CIFXFgFzzr7QTioC7u9190EH8+aUw6sJBhi/L5C6Taj73fZToJs2FY40 pX7iEI4cHc/HL1QS2GgnUz80/BOprSJ9/ghPHkanvKwDZ7V2tkTrSqYKVikeSH4K X3IpRVFKVgnS/nYkjSeXKSJBGPpIycnSvcex9sSVS4iGfm+edyHbUSvluPhbjEBd IYP2H13qTUkhJh8LAvFlxHhnpOkmptk1BguIdq4vBgZIRmSaXTIPvoBc5XgPFJDG eUaOvOIyvCZO2cZ2fJOna9mxQS+0cndSSVaPIhRVzamlaB7Ur7ATRbN9iV+Gb4T9 BUew/vkBhkEY3/QClwpEf0LCJNdzmY9hHKKvdQgSk2hZj/AYyU5deYHRdbKMbeTt RQDQcZ4Mj+M2+01DygS6kekd2wlwOpIqcKuGQAgV/ylQg4j0nPlPhVxgYs6focSQ 2aiZoBtBYTwgJyKjTBHPqstcAgvZzhtdQrJuuKqcy3jymOIDJWsYGJkfweZlbZBC XnuxUfwZ42KpForb/+U6Hmk7pAX9y0RI/mIaLxK/+pljPHyTygyqkDZddbXfYFd3 wd/TqUH+I/G/TjRYcZNJY/dY1ygagQegkf7Yn7w05jA3sFBuix8HNFA8NJc+vSIU VDzMEr2RCD0GqmBZ8rho+1LfHdBG/zkusEF1q3RuNfT4FT50LW8= =R/zG -----END PGP SIGNATURE----- --88Kg7dwnN9ulFD+d--