On Fri, Nov 25, 2022 at 06:29:16PM +1100, David Gibson wrote: > udp_sock_handler_splice() reads a whole batch of datagrams at once with > recvmmsg(). It then forwards them all via a single socket on the other > side, based on the source port. > > However, it's entirely possible that the datagrams in the set have > different source ports, and thus ought to be forwarded via different > sockets on the destination side. In fact this situation arises with the > iperf -P4 throughput tests in our own test suite. AFAICT we only get away > with this because iperf3 is strictly one way and doesn't send reply packets > which would be misdirected because of the incorrect source ports. > > Alter udp_sock_handler_splice() to split the packets it receives into > batches with the same source address and send each batch with a separate > sendmmsg(). > > For now we only look for already contiguous batches, which means that if > there are multiple active flows interleaved this is likely to degenerate > to batches of size 1. For now this is the simplest way to correct the > behaviour and we can try to optimize later. > > Signed-off-by: David Gibson Oops, replied to the wrong version. v3 still has the nasty bug, do not apply until I send a v4. > --- > udp.c | 27 +++++++++++++++++++-------- > 1 file changed, 19 insertions(+), 8 deletions(-) > > diff --git a/udp.c b/udp.c > index 899dcbb..eaf7a52 100644 > --- a/udp.c > +++ b/udp.c > @@ -571,9 +571,9 @@ static void udp_splice_sendfrom(const struct ctx *c, struct mmsghdr *mmh, int n, > static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref, > uint32_t events, const struct timespec *now) > { > - in_port_t src, dst = ref.r.p.udp.udp.port; > + in_port_t dst = ref.r.p.udp.udp.port; > + int v6 = ref.r.p.udp.udp.v6, n, i, m; > struct mmsghdr *mmh_recv, *mmh_send; > - int v6 = ref.r.p.udp.udp.v6, n, i; > > if (!(events & EPOLLIN)) > return; > @@ -609,12 +609,23 @@ static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref, > }); > } > > - for (i = 0; i < n; i++) > - mmh_send[i].msg_hdr.msg_iov->iov_len = mmh_recv[i].msg_len; > - > - src = sa_port(v6, mmh_recv[0].msg_hdr.msg_name); > - udp_splice_sendfrom(c, mmh_send, n, src, ref.r.p.udp.udp.port, > - v6, ref.r.p.udp.udp.ns, ref.r.p.udp.udp.orig, now); > + for (i = 0; i < n; i += m) { > + const struct mmsghdr *mmh = &mmh_recv[i]; > + in_port_t src = sa_port(v6, mmh->msg_hdr.msg_name); > + > + m = 0; > + do { > + mmh_send[i + m].msg_hdr.msg_iov->iov_len = mmh->msg_len; > + mmh++; > + m++; > + } while (sa_port(v6, mmh->msg_hdr.msg_name) == src); Buffer overflow here, because I didn't check i + m against n. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson