From: Jon Maloy <jmaloy@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: passt-dev@passt.top, sbrivio@redhat.com, lvivier@redhat.com,
dgibson@redhat.com
Subject: Re: [PATCH] udp: create and send ICMPv4 to local peer when applicable
Date: Wed, 12 Feb 2025 16:39:07 -0500 [thread overview]
Message-ID: <db4d70c5-0295-430b-be82-2d0fcfbc5634@redhat.com> (raw)
In-Reply-To: <Z6qf5DdO_XbkLKZd@zatzit>
On 2025-02-10 19:55, David Gibson wrote:
> On Sun, Feb 09, 2025 at 12:00:56PM -0500, Jon Maloy wrote:
>> When a local peer sends a UDP message to a non-existing port on an
>> existing remote host, that host will return an ICMP message containing
>> the error code ICMP_PORT_UNREACH, plus the header and the first eight
>> bytes of the original message. If the sender socket has been connected,
>> it uses this message to issue a "Connection Refused" event to the user.
[...]
>> +void *tap_push_ip4h(struct iphdr *ip4h, struct in_addr src,
>> + struct in_addr dst, size_t l4len, uint8_t proto)
>
> As discussed on our call, it probably makes sense to alter this to set
> the Don't Fragment bit always.
I have been studying the Linux code, but this is handled in so many
places that it it hard to get a full grip on it. Wireshark logs seem
to indicate that it is always set, and googling a bit on the subject
indicates the same. I think it is safe to do it, so if you agree I
can post a prepending patch setting this.
>
>> {
>> uint16_t l3len = l4len + sizeof(*ip4h);
>>
>> diff --git a/tap.h b/tap.h
>> index dfbd8b9..3ba00c1 100644
>> --- a/tap.h
>> +++ b/tap.h
>> @@ -44,6 +44,8 @@ static inline void tap_hdr_update(struct tap_hdr *thdr, size_t l2len)
>> thdr->vnet_len = htonl(l2len);
>> }
>>
[...]
>> + * udp_send_conn_fail_icmp4() - Construct and send ICMP to local peer
>> + * @c: Execution context
>> + * @ee: Extended error descriptor
>> + * @ref: epoll reference
>> + * @iov: First bytes (max 8) of original UDP message body
>
> I don't love passing this as an iov, because that tends to imply there
> might be multiple buffers, which is not the case here (both the caller
> and callee require a single buffer).
The real reason is that csum_udp4() requires an iov, so if we don't
pass it here we will have to recreate it inside the new function.
>
>> + */
>> +static void udp_send_conn_fail_icmp4(const struct ctx *c,
>> + const struct sock_extended_err *ee,
>> + union epoll_ref ref,
>> + struct iovec *iov)
>
>
>
>> +{
>> + flow_sidx_t tosidx = flow_sidx_opposite(ref.flowside);
>
> This is subtly, but badly wrong. ref.flowside is only valid if the
> ref is of a type which uses the flowside field. That's true for UDP
> reply sockets, but *not* for UDP listening sockets, and
> udp_sock_errs() is called on both. I think this function needs to
> take an explicit flowside.
Yes, I was a little uncertain about this. Can you give me small code
snippet of how this should be done?
>
>> + const struct flowside *toside = flowside_at_sidx(tosidx);
>> + struct in_addr oaddr = toside->oaddr.v4mapped.a4;
>> + struct in_addr eaddr = toside->eaddr.v4mapped.a4;
>> + struct iov_tail data = IOV_TAIL(iov, 1, 0);
>> + struct {
>> + struct icmphdr icmp4h;
>> + struct iphdr ip4h;
>> + struct udphdr uh;
>> + char data[8];
>> + } msg;
>
> Needs a ((packed)) attribute to ensure we don't get compiler padding.
>
ok.
> It took me a minute to work out what was going on here. Specifically
> that ip4h and uh here aren't the headers of the packet we're sending
> now, but the reconstructed headers of the packet that prompted the
> ICMP error.
Yes, I try to re-create the original message as closely as possible.
>
>> + size_t udplen = sizeof(msg.uh) + iov->iov_len;
>> + size_t msglen = sizeof(msg.icmp4h) + sizeof(msg.ip4h) + udplen;
>> +
>> + memset(&msg, 0, sizeof(msg));
>> + msg.icmp4h.type = ee->ee_type;
>> + msg.icmp4h.code = ee->ee_code;
>> + tap_push_ip4h(&msg.ip4h, eaddr, oaddr, udplen, IPPROTO_UDP);
>
> This isn't quite a natural fit. It does work, but the tap_push_()
> functions are kind of designed to work with a byte buffer where we
> only work out the positions of headers as we construct them. Probably
> a clean up for some other day.
>
>> + msg.uh.source = htons(toside->eport);
>> + msg.uh.dest = htons(toside->oport);
>> + msg.uh.len = htons(udplen);
>> + csum_udp4(&msg.uh, oaddr, eaddr, &data);
>
> It might make sense to split a tap_push_uh4() function out from
> tap_udp4_send() which you could re-use here.
Good point.
>
>> + memcpy(&msg.data, iov->iov_base, iov->iov_len);
>> + tap_icmp4_send(c, oaddr, eaddr, &msg, msglen);
>> +}
>> +
>> /**
>> * udp_sock_recverr() - Receive and clear an error from a socket
>> - * @s: Socket to receive from
>> + * @c: Execution context
>> + * @ref: epoll reference
>> *
>> * Return: 1 if error received and processed, 0 if no more errors in queue, < 0
>> * if there was an error reading the queue
>> *
>> * #syscalls recvmsg
>> */
>> -static int udp_sock_recverr(int s)
>> +static int udp_sock_recverr(const struct ctx *c, union epoll_ref ref)
>
> Similarly udp_sock_recverr() is used both on "listening" and reply
> sockets, so it's not really safe to use the ref here. You can
> explicitly pass the fd easily enough. The flow is trickier...
I can pass *both* the fd and ref, alternativedly the flowside, but
it looks redundant and adds extra code.
If I add a flowside struct instead of ref I will need to declare
that at the all the multiple locations where udp_sock_errs().
Do you have a good suggestion?
>
>> {
>> const struct sock_extended_err *ee;
>> const struct cmsghdr *hdr;
>> char buf[CMSG_SPACE(sizeof(*ee))];
>> + char udp_data[8];
>> + int s = ref.fd;
>> + struct iovec iov = {
>> + .iov_base = udp_data,
>> + .iov_len = sizeof(udp_data)
[...]
>> if (rc < 0)
>> @@ -558,7 +607,7 @@ static void udp_buf_listen_sock_handler(const struct ctx *c,
>> const socklen_t sasize = sizeof(udp_meta[0].s_in);
>> int n, i;
>>
>> - if (udp_sock_errs(c, ref.fd, events) < 0) {
>> + if (udp_sock_errs(c, ref, events) < 0) {
>
> ... because in the listening socket case we don't know the flow until
> we've looked at the packet. I'd be tempted as a first cut to only do
> the ICMP thing for errors on reply sockets.
I can try that, but it would feel a little reduced.
>
> Listening sockets will be trickier. I think we'll have to actually
> check msg_name in the received error in order to work out which flow
> it belongs to.
Actually, I did that first, until I realized I could extract the source
IP address from the flow object. But how does that help?
///jon
next prev parent reply other threads:[~2025-02-12 21:39 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-09 17:00 [PATCH] udp: create and send ICMPv4 to local peer when applicable Jon Maloy
2025-02-10 10:27 ` Stefano Brivio
2025-02-11 0:55 ` David Gibson
2025-02-12 21:39 ` Jon Maloy [this message]
2025-02-12 23:21 ` David Gibson
2025-02-13 19:07 ` Jon Maloy
2025-02-13 19:17 ` Stefano Brivio
2025-02-13 19:28 ` Jon Maloy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=db4d70c5-0295-430b-be82-2d0fcfbc5634@redhat.com \
--to=jmaloy@redhat.com \
--cc=david@gibson.dropbear.id.au \
--cc=dgibson@redhat.com \
--cc=lvivier@redhat.com \
--cc=passt-dev@passt.top \
--cc=sbrivio@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).