From: Jon Maloy <jmaloy@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: passt-dev@passt.top, sbrivio@redhat.com, lvivier@redhat.com,
dgibson@redhat.com
Subject: Re: [PATCH] udp: create and send ICMPv4 to local peer when applicable
Date: Thu, 13 Feb 2025 14:07:22 -0500 [thread overview]
Message-ID: <f745f11b-85ad-411f-9165-827c4244b088@redhat.com> (raw)
In-Reply-To: <db4d70c5-0295-430b-be82-2d0fcfbc5634@redhat.com>
On 2025-02-12 16:39, Jon Maloy wrote:
>
>
> On 2025-02-10 19:55, David Gibson wrote:
>> On Sun, Feb 09, 2025 at 12:00:56PM -0500, Jon Maloy wrote:
>>> When a local peer sends a UDP message to a non-existing port on an
>>> existing remote host, that host will return an ICMP message containing
>>> the error code ICMP_PORT_UNREACH, plus the header and the first eight
>>> bytes of the original message. If the sender socket has been connected,
>>> it uses this message to issue a "Connection Refused" event to the user.
>
> [...]
>
>>> +void *tap_push_ip4h(struct iphdr *ip4h, struct in_addr src,
>>> + struct in_addr dst, size_t l4len, uint8_t proto)
>>
>> As discussed on our call, it probably makes sense to alter this to set
>> the Don't Fragment bit always.
>
> I have been studying the Linux code, but this is handled in so many
> places that it it hard to get a full grip on it. Wireshark logs seem
> to indicate that it is always set, and googling a bit on the subject
> indicates the same. I think it is safe to do it, so if you agree I
> can post a prepending patch setting this.
>
>>
>>> {
>>> uint16_t l3len = l4len + sizeof(*ip4h);
>>> diff --git a/tap.h b/tap.h
>>> index dfbd8b9..3ba00c1 100644
>>> --- a/tap.h
>>> +++ b/tap.h
>>> @@ -44,6 +44,8 @@ static inline void tap_hdr_update(struct tap_hdr
>>> *thdr, size_t l2len)
>>> thdr->vnet_len = htonl(l2len);
>>> }
>
> [...]
>>> + * udp_send_conn_fail_icmp4() - Construct and send ICMP to local peer
>>> + * @c: Execution context
>>> + * @ee: Extended error descriptor
>>> + * @ref: epoll reference
>>> + * @iov: First bytes (max 8) of original UDP message body
>>
>> I don't love passing this as an iov, because that tends to imply there
>> might be multiple buffers, which is not the case here (both the caller
>> and callee require a single buffer).
>
> The real reason is that csum_udp4() requires an iov, so if we don't
> pass it here we will have to recreate it inside the new function.
>
>>
>>> + */
>>> +static void udp_send_conn_fail_icmp4(const struct ctx *c,
>>> + const struct sock_extended_err *ee,
>>> + union epoll_ref ref,
>>> + struct iovec *iov)
>>
>>
>>
>>> +{
>>> + flow_sidx_t tosidx = flow_sidx_opposite(ref.flowside);
>>
>> This is subtly, but badly wrong. ref.flowside is only valid if the
>> ref is of a type which uses the flowside field. That's true for UDP
>> reply sockets, but *not* for UDP listening sockets, and
>> udp_sock_errs() is called on both. I think this function needs to
>> take an explicit flowside.
>
> Yes, I was a little uncertain about this. Can you give me small code
> snippet of how this should be done?
>
>>
>>> + const struct flowside *toside = flowside_at_sidx(tosidx);
>>> + struct in_addr oaddr = toside->oaddr.v4mapped.a4;
>>> + struct in_addr eaddr = toside->eaddr.v4mapped.a4;
>>> + struct iov_tail data = IOV_TAIL(iov, 1, 0);
>>> + struct {
>>> + struct icmphdr icmp4h;
>>> + struct iphdr ip4h;
>>> + struct udphdr uh;
>>> + char data[8];
>>> + } msg;
>>
>> Needs a ((packed)) attribute to ensure we don't get compiler padding.
I get
udp.c:437:23: warning: taking address of packed member of ‘struct
<anonymous>’ may result in an unaligned pointer value
[-Waddress-of-packed-member]
437 | tap_push_ip4h(&msg.ip4h, eaddr, oaddr, udplen,
IPPROTO_UDP);
Also, the sizes of these members are 8+20+8+8, so I cannot see how
this struct can possibly be packed or how there can be an unaligned
pointer.
In short: ((packed)) seems unnecessary, and only causes problems.
/jon
>>
> ok.
>> It took me a minute to work out what was going on here. Specifically
>> that ip4h and uh here aren't the headers of the packet we're sending
>> now, but the reconstructed headers of the packet that prompted the
>> ICMP error.
>
> Yes, I try to re-create the original message as closely as possible.
>
>>
>>> + size_t udplen = sizeof(msg.uh) + iov->iov_len;
>>> + size_t msglen = sizeof(msg.icmp4h) + sizeof(msg.ip4h) + udplen;
>>> +
>>> + memset(&msg, 0, sizeof(msg));
>>> + msg.icmp4h.type = ee->ee_type;
>>> + msg.icmp4h.code = ee->ee_code;
>>> + tap_push_ip4h(&msg.ip4h, eaddr, oaddr, udplen, IPPROTO_UDP);
>>
>> This isn't quite a natural fit. It does work, but the tap_push_()
>> functions are kind of designed to work with a byte buffer where we
>> only work out the positions of headers as we construct them. Probably
>> a clean up for some other day.
>>
>>> + msg.uh.source = htons(toside->eport);
>>> + msg.uh.dest = htons(toside->oport);
>>> + msg.uh.len = htons(udplen);
>>> + csum_udp4(&msg.uh, oaddr, eaddr, &data);
>>
>> It might make sense to split a tap_push_uh4() function out from
>> tap_udp4_send() which you could re-use here.
>
> Good point.
>>
>>> + memcpy(&msg.data, iov->iov_base, iov->iov_len);
>>> + tap_icmp4_send(c, oaddr, eaddr, &msg, msglen);
>>> +}
>>> +
>>> /**
>>> * udp_sock_recverr() - Receive and clear an error from a socket
>>> - * @s: Socket to receive from
>>> + * @c: Execution context
>>> + * @ref: epoll reference
>>> *
>>> * Return: 1 if error received and processed, 0 if no more errors
>>> in queue, < 0
>>> * if there was an error reading the queue
>>> *
>>> * #syscalls recvmsg
>>> */
>>> -static int udp_sock_recverr(int s)
>>> +static int udp_sock_recverr(const struct ctx *c, union epoll_ref ref)
>>
>> Similarly udp_sock_recverr() is used both on "listening" and reply
>> sockets, so it's not really safe to use the ref here. You can
>> explicitly pass the fd easily enough. The flow is trickier...
>
> I can pass *both* the fd and ref, alternativedly the flowside, but
> it looks redundant and adds extra code.
> If I add a flowside struct instead of ref I will need to declare
> that at the all the multiple locations where udp_sock_errs().
> Do you have a good suggestion?
>
>>
>>> {
>>> const struct sock_extended_err *ee;
>>> const struct cmsghdr *hdr;
>>> char buf[CMSG_SPACE(sizeof(*ee))];
>>> + char udp_data[8];
>>> + int s = ref.fd;
>>> + struct iovec iov = {
>>> + .iov_base = udp_data,
>>> + .iov_len = sizeof(udp_data)
>
> [...]
>
>>> if (rc < 0)
>>> @@ -558,7 +607,7 @@ static void udp_buf_listen_sock_handler(const
>>> struct ctx *c,
>>> const socklen_t sasize = sizeof(udp_meta[0].s_in);
>>> int n, i;
>>> - if (udp_sock_errs(c, ref.fd, events) < 0) {
>>> + if (udp_sock_errs(c, ref, events) < 0) {
>>
>> ... because in the listening socket case we don't know the flow until
>> we've looked at the packet. I'd be tempted as a first cut to only do
>> the ICMP thing for errors on reply sockets.
>
> I can try that, but it would feel a little reduced.
>
>>
>> Listening sockets will be trickier. I think we'll have to actually
>> check msg_name in the received error in order to work out which flow
>> it belongs to.
>
> Actually, I did that first, until I realized I could extract the source
> IP address from the flow object. But how does that help?
>
> ///jon
>
>
next prev parent reply other threads:[~2025-02-13 19:07 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-09 17:00 [PATCH] udp: create and send ICMPv4 to local peer when applicable Jon Maloy
2025-02-10 10:27 ` Stefano Brivio
2025-02-11 0:55 ` David Gibson
2025-02-12 21:39 ` Jon Maloy
2025-02-12 23:21 ` David Gibson
2025-02-13 19:07 ` Jon Maloy [this message]
2025-02-13 19:17 ` Stefano Brivio
2025-02-13 19:28 ` Jon Maloy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f745f11b-85ad-411f-9165-827c4244b088@redhat.com \
--to=jmaloy@redhat.com \
--cc=david@gibson.dropbear.id.au \
--cc=dgibson@redhat.com \
--cc=lvivier@redhat.com \
--cc=passt-dev@passt.top \
--cc=sbrivio@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).