public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: Jon Maloy <jmaloy@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: passt-dev@passt.top, sbrivio@redhat.com, lvivier@redhat.com,
	dgibson@redhat.com
Subject: Re: [PATCH] udp: create and send ICMPv4 to local peer when applicable
Date: Thu, 13 Feb 2025 14:07:22 -0500	[thread overview]
Message-ID: <f745f11b-85ad-411f-9165-827c4244b088@redhat.com> (raw)
In-Reply-To: <db4d70c5-0295-430b-be82-2d0fcfbc5634@redhat.com>



On 2025-02-12 16:39, Jon Maloy wrote:
> 
> 
> On 2025-02-10 19:55, David Gibson wrote:
>> On Sun, Feb 09, 2025 at 12:00:56PM -0500, Jon Maloy wrote:
>>> When a local peer sends a UDP message to a non-existing port on an
>>> existing remote host, that host will return an ICMP message containing
>>> the error code ICMP_PORT_UNREACH, plus the header and the first eight
>>> bytes of the original message. If the sender socket has been connected,
>>> it uses this message to issue a "Connection Refused" event to the user.
> 
> [...]
> 
>>> +void *tap_push_ip4h(struct iphdr *ip4h, struct in_addr src,
>>> +            struct in_addr dst, size_t l4len, uint8_t proto)
>>
>> As discussed on our call, it probably makes sense to alter this to set
>> the Don't Fragment bit always.
> 
> I have been studying the Linux code, but this is handled in so many 
> places that it it hard to get a full grip on it. Wireshark logs seem
> to indicate that it is always set, and googling a bit on the subject
> indicates the same. I think it is safe to do it, so if you agree I
> can post a prepending patch setting this.
> 
>>
>>>   {
>>>       uint16_t l3len = l4len + sizeof(*ip4h);
>>> diff --git a/tap.h b/tap.h
>>> index dfbd8b9..3ba00c1 100644
>>> --- a/tap.h
>>> +++ b/tap.h
>>> @@ -44,6 +44,8 @@ static inline void tap_hdr_update(struct tap_hdr 
>>> *thdr, size_t l2len)
>>>           thdr->vnet_len = htonl(l2len);
>>>   }
> 
> [...]
>>> + * udp_send_conn_fail_icmp4() - Construct and send ICMP to local peer
>>> + * @c:              Execution context
>>> + * @ee:      Extended error descriptor
>>> + * @ref:      epoll reference
>>> + * @iov:      First bytes (max 8) of original UDP message body
>>
>> I don't love passing this as an iov, because that tends to imply there
>> might be multiple buffers, which is not the case here (both the caller
>> and callee require a single buffer).
> 
> The real reason is that csum_udp4() requires an iov, so if we don't
> pass it here we will have to recreate it inside the new function.
> 
>>
>>> + */
>>> +static void udp_send_conn_fail_icmp4(const struct ctx *c,
>>> +                     const struct sock_extended_err *ee,
>>> +                     union epoll_ref ref,
>>> +                     struct iovec *iov)
>>
>>
>>
>>> +{
>>> +    flow_sidx_t tosidx = flow_sidx_opposite(ref.flowside);
>>
>> This is subtly, but badly wrong.  ref.flowside is only valid if the
>> ref is of a type which uses the flowside field.  That's true for UDP
>> reply sockets, but *not* for UDP listening sockets, and
>> udp_sock_errs() is called on both.  I think this function needs to
>> take an explicit flowside.
> 
> Yes, I was a little uncertain about this. Can you give me small code
> snippet of how this should be done?
> 
>>
>>> +    const struct flowside *toside = flowside_at_sidx(tosidx);
>>> +    struct in_addr oaddr = toside->oaddr.v4mapped.a4;
>>> +    struct in_addr eaddr = toside->eaddr.v4mapped.a4;
>>> +    struct iov_tail data = IOV_TAIL(iov, 1, 0);
>>> +    struct {
>>> +        struct icmphdr icmp4h;
>>> +        struct iphdr ip4h;
>>> +        struct udphdr uh;
>>> +        char data[8];
>>> +    } msg;
>>
>> Needs a ((packed)) attribute to ensure we don't get compiler padding.

I get
udp.c:437:23: warning: taking address of packed member of ‘struct 
<anonymous>’ may result in an unaligned pointer value 
[-Waddress-of-packed-member]
   437 |         tap_push_ip4h(&msg.ip4h, eaddr, oaddr, udplen, 
IPPROTO_UDP);

Also, the sizes of these members are 8+20+8+8, so I cannot see how
this struct can possibly be packed or how there can be an unaligned
pointer.
In short: ((packed)) seems unnecessary, and only causes problems.

/jon

>>
> ok.
>> It took me a minute to work out what was going on here.  Specifically
>> that ip4h and uh here aren't the headers of the packet we're sending
>> now, but the reconstructed headers of the packet that prompted the
>> ICMP error.
> 
> Yes, I try to re-create the original message as closely as possible.
> 
>>
>>> +    size_t udplen = sizeof(msg.uh) + iov->iov_len;
>>> +    size_t msglen = sizeof(msg.icmp4h) + sizeof(msg.ip4h) + udplen;
>>> +
>>> +    memset(&msg, 0, sizeof(msg));
>>> +    msg.icmp4h.type = ee->ee_type;
>>> +    msg.icmp4h.code = ee->ee_code;
>>> +    tap_push_ip4h(&msg.ip4h, eaddr, oaddr, udplen, IPPROTO_UDP);
>>
>> This isn't quite a natural fit.  It does work, but the tap_push_()
>> functions are kind of designed to work with a byte buffer where we
>> only work out the positions of headers as we construct them.  Probably
>> a clean up for some other day.
>>
>>> +    msg.uh.source = htons(toside->eport);
>>> +    msg.uh.dest = htons(toside->oport);
>>> +    msg.uh.len = htons(udplen);
>>> +    csum_udp4(&msg.uh, oaddr, eaddr, &data);
>>
>> It might make sense to split a tap_push_uh4() function out from
>> tap_udp4_send() which you could re-use here.
> 
> Good point.
>>
>>> +    memcpy(&msg.data, iov->iov_base, iov->iov_len);
>>> +    tap_icmp4_send(c, oaddr, eaddr, &msg, msglen);
>>> +}
>>> +
>>>   /**
>>>    * udp_sock_recverr() - Receive and clear an error from a socket
>>> - * @s:        Socket to receive from
>>> + * @c:              Execution context
>>> + * @ref:      epoll reference
>>>    *
>>>    * Return: 1 if error received and processed, 0 if no more errors 
>>> in queue, < 0
>>>    *         if there was an error reading the queue
>>>    *
>>>    * #syscalls recvmsg
>>>    */
>>> -static int udp_sock_recverr(int s)
>>> +static int udp_sock_recverr(const struct ctx *c, union epoll_ref ref)
>>
>> Similarly udp_sock_recverr() is used both on "listening" and reply
>> sockets, so it's not really safe to use the ref here.  You can
>> explicitly pass the fd easily enough.  The flow is trickier...
> 
> I can pass *both* the fd and ref, alternativedly the flowside, but
> it looks redundant and adds extra code.
> If I add a flowside struct instead of ref I will need to declare
> that at the all the multiple locations where udp_sock_errs().
> Do you have a good suggestion?
> 
>>
>>>   {
>>>       const struct sock_extended_err *ee;
>>>       const struct cmsghdr *hdr;
>>>       char buf[CMSG_SPACE(sizeof(*ee))];
>>> +    char udp_data[8];
>>> +    int s = ref.fd;
>>> +    struct iovec iov = {
>>> +        .iov_base = udp_data,
>>> +        .iov_len = sizeof(udp_data)
> 
> [...]
> 
>>>       if (rc < 0)
>>> @@ -558,7 +607,7 @@ static void udp_buf_listen_sock_handler(const 
>>> struct ctx *c,
>>>       const socklen_t sasize = sizeof(udp_meta[0].s_in);
>>>       int n, i;
>>> -    if (udp_sock_errs(c, ref.fd, events) < 0) {
>>> +    if (udp_sock_errs(c, ref, events) < 0) {
>>
>> ... because in the listening socket case we don't know the flow until
>> we've looked at the packet.  I'd be tempted as a first cut to only do
>> the ICMP thing for errors on reply sockets.
> 
> I can try that, but it would feel a little reduced.
> 
>>
>> Listening sockets will be trickier.  I think we'll have to actually
>> check msg_name in the received error in order to work out which flow
>> it belongs to.
> 
> Actually, I did that first, until I realized I could extract the source
> IP address from the flow object. But how does that help?
> 
> ///jon
> 
> 


  parent reply	other threads:[~2025-02-13 19:07 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-09 17:00 [PATCH] udp: create and send ICMPv4 to local peer when applicable Jon Maloy
2025-02-10 10:27 ` Stefano Brivio
2025-02-11  0:55 ` David Gibson
2025-02-12 21:39   ` Jon Maloy
2025-02-12 23:21     ` David Gibson
2025-02-13 19:07     ` Jon Maloy [this message]
2025-02-13 19:17       ` Stefano Brivio
2025-02-13 19:28         ` Jon Maloy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f745f11b-85ad-411f-9165-827c4244b088@redhat.com \
    --to=jmaloy@redhat.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=dgibson@redhat.com \
    --cc=lvivier@redhat.com \
    --cc=passt-dev@passt.top \
    --cc=sbrivio@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).