public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: Jon Maloy <jmaloy@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: passt-dev@passt.top, sbrivio@redhat.com, lvivier@redhat.com,
	dgibson@redhat.com
Subject: Re: [PATCH] udp: create and send ICMPv4 to local peer when applicable
Date: Wed, 12 Feb 2025 16:39:07 -0500	[thread overview]
Message-ID: <db4d70c5-0295-430b-be82-2d0fcfbc5634@redhat.com> (raw)
In-Reply-To: <Z6qf5DdO_XbkLKZd@zatzit>



On 2025-02-10 19:55, David Gibson wrote:
> On Sun, Feb 09, 2025 at 12:00:56PM -0500, Jon Maloy wrote:
>> When a local peer sends a UDP message to a non-existing port on an
>> existing remote host, that host will return an ICMP message containing
>> the error code ICMP_PORT_UNREACH, plus the header and the first eight
>> bytes of the original message. If the sender socket has been connected,
>> it uses this message to issue a "Connection Refused" event to the user.

[...]

>> +void *tap_push_ip4h(struct iphdr *ip4h, struct in_addr src,
>> +		    struct in_addr dst, size_t l4len, uint8_t proto)
> 
> As discussed on our call, it probably makes sense to alter this to set
> the Don't Fragment bit always.

I have been studying the Linux code, but this is handled in so many 
places that it it hard to get a full grip on it. Wireshark logs seem
to indicate that it is always set, and googling a bit on the subject
indicates the same. I think it is safe to do it, so if you agree I
can post a prepending patch setting this.

> 
>>   {
>>   	uint16_t l3len = l4len + sizeof(*ip4h);
>>   
>> diff --git a/tap.h b/tap.h
>> index dfbd8b9..3ba00c1 100644
>> --- a/tap.h
>> +++ b/tap.h
>> @@ -44,6 +44,8 @@ static inline void tap_hdr_update(struct tap_hdr *thdr, size_t l2len)
>>   		thdr->vnet_len = htonl(l2len);
>>   }
>>   

[...]
>> + * udp_send_conn_fail_icmp4() - Construct and send ICMP to local peer
>> + * @c:	          Execution context
>> + * @ee:	  Extended error descriptor
>> + * @ref:	  epoll reference
>> + * @iov:	  First bytes (max 8) of original UDP message body
> 
> I don't love passing this as an iov, because that tends to imply there
> might be multiple buffers, which is not the case here (both the caller
> and callee require a single buffer).

The real reason is that csum_udp4() requires an iov, so if we don't
pass it here we will have to recreate it inside the new function.

> 
>> + */
>> +static void udp_send_conn_fail_icmp4(const struct ctx *c,
>> +				     const struct sock_extended_err *ee,
>> +				     union epoll_ref ref,
>> +				     struct iovec *iov)
> 
> 
> 
>> +{
>> +	flow_sidx_t tosidx = flow_sidx_opposite(ref.flowside);
> 
> This is subtly, but badly wrong.  ref.flowside is only valid if the
> ref is of a type which uses the flowside field.  That's true for UDP
> reply sockets, but *not* for UDP listening sockets, and
> udp_sock_errs() is called on both.  I think this function needs to
> take an explicit flowside.

Yes, I was a little uncertain about this. Can you give me small code
snippet of how this should be done?

> 
>> +	const struct flowside *toside = flowside_at_sidx(tosidx);
>> +	struct in_addr oaddr = toside->oaddr.v4mapped.a4;
>> +	struct in_addr eaddr = toside->eaddr.v4mapped.a4;
>> +	struct iov_tail data = IOV_TAIL(iov, 1, 0);
>> +	struct {
>> +		struct icmphdr icmp4h;
>> +		struct iphdr ip4h;
>> +		struct udphdr uh;
>> +		char data[8];
>> +	} msg;
> 
> Needs a ((packed)) attribute to ensure we don't get compiler padding.
> 
ok.
> It took me a minute to work out what was going on here.  Specifically
> that ip4h and uh here aren't the headers of the packet we're sending
> now, but the reconstructed headers of the packet that prompted the
> ICMP error.

Yes, I try to re-create the original message as closely as possible.

> 
>> +	size_t udplen = sizeof(msg.uh) + iov->iov_len;
>> +	size_t msglen = sizeof(msg.icmp4h) + sizeof(msg.ip4h) + udplen;
>> +
>> +	memset(&msg, 0, sizeof(msg));
>> +	msg.icmp4h.type = ee->ee_type;
>> +	msg.icmp4h.code = ee->ee_code;
>> +	tap_push_ip4h(&msg.ip4h, eaddr, oaddr, udplen, IPPROTO_UDP);
> 
> This isn't quite a natural fit.  It does work, but the tap_push_()
> functions are kind of designed to work with a byte buffer where we
> only work out the positions of headers as we construct them.  Probably
> a clean up for some other day.
> 
>> +	msg.uh.source = htons(toside->eport);
>> +	msg.uh.dest = htons(toside->oport);
>> +	msg.uh.len = htons(udplen);
>> +	csum_udp4(&msg.uh, oaddr, eaddr, &data);
> 
> It might make sense to split a tap_push_uh4() function out from
> tap_udp4_send() which you could re-use here.

Good point.
> 
>> +	memcpy(&msg.data, iov->iov_base, iov->iov_len);
>> +	tap_icmp4_send(c, oaddr, eaddr, &msg, msglen);
>> +}
>> +
>>   /**
>>    * udp_sock_recverr() - Receive and clear an error from a socket
>> - * @s:		Socket to receive from
>> + * @c:	          Execution context
>> + * @ref:	  epoll reference
>>    *
>>    * Return: 1 if error received and processed, 0 if no more errors in queue, < 0
>>    *         if there was an error reading the queue
>>    *
>>    * #syscalls recvmsg
>>    */
>> -static int udp_sock_recverr(int s)
>> +static int udp_sock_recverr(const struct ctx *c, union epoll_ref ref)
> 
> Similarly udp_sock_recverr() is used both on "listening" and reply
> sockets, so it's not really safe to use the ref here.  You can
> explicitly pass the fd easily enough.  The flow is trickier...

I can pass *both* the fd and ref, alternativedly the flowside, but
it looks redundant and adds extra code.
If I add a flowside struct instead of ref I will need to declare
that at the all the multiple locations where udp_sock_errs().
Do you have a good suggestion?

> 
>>   {
>>   	const struct sock_extended_err *ee;
>>   	const struct cmsghdr *hdr;
>>   	char buf[CMSG_SPACE(sizeof(*ee))];
>> +	char udp_data[8];
>> +	int s = ref.fd;
>> +	struct iovec iov = {
>> +		.iov_base = udp_data,
>> +		.iov_len = sizeof(udp_data)

[...]

>>   	if (rc < 0)
>> @@ -558,7 +607,7 @@ static void udp_buf_listen_sock_handler(const struct ctx *c,
>>   	const socklen_t sasize = sizeof(udp_meta[0].s_in);
>>   	int n, i;
>>   
>> -	if (udp_sock_errs(c, ref.fd, events) < 0) {
>> +	if (udp_sock_errs(c, ref, events) < 0) {
> 
> ... because in the listening socket case we don't know the flow until
> we've looked at the packet.  I'd be tempted as a first cut to only do
> the ICMP thing for errors on reply sockets.

I can try that, but it would feel a little reduced.

> 
> Listening sockets will be trickier.  I think we'll have to actually
> check msg_name in the received error in order to work out which flow
> it belongs to.

Actually, I did that first, until I realized I could extract the source
IP address from the flow object. But how does that help?

///jon



  reply	other threads:[~2025-02-12 21:39 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-09 17:00 [PATCH] udp: create and send ICMPv4 to local peer when applicable Jon Maloy
2025-02-10 10:27 ` Stefano Brivio
2025-02-11  0:55 ` David Gibson
2025-02-12 21:39   ` Jon Maloy [this message]
2025-02-12 23:21     ` David Gibson
2025-02-13 19:07     ` Jon Maloy
2025-02-13 19:17       ` Stefano Brivio
2025-02-13 19:28         ` Jon Maloy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=db4d70c5-0295-430b-be82-2d0fcfbc5634@redhat.com \
    --to=jmaloy@redhat.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=dgibson@redhat.com \
    --cc=lvivier@redhat.com \
    --cc=passt-dev@passt.top \
    --cc=sbrivio@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).