public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: Stefano Brivio <sbrivio@redhat.com>
To: Jon Maloy <jmaloy@redhat.com>, Paolo Abeni <pabeni@redhat.com>
Cc: netdev@vger.kernel.org, davem@davemloft.net, kuba@kernel.org,
	passt-dev@passt.top, lvivier@redhat.com, dgibson@redhat.com
Subject: Re: [RFC net-next] tcp: add support for read with offset when using MSG_PEEK
Date: Sun, 21 Jan 2024 23:16:15 +0100	[thread overview]
Message-ID: <20240121231615.13029448@elisabeth> (raw)
In-Reply-To: <595d89f1-15b1-537d-f876-0ac4627db535@redhat.com>

On Thu, 18 Jan 2024 17:22:52 -0500
Jon Maloy <jmaloy@redhat.com> wrote:

> On 2024-01-16 05:49, Paolo Abeni wrote:
> > On Thu, 2024-01-11 at 18:00 -0500, jmaloy@redhat.com wrote:  
> >> From: Jon Maloy <jmaloy@redhat.com>
> >>
> >> When reading received messages from a socket with MSG_PEEK, we may want
> >> to read the contents with an offset, like we can do with pread/preadv()
> >> when reading files. Currently, it is not possible to do that.  
> [...]
> >> +				err = -EINVAL;
> >> +				goto out;
> >> +			}
> >> +			peek_offset = msg->msg_iter.__iov[0].iov_len;
> >> +			msg->msg_iter.__iov = &msg->msg_iter.__iov[1];
> >> +			msg->msg_iter.nr_segs -= 1;
> >> +			msg->msg_iter.count -= peek_offset;
> >> +			len -= peek_offset;
> >> +			*seq += peek_offset;
> >> +		}  
> > IMHO this does not look like the correct interface to expose such
> > functionality. Doing the same with a different protocol should cause a
> > SIGSEG or the like, right?  
>
> I would expect doing the same thing with a different protocol to cause 
> an EFAULT, as it should. But I haven't tried it.

So, out of curiosity, I actually tried: the current behaviour is
recvmsg() failing with EFAULT, only as data is received (!), for TCP
and UDP with AF_INET, and for AF_UNIX (both datagram and stream).

EFAULT, however, is not in the list of "shall fail", nor "may fail"
conditions described by POSIX.1-2008, so there isn't really anything
that mandates it API-wise.

Likewise, POSIX doesn't require any signal to be delivered (and no
signals are delivered on Linux in any case: note that iov_base is not
dereferenced).

For TCP sockets only, passing a NULL buffer is already supported by
recv() with MSG_TRUNC (same here, Linux extension). This change would
finally make recvmsg() consistent with that TCP-specific bit.

> This is a change to TCP only, at least until somebody decides to 
> implement it elsewhere (why not?)

Side note, I can't really think of a reasonable use case for UDP -- it
doesn't quite fit with the notion of message boundaries.

Even letting alone the fact that passt(1) and pasta(1) don't need this
for UDP (no acknowledgement means no need to keep unacknowledged data
anywhere), if another application wants to do something conceptually
similar, we should probably target recvmmsg().

> > What about using/implementing SO_PEEK_OFF support instead?
>
> I looked at SO_PEEK_OFF, and it honestly looks both awkward and limited.

I think it's rather intended to skip headers with fixed size or
suchlike.

> We would have to make frequent calls to setsockopt(), something that 
> would beat much of the purpose of this feature.

...right, we would need to reset the SO_PEEK_OFF value at every
recvmsg(), which is probably even worse than the current overhead.

> I stand by my opinion here.
> This feature is simple, non-intrusive, totally backwards compatible and 
> implies no changes to the API or BPI.

My thoughts as well, plus the advantage for our user-mode networking
case is quite remarkable given how simple the change is.

> I would love to hear other opinions on this, though.
> 
> Regards
> /jon
> 
> >
> > Cheers,
> >
> > Paolo

-- 
Stefano


  reply	other threads:[~2024-01-21 22:16 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-11 23:00 [RFC net-next] tcp: add support for read with offset when using MSG_PEEK jmaloy
2024-01-16 10:49 ` Paolo Abeni
2024-01-18 22:22   ` Jon Maloy
2024-01-21 22:16     ` Stefano Brivio [this message]
2024-01-22 16:22       ` Jon Maloy
  -- strict thread matches above, loose matches on Subject: below --
2024-01-11 22:22 jmaloy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240121231615.13029448@elisabeth \
    --to=sbrivio@redhat.com \
    --cc=davem@davemloft.net \
    --cc=dgibson@redhat.com \
    --cc=jmaloy@redhat.com \
    --cc=kuba@kernel.org \
    --cc=lvivier@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=passt-dev@passt.top \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).