public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: Stefano Brivio <sbrivio@redhat.com>
Cc: passt-dev@passt.top
Subject: Re: [PATCH v2 19/22] tcp: Validate TCP endpoint addresses
Date: Fri, 23 Feb 2024 14:56:46 +1100	[thread overview]
Message-ID: <ZdgXfnt6w37B27kD@zatzit> (raw)
In-Reply-To: <20240222134544.0c5a8bc3@elisabeth>

[-- Attachment #1: Type: text/plain, Size: 11405 bytes --]

On Thu, Feb 22, 2024 at 01:45:44PM +0100, Stefano Brivio wrote:
> On Tue,  6 Feb 2024 12:17:31 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > TCP connections should typically not have wildcard addresses (0.0.0.0 or
> > ::) nor a zero port number for either endpoint.
> 
> This is only true for non-local connections, see also:
>   https://archives.passt.top/passt-dev/20240119105630.089c5d34@elisabeth/
> 
> Just looking at RFC 6890, which should be sufficient:
> 
>               +----------------------+----------------------------+
>               | Attribute            | Value                      |
>               +----------------------+----------------------------+
>               | Address Block        | 0.0.0.0/8                  |
>               | Name                 | "This host on this network"|
>               | RFC                  | [RFC1122], Section 3.2.1.3 |
>               | Allocation Date      | September 1981             |
>               | Termination Date     | N/A                        |
>               | Source               | True                       |
>               | Destination          | False                      |
>               | Forwardable          | False                      |
>               | Global               | False                      |
>               | Reserved-by-Protocol | True                       |
>               +----------------------+----------------------------+

It's unclear to me from either rfc 6890 or rfc 1122 whether this is
describing something meaningful on the wire, or only something
meaningful in APIs.

> ...it can be used as source, but surely not by a non-loopback
> interface, so not by the guest.
> 
> About using it as destination: the table says it can't be done, but:
> 
>   $ nc -l -p 8818 & echo abcd | nc 0.0.0.0 8818
>   [2] 2624647
>   abcd
>
> and:
> 
>   # tcpdump -ni lo tcp port 8818
>   tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
>   listening on lo, link-type EN10MB (Ethernet), snapshot length 262144 bytes
>   13:24:51.379436 IP 127.0.0.1.58574 > 127.0.0.1.8818: Flags [S], seq 3285989360, win 65495, options [mss 65495,sackOK,TS val 1253719321 ecr 0,nop,wscale 7], length 0
>   13:24:51.379447 IP 127.0.0.1.8818 > 127.0.0.1.58574: Flags [S.], seq 3128422158, ack 3285989361, win 65483, options [mss 65495,sackOK,TS val 1253719321 ecr 1253719321,nop,wscale 7], length 0
>   13:24:51.379457 IP 127.0.0.1.58574 > 127.0.0.1.8818: Flags [.], ack 1, win 512, options [nop,nop,TS val 1253719321 ecr 1253719321], length 0
>   13:24:51.379508 IP 127.0.0.1.58574 > 127.0.0.1.8818: Flags [P.], seq 1:6, ack 1, win 512, options [nop,nop,TS val 1253719321 ecr 1253719321], length 5
>   13:24:51.379513 IP 127.0.0.1.8818 > 127.0.0.1.58574: Flags [.], ack 6, win 512, options [nop,nop,TS val 1253719321 ecr 1253719321], length 0
> 
> it's not effectively used, but the kernel understands what I mean by
> 0.0.0.0:

Right: the 0.0.0.0 means something at the API level, but not AFAICT,
at the wire level.  At least for the "from tap" side of this change,
it's the wire level that I'm checking.

>   connect(3, {sa_family=AF_INET, sin_port=htons(8818),
>   sin_addr=inet_addr("0.0.0.0")}, 16) = -1 EINPROGRESS (Operation now in progress)
> 
> So I wouldn't exclude its usage in tcp_listen_handler() -- even though
> sure, the kernel translates that, so I don't think it can ever become a
> practical issue.

Hmm.  In listen_handler() it's not "on the wire", but reading from an
API still seems different from sending to an API to me.  At the very
least 0.0.0.0 doesn't have the same meaning in all API contexts, which
makes it, IMO, unsafe to carry around from one API to another.

> If it annoys you in the perspective of the flow table, maybe we can
> turn it into a loopback address, when we see it's used as source or
> destination?

It appears the kernel already does that (IMO, correctly).  If we do
see a 0.0.0.0 here, I think that would be the kernel indicating some
sort of special case that would need special handling.  So, again, I
don't think we should just blindly copy this address to other APIs.

> If that's problematic as well, I can totally live with this patch,
> though.
> 
> About IPv6:
> 
> 
>                  +----------------------+---------------------+
>                  | Attribute            | Value               |
>                  +----------------------+---------------------+
>                  | Address Block        | ::/128              |
>                  | Name                 | Unspecified Address |
>                  | RFC                  | [RFC4291]           |
>                  | Allocation Date      | February 2006       |
>                  | Termination Date     | N/A                 |
>                  | Source               | True                |
>                  | Destination          | False               |
>                  | Forwardable          | False               |
>                  | Global               | False               |
>                  | Reserved-by-Protocol | True                |
>                  +----------------------+---------------------+
> 
> ...this is more specific, and its usage as source address is exemplified
> in RFC 4291, 2.5.2:
> 
>    One example of its use is in the Source Address field of
>    any IPv6 packets sent by an initializing host before it has learned
>    its own address.
> 
> ...which should never be the case for any flow passt has to handle, so
> I think it's fine to refuse it in tcp_listen_handler().

Well, depends how broadly you define "flow".  We may need to handle
this for NDP and/or DHCPv6.  But this is specifically for TCP.

> The rest of the series, up to 22/22, looks good to me.
> 
> > It's not entirely clear
> > (at least to me) if it's strictly against the RFCs to do so, but at any
> > rate the socket interfaces often treat those values specially[1], so it's
> > not really possible to manipulate such connections.  Likewise they should
> > not have broadcast or multicast addresses for either endpoint.
> > 
> > However, nothing prevents a guest from creating a SYN packet with such
> > values, and it's not entirely clear what the effect on passt would be.  To
> > ensure sane behaviour, explicitly check for this case and drop such
> > packets, logging a debug warning (we don't want a higher level, because
> > that would allow a guest to spam the logs).
> > 
> > We never expect such an address on an accept()ed socket either, but just
> > in case, check for it as well.
> > 
> > [1] Depending on context as "unknown", "match any" or "kernel, pick
> >     something for me"
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  tcp.c | 74 +++++++++++++++++++++++++++++++++++++++++++++++++++++------
> >  1 file changed, 67 insertions(+), 7 deletions(-)
> > 
> > diff --git a/tcp.c b/tcp.c
> > index 31d4e87b..236a8d23 100644
> > --- a/tcp.c
> > +++ b/tcp.c
> > @@ -284,6 +284,7 @@
> >  #include <sys/types.h>
> >  #include <sys/uio.h>
> >  #include <time.h>
> > +#include <arpa/inet.h>
> >  
> >  #include <linux/tcp.h> /* For struct tcp_info */
> >  
> > @@ -1930,27 +1931,59 @@ static void tcp_conn_from_tap(struct ctx *c, sa_family_t af,
> >  			      const struct tcphdr *th, const char *opts,
> >  			      size_t optlen, const struct timespec *now)
> >  {
> > +	in_port_t srcport = ntohs(th->source);
> > +	in_port_t dstport = ntohs(th->dest);
> >  	struct sockaddr_in addr4 = {
> >  		.sin_family = AF_INET,
> > -		.sin_port = th->dest,
> > +		.sin_port = htons(dstport),
> >  		.sin_addr = *(struct in_addr *)daddr,
> >  	};
> >  	struct sockaddr_in6 addr6 = {
> >  		.sin6_family = AF_INET6,
> > -		.sin6_port = th->dest,
> > +		.sin6_port = htons(dstport),
> >  		.sin6_addr = *(struct in6_addr *)daddr,
> >  	};
> >  	const struct sockaddr *sa;
> >  	struct tcp_tap_conn *conn;
> >  	union flow *flow;
> > +	int s = -1, mss;
> >  	socklen_t sl;
> > -	int s, mss;
> > -
> > -	(void)saddr;
> >  
> >  	if (!(flow = flow_alloc()))
> >  		return;
> >  
> > +	if (af == AF_INET) {
> > +		if (IN4_IS_ADDR_UNSPECIFIED(saddr) ||
> > +		    IN4_IS_ADDR_BROADCAST(saddr) ||
> > +		    IN4_IS_ADDR_MULTICAST(saddr) || srcport == 0 ||
> > +		    IN4_IS_ADDR_UNSPECIFIED(daddr) ||
> > +		    IN4_IS_ADDR_BROADCAST(daddr) ||
> > +		    IN4_IS_ADDR_MULTICAST(daddr) || dstport == 0) {
> > +			char sstr[INET_ADDRSTRLEN], dstr[INET_ADDRSTRLEN];
> > +
> > +			debug("Invalid endpoint in TCP SYN: %s:%hu -> %s:%hu",
> > +			      inet_ntop(AF_INET, saddr, sstr, sizeof(sstr)),
> > +			      srcport,
> > +			      inet_ntop(AF_INET, daddr, dstr, sizeof(dstr)),
> > +			      dstport);
> > +			goto cancel;
> > +		}
> > +	} else if (af == AF_INET6) {
> > +		if (IN6_IS_ADDR_UNSPECIFIED(saddr) ||
> > +		    IN6_IS_ADDR_MULTICAST(saddr) || srcport == 0 ||
> > +		    IN6_IS_ADDR_UNSPECIFIED(daddr) ||
> > +		    IN6_IS_ADDR_MULTICAST(daddr) || dstport == 0) {
> > +			char sstr[INET6_ADDRSTRLEN], dstr[INET6_ADDRSTRLEN];
> > +
> > +			debug("Invalid endpoint in TCP SYN: %s:%hu -> %s:%hu",
> > +			      inet_ntop(AF_INET6, saddr, sstr, sizeof(sstr)),
> > +			      srcport,
> > +			      inet_ntop(AF_INET6, daddr, dstr, sizeof(dstr)),
> > +			      dstport);
> > +			goto cancel;
> > +		}
> > +	}
> > +
> >  	if ((s = tcp_conn_sock(c, af)) < 0)
> >  		goto cancel;
> >  
> > @@ -2001,8 +2034,8 @@ static void tcp_conn_from_tap(struct ctx *c, sa_family_t af,
> >  		sl = sizeof(addr6);
> >  	}
> >  
> > -	conn->fport = ntohs(th->dest);
> > -	conn->eport = ntohs(th->source);
> > +	conn->fport = dstport;
> > +	conn->eport = srcport;
> >  
> >  	conn->seq_init_from_tap = ntohl(th->seq);
> >  	conn->seq_from_tap = conn->seq_init_from_tap + 1;
> > @@ -2732,6 +2765,33 @@ void tcp_listen_handler(struct ctx *c, union epoll_ref ref,
> >  	if (s < 0)
> >  		goto cancel;
> >  
> > +	if (sa.sa_family == AF_INET) {
> > +		const struct in_addr *addr = &sa.sa4.sin_addr;
> > +		in_port_t port = sa.sa4.sin_port;
> > +
> > +		if (IN4_IS_ADDR_UNSPECIFIED(addr) ||
> > +		    IN4_IS_ADDR_BROADCAST(addr) ||
> > +		    IN4_IS_ADDR_MULTICAST(addr) || port == 0) {
> > +			char str[INET_ADDRSTRLEN];
> > +
> > +			err("Invalid endpoint from TCP accept(): %s:%hu",
> > +			    inet_ntop(AF_INET, addr, str, sizeof(str)), port);
> > +			goto cancel;
> > +		}
> > +	} else if (sa.sa_family == AF_INET6) {
> > +		const struct in6_addr *addr = &sa.sa6.sin6_addr;
> > +		in_port_t port = sa.sa6.sin6_port;
> > +
> > +		if (IN6_IS_ADDR_UNSPECIFIED(addr) ||
> > +		    IN6_IS_ADDR_MULTICAST(addr) || port == 0) {
> > +			char str[INET6_ADDRSTRLEN];
> > +
> > +			err("Invalid endpoint from TCP accept(): %s:%hu",
> > +			    inet_ntop(AF_INET6, addr, str, sizeof(str)), port);
> > +			goto cancel;
> > +		}
> > +	}
> > +
> >  	if (tcp_splice_conn_from_sock(c, ref.tcp_listen.pif,
> >  				      ref.tcp_listen.port, flow, s, &sa))
> >  		return;
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2024-02-23  4:42 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-06  1:17 [PATCH v2 00/22] More flow table preliminaries: address handling improvements David Gibson
2024-02-06  1:17 ` [PATCH v2 01/22] treewide: Use sa_family_t for address family variables David Gibson
2024-02-06  1:17 ` [PATCH v2 02/22] inany: Helper to test for various address types David Gibson
2024-02-18 20:58   ` Stefano Brivio
2024-02-19  1:48     ` David Gibson
2024-02-06  1:17 ` [PATCH v2 03/22] inany: Add inany_ntop() helper David Gibson
2024-02-06  1:17 ` [PATCH v2 04/22] inany: Provide more conveniently typed constants for special addresses David Gibson
2024-02-06  1:17 ` [PATCH v2 05/22] inany: Introduce union sockaddr_inany David Gibson
2024-02-06  1:17 ` [PATCH v2 06/22] util: Allow IN4_IS_* macros to operate on untyped addresses David Gibson
2024-02-06  1:17 ` [PATCH v2 07/22] tcp, udp: Don't precompute port remappings in epoll references David Gibson
2024-02-06  1:17 ` [PATCH v2 08/22] flow: Add helper to determine a flow's protocol David Gibson
2024-02-06  1:17 ` [PATCH v2 09/22] tcp_splice: Simplify clean up logic David Gibson
2024-02-18 20:59   ` Stefano Brivio
2024-02-19  1:50     ` David Gibson
2024-02-06  1:17 ` [PATCH v2 10/22] tcp_splice: Don't use flow_trace() before setting flow type David Gibson
2024-02-06  1:17 ` [PATCH v2 11/22] flow: Clarify flow entry life cycle, introduce uniform logging David Gibson
2024-02-18 21:00   ` Stefano Brivio
2024-02-19  1:58     ` David Gibson
2024-02-06  1:17 ` [PATCH v2 12/22] tcp, tcp_splice: Helpers for getting sockets from the pools David Gibson
2024-02-18 21:00   ` Stefano Brivio
2024-02-19  1:51     ` David Gibson
2024-02-06  1:17 ` [PATCH v2 13/22] tcp_splice: More specific variable names in new splice path David Gibson
2024-02-18 21:00   ` Stefano Brivio
2024-02-19  1:53     ` David Gibson
2024-02-06  1:17 ` [PATCH v2 14/22] tcp_splice: Merge tcp_splice_new() into its caller David Gibson
2024-02-06  1:17 ` [PATCH v2 15/22] tcp_splice: Make tcp_splice_connect() create its own sockets David Gibson
2024-02-06  1:17 ` [PATCH v2 16/22] tcp_splice: Improve error reporting on connect path David Gibson
2024-02-18 21:01   ` Stefano Brivio
2024-02-19  3:23     ` David Gibson
2024-02-06  1:17 ` [PATCH v2 17/22] tcp_splice: Improve logic deciding when to splice David Gibson
2024-02-06  1:17 ` [PATCH v2 18/22] tcp, tcp_splice: Parse listening socket epoll ref in tcp_listen_handler() David Gibson
2024-02-06  1:17 ` [PATCH v2 19/22] tcp: Validate TCP endpoint addresses David Gibson
2024-02-22 12:45   ` Stefano Brivio
2024-02-23  3:56     ` David Gibson [this message]
2024-02-06  1:17 ` [PATCH v2 20/22] tap: Disallow loopback addresses on tap interface David Gibson
2024-02-06  1:17 ` [PATCH v2 21/22] port_fwd: Fix copypasta error in port_fwd_scan_udp() comments David Gibson
2024-02-06  1:17 ` [PATCH v2 22/22] fwd: Rename port_fwd.[ch] and their contents David Gibson
2024-02-27 14:22 ` [PATCH v2 00/22] More flow table preliminaries: address handling improvements Stefano Brivio

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZdgXfnt6w37B27kD@zatzit \
    --to=david@gibson.dropbear.id.au \
    --cc=passt-dev@passt.top \
    --cc=sbrivio@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).