From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202606 header.b=OQ3nbJSE; dkim-atps=neutral Received: from mail.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by passt.top (Postfix) with ESMTPS id CB4885A0262 for ; Wed, 01 Jul 2026 05:03:57 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202606; t=1782875034; bh=+uIKNtADZa6WgTua7gDyz3PgXP32UW2mTCqOJEejmWI=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=OQ3nbJSENimcedwZ0d2cwcSwKjRgJP65O3VPSfYDc7WrvfsFVTCOU/JwgQItYcZjE G5jjkcKrLE14KCKe3qP7ht9OC3tb60De3jHDWE4CHwIPbhJgtcp9s26EI36hTa5pU1 zz2hh1Jmx/k/pUEwddP9wJxe2JFXfg9nM+ycYHlyKenM/OQ6FKy6Q74fvt4Vw+/Reb 3WyWobm+eEUdeTNlihdQv2ct2qtFO49lyNJGhmaIFeI7olwBX43jIZIw5qVOuUUJAt P/sU0SAlOpi1g5fkpqBgSHi5uJuV+iXTGvGKb1nRw6mj4BkVb2IRlW1BzEpdZyPApS 5hWXovbrM1cEA== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4gqlGL3WJcz4wTv; Wed, 01 Jul 2026 13:03:54 +1000 (AEST) Date: Wed, 1 Jul 2026 11:36:14 +1000 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH 03/12] parse: Start splitting out parsing helpers Message-ID: References: <20260626071003.3472194-1-david@gibson.dropbear.id.au> <20260626071003.3472194-4-david@gibson.dropbear.id.au> <20260701020711.7b43a1c6@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="P+P5D3KcI67WXdr9" Content-Disposition: inline In-Reply-To: <20260701020711.7b43a1c6@elisabeth> Message-ID-Hash: Z5S37W42P2DAFTRUBKADAS2GXCYL76WV X-Message-ID-Hash: Z5S37W42P2DAFTRUBKADAS2GXCYL76WV X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --P+P5D3KcI67WXdr9 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Jul 01, 2026 at 02:07:12AM +0200, Stefano Brivio wrote: > On Fri, 26 Jun 2026 17:09:54 +1000 > David Gibson wrote: >=20 > > As we add more complexity to what forwarding rules are allowed, our > > existing ad-hoc C parsing is starting to become quite awkward. We alre= ady > > have some parts that resemble a very simple recursive[0] descent parser, > > with composable subfunctions that consume as much input as they need, > > rather than pre-splitting based on known delimiters. > >=20 > > Start extending this approach, by creating parse.[ch] with parsing help= ers > > with a uniform interface. Initially we start with very simple cases: t= he > > parse_keyword() function from fwd_rule.c and another helper to check th= at > > we've reached the end of input. > >=20 > > Rename parse_keyword() to parse_literal(), because it's not just useful > > for "keywords" as such. We can use it in a bunch of additional places = for > > parsing delimiters and other symbols. This doesn't gain us a lot now b= ut > > will make things clearer as we use more such parser helpers. > >=20 > > [0] Except that the grammars we have aren't actually recursive, so neit= her > > is the code. > >=20 > > Signed-off-by: David Gibson > > --- > > Makefile | 17 ++++++++------ > > conf.c | 5 +++- > > fwd_rule.c | 33 +++++---------------------- > > parse.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > parse.h | 14 ++++++++++++ > > 5 files changed, 101 insertions(+), 35 deletions(-) > > create mode 100644 parse.c > > create mode 100644 parse.h > >=20 > > diff --git a/Makefile b/Makefile > > index 5ed0f702..bd729c75 100644 > > --- a/Makefile > > +++ b/Makefile > > @@ -36,12 +36,13 @@ BASE_CFLAGS +=3D -pedantic -Wall -Wextra -Wno-forma= t-zero-length -Wformat-security > > PASST_SRCS =3D arch.c arp.c bitmap.c checksum.c conf.c dhcp.c dhcpv6.c= \ > > epoll_ctl.c flow.c fwd.c fwd_rule.c icmp.c igmp.c inany.c iov.c ip.c \ > > isolation.c lineread.c log.c mld.c ndp.c netlink.c migrate.c packet.c= \ > > - passt.c pasta.c pcap.c pif.c repair.c serialise.c tap.c tcp.c \ > > + parse.c passt.c pasta.c pcap.c pif.c repair.c serialise.c tap.c tcp.c= \ > > tcp_buf.c tcp_splice.c tcp_vu.c udp.c udp_flow.c udp_vu.c util.c \ > > vhost_user.c virtio.c vu_common.c > > QRAP_SRCS =3D qrap.c > > PASST_REPAIR_SRCS =3D passt-repair.c > > -PESTO_SRCS =3D pesto.c bitmap.c fwd_rule.c inany.c ip.c lineread.c ser= ialise.c > > +PESTO_SRCS =3D pesto.c bitmap.c fwd_rule.c inany.c ip.c lineread.c par= se.c \ > > + serialise.c > > SRCS =3D $(PASST_SRCS) $(QRAP_SRCS) $(PASST_REPAIR_SRCS) $(PESTO_SRCS) > > =20 > > MANPAGES =3D passt.1 pasta.1 pesto.1 qrap.1 passt-repair.1 > > @@ -49,13 +50,14 @@ MANPAGES =3D passt.1 pasta.1 pesto.1 qrap.1 passt-r= epair.1 > > PASST_HEADERS =3D arch.h arp.h bitmap.h checksum.h conf.h dhcp.h dhcpv= 6.h \ > > epoll_ctl.h flow.h fwd.h fwd_rule.h flow_table.h icmp.h icmp_flow.h \ > > inany.h iov.h ip.h isolation.h lineread.h log.h migrate.h ndp.h \ > > - netlink.h packet.h passt.h pasta.h pcap.h pif.h repair.h serialise.h \ > > - siphash.h tap.h tcp.h tcp_buf.h tcp_conn.h tcp_internal.h tcp_splice.= h \ > > - tcp_vu.h udp.h udp_flow.h udp_internal.h udp_vu.h util.h vhost_user.h= \ > > - virtio.h vu_common.h > > + netlink.h packet.h parse.h passt.h pasta.h pcap.h pif.h repair.h \ > > + serialise.h siphash.h tap.h tcp.h tcp_buf.h tcp_conn.h tcp_internal.h= \ > > + tcp_splice.h tcp_vu.h udp.h udp_flow.h udp_internal.h udp_vu.h util.h= \ > > + vhost_user.h virtio.h vu_common.h > > QRAP_HEADERS =3D arp.h ip.h passt.h util.h > > PASST_REPAIR_HEADERS =3D linux_dep.h > > -PESTO_HEADERS =3D bitmap.h common.h fwd_rule.h inany.h ip.h log.h pest= o.h serialise.h > > +PESTO_HEADERS =3D bitmap.h common.h fwd_rule.h inany.h ip.h log.h pars= e.h \ > > + pesto.h serialise.h > > =20 > > C :=3D \#include \nint main(){int a=3Dgetrandom(0, 0, 0)= ;} > > ifeq ($(shell printf "$(C)" | $(CC) -S -xc - -o - >/dev/null 2>&1; ech= o $$?),0) > > @@ -223,6 +225,7 @@ pesto.cppcheck: CPPCHECK_FLAGS +=3D --suppress=3Dun= usedFunction:bitmap.c > > pesto.cppcheck: CPPCHECK_FLAGS +=3D --suppress=3DunusedFunction:inany.h > > pesto.cppcheck: CPPCHECK_FLAGS +=3D --suppress=3DunusedFunction:inany.c > > pesto.cppcheck: CPPCHECK_FLAGS +=3D --suppress=3DunusedFunction:ip.h > > +pesto.cppcheck: CPPCHECK_FLAGS +=3D --suppress=3DunusedFunction:parse.c > > pesto.cppcheck: CPPCHECK_FLAGS +=3D --suppress=3DunusedFunction:serial= ise.c > > pesto.cppcheck: CPPCHECK_FLAGS +=3D --suppress=3DstaticFunction:fwd_ru= le.c > > pesto.cppcheck: $(PESTO_SRCS) $(PESTO_HEADERS) seccomp_pesto.h > > diff --git a/conf.c b/conf.c > > index 6ab8efec..ca6c859c 100644 > > --- a/conf.c > > +++ b/conf.c > > @@ -52,6 +52,7 @@ > > #include "conf.h" > > #include "pesto.h" > > #include "serialise.h" > > +#include "parse.h" > > =20 > > #define NETNS_RUN_DIR "/run/netns" > > =20 > > @@ -1028,7 +1029,9 @@ static void conf_ugid(char *runas, uid_t *uid, gi= d_t *gid) > > static void conf_nat(const char *arg, struct in_addr *addr4, > > struct in6_addr *addr6, int *no_map_gw) > > { > > - if (strcmp(arg, "none") =3D=3D 0) { > > + const char *p =3D arg; > > + > > + if (parse_literal(&p, "none") && parse_eoi(p)) { >=20 > I think that !*p would be so much clearer here compared to: >=20 > - parse_eoi(p)... must be "input"? >=20 > - [takes a look at parse_eoi()] yes, it's input, great [goes back to > here]. Ah, wait, does it return false if *p is NULL? Or true? Like all the parse_*() functions, it returns true if it *did* see the thing it was looking for, in this case end of string. > - [takes a second look at parse_eoi()] good, it returns true, so this > is *!p... >=20 > I understand it's somewhat consistent with parse_literal() this way, so > I'm not strongly against it, just a slight preference overall. Right, that's the reason for it. Because later on composing these parse helpers has a bit of its own style, it seemed helpful to continue that style, rather than poking directly into the string. >=20 > > *addr4 =3D in4addr_any; > > *addr6 =3D in6addr_any; > > if (no_map_gw) > > diff --git a/fwd_rule.c b/fwd_rule.c > > index 04a0101d..d0e90402 100644 > > --- a/fwd_rule.c > > +++ b/fwd_rule.c > > @@ -24,6 +24,7 @@ > > #include "fwd_rule.h" > > #include "lineread.h" > > #include "log.h" > > +#include "parse.h" > > #include "serialise.h" > > =20 > > /* Ephemeral port range: values from RFC 6335 */ > > @@ -427,28 +428,6 @@ static int parse_port_range(const char *s, const c= har **endptr, > > return 0; > > } > > =20 > > -/** > > - * parse_keyword() - Parse a literal keyword > > - * @s: String to parse > > - * @endptr: Update to the character after the keyword > > - * @kw: Keyword to accept > > - * > > - * Return: 0, if @s starts with @kw, -EINVAL if it does not > > - */ > > -static int parse_keyword(const char *s, const char **endptr, const cha= r *kw) > > -{ > > - size_t len =3D strlen(kw); > > - > > - if (strlen(s) < len) > > - return -EINVAL; > > - > > - if (memcmp(s, kw, len)) > > - return -EINVAL; > > - > > - *endptr =3D s + len; > > - return 0; > > -} > > - > > /** > > * fwd_rule_range_except() - Set up forwarding for a range of ports mi= nus a > > * bitmap of exclusions > > @@ -568,7 +547,7 @@ static void fwd_rule_parse_ports(struct fwd_table *= fwd, bool del, uint8_t proto, > > continue; > > } > > =20 > > - if (parse_keyword(p, &p, "auto") =3D=3D 0) { > > + if (parse_literal(&p, "auto")) { > > if (p !=3D ep) /* Garbage after the keyword */ > > goto bad; > > =20 > > @@ -582,9 +561,8 @@ static void fwd_rule_parse_ports(struct fwd_table *= fwd, bool del, uint8_t proto, > > } > > =20 > > /* Should be an exclude range */ > > - if (*p !=3D '~') > > + if (!parse_literal(&p, "~")) > > goto bad; > > - p++; > > =20 > > if (parse_port_range(p, &p, &xrange)) > > goto bad; > > @@ -616,8 +594,9 @@ static void fwd_rule_parse_ports(struct fwd_table *= fwd, bool del, uint8_t proto, > > if (parse_port_range(p, &p, &orig_range)) > > goto bad; > > =20 > > - if (*p =3D=3D ':') { /* There's a range to map to as well */ > > - if (parse_port_range(p + 1, &p, &mapped_range)) > > + if (parse_literal(&p, ":")) { > > + /* There's a range to map to as well */ > > + if (parse_port_range(p, &p, &mapped_range)) > > goto bad; > > if ((mapped_range.last - mapped_range.first) !=3D > > (orig_range.last - orig_range.first)) > > diff --git a/parse.c b/parse.c > > new file mode 100644 > > index 00000000..c2a92422 > > --- /dev/null > > +++ b/parse.c > > @@ -0,0 +1,67 @@ > > +// SPDX-License-Identifier: GPL-2.0-or-later > > + > > +/* PASST - Plug A Simple Socket Transport > > + * for qemu/UNIX domain socket mode > > + * > > + * PASTA - Pack A Subtle Tap Abstraction > > + * for network namespace/tap device mode > > + * > > + * parse.c - Composable parsing helpers > > + * > > + * Copyright Red Hat > > + * Author: David Gibson > > + */ > > + > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > + > > +#include "common.h" > > + > > +/** > > + * DOC: Theory of Operation > > + * > > + * These are a number of primitives which can be combined into parsers= for > > + * moderately complex input. For simpler composability, they have com= mon > > + * conventions. > > + * > > + * - Functions return a bool indicating whether they successfully pars= ed. >=20 > As I reached 7/12 (converting parse_port_range()), this became rather > confusing. >=20 > It makes sense for parse_eoi() (because in some sense it's a > parse_is_eoi() function), but for the rest I'd expect almost anything > that doesn't have "is" or similar in its name to return 0 on success, > and a negative error code on failure (or -1). Huh. So an earlier draft did that, making them more similar to strtoul() in interface. I changed it because I was finding it confusing. Even knowing the convention, I found it hard to remember to rea= d: if (!parse_foo() && !parse_bar()) =09 as meaning we *succeeded* in parsing a foo and a bar. That would not be true of parse_foo() =3D=3D 0, but that's bulkier. > I'm not sure if I'm ignoring some particular advantage for the > composability you mention above, if it's clearly better for whatever > reason so be it, but otherwise I'd suggest re-considering this. No technical advantage, I just found it more readable. > I mean: >=20 > - if (parse_port_range(p, &p, &xrange)) > + if (!parse_port_range(&p, &xrange)) >=20 > ...is quite telling. It is? I'm not sure what it's telling me. > > + * - Any specific output from the parse is as output parameters > > + * - First argument is always @cursor, a const char **. > > + * - On entry, *@cursor has the point to start parsing. > > + * - On successful exit, *@cursor is updated to the next character aft= er the > > + parsed portion of the input > > + * - On failure, *@cursor and any output arguments are not modified > > + * > > + * For brevity the common parameters and return information are omitte= d from the > > + * individual function documentation comments. >=20 > The reason why we have "@c: Execution context" all over the > place isn't so much human readability, it's rather automated usage, and > the day we manage to run Sphinx on the whole codebase: >=20 > https://bugs.passt.top/show_bug.cgi?id=3D35 >=20 > it would be nice to have a clean output instead of having to go through > a pile of warnings. >=20 > So, actually, I'd keep @cursor "described" everywhere. We can also skip > it for the moment but I'm afraid we'll start following this example and > it will be more work later. Yeah, that's reasonable. Now that I'm not churning so much on the interfaces it's not too hard to duplicate the descriptions. >=20 > > + */ > > + > > +/** > > + * parse_literal() - Parse a specified literal string > > + * @lit: Keyword to accept > > + */ > > +bool parse_literal(const char **cursor, const char *lit) > > +{ > > + size_t len =3D strlen(lit); > > + > > + if (strlen(*cursor) < len || memcmp(*cursor, lit, len)) > > + return false; > > + > > + *cursor +=3D len; > > + return true; > > +} > > + > > +/** > > + * parse_eoi() - Parse end of input > > + * @cursor: Current parse pointer > > + * > > + * Return: true if @p is at End of Input (\0), false otherwise > > + */ > > +bool parse_eoi(const char *cursor) > > +{ > > + return !(*cursor); > > +} > > diff --git a/parse.h b/parse.h > > new file mode 100644 > > index 00000000..eb51a469 > > --- /dev/null > > +++ b/parse.h > > @@ -0,0 +1,14 @@ > > +/* SPDX-License-Identifier: GPL-2.0-or-later > > + * Copyright Red Hat > > + * Author: David Gibson > > + */ > > + > > +#ifndef PARSE_H > > +#define PARSE_H > > + > > +#include > > + > > +bool parse_literal(const char **cursor, const char *lit); > > +bool parse_eoi(const char *cursor); > > + > > +#endif /* _PARSE_H */ >=20 > --=20 > Stefano >=20 --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --P+P5D3KcI67WXdr9 Content-Type: application/pgp-signature; name=signature.asc -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmpEbwQACgkQzQJF27ox 2GcUWg/7BZmpviHc8GqwIJNrNTZslJvyN5VqlqtPFyaV6Ffi9087CYHpjGnO9GlL 1cjIhDezm+KXJuHXu3l053ht2g4i+vaIlvdOLRlPn1QGERdpcq/tLI24oSJ3+0SG gK7YZbVTQHNanHtTd+JEml38U2bW+4/rXf2U7/FC+HXIKYz/zxrOjcBnDnx7hIiL w8JE+CE3eVLbB9gizujePWRPlh0nl57TIplvOMmnvY+wrefPWh3NWcKtNByjYclY UoRXVbJTiqirzGKtirgC1d1oUE3VkS3ULwAC3Bb/PtwjfC0zZJFlvBaHR4HdIk8n tkQX1bmQDHuIDLtHflwB3U7u02BwJs+jynPlG86MHwJsRu9pn/uo2V4tu6wWUovp zFjY4GQmQ1INUXMRUuoGvx6lJQQCDSZQJT8Y7hZZyNwiweu/liEBr/sOe4sUoOwj W9dCsIP1EvpxBDjmOUGXtw0t7ecBOe5Suyqf/3job+K7n/NORZrjGN5VvbB8cpPZ cIA5scGXMjsf0L3L2G9F3/LNmXdgLXs3qfjAvt0T0DpJgPhnY93bQkCd4ELWexYf eHTTfJXVrqbEBSIO+tqE2aX4w5Fp72SihDuYCEVVFVUI6RhABdN0D8MhEBjki9hA U17rzjV9OLfflHfi11A+6jscv0BqLog5x/6ZeterGUUApjvvh7Q= =vJYu -----END PGP SIGNATURE----- --P+P5D3KcI67WXdr9--