public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: Stefano Brivio <sbrivio@redhat.com>
To: passt-dev@passt.top
Subject: Re: [PATCH 1/4] Add cleaner line-by-line reading primitives
Date: Thu, 23 Jun 2022 11:31:43 +0200	[thread overview]
Message-ID: <20220623113143.538e9266@elisabeth> (raw)
In-Reply-To: <20220617031026.498690-2-david@gibson.dropbear.id.au>

[-- Attachment #1: Type: text/plain, Size: 8005 bytes --]

On Fri, 17 Jun 2022 13:10:23 +1000
David Gibson <david(a)gibson.dropbear.id.au> wrote:

> Two places in passt need to read files line by line (one parsing
> resolv.conf, the other parsing /proc/net/*.  They can't use fgets()
> because in glibc that can allocate memory.  Instead they use an
> implementation line_read() in util.c.  This has some problems:
> 
>  * It has two completely separate modes of operation, one buffering
>    and one not, the relation between these and how they're activated
>    is subtle and confusing
>  * At least in non-buffered mode, it will mishandle an empty line,
>    folding them onto the start of the next non-empty line
>  * In non-buffered mode it will use lseek() which prevents using this
>    on non-regular files (we don't need that at present, but it's a
>    surprising limitation)
>  * It has a lot of difficult to read pointer mangling
> 
> Add a new cleaner implementation of allocation-free line-by-line
> reading in lineread.c.  This one already buffers, using a state
> structure to keep track of what we need.  This is larger than I'd
> like, but it turns out handling all the edge cases of line-by-line
> reading in C is surprisingly hard.

Still much simpler (albeit a bit more verbose) than the original
version, thanks. :)

> This just adds the code, subsequent patches will change the existing
> users of line_read() to the new implementation.
> 
> Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au>
> ---
>  Makefile   |   8 ++--
>  lineread.c | 108 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  lineread.h |  23 ++++++++++++
>  3 files changed, 135 insertions(+), 4 deletions(-)
>  create mode 100644 lineread.c
>  create mode 100644 lineread.h
> 
> diff --git a/Makefile b/Makefile
> index b0dde68..d059efb 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -32,16 +32,16 @@ CFLAGS += -DRLIMIT_STACK_VAL=$(RLIMIT_STACK_VAL)
>  CFLAGS += -DARCH=\"$(TARGET_ARCH)\"
>  
>  PASST_SRCS = arch.c arp.c checksum.c conf.c dhcp.c dhcpv6.c icmp.c igmp.c \
> -	mld.c ndp.c netlink.c packet.c passt.c pasta.c pcap.c siphash.c \
> -	tap.c tcp.c tcp_splice.c udp.c util.c
> +	lineread.c mld.c ndp.c netlink.c packet.c passt.c pasta.c pcap.c \
> +	siphash.c tap.c tcp.c tcp_splice.c udp.c util.c
>  QRAP_SRCS = qrap.c
>  SRCS = $(PASST_SRCS) $(QRAP_SRCS)
>  
>  MANPAGES = passt.1 pasta.1 qrap.1
>  
>  PASST_HEADERS = arch.h arp.h checksum.h conf.h dhcp.h dhcpv6.h icmp.h \
> -	ndp.h netlink.h packet.h passt.h pasta.h pcap.h siphash.h \
> -	tap.h tcp.h tcp_splice.h udp.h util.h
> +	lineread.h ndp.h netlink.h packet.h passt.h pasta.h pcap.h \
> +	siphash.h tap.h tcp.h tcp_splice.h udp.h util.h
>  HEADERS = $(PASST_HEADERS)
>  
>  # On gcc 11.2, with -O2 and -flto, tcp_hash() and siphash_20b(), if inlined,
> diff --git a/lineread.c b/lineread.c
> new file mode 100644
> index 0000000..3e263cf
> --- /dev/null
> +++ b/lineread.c
> @@ -0,0 +1,108 @@
> +// SPDX-License-Identifier: AGPL-3.0-or-later
> +
> +/* PASST - Plug A Simple Socket Transport
> + *  for qemu/UNIX domain socket mode
> + *
> + * PASTA - Pack A Subtle Tap Abstraction
> + *  for network namespace/tap device mode
> + *
> + * lineread.c - Allocation free line-by-line buffered file input
> + *
> + * Copyright Red Hat
> + * Author: David Gibson <david(a)gibson.dropbear.id.au>
> + */
> +
> +#include <stddef.h>
> +#include <fcntl.h>
> +#include <string.h>
> +#include <stdbool.h>
> +#include <assert.h>
> +#include <unistd.h>
> +
> +#include "lineread.h"
> +
> +/**
> + * lineread_init() - Prepare for line by line file reading without allocation
> + * @lr:		Line reader state structure to initialize
> + * @fd:		File handle to read lines from

I think by "handle" most people refer to "FILE" stream handles, I would
drop "handle" here or replace by "descriptor".

> + */
> +void lineread_init(struct lineread *lr, int fd)
> +{
> +	lr->fd = fd;
> +	lr->next_line = lr->count = 0;
> +}
> +
> +static int peek_line(struct lineread *lr, bool eof)

This lacks a description in kerneldoc style, I would add:

/**
 * peek_line() - Find and NULL-terminate next line in buffer
 * @lr:		Line reader state structure
 * @eof:	Caller indicates end-of-file was already found by read()
 *
 * Return: length of line in bytes, -1 if no line was found
 */

By the way, if you're wondering why you're introducing the first usage
of 'bool', I switched to C99 just recently. :)

> +{
> +	char *nl;
> +
> +	/* Sanity checks (which also document invariants) */
> +	assert(lr->count >= 0);
> +	assert(lr->next_line >= 0);
> +	assert(lr->next_line + lr->count >= lr->next_line);
> +	assert(lr->next_line + lr->count <= LINEREAD_BUFFER_SIZE);
> +
> +	nl = memchr(lr->buf + lr->next_line, '\n', lr->count);
> +
> +	if (nl) {
> +		*nl = '\0';
> +		return nl - lr->buf - lr->next_line + 1;

clang-tidy complains about the else-after-return here
(llvm-else-after-return, readability-else-after-return), and at a
second glance me too :) I would drop them if you're fine with it.

> +	} else if (eof) {
> +		lr->buf[lr->next_line + lr->count] = '\0';
> +		/*
> +		 * No trailing newline, so treat all remaining bytes
> +		 * as the last line
> +		 */
> +		return lr->count;
> +	} else {
> +		return -1;
> +	}
> +}
> +
> +/**
> + * lineread_get() - Read a single line from file (no allocation)
> + * @lr:		Line reader state structure
> + * @line:	Place a pointer to the next line in this variable
> + *
> + * Return:	Length of line read on success, 0 on EOF, negative on error
> + */
> +int lineread_get(struct lineread *lr, char **line)
> +{
> +	bool eof = false;
> +	int line_len;
> +
> +	while ((line_len = peek_line(lr, eof)) < 0) {
> +		int rc;
> +
> +		if ((lr->next_line + lr->count) == LINEREAD_BUFFER_SIZE) {
> +			/* No space at end */
> +			if (lr->next_line == 0) {
> +				/*

Nit: elsewhere, I used "net" kernel style comments, which are the same
as every other area of the Linux kernel except that there's no opening
newline:

				/* Buffer is full, which means we've

I would change it here and in the comment above.

> +				 * Buffer is full, which means we've
> +				 * hit a line too long for us to
> +				 * process.  FIXME: report error
> +				 * better
> +				 */
> +				return -1;
> +			}
> +			memmove(lr->buf, lr->buf + lr->next_line, lr->count);
> +			lr->next_line = 0;
> +		}
> +		

Stray tabs here, dropped.


> +		/* Read more data into the end of buffer */
> +		rc = read(lr->fd, lr->buf + lr->next_line + lr->count,
> +			  LINEREAD_BUFFER_SIZE - lr->next_line - lr->count);
> +		if (rc < 0) {
> +			return rc;
> +		} else if (rc == 0) {
> +			eof = true;
> +		} else {
> +			lr->count += rc;
> +		}
> +	}
> +
> +	*line = lr->buf + lr->next_line;
> +	lr->next_line += line_len;
> +	lr->count -= line_len;
> +	return line_len;
> +}
> diff --git a/lineread.h b/lineread.h
> new file mode 100644
> index 0000000..972dc51
> --- /dev/null
> +++ b/lineread.h
> @@ -0,0 +1,23 @@
> +/* SPDX-License-Identifier: AGPL-3.0-or-later
> + * Copyright Red Hat
> + * Author: David Gibson <david(a)gibson.dropbear.id.au>
> + */
> +
> +#ifndef LINEREAD_H
> +#define LINEREAD_H
> +
> +#define LINEREAD_BUFFER_SIZE	8192
> +

I would also stick to kerneldoc style comment here:

/**
 * struct lineread - Line reader state
 * @fd:		File descriptor lines are read from
 * @next_line:	...

> +struct lineread {
> +	int fd;
> +	int next_line; /* start of next unread line in buffer */
> +	int count; /* number of unreturned bytes in buffer */
> +
> +	/* One extra byte for possible trailing \0 */
> +	char buf[LINEREAD_BUFFER_SIZE+1];
> +};
> +
> +void lineread_init(struct lineread *lr, int fd);
> +int lineread_get(struct lineread *lr, char **line);
> +
> +#endif /* _LINEREAD_H */

I can change stuff on merge, let me know.

-- 
Stefano


  reply	other threads:[~2022-06-23  9:31 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-17  3:10 [PATCH 0/4] New line reading implementation David Gibson
2022-06-17  3:10 ` [PATCH 1/4] Add cleaner line-by-line reading primitives David Gibson
2022-06-23  9:31   ` Stefano Brivio [this message]
2022-06-24  2:12     ` David Gibson
2022-06-17  3:10 ` [PATCH 2/4] Parse resolv.conf with new lineread implementation David Gibson
2022-06-17  3:10 ` [PATCH 3/4] Use new lineread implementation for procfs_scan_listen() David Gibson
2022-06-17  3:10 ` [PATCH 4/4] Remove unused line_read() David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220623113143.538e9266@elisabeth \
    --to=sbrivio@redhat.com \
    --cc=passt-dev@passt.top \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).