On Fri, 17 Jun 2022 13:10:23 +1000 David Gibson wrote: > Two places in passt need to read files line by line (one parsing > resolv.conf, the other parsing /proc/net/*. They can't use fgets() > because in glibc that can allocate memory. Instead they use an > implementation line_read() in util.c. This has some problems: > > * It has two completely separate modes of operation, one buffering > and one not, the relation between these and how they're activated > is subtle and confusing > * At least in non-buffered mode, it will mishandle an empty line, > folding them onto the start of the next non-empty line > * In non-buffered mode it will use lseek() which prevents using this > on non-regular files (we don't need that at present, but it's a > surprising limitation) > * It has a lot of difficult to read pointer mangling > > Add a new cleaner implementation of allocation-free line-by-line > reading in lineread.c. This one already buffers, using a state > structure to keep track of what we need. This is larger than I'd > like, but it turns out handling all the edge cases of line-by-line > reading in C is surprisingly hard. Still much simpler (albeit a bit more verbose) than the original version, thanks. :) > This just adds the code, subsequent patches will change the existing > users of line_read() to the new implementation. > > Signed-off-by: David Gibson > --- > Makefile | 8 ++-- > lineread.c | 108 +++++++++++++++++++++++++++++++++++++++++++++++++++++ > lineread.h | 23 ++++++++++++ > 3 files changed, 135 insertions(+), 4 deletions(-) > create mode 100644 lineread.c > create mode 100644 lineread.h > > diff --git a/Makefile b/Makefile > index b0dde68..d059efb 100644 > --- a/Makefile > +++ b/Makefile > @@ -32,16 +32,16 @@ CFLAGS += -DRLIMIT_STACK_VAL=$(RLIMIT_STACK_VAL) > CFLAGS += -DARCH=\"$(TARGET_ARCH)\" > > PASST_SRCS = arch.c arp.c checksum.c conf.c dhcp.c dhcpv6.c icmp.c igmp.c \ > - mld.c ndp.c netlink.c packet.c passt.c pasta.c pcap.c siphash.c \ > - tap.c tcp.c tcp_splice.c udp.c util.c > + lineread.c mld.c ndp.c netlink.c packet.c passt.c pasta.c pcap.c \ > + siphash.c tap.c tcp.c tcp_splice.c udp.c util.c > QRAP_SRCS = qrap.c > SRCS = $(PASST_SRCS) $(QRAP_SRCS) > > MANPAGES = passt.1 pasta.1 qrap.1 > > PASST_HEADERS = arch.h arp.h checksum.h conf.h dhcp.h dhcpv6.h icmp.h \ > - ndp.h netlink.h packet.h passt.h pasta.h pcap.h siphash.h \ > - tap.h tcp.h tcp_splice.h udp.h util.h > + lineread.h ndp.h netlink.h packet.h passt.h pasta.h pcap.h \ > + siphash.h tap.h tcp.h tcp_splice.h udp.h util.h > HEADERS = $(PASST_HEADERS) > > # On gcc 11.2, with -O2 and -flto, tcp_hash() and siphash_20b(), if inlined, > diff --git a/lineread.c b/lineread.c > new file mode 100644 > index 0000000..3e263cf > --- /dev/null > +++ b/lineread.c > @@ -0,0 +1,108 @@ > +// SPDX-License-Identifier: AGPL-3.0-or-later > + > +/* PASST - Plug A Simple Socket Transport > + * for qemu/UNIX domain socket mode > + * > + * PASTA - Pack A Subtle Tap Abstraction > + * for network namespace/tap device mode > + * > + * lineread.c - Allocation free line-by-line buffered file input > + * > + * Copyright Red Hat > + * Author: David Gibson > + */ > + > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include "lineread.h" > + > +/** > + * lineread_init() - Prepare for line by line file reading without allocation > + * @lr: Line reader state structure to initialize > + * @fd: File handle to read lines from I think by "handle" most people refer to "FILE" stream handles, I would drop "handle" here or replace by "descriptor". > + */ > +void lineread_init(struct lineread *lr, int fd) > +{ > + lr->fd = fd; > + lr->next_line = lr->count = 0; > +} > + > +static int peek_line(struct lineread *lr, bool eof) This lacks a description in kerneldoc style, I would add: /** * peek_line() - Find and NULL-terminate next line in buffer * @lr: Line reader state structure * @eof: Caller indicates end-of-file was already found by read() * * Return: length of line in bytes, -1 if no line was found */ By the way, if you're wondering why you're introducing the first usage of 'bool', I switched to C99 just recently. :) > +{ > + char *nl; > + > + /* Sanity checks (which also document invariants) */ > + assert(lr->count >= 0); > + assert(lr->next_line >= 0); > + assert(lr->next_line + lr->count >= lr->next_line); > + assert(lr->next_line + lr->count <= LINEREAD_BUFFER_SIZE); > + > + nl = memchr(lr->buf + lr->next_line, '\n', lr->count); > + > + if (nl) { > + *nl = '\0'; > + return nl - lr->buf - lr->next_line + 1; clang-tidy complains about the else-after-return here (llvm-else-after-return, readability-else-after-return), and at a second glance me too :) I would drop them if you're fine with it. > + } else if (eof) { > + lr->buf[lr->next_line + lr->count] = '\0'; > + /* > + * No trailing newline, so treat all remaining bytes > + * as the last line > + */ > + return lr->count; > + } else { > + return -1; > + } > +} > + > +/** > + * lineread_get() - Read a single line from file (no allocation) > + * @lr: Line reader state structure > + * @line: Place a pointer to the next line in this variable > + * > + * Return: Length of line read on success, 0 on EOF, negative on error > + */ > +int lineread_get(struct lineread *lr, char **line) > +{ > + bool eof = false; > + int line_len; > + > + while ((line_len = peek_line(lr, eof)) < 0) { > + int rc; > + > + if ((lr->next_line + lr->count) == LINEREAD_BUFFER_SIZE) { > + /* No space at end */ > + if (lr->next_line == 0) { > + /* Nit: elsewhere, I used "net" kernel style comments, which are the same as every other area of the Linux kernel except that there's no opening newline: /* Buffer is full, which means we've I would change it here and in the comment above. > + * Buffer is full, which means we've > + * hit a line too long for us to > + * process. FIXME: report error > + * better > + */ > + return -1; > + } > + memmove(lr->buf, lr->buf + lr->next_line, lr->count); > + lr->next_line = 0; > + } > + Stray tabs here, dropped. > + /* Read more data into the end of buffer */ > + rc = read(lr->fd, lr->buf + lr->next_line + lr->count, > + LINEREAD_BUFFER_SIZE - lr->next_line - lr->count); > + if (rc < 0) { > + return rc; > + } else if (rc == 0) { > + eof = true; > + } else { > + lr->count += rc; > + } > + } > + > + *line = lr->buf + lr->next_line; > + lr->next_line += line_len; > + lr->count -= line_len; > + return line_len; > +} > diff --git a/lineread.h b/lineread.h > new file mode 100644 > index 0000000..972dc51 > --- /dev/null > +++ b/lineread.h > @@ -0,0 +1,23 @@ > +/* SPDX-License-Identifier: AGPL-3.0-or-later > + * Copyright Red Hat > + * Author: David Gibson > + */ > + > +#ifndef LINEREAD_H > +#define LINEREAD_H > + > +#define LINEREAD_BUFFER_SIZE 8192 > + I would also stick to kerneldoc style comment here: /** * struct lineread - Line reader state * @fd: File descriptor lines are read from * @next_line: ... > +struct lineread { > + int fd; > + int next_line; /* start of next unread line in buffer */ > + int count; /* number of unreturned bytes in buffer */ > + > + /* One extra byte for possible trailing \0 */ > + char buf[LINEREAD_BUFFER_SIZE+1]; > +}; > + > +void lineread_init(struct lineread *lr, int fd); > +int lineread_get(struct lineread *lr, char **line); > + > +#endif /* _LINEREAD_H */ I can change stuff on merge, let me know. -- Stefano