From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTP id BD94C5A026F for ; Wed, 6 Dec 2023 16:08:27 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1701875306; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=H4ZtTv3zoKnQQU39LBetyIlWJ1u64D35tYyblf6QUj0=; b=czc521cYhfWSz60hygyt/16LAXlU3D2Z1xiO9QQ3g0R4eMY6X3wTn8zzqbWYKnYRJtXQ0M R1M43S0HrYL6w35GHXQ/Rzgpk8rn3Mt8YrBrFAVkglI3VG5SAacgAh7C6EDa1WBQby/87D xzeiW99OSi3hPP06gunoQgU4O9hGtOA= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-240-hy0DEyI_NqyYDSmA80-E9A-1; Wed, 06 Dec 2023 10:08:24 -0500 X-MC-Unique: hy0DEyI_NqyYDSmA80-E9A-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 3F7C0B42B25 for ; Wed, 6 Dec 2023 15:08:11 +0000 (UTC) Received: from elisabeth (unknown [10.39.208.4]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 1752C2166B35; Wed, 6 Dec 2023 15:08:09 +0000 (UTC) Date: Wed, 6 Dec 2023 16:08:08 +0100 From: Stefano Brivio To: Jon Maloy Subject: Re: tcp.c: leverage MSG_PEEK with offset kernel capability when available Message-ID: <20231206160808.3d312733@elisabeth> In-Reply-To: <20231206155940.51047ac1@elisabeth> References: <20231205233604.1491317-1-jmaloy@redhat.com> <20231206155940.51047ac1@elisabeth> Organization: Red Hat MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.6 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: multipart/mixed; boundary="MP_/MkkgX64kF8I6QtIMByZDRPp" Message-ID-Hash: TEARFUKN6BCGAIIY3GSNQKXLBCTUO7RV X-Message-ID-Hash: TEARFUKN6BCGAIIY3GSNQKXLBCTUO7RV X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: lvivier@redhat.com, dgibson@redhat.com, passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --MP_/MkkgX64kF8I6QtIMByZDRPp Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Content-Disposition: inline On Wed, 6 Dec 2023 15:59:40 +0100 Stefano Brivio wrote: > [...] > > but on a kernel with your patch, I get ENOTCONN on recvmsg(). If I > replace that by a simple recv(): > > sendto(5, "ab", 2, 0, NULL, 0) = 2 > recvfrom(6, "ab", 10, 0, NULL, NULL) = 2 > > ...so I don't think it's a fundamental issue with this approach, rather > something with your patch, but I'm not yet sure what. :) Oops, my bad, I got the order of fields in struct msghdr wrong. New version attached, this one works. -- Stefano --MP_/MkkgX64kF8I6QtIMByZDRPp Content-Type: text/x-c++src Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=pkt_selfie.c #define _GNU_SOURCE #include #include #include #include #include #include #include #include #include #include #include #include #include /* ===> from passt's Makefile and code... */ #define RLIMIT_STACK_VAL 8192 #define NS_FN_STACK_SIZE (RLIMIT_STACK_VAL * 1024 / 8) int do_clone(int (*fn)(void *), char *stack_area, size_t stack_size, int flags, void *arg) { #ifdef __ia64__ return __clone2(fn, stack_area + stack_size / 2, stack_size / 2, flags, arg); #else return clone(fn, stack_area + stack_size / 2, flags, arg); #endif } static int nl_sock; static int nl_sock_init_do(void *arg) { struct sockaddr_nl addr = { .nl_family = AF_NETLINK, }; int *s = &nl_sock; #ifdef NETLINK_GET_STRICT_CHK int y = 1; #endif *s = socket(AF_NETLINK, SOCK_RAW | SOCK_CLOEXEC, NETLINK_ROUTE); if (*s < 0 || bind(*s, (struct sockaddr *)&addr, sizeof(addr))) { *s = -1; return 0; } return 0; } /** * nl_send() - Prepare and send netlink request * @s: Netlink socket * @req: Request (will fill netlink header) * @type: Request type * @flags: Extra request flags (NLM_F_REQUEST and NLM_F_ACK assumed) * @len: Request length * * Return: sequence number of request on success, terminates on error */ static uint32_t nl_send(int s, void *req, uint16_t type, uint16_t flags, ssize_t len) { struct nlmsghdr *nh; ssize_t n; nh = (struct nlmsghdr *)req; nh->nlmsg_type = type; nh->nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK | flags; nh->nlmsg_len = len; nh->nlmsg_seq = 1; nh->nlmsg_pid = 0; n = send(s, req, len, 0); return nh->nlmsg_seq; } int nl_link_up(int s, unsigned int ifi, int mtu) { struct req_t { struct nlmsghdr nlh; struct ifinfomsg ifm; struct rtattr rta; unsigned int mtu; } req = { .ifm.ifi_family = AF_UNSPEC, .ifm.ifi_index = ifi, .ifm.ifi_flags = IFF_UP, .ifm.ifi_change = IFF_UP, .rta.rta_type = IFLA_MTU, .rta.rta_len = RTA_LENGTH(sizeof(unsigned int)), .mtu = mtu, }; ssize_t len = sizeof(req); if (!mtu) /* Shorten request to drop MTU attribute */ len = offsetof(struct req_t, rta); return nl_send(s, &req, RTM_NEWLINK, 0, len); /* was nl_do() */ } /* <=== ...until here */ static int tcp_probe_sockets(void *arg) { int *s = (int *)arg; nl_sock_init_do(NULL); nl_link_up(nl_sock, 1 /* lo */, 0); s[0] = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP); s[1] = socket(AF_INET, SOCK_STREAM | SOCK_NONBLOCK, IPPROTO_TCP); return 0; } int main(int argc, char **argv) { char ns_fn_stack[NS_FN_STACK_SIZE], b; struct iovec iov[2] = { { NULL, 1 }, { &b, 1 }, }; struct sockaddr a = { AF_INET, }; struct msghdr msg = { NULL, 0, iov, 2, }; int s[2], s_nl, s_recv; ssize_t len; do_clone(tcp_probe_sockets, ns_fn_stack, sizeof(ns_fn_stack), CLONE_NEWNET | CLONE_NEWUSER | CLONE_VM | CLONE_VFORK | CLONE_FILES | SIGCHLD, (void *)s); bind(s[0], &a, sizeof(a)); getsockname(s[0], &a, &((int){ sizeof(a) })); listen(s[0], 0); connect(s[1], &a, sizeof(a)); s_recv = accept(s[0], NULL, NULL); send(s[1], (char *)("ab"), 2, 0); len = recvmsg(s_recv, &msg, MSG_PEEK); printf("MSG_PEEK with offset %ssupported\n", len == 1 ? "" : "not "); close(s_recv); close(s[1]); close(s[0]); return 0; } --MP_/MkkgX64kF8I6QtIMByZDRPp--