kernel-hacking-2024-linux-steffo

unimore/kernel-hacking-2024-linux-steffo

Author	SHA1	Message	Date
Yan Zhen	aeddf8e6c5	sunrpc: xprtrdma: Use ERR_CAST() to return Using ERR_CAST() is more reasonable and safer, When it is necessary to convert the type of an error pointer and return it. Signed-off-by: Yan Zhen <yanzhen@vivo.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-09-20 19:31:03 -04:00
Thorsten Blum	2869b3a00e	NFSD: Annotate struct pnfs_block_deviceaddr with __counted_by() Add the __counted_by compiler attribute to the flexible array member volumes to improve access bounds-checking via CONFIG_UBSAN_BOUNDS and CONFIG_FORTIFY_SOURCE. Use struct_size() instead of manually calculating the number of bytes to allocate for a pnfs_block_deviceaddr with a single volume. Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Acked-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-09-20 19:31:03 -04:00
Guoqing Jiang	d078cbf5c3	nfsd: call cache_put if xdr_reserve_space returns NULL If not enough buffer space available, but idmap_lookup has triggered lookup_fn which calls cache_get and returns successfully. Then we missed to call cache_put here which pairs with cache_get. Fixes: `ddd1ea5636` ("nfsd4: use xdr_reserve_space in attribute encoding") Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Reviwed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-09-20 19:31:03 -04:00
Jeff Layton	ba017fd391	nfsd: add more nfsd_cb tracepoints Add some tracepoints in the callback client RPC operations. Also add a tracepoint to nfsd4_cb_getattr_done. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-09-20 19:31:03 -04:00
Jeff Layton	c1c9f3ea74	nfsd: track the main opcode for callbacks Keep track of the "main" opcode for the callback, and display it in the tracepoint. This makes it simpler to discern what's happening when there is more than one callback in flight. The one special case is the CB_NULL RPC. That's not a CB_COMPOUND opcode, so designate the value 0 for that. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-09-20 19:31:03 -04:00
Jeff Layton	e8581a9124	nfsd: add more info to WARN_ON_ONCE on failed callbacks Currently, you get the warning and stack trace, but nothing is printed about the relevant error codes. Add that in. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-09-20 19:31:03 -04:00
Li Lingfeng	76a3f3f164	nfsd: fix some spelling errors in comments Fix spelling errors in comments of nfsd4_release_lockowner and nfs4_set_delegation. Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-09-20 19:31:03 -04:00
Li Lingfeng	eb059a413c	nfsd: remove unused parameter of nfsd_file_mark_find_or_create Commit `427f5f83a3` ("NFSD: Ensure nf_inode is never dereferenced") passes inode directly to nfsd_file_mark_find_or_create instead of getting it from nf, so there is no need to pass nf. Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-09-20 19:31:03 -04:00
Hongbo Li	c2feb7ee39	nfsd: use LIST_HEAD() to simplify code list_head can be initialized automatically with LIST_HEAD() instead of calling INIT_LIST_HEAD(). Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-09-20 19:31:03 -04:00
Li Lingfeng	340e61e44c	nfsd: map the EBADMSG to nfserr_io to avoid warning Ext4 will throw -EBADMSG through ext4_readdir when a checksum error occurs, resulting in the following WARNING. Fix it by mapping EBADMSG to nfserr_io. nfsd_buffered_readdir iterate_dir // -EBADMSG -74 ext4_readdir // .iterate_shared ext4_dx_readdir ext4_htree_fill_tree htree_dirblock_to_tree ext4_read_dirblock __ext4_read_dirblock ext4_dirblock_csum_verify warn_no_space_for_csum __warn_no_space_for_csum return ERR_PTR(-EFSBADCRC) // -EBADMSG -74 nfserrno // WARNING [ 161.115610] ------------[ cut here ]------------ [ 161.116465] nfsd: non-standard errno: -74 [ 161.117315] WARNING: CPU: 1 PID: 780 at fs/nfsd/nfsproc.c:878 nfserrno+0x9d/0xd0 [ 161.118596] Modules linked in: [ 161.119243] CPU: 1 PID: 780 Comm: nfsd Not tainted 5.10.0-00014-g79679361fd5d #138 [ 161.120684] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qe mu.org 04/01/2014 [ 161.123601] RIP: 0010:nfserrno+0x9d/0xd0 [ 161.124676] Code: 0f 87 da 30 dd 00 83 e3 01 b8 00 00 00 05 75 d7 44 89 ee 48 c7 c7 c0 57 24 98 89 44 24 04 c6 05 ce 2b 61 03 01 e8 99 20 d8 00 <0f> 0b 8b 44 24 04 eb b5 4c 89 e6 48 c7 c7 a0 6d a4 99 e8 cc 15 33 [ 161.127797] RSP: 0018:ffffc90000e2f9c0 EFLAGS: 00010286 [ 161.128794] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 161.130089] RDX: 1ffff1103ee16f6d RSI: 0000000000000008 RDI: fffff520001c5f2a [ 161.131379] RBP: 0000000000000022 R08: 0000000000000001 R09: ffff8881f70c1827 [ 161.132664] R10: ffffed103ee18304 R11: 0000000000000001 R12: 0000000000000021 [ 161.133949] R13: 00000000ffffffb6 R14: ffff8881317c0000 R15: ffffc90000e2fbd8 [ 161.135244] FS: 0000000000000000(0000) GS:ffff8881f7080000(0000) knlGS:0000000000000000 [ 161.136695] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 161.137761] CR2: 00007fcaad70b348 CR3: 0000000144256006 CR4: 0000000000770ee0 [ 161.139041] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 161.140291] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 161.141519] PKRU: 55555554 [ 161.142076] Call Trace: [ 161.142575] ? __warn+0x9b/0x140 [ 161.143229] ? nfserrno+0x9d/0xd0 [ 161.143872] ? report_bug+0x125/0x150 [ 161.144595] ? handle_bug+0x41/0x90 [ 161.145284] ? exc_invalid_op+0x14/0x70 [ 161.146009] ? asm_exc_invalid_op+0x12/0x20 [ 161.146816] ? nfserrno+0x9d/0xd0 [ 161.147487] nfsd_buffered_readdir+0x28b/0x2b0 [ 161.148333] ? nfsd4_encode_dirent_fattr+0x380/0x380 [ 161.149258] ? nfsd_buffered_filldir+0xf0/0xf0 [ 161.150093] ? wait_for_concurrent_writes+0x170/0x170 [ 161.151004] ? generic_file_llseek_size+0x48/0x160 [ 161.151895] nfsd_readdir+0x132/0x190 [ 161.152606] ? nfsd4_encode_dirent_fattr+0x380/0x380 [ 161.153516] ? nfsd_unlink+0x380/0x380 [ 161.154256] ? override_creds+0x45/0x60 [ 161.155006] nfsd4_encode_readdir+0x21a/0x3d0 [ 161.155850] ? nfsd4_encode_readlink+0x210/0x210 [ 161.156731] ? write_bytes_to_xdr_buf+0x97/0xe0 [ 161.157598] ? __write_bytes_to_xdr_buf+0xd0/0xd0 [ 161.158494] ? lock_downgrade+0x90/0x90 [ 161.159232] ? nfs4svc_decode_voidarg+0x10/0x10 [ 161.160092] nfsd4_encode_operation+0x15a/0x440 [ 161.160959] nfsd4_proc_compound+0x718/0xe90 [ 161.161818] nfsd_dispatch+0x18e/0x2c0 [ 161.162586] svc_process_common+0x786/0xc50 [ 161.163403] ? nfsd_svc+0x380/0x380 [ 161.164137] ? svc_printk+0x160/0x160 [ 161.164846] ? svc_xprt_do_enqueue.part.0+0x365/0x380 [ 161.165808] ? nfsd_svc+0x380/0x380 [ 161.166523] ? rcu_is_watching+0x23/0x40 [ 161.167309] svc_process+0x1a5/0x200 [ 161.168019] nfsd+0x1f5/0x380 [ 161.168663] ? nfsd_shutdown_threads+0x260/0x260 [ 161.169554] kthread+0x1c4/0x210 [ 161.170224] ? kthread_insert_work_sanity_check+0x80/0x80 [ 161.171246] ret_from_fork+0x1f/0x30 Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-09-20 19:31:03 -04:00
Li Lingfeng	2039c5da5d	NFSD: remove redundant assignment operation Commit `5826e09bf3` ("NFSD: OP_CB_RECALL_ANY should recall both read and write delegations") added a new assignment statement to add RCA4_TYPE_MASK_WDATA_DLG to ra_bmval bitmask of OP_CB_RECALL_ANY. So the old one should be removed. Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-09-20 19:31:03 -04:00
Chuck Lever	ecbf849405	.mailmap: Add an entry for my work email address Collect a few very old previous employers as well. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-09-20 19:31:03 -04:00
Chuck Lever	202f39039a	NFSD: Fix NFSv4's PUTPUBFH operation According to RFC 8881, all minor versions of NFSv4 support PUTPUBFH. Replace the XDR decoder for PUTPUBFH with a "noop" since we no longer want the minorversion check, and PUTPUBFH has no arguments to decode. (Ideally nfsd4_decode_noop should really be called nfsd4_decode_void). PUTPUBFH should now behave just like PUTROOTFH. Reported-by: Cedric Blancher <cedric.blancher@gmail.com> Fixes: `e1a90ebd8b` ("NFSD: Combine decode operations for v4 and v4.1") Cc: Dan Shelton <dan.f.shelton@gmail.com> Cc: Roland Mainz <roland.mainz@nrubsig.org> Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-09-20 19:31:03 -04:00
Mark Grimes	32b34fa485	nfsd: Add quotes to client info 'callback address' The 'callback address' in client_info_show is output without quotes causing yaml parsers to fail on processing IPv6 addresses. Adding quotes to 'callback address' also matches that used by the 'address' field. Signed-off-by: Mark Grimes <mark.grimes@ixsystems.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-09-20 19:31:03 -04:00
Chuck Lever	c4de97f7c4	svcrdma: Handle device removal outside of the CM event handler Synchronously wait for all disconnects to complete to ensure the transports have divested all hardware resources before the underlying RDMA device can safely be removed. Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-09-20 19:31:03 -04:00
NeilBrown	438f81e0e9	nfsd: move error choice for incorrect object types to version-specific code. If an NFS operation expects a particular sort of object (file, dir, link, etc) but gets a file handle for a different sort of object, it must return an error. The actual error varies among NFS versions in non-trivial ways. For v2 and v3 there are ISDIR and NOTDIR errors and, for NFSv4 only, INVAL is suitable. For v4.0 there is also NFS4ERR_SYMLINK which should be used if a SYMLINK was found when not expected. This take precedence over NOTDIR. For v4.1+ there is also NFS4ERR_WRONG_TYPE which should be used in preference to EINVAL when none of the specific error codes apply. When nfsd_mode_check() finds a symlink where it expected a directory it needs to return an error code that can be converted to NOTDIR for v2 or v3 but will be SYMLINK for v4. It must be different from the error code returns when it finds a symlink but expects a regular file - that must be converted to EINVAL or SYMLINK. So we introduce an internal error code nfserr_symlink_not_dir which each version converts as appropriate. nfsd_check_obj_isreg() is similar to nfsd_mode_check() except that it is only used by NFSv4 and only for OPEN. NFSERR_INVAL is never a suitable error if the object is the wrong time. For v4.0 we use nfserr_symlink for non-dirs even if not a symlink. For v4.1 we have nfserr_wrong_type. We handle this difference in-place in nfsd_check_obj_isreg() as there is nothing to be gained by delaying the choice to nfsd4_map_status(). As a result of these changes, nfsd_mode_check() doesn't need an rqstp arg any more. Note that NFSv4 operations are actually performed in the xdr code(!!!) so to the only place that we can map the status code successfully is in nfsd4_encode_operation(). Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-09-20 19:31:03 -04:00
NeilBrown	36ffa3d0de	nfsd: be more systematic about selecting error codes for internal use. Rather than using ad hoc values for internal errors (30000, 11000, ...) use 'enum' to sequentially allocate numbers starting from the first known available number - now visible as NFS4ERR_FIRST_FREE. The goal is values that are distinct from all be32 error codes. To get those we must first select integers that are not already used, then convert them with cpu_to_be32(). Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-09-20 19:31:03 -04:00
NeilBrown	1459ad5767	nfsd: Move error code mapping to per-version proc code. There is code scattered around nfsd which chooses an error status based on the particular version of nfs being used. It is cleaner to have the version specific choices in version specific code. With this patch common code returns the most specific error code possible and the version specific code maps that if necessary. Both v2 (nfsproc.c) and v3 (nfs3proc.c) now have a "map_status()" function which is called to map the resp->status before each non-trivial nfsd_proc_* or nfsd3_proc_* function returns. NFS4ERR_SYMLINK and NFS4ERR_WRONG_TYPE introduce extra complications and are left for a later patch. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-09-20 19:31:03 -04:00
NeilBrown	ef7f6c4904	nfsd: move V4ROOT version check to nfsd_set_fh_dentry() This further centralizes version number checks. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-09-20 19:31:03 -04:00
NeilBrown	c689bdd3bf	nfsd: further centralize protocol version checks. With this patch the only places that test ->rq_vers against a specific version are nfsd_v4client() and nfsd_set_fh_dentry(). The latter sets some flags in the svc_fh, which now includes: fh_64bit_cookies fh_use_wgather Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-09-20 19:31:03 -04:00
NeilBrown	4f67d24f72	nfsd: use nfsd_v4client() in nfsd_breaker_owns_lease() nfsd_breaker_owns_lease() currently open-codes the same test that nfsd_v4client() performs. With this patch we use nfsd_v4client() instead. Also as i_am_nfsd() is only used in combination with kthread_data(), replace it with nfsd_current_rqst() which combines the two and returns a valid svc_rqst, or NULL. The test for NULL is moved into nfsd_v4client() for code clarity. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-09-20 19:31:03 -04:00
NeilBrown	9fd45c16f3	nfsd: Pass 'cred' instead of 'rqstp' to some functions. nfsd_permission(), exp_rdonly(), nfsd_setuser(), and nfsexp_flags() only ever need the cred out of rqstp, so pass it explicitly instead of the whole rqstp. This makes the interfaces cleaner. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-09-20 19:31:03 -04:00
NeilBrown	c55aeef776	nfsd: Don't pass all of rqst into rqst_exp_find() Rather than passing the whole rqst, pass the pieces that are actually needed. This makes the inputs to rqst_exp_find() more obvious. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-09-20 19:31:03 -04:00
Sagi Grimberg	11673b2a91	nfsd: don't assume copy notify when preprocessing the stateid Move the stateid handling to nfsd4_copy_notify. If nfs4_preprocess_stateid_op did not produce an output stateid, error out. Copy notify specifically does not permit the use of special stateids, so enforce that outside generic stateid pre-processing. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Olga Kornievskaia <aglo@umich.edu> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-09-20 19:31:03 -04:00
NeilBrown	3391fc92db	sunrpc: allow svc threads to fail initialisation cleanly If an svc thread needs to perform some initialisation that might fail, it has no good way to handle the failure. Before the thread can exit it must call svc_exit_thread(), but that requires the service mutex to be held. The thread cannot simply take the mutex as that could deadlock if there is a concurrent attempt to shut down all threads (which is unlikely, but not impossible). nfsd currently call svc_exit_thread() unprotected in the unlikely event that unshare_fs_struct() fails. We can clean this up by introducing svc_thread_init_status() by which an svc thread can report whether initialisation has succeeded. If it has, it continues normally into the action loop. If it has not, svc_thread_init_status() immediately aborts the thread. svc_start_kthread() waits for either of these to happen, and calls svc_exit_thread() (under the mutex) if the thread aborted. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-09-20 19:31:03 -04:00
NeilBrown	59f3b13816	sunrpc: merge svc_rqst_alloc() into svc_prepare_thread() The only caller of svc_rqst_alloc() is svc_prepare_thread(). So merge the one into the other and simplify. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-09-20 19:31:03 -04:00
NeilBrown	9dcbc4e070	sunrpc: don't take ->sv_lock when updating ->sv_nrthreads. As documented in svc_xprt.c, sv_nrthreads is protected by the service mutex, and it does not need ->sv_lock. (->sv_lock is needed only for sv_permsocks, sv_tempsocks, and sv_tmpcnt). So remove the unnecessary locking. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-09-20 19:31:03 -04:00
NeilBrown	60749cbe3d	sunrpc: change sp_nrthreads from atomic_t to unsigned int. sp_nrthreads is only ever accessed under the service mutex nlmsvc_mutex nfs_callback_mutex nfsd_mutex so these is no need for it to be an atomic_t. The fact that all code using it is single-threaded means that we can simplify svc_pool_victim and remove the temporary elevation of sp_nrthreads. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-09-20 19:31:03 -04:00
NeilBrown	16ef80eedc	sunrpc: document locking rules for svc_exit_thread() The locking required for svc_exit_thread() is not obvious, so document it in a kdoc comment. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-09-20 19:31:03 -04:00
NeilBrown	73598a0cfb	nfsd: don't allocate the versions array. Instead of using kmalloc to allocate an array for storing active version info, just declare an array to the max size - it is only 5 or so. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-09-20 19:29:23 -04:00
NeilBrown	c9f10f811c	nfsd: move nfsd_pool_stats_open into nfsctl.c nfsd_pool_stats_open() is used in nfsctl.c, so move it there. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-09-01 10:04:57 -04:00
NeilBrown	f2b27e1d72	SUNRPC: make various functions static, or not exported. Various functions are only used within the sunrpc module, and several are only use in the one file. So clean up: These are marked static, and any EXPORT is removed. svc_rcpb_setup() svc_rqst_alloc() svc_rqst_free() - also moved before first use svc_rpcbind_set_version() svc_drop() - also moved to svc.c These are now not EXPORTed, but are not static. svc_authenticate() svc_sock_update_bufs() Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-09-01 10:04:56 -04:00
NeilBrown	4ed9ef3260	lockd: discard nlmsvc_timeout nlmsvc_timeout always has the same value as (nlm_timeout * HZ), so use that in the one place that nlmsvc_timeout is used. In truth it might not always be the same as nlmsvc_timeout is only set when lockd is started while nlm_timeout can be set at anytime via sysctl. I think this difference it not helpful so removing it is good. Also remove the test for nlm_timout being 0. This is not possible - unless a module parameter is used to set the minimum timeout to 0, and if that happens then it probably should be honoured. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-09-01 10:04:56 -04:00
NeilBrown	8203ab8a9d	nfsd: don't EXPORT_SYMBOL nfsd4_ssc_init_umount_work() nfsd4_ssc_init_umount_work() is only used in the nfsd module, so there is no need to EXPORT it. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-09-01 10:04:56 -04:00
Chen Hanxiao	cef48236df	NFS: trace: show TIMEDOUT instead of 0x6e __nfs_revalidate_inode may return ETIMEDOUT. print symbol of ETIMEDOUT in nfs trace: before: cat-5191 [005] 119.331127: nfs_revalidate_inode_exit: error=-110 (0x6e) after: cat-1738 [004] 44.365509: nfs_revalidate_inode_exit: error=-110 (TIMEDOUT) Signed-off-by: Chen Hanxiao <chenhx.fnst@fujitsu.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-09-01 10:04:55 -04:00
Youzhong Yang	4b84551a35	nfsd: use system_unbound_wq for nfsd_file_gc_worker() After many rounds of changes in filecache.c, the fix by commit ce7df055(NFSD: Make the file_delayed_close workqueue UNBOUND) is gone, now we are getting syslog messages like these: [ 1618.186688] workqueue: nfsd_file_gc_worker [nfsd] hogged CPU for >13333us 4 times, consider switching to WQ_UNBOUND [ 1638.661616] workqueue: nfsd_file_gc_worker [nfsd] hogged CPU for >13333us 8 times, consider switching to WQ_UNBOUND [ 1665.284542] workqueue: nfsd_file_gc_worker [nfsd] hogged CPU for >13333us 16 times, consider switching to WQ_UNBOUND [ 1759.491342] workqueue: nfsd_file_gc_worker [nfsd] hogged CPU for >13333us 32 times, consider switching to WQ_UNBOUND [ 3013.012308] workqueue: nfsd_file_gc_worker [nfsd] hogged CPU for >13333us 64 times, consider switching to WQ_UNBOUND [ 3154.172827] workqueue: nfsd_file_gc_worker [nfsd] hogged CPU for >13333us 128 times, consider switching to WQ_UNBOUND [ 3422.461924] workqueue: nfsd_file_gc_worker [nfsd] hogged CPU for >13333us 256 times, consider switching to WQ_UNBOUND [ 3963.152054] workqueue: nfsd_file_gc_worker [nfsd] hogged CPU for >13333us 512 times, consider switching to WQ_UNBOUND Consider use system_unbound_wq instead of system_wq for nfsd_file_gc_worker(). Signed-off-by: Youzhong Yang <youzhong@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-09-01 10:04:55 -04:00
Jeff Layton	700bb4ff91	nfsd: count nfsd_file allocations We already count the frees (via nfsd_file_releases). Count the allocations as well. Also switch the direct call to nfsd_file_slab_free in nfsd_file_do_acquire to nfsd_file_free, so that the allocs and releases match up. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-09-01 10:04:54 -04:00
Jeff Layton	8a79261763	nfsd: fix refcount leak when file is unhashed after being found If we wait_for_construction and find that the file is no longer hashed, and we're going to retry the open, the old nfsd_file reference is currently leaked. Put the reference before retrying. Fixes: `c6593366c0` ("nfsd: don't kill nfsd_files because of lease break error") Signed-off-by: Jeff Layton <jlayton@kernel.org> Tested-by: Youzhong Yang <youzhong@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-09-01 10:04:54 -04:00
Jeff Layton	81a95c2b1d	nfsd: remove unneeded EEXIST error check in nfsd_do_file_acquire Given that we do the search and insertion while holding the i_lock, I don't think it's possible for us to get EEXIST here. Remove this case. Fixes: `c6593366c0` ("nfsd: don't kill nfsd_files because of lease break error") Signed-off-by: Jeff Layton <jlayton@kernel.org> Tested-by: Youzhong Yang <youzhong@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-09-01 10:04:54 -04:00
Youzhong Yang	8e6e2ffa65	nfsd: add list_head nf_gc to struct nfsd_file nfsd_file_put() in one thread can race with another thread doing garbage collection (running nfsd_file_gc() -> list_lru_walk() -> nfsd_file_lru_cb()): * In nfsd_file_put(), nf->nf_ref is 1, so it tries to do nfsd_file_lru_add(). * nfsd_file_lru_add() returns true (with NFSD_FILE_REFERENCED bit set) * garbage collector kicks in, nfsd_file_lru_cb() clears REFERENCED bit and returns LRU_ROTATE. * garbage collector kicks in again, nfsd_file_lru_cb() now decrements nf->nf_ref to 0, runs nfsd_file_unhash(), removes it from the LRU and adds to the dispose list [list_lru_isolate_move(lru, &nf->nf_lru, head)] * nfsd_file_put() detects NFSD_FILE_HASHED bit is cleared, so it tries to remove the 'nf' from the LRU [if (!nfsd_file_lru_remove(nf))]. The 'nf' has been added to the 'dispose' list by nfsd_file_lru_cb(), so nfsd_file_lru_remove(nf) simply treats it as part of the LRU and removes it, which leads to its removal from the 'dispose' list. * At this moment, 'nf' is unhashed with its nf_ref being 0, and not on the LRU. nfsd_file_put() continues its execution [if (refcount_dec_and_test(&nf->nf_ref))], as nf->nf_ref is already 0, nf->nf_ref is set to REFCOUNT_SATURATED, and the 'nf' gets no chance of being freed. nfsd_file_put() can also race with nfsd_file_cond_queue(): * In nfsd_file_put(), nf->nf_ref is 1, so it tries to do nfsd_file_lru_add(). * nfsd_file_lru_add() sets REFERENCED bit and returns true. * Some userland application runs 'exportfs -f' or something like that, which triggers __nfsd_file_cache_purge() -> nfsd_file_cond_queue(). * In nfsd_file_cond_queue(), it runs [if (!nfsd_file_unhash(nf))], unhash is done successfully. * nfsd_file_cond_queue() runs [if (!nfsd_file_get(nf))], now nf->nf_ref goes to 2. * nfsd_file_cond_queue() runs [if (nfsd_file_lru_remove(nf))], it succeeds. * nfsd_file_cond_queue() runs [if (refcount_sub_and_test(decrement, &nf->nf_ref))] (with "decrement" being 2), so the nf->nf_ref goes to 0, the 'nf' is added to the dispose list [list_add(&nf->nf_lru, dispose)] * nfsd_file_put() detects NFSD_FILE_HASHED bit is cleared, so it tries to remove the 'nf' from the LRU [if (!nfsd_file_lru_remove(nf))], although the 'nf' is not in the LRU, but it is linked in the 'dispose' list, nfsd_file_lru_remove() simply treats it as part of the LRU and removes it. This leads to its removal from the 'dispose' list! * Now nf->ref is 0, unhashed. nfsd_file_put() continues its execution and set nf->nf_ref to REFCOUNT_SATURATED. As shown in the above analysis, using nf_lru for both the LRU list and dispose list can cause the leaks. This patch adds a new list_head nf_gc in struct nfsd_file, and uses it for the dispose list. This does not fix the nfsd_file leaking issue completely. Signed-off-by: Youzhong Yang <youzhong@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-09-01 10:04:53 -04:00
Linus Torvalds	431c1646e1	Linux 6.11-rc6	2024-09-01 19:46:02 +12:00
Linus Torvalds	6b9ffc4595	four cifs.ko client fixes -----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmbTuKMACgkQiiy9cAdy T1GsHwwAnrVfxJ+ZiAH0wbfyFcgRLOAePeADcedn4QWQaPbmyjqqQbHfiwRwDa5X sICpnxCS+3MM9aahA7G4FOZNle/DexmFUODScESmYMfdqt4hMGzGbi9KhA4l7TY8 rcewHNpbAiPW3S0y/VtOBoXXskURMEL6+KCaBwE3u990jimJtCxPie4PQbfI/V6O 4Qjqc8qjryPo70ru4g72h/LfJdaDKxV/JYymDyhhu5/Gf7PPbv0QKZ9hhxhpc6Y4 81IcJ7S4JnLA8V9nrglrbV3ymvOCXNH0UQRHOa4Hc6H7MmrVj1aE5nu0/nfgVaOh iaaKfuuv6ItDQBWqUg6tHqM8DSPONJkbhuFkXqL/rOmrl7B0G5T1UBlt3ZqNZEy5 bEX1VCqCDQRsr1nUCxC7t5r03teXeNq59nWg/JWBBbLohWLp4Dw4eKW0xlKyo3VT Oxho3E8DnVXRu8MdTF/OeFJllp71KY3ujt2wm8uu+f5H45vz9mBN0UEUAx6hoh3c SsxufLuG =l4NV -----END PGP SIGNATURE----- Merge tag 'v6.11-rc5-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6 Pull smb client fixes from Steve French: - copy_file_range fix - two read fixes including read past end of file rc fix and read retry crediting fix - falloc zero range fix * tag 'v6.11-rc5-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: Fix FALLOC_FL_ZERO_RANGE to preflush buffered part of target region cifs: Fix copy offload to flush destination region netfs, cifs: Fix handling of short DIO read cifs: Fix lack of credit renegotiation on read retry	2024-09-01 15:49:26 +12:00
Linus Torvalds	a4c763129f	bcachefs fixes for 6.11-rc6 - Fix a rare data corruption in the rebalance path, caught as a nonce inconsistency on encrypted filesystems - Revert lockless buffered write path - Mark more errors as autofix -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmbT0+8ACgkQE6szbY3K bnafeg/+KQroY9Ig1Rn9qSnVKZOjkyDeqRq8sgvfOI5exDyuqcTgM69HU6HJbzzk wCFwVNoscx0PMMrHMLtnVKohevGnATHXqCMz0tZ1YIslFlPsHlQToYfDmae3keZQ ZX6crRCxIGxXUfx5VVf8tPn02ZFEqTkilHoZteCzp24w5d6dpjtlJwYzCJ5k+gTK 1lDcQp9IerwbbbFAvg0yu3BObTG6t2aHvtE0rHJ8gzlsVeDvxhnYRPRi4QJ5lar+ Zwpcp48559j4dl3lYh6y7rU4UfHEecxSu0blKF79D8h0u4dxzu0szyDZiZluVK84 uEI4/hNVDmL6W75mRbkjzzbwJqBdgIB35FomaziJ7Z2VFlaZf5YPWWRQE28NcMD6 nKGMtEc/ryFQKffqTHupAtp9cTZBXEQE9mZGcqWLX8mr7ClVztJLmJUCvicwAwBC sUKzhWiD6HgpAJYsDvukHNJEUGN/NBa4lp3x2lUu13n0zHRZkqY0+3b9EkDrO1KE 24ueRbD3l6g1SIRZmvCjiFCSSlOm5wpqzEYKrQndAyU3fXai/mCCncFT/fqs2zJs nH7TCR9pGvW3ln0GuyZyc8+lgcdZegPalAWLHtpNzy9xQWxbn19O4mCmRGhWCbKF irtL7Pn3+EKuUnhagIOp/ImDIH9po9yX9h5PmVndeJ9Dl6YhOF0= =LTM8 -----END PGP SIGNATURE----- Merge tag 'bcachefs-2024-08-21' of https://github.com/koverstreet/bcachefs Push bcachefs fixes from Kent Overstreet: "The data corruption in the buffered write path is troubling; inode lock should not have been able to cause that... - Fix a rare data corruption in the rebalance path, caught as a nonce inconsistency on encrypted filesystems - Revert lockless buffered write path - Mark more errors as autofix" * tag 'bcachefs-2024-08-21' of https://github.com/koverstreet/bcachefs: bcachefs: Mark more errors as autofix bcachefs: Revert lockless buffered IO path bcachefs: Fix bch2_extents_match() false positive bcachefs: Fix failure to return error in data_update_index_update()	2024-09-01 15:23:20 +12:00
Kent Overstreet	3d3020c461	bcachefs: Mark more errors as autofix errors that are known to always be safe to fix should be autofix: this should be most errors even at this point, but that will need some thorough review. note that errors are still logged in the superblock, so we'll still know that they happened. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-08-31 19:27:01 -04:00
Kent Overstreet	e3e6940940	bcachefs: Revert lockless buffered IO path We had a report of data corruption on nixos when building installer images. https://github.com/NixOS/nixpkgs/pull/321055#issuecomment-2184131334 It seems that writes are being dropped, but only when issued by QEMU, and possibly only in snapshot mode. It's undetermined if it's write calls are being dropped or dirty folios. Further testing, via minimizing the original patch to just the change that skips the inode lock on non appends/truncates, reveals that it really is just not taking the inode lock that causes the corruption: it has nothing to do with the other logic changes for preserving write atomicity in corner cases. It's also kernel config dependent: it doesn't reproduce with the minimal kernel config that ktest uses, but it does reproduce with nixos's distro config. Bisection the kernel config initially pointer the finger at page migration or compaction, but it appears that was erroneous; we haven't yet determined what kernel config option actually triggers it. Sadly it appears this will have to be reverted since we're getting too close to release and my plate is full, but we'd _really_ like to fully debug it. My suspicion is that this patch is exposing a preexisting bug - the inode lock actually covers very little in IO paths, and we have a different lock (the pagecache add lock) that guards against races with truncate here. Fixes: `7e64c86cdc` ("bcachefs: Buffered write path now can avoid the inode lock") Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-08-31 19:26:08 -04:00
Linus Torvalds	6cd90e5ea7	Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull misc fixes from Guenter Roeck. These are fixes for regressions that Guenther has been reporting, and the maintainers haven't picked up and sent in. With rc6 fairly imminent, I'm taking them directly from Guenter. * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: apparmor: fix policy_unpack_test on big endian systems Revert "MIPS: csrc-r4k: Apply verification clocksource flags" microblaze: don't treat zero reserved memory regions as error	2024-09-01 09:18:48 +12:00
Linus Torvalds	8463be8448	power sequencing fixes for v6.11-rc6 - set the direction of the wlan-enable GPIO to output after requesting it as-is -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEFp3rbAvDxGAT0sefEacuoBRx13IFAmbTcCkACgkQEacuoBRx 13K6eA//W5/lgfuwUPaO8mwHFiMWLaWTg+DGNDf3TUde4PdtDdfxDiUsvhU7zLSA n4624Ykic3uKTZm98jDbUrDr68BJDBf1ilQT9EZ6Z6R0P1B+yMiS9v7kte2j6ZVq JjeSojG35tiexOsS+UehngzK4IFkyTKwyu3Q+AYGgN6gUO5uzagCqTitO8tbfAZn nI17GOiPrYDdRTPD/uU+ine1tAYSl6KE7tX3fHBRGCNVWkUlJCNHIKBYavp/RDey SgNjC97RMP2ZW1w3jhq3WgjDvhmLTHF0CprWxTL6sLuO5/t+jXMyzJyKrRF8Y7vo pUV6R6yqofx+Ops3FMxxsiEWa+FSTvU5ukkPg9tJ33TPU8Tu6ZOcoKm/u23lBP8w 4UQzTQh29Fy3vjfbbJa4VPkpcCjI0Gb1LTijKyAo09O2kDpZqRFQr/3gYO86UmpG vpw+Cxzm5L7yxxyB48zBjNXxrUljh0uyMe0mNq/NbiQk+jA37DYjmFaPu3JLsFpi xFqNH3IAnkGsGLXrfn9+yevJcwZ1b/LVQnIYcHG2LgWVbNgHsqAnKn8CqIiRLwbm m8BdIgtz/jCxuM+j0XBcSd6nD2KBf1PxfIUuMpn9dEzmZi8PfeRLagF/EJQTrD1P ycYNFp/n1lhG4ptc1Xe4GmkXuK9R6YTqEJZCwCjoAKkVSqnNdqY= =VM4v -----END PGP SIGNATURE----- Merge tag 'pwrseq-fixes-for-v6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux Pull power sequencing fix from Bartosz Golaszewski: "A follow-up fix for the power sequencing subsystem. It turned out the previous fix for this driver was incomplete and broke the WLAN support on some platforms. This addresses the issue. - set the direction of the wlan-enable GPIO to output after requesting it as-is" * tag 'pwrseq-fixes-for-v6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: power: sequencing: qcom-wcn: set the wlan-enable GPIO to output	2024-09-01 09:07:44 +12:00
Bartosz Golaszewski	d8b762070c	power: sequencing: qcom-wcn: set the wlan-enable GPIO to output Commit `a9aaf1ff88` ("power: sequencing: request the WLAN enable GPIO as-is") broke WLAN on boards on which the wlan-enable GPIO enabling the wifi module isn't in output mode by default. We need to set direction to output while retaining the value that was already set to keep the ath module on if it's already started. Fixes: `a9aaf1ff88` ("power: sequencing: request the WLAN enable GPIO as-is") Link: https://lore.kernel.org/r/20240823115500.37280-1-brgl@bgdev.pl Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>	2024-08-31 21:32:19 +02:00
Linus Torvalds	e8784b0aef	USB fixes for 6.11-rc6 Here are some small USB fixes for 6.11-rc6. Included in here are: - dwc3 driver fixes for reported issues - MAINTAINER file update, marking a driver as unsupported :( - cdnsp driver fixes - USB gadget driver fix - USB sysfs fix - other tiny fixes - new device ids for usb serial driver All of these have been in linux-next this week with no reported issues. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZtNVTw8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ymqWwCgmITg5Owdigw+ejnkLJ4Q/CHYfRkAoLh7nkcm kaX7IqZLgVDr4rvgVcmR =azoE -----END PGP SIGNATURE----- Merge tag 'usb-6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB fixes from Greg KH: "Here are some small USB fixes for 6.11-rc6. Included in here are: - dwc3 driver fixes for reported issues - MAINTAINER file update, marking a driver as unsupported :( - cdnsp driver fixes - USB gadget driver fix - USB sysfs fix - other tiny fixes - new device ids for usb serial driver All of these have been in linux-next this week with no reported issues" * tag 'usb-6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: USB: serial: option: add MeiG Smart SRM825L usb: cdnsp: fix for Link TRB with TC usb: dwc3: st: add missing depopulate in probe error path usb: dwc3: st: fix probed platform device ref count on probe error path usb: dwc3: ep0: Don't reset resource alloc flag (including ep0) usb: core: sysfs: Unmerge @usb3_hardware_lpm_attr_group in remove_power_attributes() usb: typec: fsa4480: Relax CHIP_ID check usb: dwc3: xilinx: add missing depopulate in probe error path usb: dwc3: omap: add missing depopulate in probe error path dt-bindings: usb: microchip,usb2514: Fix reference USB device schema usb: gadget: uvc: queue pump work in uvcg_video_enable() cdc-acm: Add DISABLE_ECHO quirk for GE HealthCare UI Controller usb: cdnsp: fix incorrect index in cdnsp_get_hw_deq function usb: dwc3: core: Prevent USB core invalid event buffer address access MAINTAINERS: Mark UVC gadget driver as orphan	2024-09-01 07:06:28 +12:00
Linus Torvalds	770b0ffe28	SCSI fixes on 20240831 Minor fixes only. The sd.c one ignores a sync cache request if format is in progress which can happen if formatting a drive across suspend/resume. Signed-off-by: James E.J. Bottomley <James.Bottomley@HansenPartnership.com> -----BEGIN PGP SIGNATURE----- iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCZtM61CYcamFtZXMuYm90 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishZ+yAQDz47pp ZEr9bOMueiIYs+ZpmK1KRBHviFApXr9JJGzoHAD/WsUN43+5+y12QhGWSA0/M4qo Tog2K3HXvb4kIsDoXEg= =n2gm -----END PGP SIGNATURE----- Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Minor fixes only. The sd.c one ignores a sync cache request if format is in progress which can happen if formatting a drive across suspend/resume" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: sd: Ignore command SYNCHRONIZE CACHE error if format in progress scsi: aacraid: Fix double-free on probe failure scsi: lpfc: Fix overflow build issue	2024-09-01 07:00:38 +12:00

1 2 3 4 5 ...

1296173 commits