- Apr 23, 2020
-
-
Greg Kroah-Hartman authored
-
Juergen Gross authored
commit d6f34f4c upstream. Commit 2f62f36e ("x86/xen: Make the boot CPU idle task reliable") introduced a regression for booting 32 bit Xen PV guests: the address of the initial stack needs to be a virtual one. Fixes: 2f62f36e ("x86/xen: Make the boot CPU idle task reliable") Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lore.kernel.org/r/20200409070001.16675-1-jgross@suse.com Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Daniel Borkmann authored
[ no upstream commit ] Switch the comparison, so that is_branch_taken() will recognize that below branch is never taken: [...] 17: [...] R1_w=inv0 [...] R8_w=inv(id=0,smin_value=-2147483648,smax_value=-1,umin_value=18446744071562067968,var_off=(0xffffffff80000000; 0x7fffffff)) [...] 17: (67) r8 <<= 32 18: [...] R8_w=inv(id=0,smax_value=-4294967296,umin_value=9223372036854775808,umax_value=18446744069414584320,var_off=(0x8000000000000000; 0x7fffffff00000000)) [...] 18: (c7) r8 s>>= 32 19: [...] R8_w=inv(id=0,smin_value=-2147483648,smax_value=-1,umin_value=18446744071562067968,var_off=(0xffffffff80000000; 0x7fffffff)) [...] 19: (6d) if r1 s> r8 goto pc+16 [...] R1_w=inv0 [...] R8_w=inv(id=0,smin_value=-2147483648,smax_value=-1,umin_value=18446744071562067968,var_off=(0xffffffff80000000; 0x7fffffff)) [...] [...] Currently we check for is_branch_taken() only if either K is source, or source is a scalar value that is const. For upstream it would be good to extend this properly to check whether dst is const and src not. For the sake of the test_verifier, it is probably not needed here: # ./test_verifier 101 #101/p bpf_get_stack return R0 within range OK Summary: 1 PASSED, 0 SKIPPED, 0 FAILED I haven't seen this issue in test_progs* though, they are passing fine: # ./test_progs-no_alu32 -t get_stack Switching to flavor 'no_alu32' subdirectory... #20 get_stack_raw_tp:OK Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED # ./test_progs -t get_stack #20 get_stack_raw_tp:OK Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
John Fastabend authored
commit d2db08c7 upstream. Before this series the verifier would clamp return bounds of bpf_get_stack() to [0, X] and this led the verifier to believe that a JMP_JSLT 0 would be false and so would prune that path. The result is anything hidden behind that JSLT would be unverified. Add a test to catch this case by hiding an goto pc-1 behind the check which will cause an infinite loop if not rejected. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/158560423908.10843.11783152347709008373.stgit@john-Precision-5820-Tower Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
John Fastabend authored
commit 9ac26e99 upstream. With current ALU32 subreg handling and retval refine fix from last patches we see an expected failure in test_verifier. With verbose verifier state being printed at each step for clarity we have the following relavent lines [I omit register states that are not necessarily useful to see failure cause], #101/p bpf_get_stack return R0 within range FAIL Failed to load prog 'Success'! [..] 14: (85) call bpf_get_stack#67 R0_w=map_value(id=0,off=0,ks=8,vs=48,imm=0) R3_w=inv48 15: R0=inv(id=0,smax_value=48,var32_off=(0x0; 0xffffffff)) 15: (b7) r1 = 0 16: R0=inv(id=0,smax_value=48,var32_off=(0x0; 0xffffffff)) R1_w=inv0 16: (bf) r8 = r0 17: R0=inv(id=0,smax_value=48,var32_off=(0x0; 0xffffffff)) R1_w=inv0 R8_w=inv(id=0,smax_value=48,var32_off=(0x0; 0xffffffff)) 17: (67) r8 <<= 32 18: R0=inv(id=0,smax_value=48,var32_off=(0x0; 0xffffffff)) R1_w=inv0 R8_w=inv(id=0,smax_value=9223372032559808512, umax_value=18446744069414584320, var_off=(0x0; 0xffffffff00000000), s32_min_value=0, s32_max_value=0, u32_max_value=0, var32_off=(0x0; 0x0)) 18: (c7) r8 s>>= 32 19 R0=inv(id=0,smax_value=48,var32_off=(0x0; 0xffffffff)) R1_w=inv0 R8_w=inv(id=0,smin_value=-2147483648, smax_value=2147483647, var32_off=(0x0; 0xffffffff)) 19: (cd) if r1 s< r8 goto pc+16 R0=inv(id=0,smax_value=48,var32_off=(0x0; 0xffffffff)) R1_w=inv0 R8_w=inv(id=0,smin_value=-2147483648, smax_value=0, var32_off=(0x0; 0xffffffff)) 20: R0=inv(id=0,smax_value=48,var32_off=(0x0; 0xffffffff)) R1_w=inv0 R8_w=inv(id=0,smin_value=-2147483648, smax_value=0, R9=inv48 20: (1f) r9 -= r8 21: (bf) r2 = r7 22: R2_w=map_value(id=0,off=0,ks=8,vs=48,imm=0) 22: (0f) r2 += r8 value -2147483648 makes map_value pointer be out of bounds After call bpf_get_stack() on line 14 and some moves we have at line 16 an r8 bound with max_value 48 but an unknown min value. This is to be expected bpf_get_stack call can only return a max of the input size but is free to return any negative error in the 32-bit register space. The C helper is returning an int so will use lower 32-bits. Lines 17 and 18 clear the top 32 bits with a left/right shift but use ARSH so we still have worst case min bound before line 19 of -2147483648. At this point the signed check 'r1 s< r8' meant to protect the addition on line 22 where dst reg is a map_value pointer may very well return true with a large negative number. Then the final line 22 will detect this as an invalid operation and fail the program. What we want to do is proceed only if r8 is positive non-error. So change 'r1 s< r8' to 'r1 s> r8' so that we jump if r8 is negative. Next we will throw an error because we access past the end of the map value. The map value size is 48 and sizeof(struct test_val) is 48 so we walk off the end of the map value on the second call to get bpf_get_stack(). Fix this by changing sizeof(struct test_val) to 24 by using 'sizeof(struct test_val) / 2'. After this everything passes as expected. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/158560426019.10843.3285429543232025187.stgit@john-Precision-5820-Tower Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Daniel Borkmann authored
[ no upstream commit ] See the glory details in 10060503 ("bpf: Verifier, do_refine_retval_range may clamp umin to 0 incorrectly") for why 849fa506 ("bpf/verifier: refine retval R0 state for bpf_get_stack helper") is buggy. The whole series however is not suitable for stable since it adds significant amount [0] of verifier complexity in order to add 32bit subreg tracking. Something simpler is needed. Unfortunately, reverting 849fa506 ("bpf/verifier: refine retval R0 state for bpf_get_stack helper") or just cherry-picking 10060503 ("bpf: Verifier, do_refine_retval_range may clamp umin to 0 incorrectly") is not an option since it will break existing tracing programs badly (at least those that are using bpf_get_stack() and bpf_probe_read_str() helpers). Not fixing it in stable is also not an option since on 4.19 kernels an error will cause a soft-lockup due to hitting dead-code sanitized branch since we don't hard-wire such branches in old kernels yet. But even then for 5.x 849fa506 ("bpf/verifier: refine retval R0 state for bpf_get_stack helper") would cause wrong bounds on the verifier simluation when an error is hit. In one of the earlier iterations of mentioned patch series for upstream there was the concern that just using smax_value in do_refine_retval_range() would nuke bounds by subsequent <<32 >>32 shifts before the comparison against 0 [1] which eventually led to the 32bit subreg tracking in the first place. While I initially went for implementing the idea [1] to pattern match the two shift operations, it turned out to be more complex than actually needed, meaning, we could simply treat do_refine_retval_range() similarly to how we branch off verification for conditionals or under speculation, that is, pushing a new reg state to the stack for later verification. This means, instead of verifying the current path with the ret_reg in [S32MIN, msize_max_value] interval where later bounds would get nuked, we split this into two: i) for the success case where ret_reg can be in [0, msize_max_value], and ii) for the error case with ret_reg known to be in interval [S32MIN, -1]. Latter will preserve the bounds during these shift patterns and can match reg < 0 test. test_progs also succeed with this approach. [0] https://lore.kernel.org/bpf/158507130343.15666.8018068546764556975.stgit@john-Precision-5820-Tower/ [1] https://lore.kernel.org/bpf/158015334199.28573.4940395881683556537.stgit@john-XPS-13-9370/T/#m2e0ad1d5949131014748b6daa48a3495e7f0456d Fixes: 849fa506 ("bpf/verifier: refine retval R0 state for bpf_get_stack helper") Reported-by: Lorenzo Fontana <fontanalorenz@gmail.com> Reported-by: Leonardo Di Donato <leodidonato@gmail.com> Reported-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Tested-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Waiman Long authored
commit d3ec10aa upstream. A lockdep circular locking dependency report was seen when running a keyutils test: [12537.027242] ====================================================== [12537.059309] WARNING: possible circular locking dependency detected [12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - - [12537.125253] ------------------------------------------------------ [12537.153189] keyctl/25598 is trying to acquire lock: [12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0 [12537.208365] [12537.208365] but task is already holding lock: [12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220 [12537.270476] [12537.270476] which lock already depends on the new lock. [12537.270476] [12537.307209] [12537.307209] the existing dependency chain (in reverse order) is: [12537.340754] [12537.340754] -> #3 (&type->lock_class){++++}: [12537.367434] down_write+0x4d/0x110 [12537.385202] __key_link_begin+0x87/0x280 [12537.405232] request_key_and_link+0x483/0xf70 [12537.427221] request_key+0x3c/0x80 [12537.444839] dns_query+0x1db/0x5a5 [dns_resolver] [12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs] [12537.496731] cifs_reconnect+0xe04/0x2500 [cifs] [12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs] [12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs] [12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs] [12537.601045] kthread+0x30c/0x3d0 [12537.617906] ret_from_fork+0x3a/0x50 [12537.636225] [12537.636225] -> #2 (root_key_user.cons_lock){+.+.}: [12537.664525] __mutex_lock+0x105/0x11f0 [12537.683734] request_key_and_link+0x35a/0xf70 [12537.705640] request_key+0x3c/0x80 [12537.723304] dns_query+0x1db/0x5a5 [dns_resolver] [12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs] [12537.775607] cifs_reconnect+0xe04/0x2500 [cifs] [12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs] [12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs] [12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs] [12537.873477] kthread+0x30c/0x3d0 [12537.890281] ret_from_fork+0x3a/0x50 [12537.908649] [12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}: [12537.935225] __mutex_lock+0x105/0x11f0 [12537.954450] cifs_call_async+0x102/0x7f0 [cifs] [12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs] [12538.000659] cifs_readpages+0x120a/0x1e50 [cifs] [12538.023920] read_pages+0xf5/0x560 [12538.041583] __do_page_cache_readahead+0x41d/0x4b0 [12538.067047] ondemand_readahead+0x44c/0xc10 [12538.092069] filemap_fault+0xec1/0x1830 [12538.111637] __do_fault+0x82/0x260 [12538.129216] do_fault+0x419/0xfb0 [12538.146390] __handle_mm_fault+0x862/0xdf0 [12538.167408] handle_mm_fault+0x154/0x550 [12538.187401] __do_page_fault+0x42f/0xa60 [12538.207395] do_page_fault+0x38/0x5e0 [12538.225777] page_fault+0x1e/0x30 [12538.243010] [12538.243010] -> #0 (&mm->mmap_sem){++++}: [12538.267875] lock_acquire+0x14c/0x420 [12538.286848] __might_fault+0x119/0x1b0 [12538.306006] keyring_read_iterator+0x7e/0x170 [12538.327936] assoc_array_subtree_iterate+0x97/0x280 [12538.352154] keyring_read+0xe9/0x110 [12538.370558] keyctl_read_key+0x1b9/0x220 [12538.391470] do_syscall_64+0xa5/0x4b0 [12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf [12538.435535] [12538.435535] other info that might help us debug this: [12538.435535] [12538.472829] Chain exists of: [12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class [12538.472829] [12538.524820] Possible unsafe locking scenario: [12538.524820] [12538.551431] CPU0 CPU1 [12538.572654] ---- ---- [12538.595865] lock(&type->lock_class); [12538.613737] lock(root_key_user.cons_lock); [12538.644234] lock(&type->lock_class); [12538.672410] lock(&mm->mmap_sem); [12538.687758] [12538.687758] *** DEADLOCK *** [12538.687758] [12538.714455] 1 lock held by keyctl/25598: [12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220 [12538.770573] [12538.770573] stack backtrace: [12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G [12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015 [12538.881963] Call Trace: [12538.892897] dump_stack+0x9a/0xf0 [12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279 [12538.932891] ? save_trace+0xd6/0x250 [12538.948979] check_prev_add.constprop.32+0xc36/0x14f0 [12538.971643] ? keyring_compare_object+0x104/0x190 [12538.992738] ? check_usage+0x550/0x550 [12539.009845] ? sched_clock+0x5/0x10 [12539.025484] ? sched_clock_cpu+0x18/0x1e0 [12539.043555] __lock_acquire+0x1f12/0x38d0 [12539.061551] ? trace_hardirqs_on+0x10/0x10 [12539.080554] lock_acquire+0x14c/0x420 [12539.100330] ? __might_fault+0xc4/0x1b0 [12539.119079] __might_fault+0x119/0x1b0 [12539.135869] ? __might_fault+0xc4/0x1b0 [12539.153234] keyring_read_iterator+0x7e/0x170 [12539.172787] ? keyring_read+0x110/0x110 [12539.190059] assoc_array_subtree_iterate+0x97/0x280 [12539.211526] keyring_read+0xe9/0x110 [12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0 [12539.249076] keyctl_read_key+0x1b9/0x220 [12539.266660] do_syscall_64+0xa5/0x4b0 [12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf One way to prevent this deadlock scenario from happening is to not allow writing to userspace while holding the key semaphore. Instead, an internal buffer is allocated for getting the keys out from the read method first before copying them out to userspace without holding the lock. That requires taking out the __user modifier from all the relevant read methods as well as additional changes to not use any userspace write helpers. That is, 1) The put_user() call is replaced by a direct copy. 2) The copy_to_user() call is replaced by memcpy(). 3) All the fault handling code is removed. Compiling on a x86-64 system, the size of the rxrpc_read() function is reduced from 3795 bytes to 2384 bytes with this patch. Fixes: ^1da177e4 ("Linux-2.6.12-rc2") Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Wen Yang authored
commit 49c64df8 upstream. The variable 'name' is released multiple times in the error path, which may cause double free issues. This problem is avoided by adding a goto label to release the memory uniformly. And this change also makes the code a bit more cleaner. Fixes: 4f678a58 ("mtd: fix memory leaks in phram_setup") Signed-off-by: Wen Yang <wenyang@linux.alibaba.com> Cc: Joern Engel <joern@lazybastard.org> Cc: Miquel Raynal <miquel.raynal@bootlin.com> Cc: Richard Weinberger <richard@nod.at> Cc: Vignesh Raghavendra <vigneshr@ti.com> Cc: linux-mtd@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/linux-mtd/20200318153156.25612-1-wenyang@linux.alibaba.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Dan Carpenter authored
commit 4da0ea71 upstream. This function is only called from lpddr_probe(). We free "lpddr" both here and in the caller, so it's a double free. The best place to free "lpddr" is in lpddr_probe() so let's delete this one. Fixes: 8dc00439 ("[MTD] LPDDR qinfo probing.") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/linux-mtd/20200228092554.o57igp3nqhyvf66t@kili.mountain Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jonathan Neuschäfer authored
commit fb251124 upstream. cmdlinepart.c has been moved to drivers/mtd/parsers/. Fixes: a3f12a35 ("mtd: parsers: Move CMDLINE parser") Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Frieder Schrempf authored
commit 621a7b78 upstream. When writing the bad block marker to the OOB area the access mode should be set to MTD_OPS_RAW as it is done for reading the marker. Currently this only works because req.mode is initialized to MTD_OPS_PLACE_OOB (0) and spinand_write_to_cache_op() checks for req.mode != MTD_OPS_AUTO_OOB. Fix this by explicitly setting req.mode to MTD_OPS_RAW. Fixes: 7529df46 ("mtd: nand: Add core infrastructure to support SPI NANDs") Signed-off-by: Frieder Schrempf <frieder.schrempf@kontron.de> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/linux-mtd/20200218100432.32433-3-frieder.schrempf@kontron.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Christophe Kerello authored
commit 00926460 upstream. This patch releases the resources allocated in nanddev_init function. Fixes: a7ab085d ("mtd: rawnand: Initialize the nand_device object") Signed-off-by: Christophe Kerello <christophe.kerello@st.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/linux-mtd/1579767768-32295-1-git-send-email-christophe.kerello@st.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Paul E. McKenney authored
commit 80c503e0 upstream. The __torture_print_stats() function in locktorture.c carefully initializes local variable "min" to statp[0].n_lock_acquired, but then compares it to statp[i].n_lock_fail. Given that the .n_lock_fail field should normally be zero, and given the initialization, it seems reasonable to display the maximum and minimum number acquisitions instead of miscomputing the maximum and minimum number of failures. This commit therefore switches from failures to acquisitions. And this turns out to be not only a day-zero bug, but entirely my own fault. I hate it when that happens! Fixes: 0af3fe1e ("locktorture: Add a lock-torture kernel module") Reported-by: Will Deacon <will@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Acked-by: Will Deacon <will@kernel.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Colin Ian King authored
commit 9960c709 upstream. A null pointer deference on pdata can occur if the allocation of pdata fails. Fix this by adding a null pointer check and handle the -ENOMEM failure in the caller. Addresses-Coverity: ("Dereference null return value") Fixes: 3ce85cc4 ("iio: st_sensors: get platform data from device tree") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Stephen Rothwell authored
commit 3670664b upstream. ev_byte_channel_send() assumes that its third argument is a 16 byte array. Some places where it is called it may not be (or we can't easily tell if it is). Newer compilers have started producing warnings about this, so make sure we actually pass a 16 byte array. There may be more elegant solutions to this, but the driver is quite old and hasn't been updated in many years. The warnings (from a powerpc allyesconfig build) are: In file included from include/linux/byteorder/big_endian.h:5, from arch/powerpc/include/uapi/asm/byteorder.h:14, from include/asm-generic/bitops/le.h:6, from arch/powerpc/include/asm/bitops.h:250, from include/linux/bitops.h:29, from include/linux/kernel.h:12, from include/asm-generic/bug.h:19, from arch/powerpc/include/asm/bug.h:109, from include/linux/bug.h:5, from include/linux/mmdebug.h:5, from include/linux/gfp.h:5, from include/linux/slab.h:15, from drivers/tty/ehv_bytechan.c:24: drivers/tty/ehv_bytechan.c: In function ‘ehv_bc_udbg_putc’: arch/powerpc/include/asm/epapr_hcalls.h:298:20: warning: array subscript 1 is outside array bounds of ‘const char[1]’ [-Warray-bounds] 298 | r6 = be32_to_cpu(p[1]); include/uapi/linux/byteorder/big_endian.h:40:51: note: in definition of macro ‘__be32_to_cpu’ 40 | #define __be32_to_cpu(x) ((__force __u32)(__be32)(x)) | ^ arch/powerpc/include/asm/epapr_hcalls.h:298:7: note: in expansion of macro ‘be32_to_cpu’ 298 | r6 = be32_to_cpu(p[1]); | ^~~~~~~~~~~ drivers/tty/ehv_bytechan.c:166:13: note: while referencing ‘data’ 166 | static void ehv_bc_udbg_putc(char c) | ^~~~~~~~~~~~~~~~ Fixes: dcd83aaf ("tty/powerpc: introduce the ePAPR embedded hypervisor byte channel driver") Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Tested-by: Laurentiu Tudor <laurentiu.tudor@nxp.com> [mpe: Trim warnings from change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200109183912.5fcb52aa@canb.auug.org.au Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Nathan Chancellor authored
commit 93166f5f upstream. Clang warns: ../drivers/video/fbdev/core/fbmem.c:665:3: warning: misleading indentation; statement is not part of the previous 'else' [-Wmisleading-indentation] if (fb_logo.depth > 4 && depth > 4) { ^ ../drivers/video/fbdev/core/fbmem.c:661:2: note: previous statement is here else ^ ../drivers/video/fbdev/core/fbmem.c:1075:3: warning: misleading indentation; statement is not part of the previous 'if' [-Wmisleading-indentation] return ret; ^ ../drivers/video/fbdev/core/fbmem.c:1072:2: note: previous statement is here if (!ret) ^ 2 warnings generated. This warning occurs because there are spaces before the tabs on these lines. Normalize the indentation in these functions so that it is consistent with the Linux kernel coding style and clang no longer warns. Fixes: 1692b37c ("fbdev: Fix logo if logo depth is less than framebuffer depth") Link: https://github.com/ClangBuiltLinux/linux/issues/825 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191218030025.10064-1-natechancellor@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Maxime Roussin-Bélanger authored
commit 328b50e9 upstream. The chip is configured in 24 bit mode. The values read from it must always be treated as is. This fixes the issue by replacing the previous 16 bits value by a 24 bits buffer. This changes affects the value output by previous version of the driver, since the least significant byte was missing. The upper half of 16 bit values previously output are now the upper half of a 24 bit value. Fixes: e01e7eaf ("iio: light: introduce si1133") Reported-by: Simon Goyette <simon.goyette@gmail.com> Co-authored-by: Guillaume Champagne <champagne.guillaume.c@gmail.com> Signed-off-by: Maxime Roussin-Bélanger <maxime.roussinbelanger@gmail.com> Signed-off-by: Guillaume Champagne <champagne.guillaume.c@gmail.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jernej Skrabec authored
commit da180322 upstream. As it can be seen from DE2 manual, clock range is 0x10000. Fix it. Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net> Fixes: 73f122c8 ("ARM: dts: sun8i: a83t: Add display pipeline") Fixes: 05a43a26 ("ARM: dts: sun8i: r40: Add HDMI pipeline") Fixes: 21b29920 ("ARM: sun8i: v3s: add device nodes for DE2 display pipeline") Fixes: d8c6f1f0 ("ARM: sun8i: h3/h5: add DE2 CCU device node for H3") [wens@csie.org: added fixes tags] Signed-off-by: Chen-Yu Tsai <wens@csie.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Dan Carpenter authored
commit d3d19d6f upstream. The "fix" struct has a 2 byte hole after ->ywrapstep and the "fix = info->fix;" assignment doesn't necessarily clear it. It depends on the compiler. The solution is just to replace the assignment with an memcpy(). Fixes: 1f5e31d7 ("fbmem: don't call copy_from/to_user() with mutex held") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Andrea Righi <righi.andrea@gmail.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Daniel Thompson <daniel.thompson@linaro.org> Cc: Peter Rosin <peda@axentia.se> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200113100132.ixpaymordi24n3av@kili.mountain Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Grygorii Strashko authored
commit 9bb50ed7 upstream. The commit 2e05ea5c ("dma-mapping: implement dma_map_single_attrs using dma_map_page_attrs") removed "dma_debug_page" enum, but missed to update type2name string table. This causes incorrect displaying of dma allocation type. Fix it by removing "page" string from type2name string table and switch to use named initializers. Before (dma_alloc_coherent()): k3-ringacc 4b800000.ringacc: scather-gather idx 2208 P=d1140000 N=d114 D=d1140000 L=40 DMA_BIDIRECTIONAL dma map error check not applicable k3-ringacc 4b800000.ringacc: scather-gather idx 2216 P=d1150000 N=d115 D=d1150000 L=40 DMA_BIDIRECTIONAL dma map error check not applicable After: k3-ringacc 4b800000.ringacc: coherent idx 2208 P=d1140000 N=d114 D=d1140000 L=40 DMA_BIDIRECTIONAL dma map error check not applicable k3-ringacc 4b800000.ringacc: coherent idx 2216 P=d1150000 N=d115 D=d1150000 L=40 DMA_BIDIRECTIONAL dma map error check not applicable Fixes: 2e05ea5c ("dma-mapping: implement dma_map_single_attrs using dma_map_page_attrs") Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Aurelien Aptel authored
commit e79b0332 upstream. Fix tcon use-after-free and NULL ptr deref. Customer system crashes with the following kernel log: [462233.169868] CIFS VFS: Cancelling wait for mid 4894753 cmd: 14 => a QUERY DIR [462233.228045] CIFS VFS: cifs_put_smb_ses: Session Logoff failure rc=-4 [462233.305922] CIFS VFS: cifs_put_smb_ses: Session Logoff failure rc=-4 [462233.306205] CIFS VFS: cifs_put_smb_ses: Session Logoff failure rc=-4 [462233.347060] CIFS VFS: cifs_put_smb_ses: Session Logoff failure rc=-4 [462233.347107] CIFS VFS: Close unmatched open [462233.347113] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038 ... [exception RIP: cifs_put_tcon+0xa0] (this is doing tcon->ses->server) #6 [...] smb2_cancelled_close_fid at ... [cifs] #7 [...] process_one_work at ... #8 [...] worker_thread at ... #9 [...] kthread at ... The most likely explanation we have is: * When we put the last reference of a tcon (refcount=0), we close the cached share root handle. * If closing a handle is interrupted, SMB2_close() will queue a SMB2_close() in a work thread. * The queued object keeps a tcon ref so we bump the tcon refcount, jumping from 0 to 1. * We reach the end of cifs_put_tcon(), we free the tcon object despite it now having a refcount of 1. * The queued work now runs, but the tcon, ses & server was freed in the meantime resulting in a crash. THREAD 1 ======== cifs_put_tcon => tcon refcount reach 0 SMB2_tdis close_shroot_lease close_shroot_lease_locked => if cached root has lease && refcount = 0 smb2_close_cached_fid => if cached root valid SMB2_close => retry close in a thread if interrupted smb2_handle_cancelled_close __smb2_handle_cancelled_close => !! tcon refcount bump 0 => 1 !! INIT_WORK(&cancelled->work, smb2_cancelled_close_fid); queue_work(cifsiod_wq, &cancelled->work) => queue work tconInfoFree(tcon); ==> freed! cifs_put_smb_ses(ses); ==> freed! THREAD 2 (workqueue) ======== smb2_cancelled_close_fid SMB2_close(0, cancelled->tcon, ...); => use-after-free of tcon cifs_put_tcon(cancelled->tcon); => tcon refcount reach 0 second time *CRASH* Fixes: d9191319 ("CIFS: Close cached root handle only if it has a lease") Signed-off-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Florian Fainelli authored
commit d0802dc4 upstream. Commit f949a12f ("net: dsa: bcm_sf2: fix buffer overflow doing set_rxnfc") tried to fix the some user controlled buffer overflows in bcm_sf2_cfp_rule_set() and bcm_sf2_cfp_rule_del() but the fix was using CFP_NUM_RULES, which while it is correct not to overflow the bitmaps, is not representative of what the device actually supports. Correct that by using bcm_sf2_cfp_rule_size() instead. The latter subtracts the number of rules by 1, so change the checks from greater than or equal to greater than accordingly. Fixes: f949a12f ("net: dsa: bcm_sf2: fix buffer overflow doing set_rxnfc") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ben Skeggs authored
[ Upstream commit 028a12f5 ] Certain boards with GP107/GP108 chipsets hang (often, but randomly) for unknown reasons during GR initialisation. The first tell-tale symptom of this issue is: nouveau 0000:01:00.0: bus: MMIO read of 00000000 FAULT at 409800 [ TIMEOUT ] appearing in dmesg, likely followed by many other failures being logged. Karol found this WAR for the issue a while back, but efforts to isolate the root cause and proper fix have not yielded success so far. I've modified the original patch to include a few more details, limit it to GP107/GP108 by default, and added a config option to override this choice. Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Yicheng Li authored
[ Upstream commit 42cd0ab4 ] RO and RW of EC may have different EC protocol version. If EC transitions between RO and RW, but AP does not reboot (this is true for fingerprint microcontroller / cros_fp, but not true for main ec / cros_ec), the AP still uses the protocol version queried before transition, which can cause problems. In the case of fingerprint microcontroller, this causes AP to send the wrong version of EC_CMD_GET_NEXT_EVENT to RO in the interrupt handler, which in turn prevents RO to clear the interrupt line to AP, in an infinite loop. Once an EC_HOST_EVENT_INTERFACE_READY is received, we know that there might have been a transition between RO and RW, so re-query the protocol. Signed-off-by: Yicheng Li <yichengli@chromium.org> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Reviewed-by: Gwendal Grignou <gwendal@chromium.org> Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Chao Yu authored
[ Upstream commit dc5a9412 ] There is a race condition that we may miss to wait for all node pages writeback, fix it. - fsync() - shrink - f2fs_do_sync_file - __write_node_page - set_page_writeback(page#0) : remove DIRTY/TOWRITE flag - f2fs_fsync_node_pages : won't find page #0 as TOWRITE flag was removeD - f2fs_wait_on_node_pages_writeback : wont' wait page #0 writeback as it was not in fsync_node_list list. - f2fs_add_fsync_node_entry Fixes: 50fa53ec ("f2fs: fix to avoid broken of dnode block list") Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Eric Biggers authored
[ Upstream commit 7fa6d598 ] When the compressed data of a cluster doesn't end on a page boundary, the remainder of the last page must be zeroed in order to avoid leaking uninitialized memory to disk. Fixes: 4c8ff709 ("f2fs: support data compression") Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Adrian Huang authored
[ Upstream commit c20f3653 ] The SPA of the GCR3 table root pointer[51:31] masks 20 bits. However, this requires 21 bits (Please see the AMD IOMMU specification). This leads to the potential failure when the bit 51 of SPA of the GCR3 table root pointer is 1'. Signed-off-by: Adrian Huang <ahuang12@lenovo.com> Fixes: 52815b75 ("iommu/amd: Add support for IOMMUv2 domain mode") Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Dan Carpenter authored
[ Upstream commit f84afbdd ] The "cmd" comes from the user and it can be up to 255. It it's more than the number of bits in long, it results out of bounds read when we check test_bit(cmd, &cmd_mask). The highest valid value for "cmd" is ND_CMD_CALL (10) so I added a compare against that. Fixes: 62232e45 ("libnvdimm: control (ioctl) messages for nvdimm_bus and nvdimm devices") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/20200225162055.amtosfy7m35aivxg@kili.mountain Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Jeffery Miller authored
[ Upstream commit e42fe5b2 ] The Intel Compute Stick `STK1A32SC` can have a system vendor of "Intel(R) Client Systems". Broaden the Intel Compute Stick DMI checks so that they match "Intel Corporation" as well as "Intel(R) Client Systems". This fixes an issue where the STK1A32SC compute sticks were still exposing a battery with the existing blacklist entry. Signed-off-by: Jeffery Miller <jmiller@neverware.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Guo Ren authored
[ Upstream commit 12879bda ] WARNING: vmlinux.o(.text+0x2366): Section mismatch in reference from the function csky_start_secondary() to the function .init.text:init_fpu() The function csky_start_secondary() references the function __init init_fpu(). This is often because csky_start_secondary lacks a __init annotation or the annotation of init_fpu is wrong. Reported-by: Lu Chongzhi <chongzhi.lcz@alibaba-inc.com> Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Yuantian Tang authored
[ Upstream commit cbe259fd ] Qoriq thermal driver is used by both PowerPC and ARM architecture. When built for PowerPC architecture, it reports error: undefined reference to `.__devm_regmap_init_mmio_clk' To fix it, select config REGMAP_MMIO. Fixes: 4316237b (thermal: qoriq: Convert driver to use regmap API) Signed-off-by: Yuantian Tang <andy.tang@nxp.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200303084641.35687-1-andy.tang@nxp.com Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Chuck Lever authored
[ Upstream commit 4047aa90 ] xdr_buf_read_mic() tries to find unused contiguous space in a received xdr_buf in order to linearize the checksum for the call to gss_verify_mic. However, the corner cases in this code are numerous and we seem to keep missing them. I've just hit yet another buffer overrun related to it. This overrun is at the end of xdr_buf_read_mic(): 1284 if (buf->tail[0].iov_len != 0) 1285 mic->data = buf->tail[0].iov_base + buf->tail[0].iov_len; 1286 else 1287 mic->data = buf->head[0].iov_base + buf->head[0].iov_len; 1288 __read_bytes_from_xdr_buf(&subbuf, mic->data, mic->len); 1289 return 0; This logic assumes the transport has set the length of the tail based on the size of the received message. base + len is then supposed to be off the end of the message but still within the actual buffer. In fact, the length of the tail is set by the upper layer when the Call is encoded so that the end of the tail is actually the end of the allocated buffer itself. This causes the logic above to set mic->data to point past the end of the receive buffer. The "mic->data = head" arm of this if statement is no less fragile. As near as I can tell, this has been a problem forever. I'm not sure that minimizing au_rslack recently changed this pathology much. So instead, let's use a more straightforward approach: kmalloc a separate buffer to linearize the checksum. This is similar to how gss_validate() currently works. Coming back to this code, I had some trouble understanding what was going on. So I've cleaned up the variable naming and added a few comments that point back to the XDR definition in RFC 2203 to help guide future spelunkers, including myself. As an added clean up, the functionality that was in xdr_buf_read_mic() is folded directly into gss_unwrap_resp_integ(), as that is its only caller. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Jan Kara authored
[ Upstream commit 32302085 ] Fix a debug-only build error in ext2/xattr.c: When building without extra debugging, (and with another patch that uses no_printk() instead of <empty> for the ext2-xattr debug-print macros, this build error happens: ../fs/ext2/xattr.c: In function ‘ext2_xattr_cache_insert’: ../fs/ext2/xattr.c:869:18: error: ‘ext2_xattr_cache’ undeclared (first use in this function); did you mean ‘ext2_xattr_list’? atomic_read(&ext2_xattr_cache->c_entry_count)); Fix the problem by removing cached entry count from the debug message since otherwise we'd have to export the mbcache structure just for that. Fixes: be0726d3 ("ext2: convert to mbcache2") Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Jacob Pan authored
[ Upstream commit 52355fb1 ] Intel VT-d might support PRS (Page Reqest Support) when it's running in the scalable mode. Each page request descriptor occupies 32 bytes and is 32-bytes aligned. The page request descriptor offset mask should be 32-bytes aligned. Fixes: 5b438f4b ("iommu/vt-d: Support page request in scalable mode") Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Liu Yi L <yi.l.liu@intel.com> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Qian Cai authored
[ Upstream commit c6f4ebde ] dmar_find_atsr() calls list_for_each_entry_rcu() outside of an RCU read side critical section but with dmar_global_lock held. Silence this false positive. drivers/iommu/intel-iommu.c:4504 RCU-list traversed in non-reader section!! 1 lock held by swapper/0/1: #0: ffffffff9755bee8 (dmar_global_lock){+.+.}, at: intel_iommu_init+0x1a6/0xe19 Call Trace: dump_stack+0xa4/0xfe lockdep_rcu_suspicious+0xeb/0xf5 dmar_find_atsr+0x1ab/0x1c0 dmar_parse_one_atsr+0x64/0x220 dmar_walk_remapping_entries+0x130/0x380 dmar_table_init+0x166/0x243 intel_iommu_init+0x1ab/0xe19 pci_iommu_init+0x1a/0x44 do_one_initcall+0xae/0x4d0 kernel_init_freeable+0x412/0x4c5 kernel_init+0x19/0x193 Signed-off-by: Qian Cai <cai@lca.pw> Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Jaegeuk Kim authored
[ Upstream commit 2bac0763 ] This fixes skipping GC when segment is full in large section. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Chao Yu authored
[ Upstream commit 1a67cbe1 ] por_fsstress reports inconsistent status in orphan inode, the root cause of this is in f2fs_write_raw_pages() we decrease i_compr_blocks incorrectly due to wrong calculation in f2fs_compressed_blocks(). So this patch exposes below two functions based on __f2fs_cluster_blocks: - f2fs_compressed_blocks: get count of compressed blocks in compressed cluster - f2fs_cluster_blocks: get count of valid blocks (including reserved blocks) in compressed cluster. Then use f2fs_compress_blocks() to get correct compressed blocks count in f2fs_write_raw_pages(). sanity_check_inode: inode (ino=ad80) hash inconsistent i_compr_blocks:2, i_blocks:1, run fsck to fix Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Randy Dunlap authored
[ Upstream commit 44a52022 ] When EXT2_ATTR_DEBUG is not defined, modify the 2 debug macros to use the no_printk() macro instead of <nothing>. This fixes gcc warnings when -Wextra is used: ../fs/ext2/xattr.c:252:42: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body] ../fs/ext2/xattr.c:258:42: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body] ../fs/ext2/xattr.c:330:42: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body] ../fs/ext2/xattr.c:872:45: warning: suggest braces around empty body in an ‘else’ statement [-Wempty-body] I have verified that the only object code change (with gcc 7.5.0) is the reversal of some instructions from 'cmp a,b' to 'cmp b,a'. Link: https://lore.kernel.org/r/e18a7395-61fb-2093-18e8-ed4f8cf56248@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jan Kara <jack@suse.com> Cc: linux-ext4@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
David Hildenbrand authored
[ Upstream commit 5a6b4cc5 ] Commit 71994620 ("virtio_balloon: replace oom notifier with shrinker") changed the behavior when deflation happens automatically. Instead of deflating when called by the OOM handler, the shrinker is used. However, the balloon is not simply some slab cache that should be shrunk when under memory pressure. The shrinker does not have a concept of priorities, so this behavior cannot be configured. There was a report that this results in undesired side effects when inflating the balloon to shrink the page cache. [1] "When inflating the balloon against page cache (i.e. no free memory remains) vmscan.c will both shrink page cache, but also invoke the shrinkers -- including the balloon's shrinker. So the balloon driver allocates memory which requires reclaim, vmscan gets this memory by shrinking the balloon, and then the driver adds the memory back to the balloon. Basically a busy no-op." The name "deflate on OOM" makes it pretty clear when deflation should happen - after other approaches to reclaim memory failed, not while reclaiming. This allows to minimize the footprint of a guest - memory will only be taken out of the balloon when really needed. Especially, a drop_slab() will result in the whole balloon getting deflated - undesired. While handling it via the OOM handler might not be perfect, it keeps existing behavior. If we want a different behavior, then we need a new feature bit and document it properly (although, there should be a clear use case and the intended effects should be well described). Keep using the shrinker for VIRTIO_BALLOON_F_FREE_PAGE_HINT, because this has no such side effects. Always register the shrinker with VIRTIO_BALLOON_F_FREE_PAGE_HINT now. We are always allowed to reuse free pages that are still to be processed by the guest. The hypervisor takes care of identifying and resolving possible races between processing a hinting request and the guest reusing a page. In contrast to pre commit 71994620 ("virtio_balloon: replace oom notifier with shrinker"), don't add a moodule parameter to configure the number of pages to deflate on OOM. Can be re-added if really needed. Also, pay attention that leak_balloon() returns the number of 4k pages - convert it properly in virtio_balloon_oom_notify(). Note1: using the OOM handler is frowned upon, but it really is what we need for this feature. Note2: without VIRTIO_BALLOON_F_MUST_TELL_HOST (iow, always with QEMU) we could actually skip sending deflation requests to our hypervisor, making the OOM path *very* simple. Besically freeing pages and updating the balloon. If the communication with the host ever becomes a problem on this call path. [1] https://www.spinics.net/lists/linux-virtualization/msg40863.html Test report by Tyler Sanderson: Test setup: VM with 16 CPU, 64GB RAM. Running Debian 10. We have a 42 GB file full of random bytes that we continually cat to /dev/null. This fills the page cache as the file is read. Meanwhile we trigger the balloon to inflate, with a target size of 53 GB. This setup causes the balloon inflation to pressure the page cache as the page cache is also trying to grow. Afterwards we shrink the balloon back to zero (so total deflate = total inflate). Without patch (kernel 4.19.0-5): Inflation never reaches the target until we stop the "cat file > /dev/null" process. Total inflation time was 542 seconds. The longest period that made no net forward progress was 315 seconds (see attached graph). Result of "grep balloon /proc/vmstat" after the test: balloon_inflate 154828377 balloon_deflate 154828377 With patch (kernel 5.6.0-rc4+): Total inflation duration was 63 seconds. No deflate-queue activity occurs when pressuring the page-cache. Result of "grep balloon /proc/vmstat" after the test: balloon_inflate 12968539 balloon_deflate 12968539 Conclusion: This patch fixes the issue. In the test it reduced inflate/deflate activity by 12x, and reduced inflation time by 8.6x. But more importantly, if we hadn't killed the "grep balloon /proc/vmstat" process then, without the patch, the inflation process would never reach the target. Attached [1] is a png of a graph showing the problematic behavior without this patch. It shows deflate-queue activity increasing linearly while balloon size stays constant over the course of more than 8 minutes of the test. [1] https://lore.kernel.org/linux-mm/CAJuQAmphPcfew1v_EOgAdSFiprzjiZjmOf3iJDmFX0gD6b9TYQ@mail.gmail.com/2-without_patch.png Full test report and discussion [2]: [2] https://lore.kernel.org/r/CAJuQAmphPcfew1v_EOgAdSFiprzjiZjmOf3iJDmFX0gD6b9TYQ@mail.gmail.com Tested-by: Tyler Sanderson <tysand@google.com> Reported-by: Tyler Sanderson <tysand@google.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Wei Wang <wei.w.wang@intel.com> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Nadav Amit <namit@vmware.com> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20200205163402.42627-4-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Olga Kornievskaia authored
[ Upstream commit df513a77 ] Ever since commit 2c94b8ec ("SUNRPC: Use au_rslack when computing reply buffer size"). It changed how "req->rq_rcvsize" is calculated. It used to use au_cslack value which was nice and large and changed it to au_rslack value which turns out to be too small. Since 5.1, v3 mount with sec=krb5p fails against an Ontap server because client's receive buffer it too small. For gss krb5p, we need to account for the mic token in the verifier, and the wrap token in the wrap token. RFC 4121 defines: mic token Octet no Name Description -------------------------------------------------------------- 0..1 TOK_ID Identification field. Tokens emitted by GSS_GetMIC() contain the hex value 04 04 expressed in big-endian order in this field. 2 Flags Attributes field, as described in section 4.2.2. 3..7 Filler Contains five octets of hex value FF. 8..15 SND_SEQ Sequence number field in clear text, expressed in big-endian order. 16..last SGN_CKSUM Checksum of the "to-be-signed" data and octet 0..15, as described in section 4.2.4. that's 16bytes (GSS_KRB5_TOK_HDR_LEN) + chksum wrap token Octet no Name Description -------------------------------------------------------------- 0..1 TOK_ID Identification field. Tokens emitted by GSS_Wrap() contain the hex value 05 04 expressed in big-endian order in this field. 2 Flags Attributes field, as described in section 4.2.2. 3 Filler Contains the hex value FF. 4..5 EC Contains the "extra count" field, in big- endian order as described in section 4.2.3. 6..7 RRC Contains the "right rotation count" in big- endian order, as described in section 4.2.5. 8..15 SND_SEQ Sequence number field in clear text, expressed in big-endian order. 16..last Data Encrypted data for Wrap tokens with confidentiality, or plaintext data followed by the checksum for Wrap tokens without confidentiality, as described in section 4.2.4. Also 16bytes of header (GSS_KRB5_TOK_HDR_LEN), encrypted data, and cksum (other things like padding) RFC 3961 defines known cksum sizes: Checksum type sumtype checksum section or value size reference --------------------------------------------------------------------- CRC32 1 4 6.1.3 rsa-md4 2 16 6.1.2 rsa-md4-des 3 24 6.2.5 des-mac 4 16 6.2.7 des-mac-k 5 8 6.2.8 rsa-md4-des-k 6 16 6.2.6 rsa-md5 7 16 6.1.1 rsa-md5-des 8 24 6.2.4 rsa-md5-des3 9 24 ?? sha1 (unkeyed) 10 20 ?? hmac-sha1-des3-kd 12 20 6.3 hmac-sha1-des3 13 20 ?? sha1 (unkeyed) 14 20 ?? hmac-sha1-96-aes128 15 20 [KRB5-AES] hmac-sha1-96-aes256 16 20 [KRB5-AES] [reserved] 0x8003 ? [GSS-KRB5] Linux kernel now mainly supports type 15,16 so max cksum size is 20bytes. (GSS_KRB5_MAX_CKSUM_LEN) Re-use already existing define of GSS_KRB5_MAX_SLACK_NEEDED that's used for encoding the gss_wrap tokens (same tokens are used in reply). Fixes: 2c94b8ec ("SUNRPC: Use au_rslack when computing reply buffer size") Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
-