- Dec 11, 2020
-
-
Roman Gushchin authored
commit becaba65 upstream. Commit 10befea9 ("mm: memcg/slab: use a single set of kmem_caches for all allocations") introduced a regression into the handling of the obj_cgroup_charge() return value. If a non-zero value is returned (indicating of exceeding one of memory.max limits), the allocation should fail, instead of falling back to non-accounted mode. To make the code more readable, move memcg_slab_pre_alloc_hook() and memcg_slab_post_alloc_hook() calling conditions into bodies of these hooks. Fixes: 10befea9 ("mm: memcg/slab: use a single set of kmem_caches for all allocations") Signed-off-by:
Roman Gushchin <guro@fb.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Reviewed-by:
Shakeel Butt <shakeelb@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20201127161828.GD840171@carbon.dhcp.thefacebook.com Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Suravee Suthikulpanit authored
commit 4165bf01 upstream. According to the AMD IOMMU spec, the commit 73db2fc5 ("iommu/amd: Increase interrupt remapping table limit to 512 entries") also requires the interrupt table length (IntTabLen) to be set to 9 (power of 2) in the device table mapping entry (DTE). Fixes: 73db2fc5 ("iommu/amd: Increase interrupt remapping table limit to 512 entries") Reported-by:
Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by:
Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Reviewed-by:
Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20201207091920.3052-1-suravee.suthikulpanit@amd.com Signed-off-by:
Will Deacon <will@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alex Deucher authored
This patch should not have been applied to stable. It depends on changes in newer drivers. This reverts commit 756fec06. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1402 Signed-off-by:
Alex Deucher <alexander.deucher@amd.com> Cc: Sasha Levin <sashal@kernel.org> Cc: stable@vger.kernel.org Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Mike Kravetz authored
commit 7a5bde37 upstream. Adrian Moreno was ruuning a kubernetes 1.19 + containerd/docker workload using hugetlbfs. In this environment the issue is reproduced by: - Start a simple pod that uses the recently added HugePages medium feature (pod yaml attached) - Start a DPDK app. It doesn't need to run successfully (as in transfer packets) nor interact with real hardware. It seems just initializing the EAL layer (which handles hugepage reservation and locking) is enough to trigger the issue - Delete the Pod (or let it "Complete"). This would result in a kworker thread going into a tight loop (top output): 1425 root 20 0 0 0 0 R 99.7 0.0 5:22.45 kworker/28:7+cgroup_destroy 'perf top -g' reports: - 63.28% 0.01% [kernel] [k] worker_thread - 49.97% worker_thread - 52.64% process_one_work - 62.08% css_killed_work_fn - hugetlb_cgroup_css_offline 41.52% _raw_spin_lock - 2.82% _cond_resched rcu_all_qs 2.66% PageHuge - 0.57% schedule - 0.57% __schedule We are spinning in the do-while loop in hugetlb_cgroup_css_offline. Worse yet, we are holding the master cgroup lock (cgroup_mutex) while infinitely spinning. Little else can be done on the system as the cgroup_mutex can not be acquired. Do note that the issue can be reproduced by simply offlining a hugetlb cgroup containing pages with reservation counts. The loop in hugetlb_cgroup_css_offline is moving page counts from the cgroup being offlined to the parent cgroup. This is done for each hstate, and is repeated until hugetlb_cgroup_have_usage returns false. The routine moving counts (hugetlb_cgroup_move_parent) is only moving 'usage' counts. The routine hugetlb_cgroup_have_usage is checking for both 'usage' and 'reservation' counts. Discussion about what to do with reservation counts when reparenting was discussed here: https://lore.kernel.org/linux-kselftest/CAHS8izMFAYTgxym-Hzb_JmkTK1N_S9tGN71uS6MFV+R7swYu5A@mail.gmail.com/ The decision was made to leave a zombie cgroup for with reservation counts. Unfortunately, the code checking reservation counts was incorrectly added to hugetlb_cgroup_have_usage. To fix the issue, simply remove the check for reservation counts. While fixing this issue, a related bug in hugetlb_cgroup_css_offline was noticed. The hstate index is not reinitialized each time through the do-while loop. Fix this as well. Fixes: 1adc4d41 ("hugetlb_cgroup: add interface for charge/uncharge hugetlb reservations") Reported-by:
Adrian Moreno <amorenoz@redhat.com> Signed-off-by:
Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Tested-by:
Adrian Moreno <amorenoz@redhat.com> Reviewed-by:
Shakeel Butt <shakeelb@google.com> Cc: Mina Almasry <almasrymina@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Greg Thelen <gthelen@google.com> Cc: Sandipan Das <sandipan@linux.ibm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20201203220242.158165-1-mike.kravetz@oracle.com Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Qian Cai authored
commit b11a76b3 upstream. We can't call kvfree() with a spin lock held, so defer it. Fixes a might_sleep() runtime warning. Fixes: 873d7bcf ("mm/swapfile.c: use kvzalloc for swap_info_struct allocation") Signed-off-by:
Qian Cai <qcai@redhat.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Reviewed-by:
Andrew Morton <akpm@linux-foundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20201202151549.10350-1-qcai@redhat.com Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Yang Shi authored
commit 8199be00 upstream. When investigating a slab cache bloat problem, significant amount of negative dentry cache was seen, but confusingly they neither got shrunk by reclaimer (the host has very tight memory) nor be shrunk by dropping cache. The vmcore shows there are over 14M negative dentry objects on lru, but tracing result shows they were even not scanned at all. Further investigation shows the memcg's vfs shrinker_map bit is not set. So the reclaimer or dropping cache just skip calling vfs shrinker. So we have to reboot the hosts to get the memory back. I didn't manage to come up with a reproducer in test environment, and the problem can't be reproduced after rebooting. But it seems there is race between shrinker map bit clear and reparenting by code inspection. The hypothesis is elaborated as below. The memcg hierarchy on our production environment looks like: root / \ system user The main workloads are running under user slice's children, and it creates and removes memcg frequently. So reparenting happens very often under user slice, but no task is under user slice directly. So with the frequent reparenting and tight memory pressure, the below hypothetical race condition may happen: CPU A CPU B reparent dst->nr_items == 0 shrinker: total_objects == 0 add src->nr_items to dst set_bit return SHRINK_EMPTY clear_bit child memcg offline replace child's kmemcg_id with parent's (in memcg_offline_kmem()) list_lru_del() between shrinker runs see parent's kmemcg_id dec dst->nr_items reparent again dst->nr_items may go negative due to concurrent list_lru_del() The second run of shrinker: read nr_items without any synchronization, so it may see intermediate negative nr_items then total_objects may return 0 coincidently keep the bit cleared dst->nr_items != 0 skip set_bit add scr->nr_item to dst After this point dst->nr_item may never go zero, so reparenting will not set shrinker_map bit anymore. And since there is no task under user slice directly, so no new object will be added to its lru to set the shrinker map bit either. That bit is kept cleared forever. How does list_lru_del() race with reparenting? It is because reparenting replaces children's kmemcg_id to parent's without protecting from nlru->lock, so list_lru_del() may see parent's kmemcg_id but actually deleting items from child's lru, but dec'ing parent's nr_items, so the parent's nr_items may go negative as commit 2788cf0c ("memcg: reparent list_lrus and free kmemcg_id on css offline") says. Since it is impossible that dst->nr_items goes negative and src->nr_items goes zero at the same time, so it seems we could set the shrinker map bit iff src->nr_items != 0. We could synchronize list_lru_count_one() and reparenting with nlru->lock, but it seems checking src->nr_items in reparenting is the simplest and avoids lock contention. Fixes: fae91d6d ("mm/list_lru.c: set bit in memcg shrinker bitmap on first list_lru item appearance") Suggested-by:
Roman Gushchin <guro@fb.com> Signed-off-by:
Yang Shi <shy828301@gmail.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Reviewed-by:
Roman Gushchin <guro@fb.com> Reviewed-by:
Shakeel Butt <shakeelb@google.com> Acked-by:
Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: <stable@vger.kernel.org> [4.19] Link: https://lkml.kernel.org/r/20201202171749.264354-1-shy828301@gmail.com Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Menglong Dong authored
commit 2bf509d9 upstream. 'format_corename()' will splite 'core_pattern' on spaces when it is in pipe mode, and take helper_argv[0] as the path to usermode executable. It works fine in most cases. However, if there is a space between '|' and '/file/path', such as '| /usr/lib/systemd/systemd-coredump %P %u %g', then helper_argv[0] will be parsed as '', and users will get a 'Core dump to | disabled'. It is not friendly to users, as the pattern above was valid previously. Fix this by ignoring the spaces between '|' and '/file/path'. Fixes: 315c6926 ("coredump: split pipe command whitespace before expanding template") Signed-off-by:
Menglong Dong <dong.menglong@zte.com.cn> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Cc: Paul Wise <pabs3@bonedaddy.net> Cc: Jakub Wilk <jwilk@jwilk.net> [https://bugs.debian.org/924398] Cc: Neil Horman <nhorman@tuxdriver.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/5fb62870.1c69fb81.8ef5d.af76@mx.google.com Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Masami Hiramatsu authored
commit 4e9a5ae8 upstream. Since insn.prefixes.nbytes can be bigger than the size of insn.prefixes.bytes[] when a prefix is repeated, the proper check must be insn.prefixes.bytes[i] != 0 and i < 4 instead of using insn.prefixes.nbytes. Introduce a for_each_insn_prefix() macro for this purpose. Debugged by Kees Cook <keescook@chromium.org>. [ bp: Massage commit message, sync with the respective header in tools/ and drop "we". ] Fixes: 2b144498 ("uprobes, mm, x86: Add the ability to install and remove uprobes breakpoints") Reported-by:
<syzbot+9b64b619f10f19d19a7c@syzkaller.appspotmail.com> Signed-off-by:
Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by:
Borislav Petkov <bp@suse.de> Reviewed-by:
Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/160697103739.3146288.7437620795200799020.stgit@devnote2 Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Mike Snitzer authored
commit bde3808b upstream. Fixes sparse warnings: drivers/md/dm.c:508:12: warning: context imbalance in 'dm_prepare_ioctl' - wrong count at exit drivers/md/dm.c:543:13: warning: context imbalance in 'dm_unprepare_ioctl' - wrong count at exit Fixes: 971888c4 ("dm: hold DM table for duration of ioctl rather than use blkdev_get") Cc: stable@vger.kernel.org Signed-off-by:
Mike Snitzer <snitzer@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Mike Snitzer authored
commit f05c4403 upstream. Remove redundant dm_put_live_table() in dm_dax_zero_page_range() error path to fix sparse warning: drivers/md/dm.c:1208:9: warning: context imbalance in 'dm_dax_zero_page_range' - unexpected unlock Fixes: cdf6cdcd ("dm,dax: Add dax zero_page_range operation") Cc: stable@vger.kernel.org Signed-off-by:
Mike Snitzer <snitzer@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sergei Shtepa authored
commit 89478335 upstream. The dm_get_live_table() function makes RCU read lock so dm_put_live_table() must be called even if dm_table map is not found. Fixes: e76239a3 ("block: add a report_zones method") Cc: stable@vger.kernel.org Signed-off-by:
Sergei Shtepa <sergei.shtepa@veeam.com> Signed-off-by:
Mike Snitzer <snitzer@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Laurent Vivier authored
commit 9ea69a55 upstream. With virtio multiqueue, normally each queue IRQ is mapped to a CPU. Commit 0d9f0a52 ("virtio_scsi: use virtio IRQ affinity") exposed an existing shortcoming of the arch code by moving virtio_scsi to the automatic IRQ affinity assignment. The affinity is correctly computed in msi_desc but this is not applied to the system IRQs. It appears the affinity is correctly passed to rtas_setup_msi_irqs() but lost at this point and never passed to irq_domain_alloc_descs() (see commit 06ee6d57 ("genirq: Add affinity hint to irq allocation")) because irq_create_mapping() doesn't take an affinity parameter. Use the new irq_create_mapping_affinity() function, which allows to forward the affinity setting from rtas_setup_msi_irqs() to irq_domain_alloc_descs(). With this change, the virtqueues are correctly dispatched between the CPUs on pseries. Fixes: e75eafb9 ("genirq/msi: Switch to new irq spreading infrastructure") Signed-off-by:
Laurent Vivier <lvivier@redhat.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Reviewed-by:
Greg Kurz <groug@kaod.org> Acked-by:
Michael Ellerman <mpe@ellerman.id.au> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20201126082852.1178497-3-lvivier@redhat.com Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Laurent Vivier authored
commit bb4c6910 upstream. There is currently no way to convey the affinity of an interrupt via irq_create_mapping(), which creates issues for devices that expect that affinity to be managed by the kernel. In order to sort this out, rename irq_create_mapping() to irq_create_mapping_affinity() with an additional affinity parameter that can be passed down to irq_domain_alloc_descs(). irq_create_mapping() is re-implemented as a wrapper around irq_create_mapping_affinity(). No functional change. Fixes: e75eafb9 ("genirq/msi: Switch to new irq spreading infrastructure") Signed-off-by:
Laurent Vivier <lvivier@redhat.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Reviewed-by:
Greg Kurz <groug@kaod.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20201126082852.1178497-2-lvivier@redhat.com Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Nicholas Piggin authored
commit a1ee2811 upstream. This can be hit by an HPT guest running on an HPT host and bring down the host, so it's quite important to fix. Fixes: 7290f3b3 ("powerpc/64s/powernv: machine check dump SLB contents") Cc: stable@vger.kernel.org # v5.4+ Signed-off-by:
Nicholas Piggin <npiggin@gmail.com> Acked-by:
Mahesh Salgaonkar <mahesh@linux.ibm.com> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201128070728.825934-2-npiggin@gmail.com Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Mikulas Patocka authored
commit 67aa3ec3 upstream. Advance the maximum number of arguments to 16. This fixes issue where certain operations, combined with table configured args, exceed 10 arguments. Signed-off-by:
Mikulas Patocka <mpatocka@redhat.com> Fixes: 48debafe ("dm: add writecache target") Cc: stable@vger.kernel.org # v4.18+ Signed-off-by:
Mike Snitzer <snitzer@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Mikulas Patocka authored
commit e5d41cbc upstream. When reporting the "max_age" value the number of arguments must advance by two. Signed-off-by:
Mikulas Patocka <mpatocka@redhat.com> Fixes: 3923d485 ("dm writecache: implement gradual cleanup") Cc: stable@vger.kernel.org # v5.7+ Signed-off-by:
Mike Snitzer <snitzer@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Pavel Begunkov authored
commit 2d280bc8 upstream. __io_compat_recvmsg_copy_hdr() with REQ_F_BUFFER_SELECT reads out iov len but never assigns it to iov/fast_iov, leaving sr->len with garbage. Hopefully, following io_buffer_select() truncates it to the selected buffer size, but the value is still may be under what was specified. Cc: <stable@vger.kernel.org> # 5.7 Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Suganath Prabu S authored
commit 42f68703 upstream. Commit c1a6c5ac ("scsi: mpt3sas: For NVME device, issue a protocol level reset") modified the ioctl path 'timeout' variable type to u8 from unsigned long, limiting the maximum timeout value that the driver can support to 255 seconds. If the management application is requesting a higher value the resulting timeout will be zero. The operation times out immediately and the ioctl request fails. Change datatype back to unsigned long. Link: https://lore.kernel.org/r/20201125094838.4340-1-suganath-prabu.subramani@broadcom.com Fixes: c1a6c5ac ("scsi: mpt3sas: For NVME device, issue a protocol level reset") Cc: <stable@vger.kernel.org> #v4.18+ Signed-off-by:
Suganath Prabu S <suganath-prabu.subramani@broadcom.com> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Greg Kurz authored
commit f54db39f upstream. Commit 062cfab7 ("KVM: PPC: Book3S HV: XIVE: Make VP block size configurable") updated kvmppc_xive_vcpu_id_valid() in a way that allows userspace to trigger an assertion in skiboot and crash the host: [ 696.186248988,3] XIVE[ IC 08 ] eq_blk != vp_blk (0 vs. 1) for target 0x4300008c/0 [ 696.186314757,0] Assert fail: hw/xive.c:2370:0 [ 696.186342458,0] Aborting! xive-kvCPU 0043 Backtrace: S: 0000000031e2b8f0 R: 0000000030013840 .backtrace+0x48 S: 0000000031e2b990 R: 000000003001b2d0 ._abort+0x4c S: 0000000031e2ba10 R: 000000003001b34c .assert_fail+0x34 S: 0000000031e2ba90 R: 0000000030058984 .xive_eq_for_target.part.20+0xb0 S: 0000000031e2bb40 R: 0000000030059fdc .xive_setup_silent_gather+0x2c S: 0000000031e2bc20 R: 000000003005a334 .opal_xive_set_vp_info+0x124 S: 0000000031e2bd20 R: 00000000300051a4 opal_entry+0x134 --- OPAL call token: 0x8a caller R1: 0xc000001f28563850 --- XIVE maintains the interrupt context state of non-dispatched vCPUs in an internal VP structure. We allocate a bunch of those on startup to accommodate all possible vCPUs. Each VP has an id, that we derive from the vCPU id for efficiency: static inline u32 kvmppc_xive_vp(struct kvmppc_xive *xive, u32 server) { return xive->vp_base + kvmppc_pack_vcpu_id(xive->kvm, server); } The KVM XIVE device used to allocate KVM_MAX_VCPUS VPs. This was limitting the number of concurrent VMs because the VP space is limited on the HW. Since most of the time, VMs run with a lot less vCPUs, commit 062cfab7 ("KVM: PPC: Book3S HV: XIVE: Make VP block size configurable") gave the possibility for userspace to tune the size of the VP block through the KVM_DEV_XIVE_NR_SERVERS attribute. The check in kvmppc_pack_vcpu_id() was changed from cpu < KVM_MAX_VCPUS * xive->kvm->arch.emul_smt_mode to cpu < xive->nr_servers * xive->kvm->arch.emul_smt_mode The previous check was based on the fact that the VP block had KVM_MAX_VCPUS entries and that kvmppc_pack_vcpu_id() guarantees that packed vCPU ids are below KVM_MAX_VCPUS. We've changed the size of the VP block, but kvmppc_pack_vcpu_id() has nothing to do with it and it certainly doesn't ensure that the packed vCPU ids are below xive->nr_servers. kvmppc_xive_vcpu_id_valid() might thus return true when the VM was configured with a non-standard VSMT mode, even if the packed vCPU id is higher than what we expect. We end up using an unallocated VP id, which confuses OPAL. The assert in OPAL is probably abusive and should be converted to a regular error that the kernel can handle, but we shouldn't really use broken VP ids in the first place. Fix kvmppc_xive_vcpu_id_valid() so that it checks the packed vCPU id is below xive->nr_servers, which is explicitly what we want. Fixes: 062cfab7 ("KVM: PPC: Book3S HV: XIVE: Make VP block size configurable") Cc: stable@vger.kernel.org # v5.5+ Signed-off-by:
Greg Kurz <groug@kaod.org> Reviewed-by:
Cédric Le Goater <clg@kaod.org> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/160673876747.695514.1809676603724514920.stgit@bahia.lan Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Chris Wilson authored
commit 777a7717 upstream. Ville noticed that the last mocs entry is used unconditionally by the HW when it performs cache evictions, and noted that while the value is not meant to be writable by the driver, we should program it to a reasonable value nevertheless. As it turns out, we can change the value of mocs:63 and the value we were programming into it would cause hard hangs in conjunction with atomic operations. v2: Add details from bspec about how it is used by HW Suggested-by:
Ville Syrjälä <ville.syrjala@linux.intel.com> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2707 Fixes: 3bbaba0c ("drm/i915: Added Programming of the MOCS") Signed-off-by:
Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Jason Ekstrand <jason@jlekstrand.net> Cc: <stable@vger.kernel.org> # v4.3+ Reviewed-by:
Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201126140841.1982-1-chris@chris-wilson.co.uk (cherry picked from commit 977933b5) Signed-off-by:
Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Chris Wilson authored
commit aff76ab7 upstream. We treat idling the GT (intel_rps_park) as a downclock event, and reduce the frequency we intend to restart the GT with. Since the two workloads are likely related (e.g. a compositor rendering every 16ms), we want to carry the frequency and load information from across the idling. However, we do also need to update the frequencies so that workloads that run for less than 1ms are autotuned by RPS (otherwise we leave compositors running at max clocks, draining excess power). Conversely, if we try to run too slowly, the next workload has to run longer. Since there is a hysteresis in the power graph, below a certain frequency running a short workload for longer consumes more energy than running it slightly higher for less time. The exact balance point is unknown beforehand, but measurements with 30fps media playback indicate that RPe is a better choice. Reported-by:
Edward Baker <edward.baker@intel.com> Tested-by:
Edward Baker <edward.baker@intel.com> Fixes: 043cd2d1 ("drm/i915/gt: Leave rps->cur_freq on unpark") Signed-off-by:
Chris Wilson <chris@chris-wilson.co.uk> Cc: Edward Baker <edward.baker@intel.com> Cc: Andi Shyti <andi.shyti@intel.com> Cc: Lyude Paul <lyude@redhat.com> Cc: <stable@vger.kernel.org> # v5.8+ Reviewed-by:
Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by:
Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201124183521.28623-1-chris@chris-wilson.co.uk (cherry picked from commit f7ed83cc) Signed-off-by:
Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Venkata Ramana Nayana authored
commit 78b2eb8a upstream. As we use a shmemfs file to hold the context state, when not in use it may be swapped out, such as across suspend. Since we wrote into the shmemfs without marking the pages as dirty, the contents may be dropped instead of being written back to swap. On re-using the shmemfs file, such as creating a new context after resume, the contents of that file were likely garbage and so the new context could then hang the GPU. Simply mark the page as being written when copying into the shmemfs file, and it the new contents will be retained across swapout. Fixes: be1cb55a ("drm/i915/gt: Keep a no-frills swappable copy of the default context state") Cc: Sudeep Dutt <sudeep.dutt@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Ramalingam C <ramalingam.c@intel.com> Signed-off-by:
CQ Tang <cq.tang@intel.com> Signed-off-by:
Venkata Ramana Nayana <venkata.ramana.nayana@intel.com> Reviewed-by:
Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by:
Chris Wilson <chris@chris-wilson.co.uk> Cc: <stable@vger.kernel.org> # v5.8+ Link: https://patchwork.freedesktop.org/patch/msgid/20201127120718.454037-161-matthew.auld@intel.com (cherry picked from commit a9d71f76) Signed-off-by:
Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Boyuan Zhang authored
commit efd6d85a upstream. Port from VCN2.5 SCRATCH2 is used to keep decode wptr as a workaround which fix a hardware DPG decode wptr update bug for vcn2.5 beforehand. Signed-off-by:
Boyuan Zhang <boyuan.zhang@amd.com> Reviewed-by:
James Zhu <James.Zhu@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 5.9.x Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Boyuan Zhang authored
commit ac2db948 upstream. Port from VCN2.5 Add vcn dpg harware synchronization to fix race condition issue between vcn driver and hardware. Signed-off-by:
Boyuan Zhang <boyuan.zhang@amd.com> Reviewed-by:
James Zhu <James.Zhu@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 5.9.x Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Tomi Valkeinen authored
commit fd4e788e upstream. When the SDI output was converted to DRM bridge, the atomic versions of enable and disable funcs were used. This was not intended, as that would require implementing other atomic funcs too. This leads to: WARNING: CPU: 0 PID: 18 at drivers/gpu/drm/drm_bridge.c:708 drm_atomic_helper_commit_modeset_enables+0x134/0x268 and display not working. Fix this by using the legacy enable/disable funcs. Fixes: 8bef8a6d ("drm/omap: sdi: Register a drm_bridge") Reported-by:
Aaro Koskinen <aaro.koskinen@iki.fi> Signed-off-by:
Tomi Valkeinen <tomi.valkeinen@ti.com> Tested-by:
Ivaylo Dimitrov <ivo.g.dimitrov.75@gmail.com> Tested-by:
Aaro Koskinen <aaro.koskinen@iki.fi> Reviewed-by:
Laurent Pinchart <laurent.pinchart@ideasonboard.com> Cc: stable@vger.kernel.org # v5.7+ Link: https://patchwork.freedesktop.org/patch/msgid/20201127085241.848461-1-tomi.valkeinen@ti.com Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Mika Westerberg authored
commit 600c0849 upstream. Paulian reported a crash that happens when a dock is unplugged during hibernation: [78436.228217] thunderbolt 0-1: device disconnected [78436.228365] BUG: kernel NULL pointer dereference, address: 00000000000001e0 ... [78436.228397] RIP: 0010:icm_free_unplugged_children+0x109/0x1a0 ... [78436.228432] Call Trace: [78436.228439] icm_rescan_work+0x24/0x30 [78436.228444] process_one_work+0x1a3/0x3a0 [78436.228449] worker_thread+0x30/0x370 [78436.228454] ? process_one_work+0x3a0/0x3a0 [78436.228457] kthread+0x13d/0x160 [78436.228461] ? kthread_park+0x90/0x90 [78436.228465] ret_from_fork+0x1f/0x30 This happens because remove_unplugged_switch() calls tb_switch_remove() that releases the memory pointed by sw so the following lines reference to a memory that might be released already. Fix this by saving pointer to the parent device before calling tb_switch_remove(). Reported-by:
Paulian Bogdan Marinca <paulian@marinca.net> Fixes: 4f7c2e0d ("thunderbolt: Make sure device runtime resume completes before taking domain lock") Cc: stable@vger.kernel.org Signed-off-by:
Mika Westerberg <mika.westerberg@linux.intel.com> Reviewed-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Steven Rostedt (VMware) authored
commit bcee5278 upstream. When the instances were able to use their own options, the userstacktrace option was left hardcoded for the top level. This made the instance userstacktrace option bascially into a nop, and will confuse users that set it, but nothing happens (I was confused when it happened to me!) Cc: stable@vger.kernel.org Fixes: 16270145 ("tracing: Add trace options for core options to instances") Signed-off-by:
Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Christian Eggers authored
commit 61e6fe59 upstream. If arbitration is lost, the master automatically changes to slave mode. I2SR_IBB may or may not be reset by hardware. Raising a STOP condition by resetting I2CR_MSTA has no effect and will not clear I2SR_IBB. So calling i2c_imx_bus_busy() is not required and would busy-wait until timeout. Signed-off-by:
Christian Eggers <ceggers@arri.de> Tested (not extensively) on Vybrid VF500 (Toradex VF50): Tested-by:
Krzysztof Kozlowski <krzk@kernel.org> Acked-by:
Oleksij Rempel <o.rempel@pengutronix.de> Cc: stable@vger.kernel.org # Requires trivial backporting, simple remove # the 3rd argument from the calls to # i2c_imx_bus_busy(). Signed-off-by:
Wolfram Sang <wsa@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Christian Eggers authored
commit 1de67a3d upstream. Arbitration Lost (IAL) can happen after every single byte transfer. If arbitration is lost, the I2C hardware will autonomously switch from master mode to slave. If a transfer is not aborted in this state, consecutive transfers will not be executed by the hardware and will timeout. Signed-off-by:
Christian Eggers <ceggers@arri.de> Tested (not extensively) on Vybrid VF500 (Toradex VF50): Tested-by:
Krzysztof Kozlowski <krzk@kernel.org> Acked-by:
Oleksij Rempel <o.rempel@pengutronix.de> Cc: stable@vger.kernel.org Signed-off-by:
Wolfram Sang <wsa@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Christian Eggers authored
commit 384a9565 upstream. According to the "VFxxx Controller Reference Manual" (and the comment block starting at line 97), Vybrid requires writing a one for clearing an interrupt flag. Syncing the method for clearing I2SR_IIF in i2c_imx_isr(). Signed-off-by:
Christian Eggers <ceggers@arri.de> Fixes: 4b775022 ("i2c: imx: add struct to hold more configurable quirks") Reviewed-by:
Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Acked-by:
Oleksij Rempel <o.rempel@pengutronix.de> Cc: stable@vger.kernel.org Signed-off-by:
Wolfram Sang <wsa@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alexander Gordeev authored
commit a2bd4097 upstream. The directed MSIs are delivered to CPUs whose address is written to the MSI message address. The current code assumes that a CPU logical number (as it is seen by the kernel) is also the CPU address. The above assumption is not correct, as the CPU address is rather the value returned by STAP instruction. That value does not necessarily match the kernel logical CPU number. Fixes: e979ce7b ("s390/pci: provide support for CPU directed interrupts") Cc: <stable@vger.kernel.org> # v5.2+ Signed-off-by:
Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by:
Halil Pasic <pasic@linux.ibm.com> Reviewed-by:
Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by:
Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by:
Heiko Carstens <hca@linux.ibm.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Andreas Gruenbacher authored
commit dd0ecf54 upstream. In gfs2_create_inode and gfs2_inode_lookup, make sure to cancel any pending delete work before taking the inode glock. Otherwise, gfs2_cancel_delete_work may block waiting for delete_work_func to complete, and delete_work_func may block trying to acquire the inode glock in gfs2_inode_lookup. Reported-by:
Alexander Aring <aahringo@redhat.com> Fixes: a0e3cc65 ("gfs2: Turn gl_delete into a delayed work") Cc: stable@vger.kernel.org # v5.8+ Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Andreas Gruenbacher authored
commit 82e938bd upstream. Commit 20f82999 ("gfs2: Rework read and page fault locking") lifted the glock lock taking from the low-level ->readpage and ->readahead address space operations to the higher-level ->read_iter file and ->fault vm operations. The glocks are still taken in LM_ST_SHARED mode only. On filesystems mounted without the noatime option, ->read_iter sometimes needs to update the atime as well, though. Right now, this leads to a failed locking mode assertion in gfs2_dirty_inode. Fix that by introducing a new update_time inode operation. There, if the glock is held non-exclusively, upgrade it to an exclusive lock. Reported-by:
Alexander Aring <aahringo@redhat.com> Fixes: 20f82999 ("gfs2: Rework read and page fault locking") Cc: stable@vger.kernel.org # v5.8+ Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Aurelien Aptel authored
commit 59463eb8 upstream. In some scenarios (DFS and BAD_NETWORK_NAME) set_root_set() can be called with a NULL ses->tcon_ipc. Signed-off-by:
Aurelien Aptel <aaptel@suse.com> Reviewed-by:
Paulo Alcantara (SUSE) <pc@cjr.nz> CC: Stable <stable@vger.kernel.org> Signed-off-by:
Steve French <stfrench@microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ronnie Sahlberg authored
commit ea64370b upstream. When mounting with "idsfromsid" mount option, Azure corrupted the owner SIDs due to excessive padding caused by placing the owner fields at the end of the security descriptor on create. Placing owners at the front of the security descriptor (rather than the end) is also safer, as the number of ACEs (that follow it) are variable. Signed-off-by:
Ronnie Sahlberg <lsahlber@redhat.com> Suggested-by:
Rohith Surabattula <rohiths@microsoft.com> CC: Stable <stable@vger.kernel.org> # v5.8 Signed-off-by:
Steve French <stfrench@microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Paulo Alcantara authored
commit 21225336 upstream. This patch fixes a potential use-after-free bug in cifs_echo_request(). For instance, thread 1 -------- cifs_demultiplex_thread() clean_demultiplex_info() kfree(server) thread 2 (workqueue) -------- apic_timer_interrupt() smp_apic_timer_interrupt() irq_exit() __do_softirq() run_timer_softirq() call_timer_fn() cifs_echo_request() <- use-after-free in server ptr Signed-off-by:
Paulo Alcantara (SUSE) <pc@cjr.nz> CC: Stable <stable@vger.kernel.org> Reviewed-by:
Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by:
Steve French <stfrench@microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Paulo Alcantara authored
commit 6988a619 upstream. A customer has reported that several files in their multi-threaded app were left with size of 0 because most of the read(2) calls returned -EINTR and they assumed no bytes were read. Obviously, they could have fixed it by simply retrying on -EINTR. We noticed that most of the -EINTR on read(2) were due to real-time signals sent by glibc to process wide credential changes (SIGRT_1), and its signal handler had been established with SA_RESTART, in which case those calls could have been automatically restarted by the kernel. Let the kernel decide to whether or not restart the syscalls when there is a signal pending in __smb_send_rqst() by returning -ERESTARTSYS. If it can't, it will return -EINTR anyway. Signed-off-by:
Paulo Alcantara (SUSE) <pc@cjr.nz> CC: Stable <stable@vger.kernel.org> Reviewed-by:
Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by:
Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by:
Steve French <stfrench@microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Naveen N. Rao authored
commit 49a962c0 upstream. DYNAMIC_FTRACE_WITH_DIRECT_CALLS should depend on DYNAMIC_FTRACE_WITH_REGS since we need ftrace_regs_caller(). Link: https://lkml.kernel.org/r/fc4b257ea8689a36f086d2389a9ed989496ca63a.1606412433.git.naveen.n.rao@linux.vnet.ibm.com Cc: stable@vger.kernel.org Fixes: 763e34e7 ("ftrace: Add register_ftrace_direct()") Signed-off-by:
Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by:
Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Naveen N. Rao authored
commit 4c75b0ff upstream. On powerpc, kprobe-direct.tc triggered FTRACE_WARN_ON() in ftrace_get_addr_new() followed by the below message: Bad trampoline accounting at: 000000004222522f (wake_up_process+0xc/0x20) (f0000001) The set of steps leading to this involved: - modprobe ftrace-direct-too - enable_probe - modprobe ftrace-direct - rmmod ftrace-direct <-- trigger The problem turned out to be that we were not updating flags in the ftrace record properly. From the above message about the trampoline accounting being bad, it can be seen that the ftrace record still has FTRACE_FL_TRAMP set though ftrace-direct module is going away. This happens because we are checking if any ftrace_ops has the FTRACE_FL_TRAMP flag set _before_ updating the filter hash. The fix for this is to look for any _other_ ftrace_ops that also needs FTRACE_FL_TRAMP. Link: https://lkml.kernel.org/r/56c113aa9c3e10c19144a36d9684c7882bf09af5.1606412433.git.naveen.n.rao@linux.vnet.ibm.com Cc: stable@vger.kernel.org Fixes: a124692b ("ftrace: Enable trampoline when rec count returns back to one") Signed-off-by:
Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by:
Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Steven Rostedt (VMware) authored
commit 68e10d5f upstream. The current ring buffer logic checks to see if the updating of the event buffer was interrupted, and if it is, it will try to fix up the before stamp with the write stamp to make them equal again. This logic is flawed, because if it is not interrupted, the two are guaranteed to be different, as the current event just updated the before stamp before allocation. This guarantees that the next event (this one or another interrupting one) will think it interrupted the time updates of a previous event and inject an absolute time stamp to compensate. The correct logic is to always update the timestamps when traversing to a new sub buffer. Cc: stable@vger.kernel.org Fixes: a389d86f ("ring-buffer: Have nested events still record running time stamp") Signed-off-by:
Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-