Hangs / RCU stalls on 6.6.52+
Since recent kernel updates, the phone randomly becomes completely unresponsive. Sometimes something like this gets printed on serial console (though often it's completely silent):
[ 212.950612] rcu: INFO: rcu_preempt self-detected stall on CPU
[ 212.956927] rcu: 0-...!: (5976 ticks this GP) idle=c60c/1/0x4000000000000004 softirq=12845/12846 fqs=817
[ 212.966646] rcu: (t=5250 jiffies g=18261 q=190 ncpus=4)
[ 212.972055] rcu: rcu_preempt kthread starved for 3494 jiffies! g18261 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
[ 212.982418] rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
[ 212.991630] rcu: RCU grace-period kthread stack dump:
[ 212.996770] task:rcu_preempt state:R running task stack:0 pid:17 ppid:2 flags:0x00000008
[ 213.006910] Call trace:
[ 213.009428] __switch_to+0x1d0/0x2e0
[ 213.013232] __schedule+0x87c/0x1b88
[ 213.016904] schedule+0x10c/0x278
[ 213.020315] schedule_timeout+0x128/0x400
[ 213.024444] rcu_gp_fqs_loop+0x1a0/0xc90
[ 213.028521] rcu_gp_kthread+0x438/0x5c8
[ 213.032465] kthread+0x2c0/0x350
[ 213.035828] ret_from_fork+0x10/0x20
[ 213.039527] rcu: Stack dump where RCU GP kthread last ran:
[ 213.045133] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.6.0-1-librem5 #1
[ 213.051944] Hardware name: Purism Librem 5r4 (DT)
[ 213.056730] pstate: 40000005 (nZcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 213.063792] pc : memset+0x54/0x78
[ 213.067212] lr : memset+0x4c/0x78
[ 213.070597] sp : ffff800080007700
[ 213.074014] x29: ffff800080007700 x28: ffff00000114dc80 x27: dfff800000000000
[ 213.081279] x26: ffff800080007a10 x25: ffff800080007908 x24: 0000000000000000
[ 213.088580] x23: ffff8000800078d0 x22: ffff8000835acc50 x21: 0000000000000000
[ 213.095835] x20: ffff00000114dca8 x19: ffff800080007c40 x18: 0000000000000000
[ 213.103092] x17: 1fffe00011d490d8 x16: 0000000000000001 x15: 1fffe00011d490ea
[ 213.110353] x14: 0000000000000007 x13: ffff8000835ad480 x12: ffff700010000f24
[ 213.117637] x11: 1ffff00010000f23 x10: ffff700010000f23 x9 : dfff800000000000
[ 213.124919] x8 : ffff800080007920 x7 : 0000000000000000 x6 : 000000000000000a
[ 213.132225] x5 : ffff8000800078d0 x4 : 0000000000000000 x3 : 0000000000000010
[ 213.139477] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff8000800078d0
[ 213.146785] Call trace:
[ 213.149304] memset+0x54/0x78
[ 213.152342] update_sd_lb_stats.constprop.0+0x204/0x2ca0
[ 213.157785] find_busiest_group+0xc4/0xbc8
[ 213.161999] load_balance+0x280/0x1ba0
[ 213.165849] rebalance_domains+0x474/0x928
[ 213.170047] _nohz_idle_balance.isra.0+0x3a8/0x5e8
[ 213.174943] run_rebalance_domains+0xd8/0x148
[ 213.179375] handle_softirqs+0x27c/0x9f8
[ 213.183437] __do_softirq+0x1c/0x28
[ 213.187019] ____do_softirq+0x18/0x30
[ 213.190754] call_on_irq_stack+0x24/0x58
[ 213.194774] do_softirq_own_stack+0x24/0x38
[ 213.199057] irq_exit_rcu+0x1a0/0x248
[ 213.202830] el1_interrupt+0x38/0x58
[ 213.206499] el1h_64_irq_handler+0x18/0x28
[ 213.210700] el1h_64_irq+0x64/0x68
[ 213.214183] arch_local_irq_enable+0x4/0x8
[ 213.218419] cpuidle_enter+0x60/0xb8
[ 213.222125] do_idle+0x338/0x480
[ 213.225438] cpu_startup_entry+0x64/0x80
[ 213.229494] rest_init+0x168/0x190
[ 213.233009] arch_call_rest_init+0x1c/0x28
[ 213.237245] start_kernel+0x2d8/0x380
[ 213.240983] __primary_switched+0xc0/0xd0
[ 213.245152] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.6.0-1-librem5 #1
[ 213.251920] Hardware name: Purism Librem 5r4 (DT)
[ 213.256672] pstate: 40000005 (nZcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 213.263726] pc : memset+0x54/0x78
[ 213.267141] lr : memset+0x4c/0x78
[ 213.270556] sp : ffff800080007700
[ 213.273915] x29: ffff800080007700 x28: ffff00000114dc80 x27: dfff800000000000
[ 213.281199] x26: ffff800080007a10 x25: ffff800080007908 x24: 0000000000000000
[ 213.288481] x23: ffff8000800078d0 x22: ffff8000835acc50 x21: 0000000000000000
[ 213.295764] x20: ffff00000114dca8 x19: ffff800080007c40 x18: 0000000000000000
[ 213.303048] x17: 1fffe00011d490d8 x16: 0000000000000001 x15: 1fffe00011d490ea
[ 213.310337] x14: 0000000000000007 x13: ffff8000835ad480 x12: ffff700010000f24
[ 213.317618] x11: 1ffff00010000f23 x10: ffff700010000f23 x9 : dfff800000000000
[ 213.324930] x8 : ffff800080007920 x7 : 0000000000000000 x6 : 000000000000000a
[ 213.332180] x5 : ffff8000800078d0 x4 : 0000000000000000 x3 : 0000000000000010
[ 213.339459] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff8000800078d0
[ 213.346739] Call trace:
[ 213.349254] memset+0x54/0x78
[ 213.352350] update_sd_lb_stats.constprop.0+0x204/0x2ca0
[ 213.357742] find_busiest_group+0xc4/0xbc8
[ 213.361966] load_balance+0x280/0x1ba0
[ 213.365816] rebalance_domains+0x474/0x928
[ 213.369986] _nohz_idle_balance.isra.0+0x3a8/0x5e8
[ 213.374881] run_rebalance_domains+0xd8/0x148
[ 213.379339] handle_softirqs+0x27c/0x9f8
[ 213.383362] __do_softirq+0x1c/0x28
[ 213.386932] ____do_softirq+0x18/0x30
[ 213.390684] call_on_irq_stack+0x24/0x58
[ 213.394680] do_softirq_own_stack+0x24/0x38
[ 213.398988] irq_exit_rcu+0x1a0/0x248
[ 213.402719] el1_interrupt+0x38/0x58
[ 213.406392] el1h_64_irq_handler+0x18/0x28
[ 213.410592] el1h_64_irq+0x64/0x68
[ 213.414066] arch_local_irq_enable+0x4/0x8
[ 213.418271] cpuidle_enter+0x60/0xb8
[ 213.421979] do_idle+0x338/0x480
[ 213.425285] cpu_startup_entry+0x64/0x80
[ 213.429342] rest_init+0x168/0x190
[ 213.432823] arch_call_rest_init+0x1c/0x28
[ 213.437032] start_kernel+0x2d8/0x380
[ 213.440796] __primary_switched+0xc0/0xd0
[ 315.034594] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 315.041216] rcu: 0-....: (3 ticks this GP) idle=116c/1/0x4000000000000004 softirq=13037/13037 fqs=1082
[ 315.050775] rcu: (detected by 2, t=5253 jiffies, g=18573, q=247 ncpus=4)
[ 315.057677] Task dump for CPU 0:
[ 315.060993] task:swapper/0 state:R running task stack:0 pid:0 ppid:0 flags:0x0000000a
[ 315.071104] Call trace:
[ 315.073635] __switch_to+0x1d0/0x2e0
[ 315.077444] 0xdfff800000000000
Note that I've noticed this after updating from 6.6.40, skipping some kernel versions in between now verified that I experience these hangs on 6.6.52 onwards.