Due to an influx of spam, we have had to impose restrictions on new accounts. Please see this page for instructions on how to get full permissions. Sorry for the inconvenience.
I noticed that cpufreq almost never gets the cpu into lower frequency states (even when idle). Do you consider this related or should we use a different issue.
The solution (workaround) here will involve Linux and ATF. To update here what I plan to do (sometime until the 25th of june hopefully): Test the latest upstream RFC patches that are being discussed. Prepared the forks/branches:
Also, I created the Librem5/trusted-firmware-a repo for us to use and put possible out-of-tree changes in (librem5 branch).
Up until now it's not clear that cpuidle on imx8mq will be supported by mainline linux. That shouldn't be a big deal, at least as long as it's actively discussed ( https://lists.infradead.org/pipermail/linux-arm-kernel/2019-June/658494.html ). Worst case will be that any serious imx8mq user will need an out-of-tree change anyways.
Worst case will be that any serious imx8mq user will need an out-of-tree change anyways.
Which would be pretty bad since lots of people expect to run mainline. We should join the upstream discussion once we know if there's a reliable way to wake up (which doesn't seem to be clear yet).
I'm happy for hints for how to better test here! I look at interrupts and what powertop says, but powertop doesn't really work for arm and I'm not even sure if we are supposed to see real "CPU wake" interrupts with Abel's workarounds.
So there's one solution posted by Abel from March, and one from June. The older one I have "tested" too, but that's not pushed into my trees. But what I notice is the same for both versions:
of course, "powertop" then has a cpuidle driver and tries to tell "WFI" stats: cpu-sleep: 0%, WFI: 99+%. But again, I'm not sure if that is accurate... temperature recording don't really show a difference.
the heating-curve of the workaround:
looks exactly the same as without it:
(30s units on the x axis, different amount of samples, the details don't matter)
So to the code path: arch/arm64/kernel/process.c's cpu_do_idle() is called constantly:
With the workaround ca. once every 1ms
Without it ca. every 100us
And we use __cpu_do_idle() which goes straight to the "wait for interrupt" assembler instruction.
Without the workaround, is the WFI call simply not implemented and just returns?
So while we don't see any difference, and even more interrupts (arch_timer, see above), can it be that this workaround actually does what it should do but we simply constantly interrupt all cpus (due to other bugs or missing things)?
Appearently, Anson is still working on this, and TIMER_IMX_SYS_CTR needs to be configured-in see https://lkml.org/lkml/2019/6/21/55 plus the necessary support added, see https://lkml.org/lkml/2019/6/21/103 But simply grabbing these, I don't have a cooler SoC yet, but there might be (a clock definition?) missing still...
We won't see a difference in the heating-curve (according to Abel, see https://lkml.org/lkml/2019/6/28/237 ). What I tried to do now (don't laugh :) this should be more of a side-note): is to use powertop on my Laptop, with the devkit connected, and compare power-consumption -.-:
I do see a slight difference (but really, these are still not hard numbers you can read anything into): with the workaround I increase ca. 4,4 W. Without it (using our Librem5/linux-next, consumption increases ca. 5,1 W. Although I did a few iterations, there were iterations in when I also saw ca. 5 W increase with the workaround. But on average, it seems to consume less power, if I can trust my battery and all.
I asked Abel about "arch_timer" and showed him our idlestat numbers; I hope he can quickly confirm that this makes sense to him.
but "cpu-sleep" is never used, see idlestat output above. Actually, it is, but only during startup, about up until DCSS/DRM loading:
[ 0.922674] dcss-core 32e00000.dcss: dcss_submodules_init 178.: ret: 0 [ 0.923621] cpu_pm_enter [ 0.923638] cpu_pm_enter [ 0.923643] cpu_pm_enter return val: 0 [ 0.923646] arm_cpuidle_suspend index 1 cpu 1 [ 0.929480] cpu_pm_enter return val: 0 [ 0.929483] arm_cpuidle_suspend index 1 cpu 2 [ 0.929531] Registering platform device 'imx-dcss-crtc.0'. Parent at 32e000000.dcss [ 0.929535] device: 'imx-dcss-crtc.0': device_add [ 0.929546] bus: 'platform': add device imx-dcss-crtc.0 [ 0.929573] PM: Adding info for platform:imx-dcss-crtc.0
The irq-imx-gpcv2 driver is loaded correctly.
(only until ca. that point during startup) arch/arm64/kernel/cpuidle.c's
arm_cpuidle_suspend() is called via kernel/cpu_pm.c's cpu_pm_enter(). Never again afterwards.
My guess right now would be that we mess things up somewhere in our out-of-tree (drm) changes.
given that we don't even have busfreq and the dcss driver will look totally different in mainline anyway (it's basically the last part of the display stack we copied mostly verbatim from nxp) it'd say it's o.k. to disable.
Which one should I build against in order to merge this?
Roughly, this will make the devkit (cat /sys/class/thermal/thermal_zone*/temp) heat up to ca. 66 C without a display connected, zero load (instead of the 75 in the diagrams).
with the display connected, after using some apps (and having 85 C) it cools down to ca. 72 C here. It all depends though.
my first test-run failed at 77 degrees for whatever reason :(
TODO
more testing! (this is actually not yet really tested!)
try to port this over for using NXP ATF regardless of what NXP says?
identify the Linux patches from devfreq that make us use NXP ATF now
identify the ATF patches (from NXP) that would possibly be needed to be ported over to mainline ATF, for devfreq: pasting what Angus followed up on this:
There are 3 things the ATF busfreq implements:
DDR training based on tables passed in by u-boot that are dependent on the SOC rev.
Store the DDR training tables ( not sure where these get stored but I assume OCRAM )
A method for the kernel ( or u-boot ) to change the DDR frequency.
tested and seems to work fine. let's close this issue when we switch back to mainline ATF for our kernels. That is now blocked by devfreq: https://source.puri.sm/Librem5/linux-next/issues/17 so here there's currently nothing to do.
booting from emmc doesn't work when having ARM_PSCI_CPUIDLE enabled:
[ 1.638207] imx-cpufreq-dt imx-cpufreq-dt: cpu speed grade 3 mkt segment 0 supported-hw 0x8 0x1[ 1.683487] mmc1: SDHCI controller on 30b50000.mmc [30b50000.mmc] using ADMA[ 1.695528] input: gpio-keys as /devices/platform/gpio-keys/input/input0[ 1.708037] input: bd718xx-pwrkey as /devices/platform/soc@0/soc@0:bus@30800000/30a20000.i2c/i2c-0/0-004b/gpio-keys.0.auto/input/input1[ 1.721939] snvs_rtc 30370000.snvs:snvs-rtc-lp: setting system clock to 1970-01-01T00:00:00 UTC (0)[ 1.723543] mmc1: new high speed SDIO card at address fffd
but the psci checker seems to be ok:
[ 1.717281] imx-cpufreq-dt imx-cpufreq-dt: cpu speed grade 3 mkt segment 0 supported-hw 0x8 0x1[ 1.763172] mmc1: SDHCI controller on 30b50000.mmc [30b50000.mmc] using ADMA[ 1.775368] input: gpio-keys as /devices/platform/gpio-keys/input/input1[ 1.784397] input: bd718xx-pwrkey as /devices/platform/soc@0/soc@0:bus@30800000/30a20000.i2c/i2c-0/0-004b/gpio-keys.0.auto/input/input2[ 1.798160] snvs_rtc 30370000.snvs:snvs-rtc-lp: setting system clock to 1970-01-01T00:00:00 UTC (0)[ 1.807668] psci_checker: PSCI checker started using 4 CPUs[ 1.813500] psci_checker: Starting hotplug tests[ 1.818351] psci_checker: Trying to turn off and on again all CPUs[ 1.826388] IRQ 6: no longer affine to CPU0[ 1.826805] CPU0: shutdown[ 1.834060] psci: CPU0 killed.[ 1.840096] CPU1: shutdown[ 1.842938] psci: CPU1 killed.[ 1.848633] CPU2: shutdown[ 1.851500] psci: CPU2 killed.[ 1.856376] Detected VIPT I-cache on CPU0[ 1.856407] GICv3: CPU0: found redistributor 0 region 0:0x0000000038880000[ 1.856459] CPU0: Booted secondary processor 0x0000000000 [0x410fd034][ 1.862897] mmc1: new high speed SDIO card at address fffd[ 1.882136] Detected VIPT I-cache on CPU1[ 1.882155] GICv3: CPU1: found redistributor 1 region 0:0x00000000388a0000[ 1.882186] CPU1: Booted secondary processor 0x0000000001 [0x410fd034][ 1.902604] Detected VIPT I-cache on CPU2[ 1.902624] GICv3: CPU2: found redistributor 2 region 0:0x00000000388c0000[ 1.902653] CPU2: Booted secondary processor 0x0000000002 [0x410fd034][ 1.921604] psci_checker: Trying to turn off and on again group 0 (CPUs 0-3)[ 1.930565] IRQ 6: no longer affine to CPU0[ 1.930691] CPU0: shutdown[ 1.937961] psci: CPU0 killed.[ 1.942402] IRQ 6: no longer affine to CPU1[ 1.942518] CPU1: shutdown[ 1.949759] psci: CPU1 killed.[ 1.954370] CPU2: shutdown[ 1.957249] psci: CPU2 killed.[ 1.961582] Detected VIPT I-cache on CPU0[ 1.961600] GICv3: CPU0: found redistributor 0 region 0:0x0000000038880000[ 1.961632] CPU0: Booted secondary processor 0x0000000000 [0x410fd034][ 1.981892] Detected VIPT I-cache on CPU1[ 1.981910] GICv3: CPU1: found redistributor 1 region 0:0x00000000388a0000[ 1.981941] CPU1: Booted secondary processor 0x0000000001 [0x410fd034][ 2.002301] Detected VIPT I-cache on CPU2[ 2.002319] GICv3: CPU2: found redistributor 2 region 0:0x00000000388c0000[ 2.002348] CPU2: Booted secondary processor 0x0000000002 [0x410fd034][ 2.021288] psci_checker: Hotplug tests passed OK[ 2.026241] psci_checker: Starting suspend tests (10 cycles per state)[ 2.033683] psci_checker: CPU 1 entering suspend cycles, states 1 through 1[ 2.033685] psci_checker: CPU 3 entering suspend cycles, states 1 through 1[ 2.033687] psci_checker: CPU 0 entering suspend cycles, states 1 through 1[ 2.033689] psci_checker: CPU 2 entering suspend cycles, states 1 through 1[ 2.091607] psci_checker: CPU 0 suspend test results: success 10, shallow states 0, errors 0[ 2.100497] psci_checker: CPU 1 suspend test results: success 10, shallow states 0, errors 0[ 2.109361] psci_checker: CPU 2 suspend test results: success 10, shallow states 0, errors 0[ 2.118227] psci_checker: CPU 3 suspend test results: success 10, shallow states 0, errors 0[ 2.127106] psci_checker: Suspend tests passed OK[ 2.132030] psci_checker: PSCI checker completed
issue resolved. again: let's close this issue when we switch back to mainline ATF for our kernels. That is now blocked by devfreq: https://source.puri.sm/Librem5/linux-next/issues/17 so here there's currently nothing to do.