Skip to content
  • Stephane Eranian's avatar
    perf/x86: Add Intel RAPL PMU support · 4788e5b4
    Stephane Eranian authored
    
    
    This patch adds a new uncore PMU to expose the Intel
    RAPL energy consumption counters. Up to 3 counters,
    each counting a particular RAPL event are exposed.
    
    The RAPL counters are available on Intel SandyBridge,
    IvyBridge, Haswell. The server skus add a 3rd counter.
    
    The following events are available and exposed in sysfs:
    
      - power/energy-cores: power consumption of all cores on socket
      - power/energy-pkg: power consumption of all cores + LLc cache
      - power/energy-dram: power consumption of DRAM (servers only)
    
    For each event both the unit (Joules) and scale (2^-32 J)
    is exposed in sysfs for use by perf stat and other tools.
    The files are:
    
    	/sys/devices/power/events/energy-*.unit
    	/sys/devices/power/events/energy-*.scale
    
    The RAPL PMU is uncore by nature and is implemented such
    that it only works in system-wide mode. Measuring only
    one CPU per socket is sufficient. The /sys/devices/power/cpumask
    file can be used by tools to figure out which CPUs to monitor
    by default. For instance, on a 2-socket system, 2 CPUs
    (one on each socket) will be shown.
    
    All the counters measure in the same unit (exposed via sysfs).
    The perf_events API exposes all RAPL counters as 64-bit integers
    counting in unit of 1/2^32 Joules (about 0.23 nJ). User level tools
    must convert the counts by multiplying them by 2^-32 to obtain
    Joules. The reason for this is that the kernel avoids
    doing floating point math whenever possible because it is
    expensive (user floating-point state must be saved). The method
    used avoids kernel floating-point usage. There is no loss of
    precision. Thanks to PeterZ for suggesting this approach.
    
    To convert the raw count in Watt:
       W = C * 2.3 / (1e10 * time)
    or ldexp(C, -32).
    
    RAPL PMU is a new standalone PMU which registers with the
    perf_event core subsystem. The PMU type (attr->type) is
    dynamically allocated and is available from /sys/device/power/type.
    
    Sampling is not supported by the RAPL PMU. There is no
    privilege level filtering either.
    
    Signed-off-by: default avatarStephane Eranian <eranian@google.com>
    Reviewed-by: default avatarMaria Dimakopoulou <maria.n.dimakopoulou@gmail.com>
    Reviewed-by: default avatarAndi Kleen <ak@linux.intel.com>
    Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
    Cc: acme@redhat.com
    Cc: jolsa@redhat.com
    Cc: zheng.z.yan@intel.com
    Cc: bp@alien8.de
    Link: http://lkml.kernel.org/r/1384275531-10892-4-git-send-email-eranian@google.com
    
    
    Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
    4788e5b4