1. 12 Dec, 2018 1 commit
    • Tycho Andersen's avatar
      seccomp: add a return code to trap to userspace · 6a21cc50
      Tycho Andersen authored
      This patch introduces a means for syscalls matched in seccomp to notify
      some other task that a particular filter has been triggered.
      The motivation for this is primarily for use with containers. For example,
      if a container does an init_module(), we obviously don't want to load this
      untrusted code, which may be compiled for the wrong version of the kernel
      anyway. Instead, we could parse the module image, figure out which module
      the container is trying to load and load it on the host.
      As another example, containers cannot mount() in general since various
      filesystems assume a trusted image. However, if an orchestrator knows that
      e.g. a particular block device has not been exposed to a container for
      writing, it want to allow the container to mount that block device (that
      is, handle the mount for it).
      This patch adds functionality that is already possible via at least two
      other means that I know about, both of which involve ptrace(): first, one
      could ptrace attach, and then iterate through syscalls via PTRACE_SYSCALL.
      Unfortunately this is slow, so a faster version would be to install a
      filter that does SECCOMP_RET_TRACE, which triggers a PTRACE_EVENT_SECCOMP.
      Since ptrace allows only one tracer, if the container runtime is that
      tracer, users inside the container (or outside) trying to debug it will not
      be able to use ptrace, which is annoying. It also means that older
      distributions based on Upstart cannot boot inside containers using ptrace,
      since upstart itself uses ptrace to monitor services while starting.
      The actual implementation of this is fairly small, although getting the
      synchronization right was/is slightly complex.
      Finally, it's worth noting that the classic seccomp TOCTOU of reading
      memory data from the task still applies here, but can be avoided with
      careful design of the userspace handler: if the userspace handler reads all
      of the task memory that is necessary before applying its security policy,
      the tracee's subsequent memory edits will not be read by the tracer.
      Signed-off-by: default avatarTycho Andersen <tycho@tycho.ws>
      CC: Kees Cook <keescook@chromium.org>
      CC: Andy Lutomirski <luto@amacapital.net>
      CC: Oleg Nesterov <oleg@redhat.com>
      CC: Eric W. Biederman <ebiederm@xmission.com>
      CC: "Serge E. Hallyn" <serge@hallyn.com>
      Acked-by: default avatarSerge Hallyn <serge@hallyn.com>
      CC: Christian Brauner <christian@brauner.io>
      CC: Tyler Hicks <tyhicks@canonical.com>
      CC: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
  2. 28 Nov, 2018 1 commit
    • Thomas Gleixner's avatar
      x86/speculation: Add prctl() control for indirect branch speculation · 9137bb27
      Thomas Gleixner authored
      PR_SET_SPECULATION_CTRL prctls to allow fine grained per task control of
      indirect branch speculation via STIBP and IBPB.
       Check indirect branch speculation status with
       Enable indirect branch speculation with
       Disable indirect branch speculation with
       Force disable indirect branch speculation with
      See Documentation/userspace-api/spec_ctrl.rst.
      Signed-off-by: default avatarTim Chen <tim.c.chen@linux.intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185005.866780996@linutronix.de
  3. 09 May, 2018 1 commit
  4. 08 May, 2018 1 commit
    • Tyler Hicks's avatar
      seccomp: Don't special case audited processes when logging · 326bee02
      Tyler Hicks authored
      Seccomp logging for "handled" actions such as RET_TRAP, RET_TRACE, or
      RET_ERRNO can be very noisy for processes that are being audited. This
      patch modifies the seccomp logging behavior to treat processes that are
      being inspected via the audit subsystem the same as processes that
      aren't under inspection. Handled actions will no longer be logged just
      because the process is being inspected. Since v4.14, applications have
      the ability to request logging of handled actions by using the
      SECCOMP_FILTER_FLAG_LOG flag when loading seccomp filters.
      With this patch, the logic for deciding if an action will be logged is:
        if action == RET_ALLOW:
          do not log
        else if action not in actions_logged:
          do not log
        else if action == RET_KILL:
        else if action == RET_LOG:
        else if filter-requests-logging:
          do not log
      Reported-by: default avatarSteve Grubb <sgrubb@redhat.com>
      Signed-off-by: default avatarTyler Hicks <tyhicks@canonical.com>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
  5. 04 May, 2018 1 commit
    • Thomas Gleixner's avatar
      prctl: Add force disable speculation · 356e4bff
      Thomas Gleixner authored
      For certain use cases it is desired to enforce mitigations so they cannot
      be undone afterwards. That's important for loader stubs which want to
      prevent a child from disabling the mitigation again. Will also be used for
      seccomp(). The extra state preserving of the prctl state for SSB is a
      preparatory step for EBPF dymanic speculation control.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
  6. 03 May, 2018 1 commit
    • Thomas Gleixner's avatar
      prctl: Add speculation control prctls · b617cfc8
      Thomas Gleixner authored
      Add two new prctls to control aspects of speculation related vulnerabilites
      and their mitigations to provide finer grained control over performance
      impacting mitigations.
      PR_GET_SPECULATION_CTRL returns the state of the speculation misfeature
      which is selected with arg2 of prctl(2). The return value uses bit 0-2 with
      the following meaning:
      Bit  Define           Description
      0    PR_SPEC_PRCTL    Mitigation can be controlled per task by
      1    PR_SPEC_ENABLE   The speculation feature is enabled, mitigation is
      2    PR_SPEC_DISABLE  The speculation feature is disabled, mitigation is
      If all bits are 0 the CPU is not affected by the speculation misfeature.
      If PR_SPEC_PRCTL is set, then the per task control of the mitigation is
      available. If not set, prctl(PR_SET_SPECULATION_CTRL) for the speculation
      misfeature will fail.
      PR_SET_SPECULATION_CTRL allows to control the speculation misfeature, which
      is selected by arg2 of prctl(2) per task. arg3 is used to hand in the
      control value, i.e. either PR_SPEC_ENABLE or PR_SPEC_DISABLE.
      The common return values are:
      EINVAL  prctl is not implemented by the architecture or the unused prctl()
              arguments are not 0
      ENODEV  arg2 is selecting a not supported speculation misfeature
      PR_SET_SPECULATION_CTRL has these additional return values:
      ERANGE  arg3 is incorrect, i.e. it's not either PR_SPEC_ENABLE or PR_SPEC_DISABLE
      ENXIO   prctl control of the selected speculation misfeature is disabled
      The first supported controlable speculation misfeature is
      PR_SPEC_STORE_BYPASS. Add the define so this can be shared between
      Based on an initial patch from Tim Chen and mostly rewritten.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarIngo Molnar <mingo@kernel.org>
      Reviewed-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
  7. 14 Aug, 2017 5 commits
    • Kees Cook's avatar
      seccomp: Implement SECCOMP_RET_KILL_PROCESS action · 0466bdb9
      Kees Cook authored
      Right now, SECCOMP_RET_KILL_THREAD (neé SECCOMP_RET_KILL) kills the
      current thread. There have been a few requests for this to kill the entire
      process (the thread group). This cannot be just changed (discovered when
      adding coredump support since coredumping kills the entire process)
      because there are userspace programs depending on the thread-kill
      Instead, implement SECCOMP_RET_KILL_PROCESS, which is 0x80000000, and can
      be processed as "-1" by the kernel, below the existing RET_KILL that is
      ABI-set to "0". For userspace, SECCOMP_RET_ACTION_FULL is added to expand
      the mask to the signed bit. Old userspace using the SECCOMP_RET_ACTION
      mask will see SECCOMP_RET_KILL_PROCESS as 0 still, but this would only
      be visible when examining the siginfo in a core dump from a RET_KILL_*,
      where it will think it was thread-killed instead of process-killed.
      Attempts to introduce this behavior via other ways (filter flags,
      seccomp struct flags, masked RET_DATA bits) all come with weird
      side-effects and baggage. This change preserves the central behavioral
      expectations of the seccomp filter engine without putting too great
      a burden on changes needed in userspace to use the new action.
      The new action is discoverable by userspace through either the new
      actions_avail sysctl or through the SECCOMP_GET_ACTION_AVAIL seccomp
      operation. If used without checking for availability, old kernels
      will treat RET_KILL_PROCESS as RET_KILL_THREAD (since the old mask
      will produce RET_KILL_THREAD).
      Cc: Paul Moore <paul@paul-moore.com>
      Cc: Fabricio Voznika <fvoznika@google.com>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
    • Kees Cook's avatar
      seccomp: Rename SECCOMP_RET_KILL to SECCOMP_RET_KILL_THREAD · fd76875c
      Kees Cook authored
      In preparation for adding SECCOMP_RET_KILL_PROCESS, rename SECCOMP_RET_KILL
      to the more accurate SECCOMP_RET_KILL_THREAD.
      The existing selftest values are intentionally left as SECCOMP_RET_KILL
      just to be sure we're exercising the alias.
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
    • Tyler Hicks's avatar
      seccomp: Action to log before allowing · 59f5cf44
      Tyler Hicks authored
      Add a new action, SECCOMP_RET_LOG, that logs a syscall before allowing
      the syscall. At the implementation level, this action is identical to
      the existing SECCOMP_RET_ALLOW action. However, it can be very useful when
      initially developing a seccomp filter for an application. The developer
      can set the default action to be SECCOMP_RET_LOG, maybe mark any
      obviously needed syscalls with SECCOMP_RET_ALLOW, and then put the
      application through its paces. A list of syscalls that triggered the
      default action (SECCOMP_RET_LOG) can be easily gleaned from the logs and
      that list can be used to build the syscall whitelist. Finally, the
      developer can change the default action to the desired value.
      This provides a more friendly experience than seeing the application get
      killed, then updating the filter and rebuilding the app, seeing the
      application get killed due to a different syscall, then updating the
      filter and rebuilding the app, etc.
      The functionality is similar to what's supported by the various LSMs.
      SELinux has permissive mode, AppArmor has complain mode, SMACK has
      bring-up mode, etc.
      SECCOMP_RET_LOG is given a lower value than SECCOMP_RET_ALLOW as allow
      while logging is slightly more restrictive than quietly allowing.
      Unfortunately, the tests added for SECCOMP_RET_LOG are not capable of
      inspecting the audit log to verify that the syscall was logged.
      With this patch, the logic for deciding if an action will be logged is:
      if action == RET_ALLOW:
        do not log
      else if action == RET_KILL && RET_KILL in actions_logged:
      else if action == RET_LOG && RET_LOG in actions_logged:
      else if filter-requests-logging && action in actions_logged:
      else if audit_enabled && process-is-being-audited:
        do not log
      Signed-off-by: default avatarTyler Hicks <tyhicks@canonical.com>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
    • Tyler Hicks's avatar
      seccomp: Sysctl to configure actions that are allowed to be logged · 0ddec0fc
      Tyler Hicks authored
      Adminstrators can write to this sysctl to set the seccomp actions that
      are allowed to be logged. Any actions not found in this sysctl will not
      be logged.
      For example, all SECCOMP_RET_KILL, SECCOMP_RET_TRAP, and
      SECCOMP_RET_ERRNO actions would be loggable if "kill trap errno" were
      written to the sysctl. SECCOMP_RET_TRACE actions would not be logged
      since its string representation ("trace") wasn't present in the sysctl
      The path to the sysctl is:
      The actions_avail sysctl can be read to discover the valid action names
      that can be written to the actions_logged sysctl with the exception of
      "allow". SECCOMP_RET_ALLOW actions cannot be configured for logging.
      The default setting for the sysctl is to allow all actions to be logged
      except SECCOMP_RET_ALLOW. While only SECCOMP_RET_KILL actions are
      currently logged, an upcoming patch will allow applications to request
      additional actions to be logged.
      There's one important exception to this sysctl. If a task is
      specifically being audited, meaning that an audit context has been
      allocated for the task, seccomp will log all actions other than
      SECCOMP_RET_ALLOW despite the value of actions_logged. This exception
      preserves the existing auditing behavior of tasks with an allocated
      audit context.
      With this patch, the logic for deciding if an action will be logged is:
      if action == RET_ALLOW:
        do not log
      else if action == RET_KILL && RET_KILL in actions_logged:
      else if audit_enabled && task-is-being-audited:
        do not log
      Signed-off-by: default avatarTyler Hicks <tyhicks@canonical.com>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
    • Tyler Hicks's avatar
      seccomp: Sysctl to display available actions · 8e5f1ad1
      Tyler Hicks authored
      This patch creates a read-only sysctl containing an ordered list of
      seccomp actions that the kernel supports. The ordering, from left to
      right, is the lowest action value (kill) to the highest action value
      (allow). Currently, a read of the sysctl file would return "kill trap
      errno trace allow". The contents of this sysctl file can be useful for
      userspace code as well as the system administrator.
      The path to the sysctl is:
      libseccomp and other userspace code can easily determine which actions
      the current kernel supports. The set of actions supported by the current
      kernel may be different than the set of action macros found in kernel
      headers that were installed where the userspace code was built.
      In addition, this sysctl will allow system administrators to know which
      actions are supported by the kernel and make it easier to configure
      exactly what seccomp logs through the audit subsystem. Support for this
      level of logging configuration will come in a future patch.
      Signed-off-by: default avatarTyler Hicks <tyhicks@canonical.com>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
  8. 18 May, 2017 3 commits
  9. 02 Apr, 2017 2 commits