Skip to content
  • Luiz Capitulino's avatar
    hugetlb: add support for gigantic page allocation at runtime · 944d9fec
    Luiz Capitulino authored
    
    
    HugeTLB is limited to allocating hugepages whose size are less than
    MAX_ORDER order.  This is so because HugeTLB allocates hugepages via the
    buddy allocator.  Gigantic pages (that is, pages whose size is greater
    than MAX_ORDER order) have to be allocated at boottime.
    
    However, boottime allocation has at least two serious problems.  First,
    it doesn't support NUMA and second, gigantic pages allocated at boottime
    can't be freed.
    
    This commit solves both issues by adding support for allocating gigantic
    pages during runtime.  It works just like regular sized hugepages,
    meaning that the interface in sysfs is the same, it supports NUMA, and
    gigantic pages can be freed.
    
    For example, on x86_64 gigantic pages are 1GB big. To allocate two 1G
    gigantic pages on node 1, one can do:
    
     # echo 2 > \
       /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages
    
    And to free them all:
    
     # echo 0 > \
       /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages
    
    The one problem with gigantic page allocation at runtime is that it
    can't be serviced by the buddy allocator.  To overcome that problem,
    this commit scans all zones from a node looking for a large enough
    contiguous region.  When one is found, it's allocated by using CMA, that
    is, we call alloc_contig_range() to do the actual allocation.  For
    example, on x86_64 we scan all zones looking for a 1GB contiguous
    region.  When one is found, it's allocated by alloc_contig_range().
    
    One expected issue with that approach is that such gigantic contiguous
    regions tend to vanish as runtime goes by.  The best way to avoid this
    for now is to make gigantic page allocations very early during system
    boot, say from a init script.  Other possible optimization include using
    compaction, which is supported by CMA but is not explicitly used by this
    commit.
    
    It's also important to note the following:
    
     1. Gigantic pages allocated at boottime by the hugepages= command-line
        option can be freed at runtime just fine
    
     2. This commit adds support for gigantic pages only to x86_64. The
        reason is that I don't have access to nor experience with other archs.
        The code is arch indepedent though, so it should be simple to add
        support to different archs
    
     3. I didn't add support for hugepage overcommit, that is allocating
        a gigantic page on demand when
       /proc/sys/vm/nr_overcommit_hugepages > 0. The reason is that I don't
       think it's reasonable to do the hard and long work required for
       allocating a gigantic page at fault time. But it should be simple
       to add this if wanted
    
    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: default avatarLuiz Capitulino <lcapitulino@redhat.com>
    Reviewed-by: default avatarDavidlohr Bueso <davidlohr@hp.com>
    Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Reviewed-by: default avatarZhang Yanfei <zhangyanfei@cn.fujitsu.com>
    Reviewed-by: default avatarYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Marcelo Tosatti <mtosatti@redhat.com>
    Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
    Cc: Rik van Riel <riel@redhat.com>
    Cc: Yinghai Lu <yinghai@kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    944d9fec