aliasing.txt 8.49 KB
Newer Older
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114
	         MEMORY ATTRIBUTE ALIASING ON IA-64

			   Bjorn Helgaas
		       <bjorn.helgaas@hp.com>
			    May 4, 2006


MEMORY ATTRIBUTES

    Itanium supports several attributes for virtual memory references.
    The attribute is part of the virtual translation, i.e., it is
    contained in the TLB entry.  The ones of most interest to the Linux
    kernel are:

	WB		Write-back (cacheable)
	UC		Uncacheable
	WC		Write-coalescing

    System memory typically uses the WB attribute.  The UC attribute is
    used for memory-mapped I/O devices.  The WC attribute is uncacheable
    like UC is, but writes may be delayed and combined to increase
    performance for things like frame buffers.

    The Itanium architecture requires that we avoid accessing the same
    page with both a cacheable mapping and an uncacheable mapping[1].

    The design of the chipset determines which attributes are supported
    on which regions of the address space.  For example, some chipsets
    support either WB or UC access to main memory, while others support
    only WB access.

MEMORY MAP

    Platform firmware describes the physical memory map and the
    supported attributes for each region.  At boot-time, the kernel uses
    the EFI GetMemoryMap() interface.  ACPI can also describe memory
    devices and the attributes they support, but Linux/ia64 currently
    doesn't use this information.

    The kernel uses the efi_memmap table returned from GetMemoryMap() to
    learn the attributes supported by each region of physical address
    space.  Unfortunately, this table does not completely describe the
    address space because some machines omit some or all of the MMIO
    regions from the map.

    The kernel maintains another table, kern_memmap, which describes the
    memory Linux is actually using and the attribute for each region.
    This contains only system memory; it does not contain MMIO space.

    The kern_memmap table typically contains only a subset of the system
    memory described by the efi_memmap.  Linux/ia64 can't use all memory
    in the system because of constraints imposed by the identity mapping
    scheme.

    The efi_memmap table is preserved unmodified because the original
    boot-time information is required for kexec.

KERNEL IDENTITY MAPPINGS

    Linux/ia64 identity mappings are done with large pages, currently
    either 16MB or 64MB, referred to as "granules."  Cacheable mappings
    are speculative[2], so the processor can read any location in the
    page at any time, independent of the programmer's intentions.  This
    means that to avoid attribute aliasing, Linux can create a cacheable
    identity mapping only when the entire granule supports cacheable
    access.

    Therefore, kern_memmap contains only full granule-sized regions that
    can referenced safely by an identity mapping.

    Uncacheable mappings are not speculative, so the processor will
    generate UC accesses only to locations explicitly referenced by
    software.  This allows UC identity mappings to cover granules that
    are only partially populated, or populated with a combination of UC
    and WB regions.

USER MAPPINGS

    User mappings are typically done with 16K or 64K pages.  The smaller
    page size allows more flexibility because only 16K or 64K has to be
    homogeneous with respect to memory attributes.

POTENTIAL ATTRIBUTE ALIASING CASES

    There are several ways the kernel creates new mappings:

    mmap of /dev/mem

	This uses remap_pfn_range(), which creates user mappings.  These
	mappings may be either WB or UC.  If the region being mapped
	happens to be in kern_memmap, meaning that it may also be mapped
	by a kernel identity mapping, the user mapping must use the same
	attribute as the kernel mapping.

	If the region is not in kern_memmap, the user mapping should use
	an attribute reported as being supported in the EFI memory map.

	Since the EFI memory map does not describe MMIO on some
	machines, this should use an uncacheable mapping as a fallback.

    mmap of /sys/class/pci_bus/.../legacy_mem

	This is very similar to mmap of /dev/mem, except that legacy_mem
	only allows mmap of the one megabyte "legacy MMIO" area for a
	specific PCI bus.  Typically this is the first megabyte of
	physical address space, but it may be different on machines with
	several VGA devices.

	"X" uses this to access VGA frame buffers.  Using legacy_mem
	rather than /dev/mem allows multiple instances of X to talk to
	different VGA cards.

	The /dev/mem mmap constraints apply.

115 116 117 118 119 120 121 122 123 124 125 126
    mmap of /proc/bus/pci/.../??.?

    	This is an MMIO mmap of PCI functions, which additionally may or
	may not be requested as using the WC attribute.

	If WC is requested, and the region in kern_memmap is either WC
	or UC, and the EFI memory map designates the region as WC, then
	the WC mapping is allowed.

	Otherwise, the user mapping must use the same attribute as the
	kernel mapping.

127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142
    read/write of /dev/mem

	This uses copy_from_user(), which implicitly uses a kernel
	identity mapping.  This is obviously safe for things in
	kern_memmap.

	There may be corner cases of things that are not in kern_memmap,
	but could be accessed this way.  For example, registers in MMIO
	space are not in kern_memmap, but could be accessed with a UC
	mapping.  This would not cause attribute aliasing.  But
	registers typically can be accessed only with four-byte or
	eight-byte accesses, and the copy_from_user() path doesn't allow
	any control over the access size, so this would be dangerous.

    ioremap()

143
	This returns a mapping for use inside the kernel.
144 145

	If the region is in kern_memmap, we should use the attribute
146 147 148 149 150 151 152 153 154 155 156
	specified there.

	If the EFI memory map reports that the entire granule supports
	WB, we should use that (granules that are partially reserved
	or occupied by firmware do not appear in kern_memmap).

	If the granule contains non-WB memory, but we can cover the
	region safely with kernel page table mappings, we can use
	ioremap_page_range() as most other architectures do.

	Failing all of the above, we have to fall back to a UC mapping.
157 158 159 160 161 162 163 164 165 166 167 168

PAST PROBLEM CASES

    mmap of various MMIO regions from /dev/mem by "X" on Intel platforms

      The EFI memory map may not report these MMIO regions.

      These must be allowed so that X will work.  This means that
      when the EFI memory map is incomplete, every /dev/mem mmap must
      succeed.  It may create either WB or UC user mappings, depending
      on whether the region is in kern_memmap or the EFI memory map.

169
    mmap of 0x0-0x9FFFF /dev/mem by "hwinfo" on HP sx1000 with VGA enabled
170 171 172 173 174 175 176 177 178 179

      The EFI memory map reports the following attributes:
        0x00000-0x9FFFF WB only
        0xA0000-0xBFFFF UC only (VGA frame buffer)
        0xC0000-0xFFFFF WB only

      This mmap is done with user pages, not kernel identity mappings,
      so it is safe to use WB mappings.

      The kernel VGA driver may ioremap the VGA frame buffer at 0xA0000,
180 181 182 183
      which uses a granule-sized UC mapping.  This granule will cover some
      WB-only memory, but since UC is non-speculative, the processor will
      never generate an uncacheable reference to the WB-only areas unless
      the driver explicitly touches them.
184 185 186

    mmap of 0x0-0xFFFFF legacy_mem by "X"

187 188 189 190
      If the EFI memory map reports that the entire range supports the
      same attributes, we can allow the mmap (and we will prefer WB if
      supported, as is the case with HP sx[12]000 machines with VGA
      disabled).
191

192 193 194
      If EFI reports the range as partly WB and partly UC (as on sx[12]000
      machines with VGA enabled), we must fail the mmap because there's no
      safe attribute to use.
195

196 197 198
      If EFI reports some of the range but not all (as on Intel firmware
      that doesn't report the VGA frame buffer at all), we should fail the
      mmap and force the user to map just the specific region of interest.
199 200 201 202 203 204 205 206 207

    mmap of 0xA0000-0xBFFFF legacy_mem by "X" on HP sx1000 with VGA disabled

      The EFI memory map reports the following attributes:
        0x00000-0xFFFFF WB only (no VGA MMIO hole)

      This is a special case of the previous case, and the mmap should
      fail for the same reason as above.

208 209 210 211 212 213 214 215 216 217
    read of /sys/devices/.../rom

      For VGA devices, this may cause an ioremap() of 0xC0000.  This
      used to be done with a UC mapping, because the VGA frame buffer
      at 0xA0000 prevents use of a WB granule.  The UC mapping causes
      an MCA on HP sx[12]000 chipsets.

      We should use WB page table mappings to avoid covering the VGA
      frame buffer.

218 219 220 221
NOTES

    [1] SDM rev 2.2, vol 2, sec 4.4.1.
    [2] SDM rev 2.2, vol 2, sec 4.4.6.