ntb.txt 10.8 KB
Newer Older
1 2 3
===========
NTB Drivers
===========
4 5

NTB (Non-Transparent Bridge) is a type of PCI-Express bridge chip that connects
6 7 8 9 10 11 12 13 14 15
the separate memory systems of two or more computers to the same PCI-Express
fabric. Existing NTB hardware supports a common feature set: doorbell
registers and memory translation windows, as well as non common features like
scratchpad and message registers. Scratchpad registers are read-and-writable
registers that are accessible from either side of the device, so that peers can
exchange a small amount of information at a fixed address. Message registers can
be utilized for the same purpose. Additionally they are provided with with
special status bits to make sure the information isn't rewritten by another
peer. Doorbell registers provide a way for peers to send interrupt events.
Memory windows allow translated read and write access to the peer memory.
16

17 18
NTB Core Driver (ntb)
=====================
19 20 21 22 23 24 25

The NTB core driver defines an api wrapping the common feature set, and allows
clients interested in NTB features to discover NTB the devices supported by
hardware drivers.  The term "client" is used here to mean an upper layer
component making use of the NTB api.  The term "driver," or "hardware driver,"
is used here to mean a driver for a specific vendor and model of NTB hardware.

26 27
NTB Client Drivers
==================
28 29 30 31 32 33 34

NTB client drivers should register with the NTB core driver.  After
registering, the client probe and remove functions will be called appropriately
as ntb hardware, or hardware drivers, are inserted and removed.  The
registration uses the Linux Device framework, so it should feel familiar to
anyone who has written a pci driver.

35 36
NTB Typical client driver implementation
----------------------------------------
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116

Primary purpose of NTB is to share some peace of memory between at least two
systems. So the NTB device features like Scratchpad/Message registers are
mainly used to perform the proper memory window initialization. Typically
there are two types of memory window interfaces supported by the NTB API:
inbound translation configured on the local ntb port and outbound translation
configured by the peer, on the peer ntb port. The first type is
depicted on the next figure

Inbound translation:
 Memory:              Local NTB Port:      Peer NTB Port:      Peer MMIO:
  ____________
 | dma-mapped |-ntb_mw_set_trans(addr)  |
 | memory     |        _v____________   |   ______________
 | (addr)     |<======| MW xlat addr |<====| MW base addr |<== memory-mapped IO
 |------------|       |--------------|  |  |--------------|

So typical scenario of the first type memory window initialization looks:
1) allocate a memory region, 2) put translated address to NTB config,
3) somehow notify a peer device of performed initialization, 4) peer device
maps corresponding outbound memory window so to have access to the shared
memory region.

The second type of interface, that implies the shared windows being
initialized by a peer device, is depicted on the figure:

Outbound translation:
 Memory:        Local NTB Port:    Peer NTB Port:      Peer MMIO:
  ____________                      ______________
 | dma-mapped |                |   | MW base addr |<== memory-mapped IO
 | memory     |                |   |--------------|
 | (addr)     |<===================| MW xlat addr |<-ntb_peer_mw_set_trans(addr)
 |------------|                |   |--------------|

Typical scenario of the second type interface initialization would be:
1) allocate a memory region, 2) somehow deliver a translated address to a peer
device, 3) peer puts the translated address to NTB config, 4) peer device maps
outbound memory window so to have access to the shared memory region.

As one can see the described scenarios can be combined in one portable
algorithm.
 Local device:
  1) Allocate memory for a shared window
  2) Initialize memory window by translated address of the allocated region
     (it may fail if local memory window initialization is unsupported)
  3) Send the translated address and memory window index to a peer device
 Peer device:
  1) Initialize memory window with retrieved address of the allocated
     by another device memory region (it may fail if peer memory window
     initialization is unsupported)
  2) Map outbound memory window

In accordance with this scenario, the NTB Memory Window API can be used as
follows:
 Local device:
  1) ntb_mw_count(pidx) - retrieve number of memory ranges, which can
     be allocated for memory windows between local device and peer device
     of port with specified index.
  2) ntb_get_align(pidx, midx) - retrieve parameters restricting the
     shared memory region alignment and size. Then memory can be properly
     allocated.
  3) Allocate physically contiguous memory region in compliance with
     restrictions retrieved in 2).
  4) ntb_mw_set_trans(pidx, midx) - try to set translation address of
     the memory window with specified index for the defined peer device
     (it may fail if local translated address setting is not supported)
  5) Send translated base address (usually together with memory window
     number) to the peer device using, for instance, scratchpad or message
     registers.
 Peer device:
  1) ntb_peer_mw_set_trans(pidx, midx) - try to set received from other
     device (related to pidx) translated address for specified memory
     window. It may fail if retrieved address, for instance, exceeds
     maximum possible address or isn't properly aligned.
  2) ntb_peer_mw_get_addr(widx) - retrieve MMIO address to map the memory
     window so to have an access to the shared memory.

Also it is worth to note, that method ntb_mw_count(pidx) should return the
same value as ntb_peer_mw_count() on the peer with port index - pidx.

117 118
NTB Transport Client (ntb\_transport) and NTB Netdev (ntb\_netdev)
------------------------------------------------------------------
119 120 121 122 123 124 125 126 127 128

The primary client for NTB is the Transport client, used in tandem with NTB
Netdev.  These drivers function together to create a logical link to the peer,
across the ntb, to exchange packets of network data.  The Transport client
establishes a logical link to the peer, and creates queue pairs to exchange
messages and data.  The NTB Netdev then creates an ethernet device using a
Transport queue pair.  Network data is copied between socket buffers and the
Transport queue pair buffer.  The Transport client may be used for other things
besides Netdev, however no other applications have yet been written.

129 130
NTB Ping Pong Test Client (ntb\_pingpong)
-----------------------------------------
131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156

The Ping Pong test client serves as a demonstration to exercise the doorbell
and scratchpad registers of NTB hardware, and as an example simple NTB client.
Ping Pong enables the link when started, waits for the NTB link to come up, and
then proceeds to read and write the doorbell scratchpad registers of the NTB.
The peers interrupt each other using a bit mask of doorbell bits, which is
shifted by one in each round, to test the behavior of multiple doorbell bits
and interrupt vectors.  The Ping Pong driver also reads the first local
scratchpad, and writes the value plus one to the first peer scratchpad, each
round before writing the peer doorbell register.

Module Parameters:

* unsafe - Some hardware has known issues with scratchpad and doorbell
	registers.  By default, Ping Pong will not attempt to exercise such
	hardware.  You may override this behavior at your own risk by setting
	unsafe=1.
* delay\_ms - Specify the delay between receiving a doorbell
	interrupt event and setting the peer doorbell register for the next
	round.
* init\_db - Specify the doorbell bits to start new series of rounds.  A new
	series begins once all the doorbell bits have been shifted out of
	range.
* dyndbg - It is suggested to specify dyndbg=+p when loading this module, and
	then to observe debugging output on the console.

157 158
NTB Tool Test Client (ntb\_tool)
--------------------------------
Allen Hubbe's avatar
Allen Hubbe committed
159 160 161 162 163 164 165 166 167

The Tool test client serves for debugging, primarily, ntb hardware and drivers.
The Tool provides access through debugfs for reading, setting, and clearing the
NTB doorbell, and reading and writing scratchpads.

The Tool does not currently have any module parameters.

Debugfs Files:

168 169
* *debugfs*/ntb\_tool/*hw*/
	A directory in debugfs will be created for each
Allen Hubbe's avatar
Allen Hubbe committed
170 171
	NTB device probed by the tool.  This directory is shortened to *hw*
	below.
172 173
* *hw*/db
	This file is used to read, set, and clear the local doorbell.  Not
Allen Hubbe's avatar
Allen Hubbe committed
174 175 176 177
	all operations may be supported by all hardware.  To read the doorbell,
	read the file.  To set the doorbell, write `s` followed by the bits to
	set (eg: `echo 's 0x0101' > db`).  To clear the doorbell, write `c`
	followed by the bits to clear.
178 179
* *hw*/mask
	This file is used to read, set, and clear the local doorbell mask.
Allen Hubbe's avatar
Allen Hubbe committed
180
	See *db* for details.
181 182
* *hw*/peer\_db
	This file is used to read, set, and clear the peer doorbell.
Allen Hubbe's avatar
Allen Hubbe committed
183
	See *db* for details.
184 185
* *hw*/peer\_mask
	This file is used to read, set, and clear the peer doorbell
Allen Hubbe's avatar
Allen Hubbe committed
186
	mask.  See *db* for details.
187 188
* *hw*/spad
	This file is used to read and write local scratchpads.  To read
Allen Hubbe's avatar
Allen Hubbe committed
189 190 191 192
	the values of all scratchpads, read the file.  To write values, write a
	series of pairs of scratchpad number and value
	(eg: `echo '4 0x123 7 0xabc' > spad`
	# to set scratchpads `4` and `7` to `0x123` and `0xabc`, respectively).
193 194
* *hw*/peer\_spad
	This file is used to read and write peer scratchpads.  See
Allen Hubbe's avatar
Allen Hubbe committed
195 196
	*spad* for details.

197 198
NTB Hardware Drivers
====================
199 200 201

NTB hardware drivers should register devices with the NTB core driver.  After
registering, clients probe and remove functions will be called.
202

203 204
NTB Intel Hardware Driver (ntb\_hw\_intel)
------------------------------------------
205 206 207 208 209

The Intel hardware driver supports NTB on Xeon and Atom CPUs.

Module Parameters:

210 211
* b2b\_mw\_idx
	If the peer ntb is to be accessed via a memory window, then use
212 213 214 215
	this memory window to access the peer ntb.  A value of zero or positive
	starts from the first mw idx, and a negative value starts from the last
	mw idx.  Both sides MUST set the same value here!  The default value is
	`-1`.
216 217
* b2b\_mw\_share
	If the peer ntb is to be accessed via a memory window, and if
218 219
	the memory window is large enough, still allow the client to use the
	second half of the memory window for address translation to the peer.
220 221
* xeon\_b2b\_usd\_bar2\_addr64
	If using B2B topology on Xeon hardware, use
222 223 224 225 226 227 228 229 230
	this 64 bit address on the bus between the NTB devices for the window
	at BAR2, on the upstream side of the link.
* xeon\_b2b\_usd\_bar4\_addr64 - See *xeon\_b2b\_bar2\_addr64*.
* xeon\_b2b\_usd\_bar4\_addr32 - See *xeon\_b2b\_bar2\_addr64*.
* xeon\_b2b\_usd\_bar5\_addr32 - See *xeon\_b2b\_bar2\_addr64*.
* xeon\_b2b\_dsd\_bar2\_addr64 - See *xeon\_b2b\_bar2\_addr64*.
* xeon\_b2b\_dsd\_bar4\_addr64 - See *xeon\_b2b\_bar2\_addr64*.
* xeon\_b2b\_dsd\_bar4\_addr32 - See *xeon\_b2b\_bar2\_addr64*.
* xeon\_b2b\_dsd\_bar5\_addr32 - See *xeon\_b2b\_bar2\_addr64*.