atomic_ops.rst 22.7 KB
Newer Older
1 2 3
=======================================================
Semantics and Behavior of Atomic and Bitmask Operations
=======================================================
Linus Torvalds's avatar
Linus Torvalds committed
4

5
:Author: David S. Miller
Linus Torvalds's avatar
Linus Torvalds committed
6

7
This document is intended to serve as a guide to Linux port
Linus Torvalds's avatar
Linus Torvalds committed
8 9 10
maintainers on how to implement atomic counter, bitops, and spinlock
interfaces properly.

11 12 13 14
Atomic Type And Operations
==========================

The atomic_t type should be defined as a signed integer and
15 16
the atomic_long_t type as a signed long integer.  Also, they should
be made opaque such that any kind of cast to a normal C integer type
17
will fail.  Something like the following should suffice::
Linus Torvalds's avatar
Linus Torvalds committed
18

19
	typedef struct { int counter; } atomic_t;
20
	typedef struct { long counter; } atomic_long_t;
Linus Torvalds's avatar
Linus Torvalds committed
21

22
Historically, counter has been declared volatile.  This is now discouraged.
23 24
See :ref:`Documentation/process/volatile-considered-harmful.rst
<volatile_considered_harmful>` for the complete rationale.
25

26 27
local_t is very similar to atomic_t. If the counter is per CPU and only
updated by one CPU, local_t is probably more appropriate. Please see
28 29
:ref:`Documentation/core-api/local_ops.rst <local_ops>` for the semantics of
local_t.
30

31
The first operations to implement for atomic_t's are the initializers and
32
plain writes. ::
Linus Torvalds's avatar
Linus Torvalds committed
33 34 35 36

	#define ATOMIC_INIT(i)		{ (i) }
	#define atomic_set(v, i)	((v)->counter = (i))

37
The first macro is used in definitions, such as::
Linus Torvalds's avatar
Linus Torvalds committed
38

39
	static atomic_t my_counter = ATOMIC_INIT(1);
Linus Torvalds's avatar
Linus Torvalds committed
40

41 42 43 44 45 46
The initializer is atomic in that the return values of the atomic operations
are guaranteed to be correct reflecting the initialized value if the
initializer is used before runtime.  If the initializer is used at runtime, a
proper implicit or explicit read memory barrier is needed before reading the
value with atomic_read from another thread.

47 48
As with all of the ``atomic_`` interfaces, replace the leading ``atomic_``
with ``atomic_long_`` to operate on atomic_long_t.
49

50
The second interface can be used at runtime, as in::
Linus Torvalds's avatar
Linus Torvalds committed
51 52 53 54 55 56 57 58 59 60 61

	struct foo { atomic_t counter; };
	...

	struct foo *k;

	k = kmalloc(sizeof(*k), GFP_KERNEL);
	if (!k)
		return -ENOMEM;
	atomic_set(&k->counter, 0);

62 63 64 65 66 67
The setting is atomic in that the return values of the atomic operations by
all threads are guaranteed to be correct reflecting either the value that has
been set with this operation or set with another operation.  A proper implicit
or explicit memory barrier is needed before the value set with the operation
is guaranteed to be readable with atomic_read from another thread.

68
Next, we have::
Linus Torvalds's avatar
Linus Torvalds committed
69 70 71

	#define atomic_read(v)	((v)->counter)

72 73 74 75 76 77 78 79 80 81
which simply reads the counter value currently visible to the calling thread.
The read is atomic in that the return value is guaranteed to be one of the
values initialized or modified with the interface operations if a proper
implicit or explicit memory barrier is used after possible runtime
initialization by any other thread and the value is modified only with the
interface operations.  atomic_read does not guarantee that the runtime
initialization by any other thread is visible yet, so the user of the
interface must take care of that with a proper implicit or explicit memory
barrier.

82
.. warning::
83

84
	``atomic_read()`` and ``atomic_set()`` DO NOT IMPLY BARRIERS!
85

86 87 88 89 90 91 92 93 94 95 96
	Some architectures may choose to use the volatile keyword, barriers, or
	inline assembly to guarantee some degree of immediacy for atomic_read()
	and atomic_set().  This is not uniformly guaranteed, and may change in
	the future, so all users of atomic_t should treat atomic_read() and
	atomic_set() as simple C statements that may be reordered or optimized
	away entirely by the compiler or processor, and explicitly invoke the
	appropriate compiler and/or memory barrier for each use case.  Failure
	to do so will result in code that may suddenly break when used with
	different architectures or compiler optimizations, or even changes in
	unrelated code which changes how the compiler optimizes the section
	accessing atomic_t variables.
97

98 99
Properly aligned pointers, longs, ints, and chars (and unsigned
equivalents) may be atomically loaded from and stored to in the same
100 101 102 103
sense as described for atomic_read() and atomic_set().  The READ_ONCE()
and WRITE_ONCE() macros should be used to prevent the compiler from using
optimizations that might otherwise optimize accesses out of existence on
the one hand, or that might create unsolicited accesses on the other.
104

105
For example consider the following code::
106 107 108 109 110 111

	while (a > 0)
		do_something();

If the compiler can prove that do_something() does not store to the
variable a, then the compiler is within its rights transforming this to
112
the following::
113 114 115 116 117 118

	if (a > 0)
		for (;;)
			do_something();

If you don't want the compiler to do this (and you probably don't), then
119
you should use something like the following::
120

121
	while (READ_ONCE(a) > 0)
122 123 124 125
		do_something();

Alternatively, you could place a barrier() call in the loop.

126
For another example, consider the following code::
127 128 129 130 131 132 133

	tmp_a = a;
	do_something_with(tmp_a);
	do_something_else_with(tmp_a);

If the compiler can prove that do_something_with() does not store to the
variable a, then the compiler is within its rights to manufacture an
134
additional load as follows::
135 136 137 138 139 140 141 142 143 144 145 146 147

	tmp_a = a;
	do_something_with(tmp_a);
	tmp_a = a;
	do_something_else_with(tmp_a);

This could fatally confuse your code if it expected the same value
to be passed to do_something_with() and do_something_else_with().

The compiler would be likely to manufacture this additional load if
do_something_with() was an inline function that made very heavy use
of registers: reloading from variable a could save a flush to the
stack and later reload.  To prevent the compiler from attacking your
148
code in this manner, write the following::
149

150
	tmp_a = READ_ONCE(a);
151 152 153 154 155
	do_something_with(tmp_a);
	do_something_else_with(tmp_a);

For a final example, consider the following code, assuming that the
variable a is set at boot time before the second CPU is brought online
156
and never changed later, so that memory barriers are not needed::
157 158 159 160 161 162 163

	if (a)
		b = 9;
	else
		b = 42;

The compiler is within its rights to manufacture an additional store
164
by transforming the above code into the following::
165 166 167 168 169 170 171

	b = 42;
	if (a)
		b = 9;

This could come as a fatal surprise to other code running concurrently
that expected b to never have the value 42 if a was zero.  To prevent
172
the compiler from doing this, write something like::
173 174

	if (a)
175
		WRITE_ONCE(b, 9);
176
	else
177
		WRITE_ONCE(b, 42);
178 179 180 181

Don't even -think- about doing this without proper use of memory barriers,
locks, or atomic operations if variable a can change at runtime!

182 183 184
.. warning::

	``READ_ONCE()`` OR ``WRITE_ONCE()`` DO NOT IMPLY A BARRIER!
185

186
Now, we move onto the atomic operation interfaces typically implemented with
187
the help of assembly code. ::
Linus Torvalds's avatar
Linus Torvalds committed
188 189 190 191 192 193 194 195 196 197 198 199 200 201 202

	void atomic_add(int i, atomic_t *v);
	void atomic_sub(int i, atomic_t *v);
	void atomic_inc(atomic_t *v);
	void atomic_dec(atomic_t *v);

These four routines add and subtract integral values to/from the given
atomic_t value.  The first two routines pass explicit integers by
which to make the adjustment, whereas the latter two use an implicit
adjustment value of "1".

One very important aspect of these two routines is that they DO NOT
require any explicit memory barriers.  They need only perform the
atomic_t counter update in an SMP safe manner.

203
Next, we have::
Linus Torvalds's avatar
Linus Torvalds committed
204 205 206 207 208 209 210 211

	int atomic_inc_return(atomic_t *v);
	int atomic_dec_return(atomic_t *v);

These routines add 1 and subtract 1, respectively, from the given
atomic_t and return the new counter value after the operation is
performed.

212 213 214 215 216
Unlike the above routines, it is required that these primitives
include explicit memory barriers that are performed before and after
the operation.  It must be done such that all memory operations before
and after the atomic operation calls are strongly ordered with respect
to the atomic operation itself.
Linus Torvalds's avatar
Linus Torvalds committed
217 218 219 220 221 222 223 224

For example, it should behave as if a smp_mb() call existed both
before and after the atomic operation.

If the atomic instructions used in an implementation provide explicit
memory barrier semantics which satisfy the above requirements, that is
fine as well.

225
Let's move on::
Linus Torvalds's avatar
Linus Torvalds committed
226 227 228 229 230 231 232 233 234

	int atomic_add_return(int i, atomic_t *v);
	int atomic_sub_return(int i, atomic_t *v);

These behave just like atomic_{inc,dec}_return() except that an
explicit counter adjustment is given instead of the implicit "1".
This means that like atomic_{inc,dec}_return(), the memory barrier
semantics are required.

235
Next::
Linus Torvalds's avatar
Linus Torvalds committed
236 237 238 239 240 241 242 243

	int atomic_inc_and_test(atomic_t *v);
	int atomic_dec_and_test(atomic_t *v);

These two routines increment and decrement by 1, respectively, the
given atomic counter.  They return a boolean indicating whether the
resulting counter value was zero or not.

244
Again, these primitives provide explicit memory barrier semantics around
245
the atomic operation::
Linus Torvalds's avatar
Linus Torvalds committed
246 247 248 249

	int atomic_sub_and_test(int i, atomic_t *v);

This is identical to atomic_dec_and_test() except that an explicit
250
decrement is given instead of the implicit "1".  This primitive must
251
provide explicit memory barrier semantics around the operation::
Linus Torvalds's avatar
Linus Torvalds committed
252 253 254

	int atomic_add_negative(int i, atomic_t *v);

255 256 257 258
The given increment is added to the given atomic counter value.  A boolean
is return which indicates whether the resulting counter value is negative.
This primitive must provide explicit memory barrier semantics around
the operation.
Linus Torvalds's avatar
Linus Torvalds committed
259

260
Then::
Nick Piggin's avatar
Nick Piggin committed
261

262 263 264 265 266 267
	int atomic_xchg(atomic_t *v, int new);

This performs an atomic exchange operation on the atomic variable v, setting
the given new value.  It returns the old value that the atomic variable v had
just before the operation.

268
atomic_xchg must provide explicit memory barriers around the operation. ::
269

Nick Piggin's avatar
Nick Piggin committed
270 271 272 273 274
	int atomic_cmpxchg(atomic_t *v, int old, int new);

This performs an atomic compare exchange operation on the atomic value v,
with the given old and new values. Like all atomic_xxx operations,
atomic_cmpxchg will only satisfy its atomicity semantics as long as all
275
other accesses of \*v are performed through atomic_xxx operations.
Nick Piggin's avatar
Nick Piggin committed
276

277 278 279
atomic_cmpxchg must provide explicit memory barriers around the operation,
although if the comparison fails then no memory ordering guarantees are
required.
Nick Piggin's avatar
Nick Piggin committed
280 281 282 283

The semantics for atomic_cmpxchg are the same as those defined for 'cas'
below.

284
Finally::
285 286 287 288 289 290 291

	int atomic_add_unless(atomic_t *v, int a, int u);

If the atomic value v is not equal to u, this function adds a to v, and
returns non zero. If v is equal to u then it returns zero. This is done as
an atomic operation.

292 293
atomic_add_unless must provide explicit memory barriers around the
operation unless it fails (returns 0).
294 295 296

atomic_inc_not_zero, equivalent to atomic_add_unless(v, 1, 0)

Nick Piggin's avatar
Nick Piggin committed
297

Linus Torvalds's avatar
Linus Torvalds committed
298 299
If a caller requires memory barrier semantics around an atomic_t
operation which does not return a value, a set of interfaces are
300
defined which accomplish this::
Linus Torvalds's avatar
Linus Torvalds committed
301

302 303
	void smp_mb__before_atomic(void);
	void smp_mb__after_atomic(void);
Linus Torvalds's avatar
Linus Torvalds committed
304

305 306 307 308 309
Preceding a non-value-returning read-modify-write atomic operation with
smp_mb__before_atomic() and following it with smp_mb__after_atomic()
provides the same full ordering that is provided by value-returning
read-modify-write atomic operations.

310
For example, smp_mb__before_atomic() can be used like so::
Linus Torvalds's avatar
Linus Torvalds committed
311 312

	obj->dead = 1;
313
	smp_mb__before_atomic();
Linus Torvalds's avatar
Linus Torvalds committed
314 315
	atomic_dec(&obj->ref_count);

316
It makes sure that all memory operations preceding the atomic_dec()
Linus Torvalds's avatar
Linus Torvalds committed
317
call are strongly ordered with respect to the atomic counter
318
operation.  In the above example, it guarantees that the assignment of
Linus Torvalds's avatar
Linus Torvalds committed
319 320 321
"1" to obj->dead will be globally visible to other cpus before the
atomic counter decrement.

322
Without the explicit smp_mb__before_atomic() call, the
Linus Torvalds's avatar
Linus Torvalds committed
323 324 325 326
implementation could legally allow the atomic counter update visible
to other cpus before the "obj->dead = 1;" assignment.

A missing memory barrier in the cases where they are required by the
327 328
atomic_t implementation above can have disastrous results.  Here is
an example, which follows a pattern occurring frequently in the Linux
Linus Torvalds's avatar
Linus Torvalds committed
329 330
kernel.  It is the use of atomic counters to implement reference
counting, and it works such that once the counter falls to zero it can
331 332 333 334 335 336 337 338 339 340 341 342 343
be guaranteed that no other entity can be accessing the object::

	static void obj_list_add(struct obj *obj, struct list_head *head)
	{
		obj->active = 1;
		list_add(&obj->list, head);
	}

	static void obj_list_del(struct obj *obj)
	{
		list_del(&obj->list);
		obj->active = 0;
	}
Linus Torvalds's avatar
Linus Torvalds committed
344

345 346 347 348
	static void obj_destroy(struct obj *obj)
	{
		BUG_ON(obj->active);
		kfree(obj);
Linus Torvalds's avatar
Linus Torvalds committed
349 350
	}

351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369
	struct obj *obj_list_peek(struct list_head *head)
	{
		if (!list_empty(head)) {
			struct obj *obj;

			obj = list_entry(head->next, struct obj, list);
			atomic_inc(&obj->refcnt);
			return obj;
		}
		return NULL;
	}

	void obj_poke(void)
	{
		struct obj *obj;

		spin_lock(&global_list_lock);
		obj = obj_list_peek(&global_list);
		spin_unlock(&global_list_lock);
Linus Torvalds's avatar
Linus Torvalds committed
370

371 372 373 374 375 376 377 378 379 380 381 382
		if (obj) {
			obj->ops->poke(obj);
			if (atomic_dec_and_test(&obj->refcnt))
				obj_destroy(obj);
		}
	}

	void obj_timeout(struct obj *obj)
	{
		spin_lock(&global_list_lock);
		obj_list_del(obj);
		spin_unlock(&global_list_lock);
Linus Torvalds's avatar
Linus Torvalds committed
383 384 385 386 387

		if (atomic_dec_and_test(&obj->refcnt))
			obj_destroy(obj);
	}

388
.. note::
Linus Torvalds's avatar
Linus Torvalds committed
389

390 391 392 393
	This is a simplification of the ARP queue management in the generic
	neighbour discover code of the networking.  Olaf Kirch found a bug wrt.
	memory barriers in kfree_skb() that exposed the atomic_t memory barrier
	requirements quite clearly.
Linus Torvalds's avatar
Linus Torvalds committed
394 395 396 397 398 399 400

Given the above scheme, it must be the case that the obj->active
update done by the obj list deletion be visible to other processors
before the atomic counter decrement is performed.

Otherwise, the counter could fall to zero, yet obj->active would still
be set, thus triggering the assertion in obj_destroy().  The error
401
sequence looks like this::
Linus Torvalds's avatar
Linus Torvalds committed
402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425

	cpu 0				cpu 1
	obj_poke()			obj_timeout()
	obj = obj_list_peek();
	... gains ref to obj, refcnt=2
					obj_list_del(obj);
					obj->active = 0 ...
					... visibility delayed ...
					atomic_dec_and_test()
					... refcnt drops to 1 ...
	atomic_dec_and_test()
	... refcount drops to 0 ...
	obj_destroy()
	BUG() triggers since obj->active
	still seen as one
					obj->active update visibility occurs

With the memory barrier semantics required of the atomic_t operations
which return values, the above sequence of memory visibility can never
happen.  Specifically, in the above case the atomic_dec_and_test()
counter decrement would not become globally visible until the
obj->active update does.

As a historical note, 32-bit Sparc used to only allow usage of
426
24-bits of its atomic_t type.  This was because it used 8 bits
Linus Torvalds's avatar
Linus Torvalds committed
427 428 429 430 431 432 433 434 435 436 437
as a spinlock for SMP safety.  Sparc32 lacked a "compare and swap"
type instruction.  However, 32-bit Sparc has since been moved over
to a "hash table of spinlocks" scheme, that allows the full 32-bit
counter to be realized.  Essentially, an array of spinlocks are
indexed into based upon the address of the atomic_t being operated
on, and that lock protects the atomic operation.  Parisc uses the
same scheme.

Another note is that the atomic_t operations returning values are
extremely slow on an old 386.

438 439 440 441

Atomic Bitmask
==============

Linus Torvalds's avatar
Linus Torvalds committed
442 443 444 445 446 447 448
We will now cover the atomic bitmask operations.  You will find that
their SMP and memory barrier semantics are similar in shape and scope
to the atomic_t ops above.

Native atomic bit operations are defined to operate on objects aligned
to the size of an "unsigned long" C data type, and are least of that
size.  The endianness of the bits within each "unsigned long" are the
449
native endianness of the cpu. ::
Linus Torvalds's avatar
Linus Torvalds committed
450

451 452 453
	void set_bit(unsigned long nr, volatile unsigned long *addr);
	void clear_bit(unsigned long nr, volatile unsigned long *addr);
	void change_bit(unsigned long nr, volatile unsigned long *addr);
Linus Torvalds's avatar
Linus Torvalds committed
454 455 456 457 458

These routines set, clear, and change, respectively, the bit number
indicated by "nr" on the bit mask pointed to by "ADDR".

They must execute atomically, yet there are no implicit memory barrier
459
semantics required of these interfaces. ::
Linus Torvalds's avatar
Linus Torvalds committed
460

461 462 463
	int test_and_set_bit(unsigned long nr, volatile unsigned long *addr);
	int test_and_clear_bit(unsigned long nr, volatile unsigned long *addr);
	int test_and_change_bit(unsigned long nr, volatile unsigned long *addr);
Linus Torvalds's avatar
Linus Torvalds committed
464 465 466 467 468

Like the above, except that these routines return a boolean which
indicates whether the changed bit was set _BEFORE_ the atomic bit
operation.

469 470 471 472 473 474

.. warning::
        It is incredibly important that the value be a boolean, ie. "0" or "1".
        Do not try to be fancy and save a few instructions by declaring the
        above to return "long" and just returning something like "old_val &
        mask" because that will not work.
Linus Torvalds's avatar
Linus Torvalds committed
475 476 477 478 479 480 481 482 483 484 485

For one thing, this return value gets truncated to int in many code
paths using these interfaces, so on 64-bit if the bit is set in the
upper 32-bits then testers will never see that.

One great example of where this problem crops up are the thread_info
flag operations.  Routines such as test_and_set_ti_thread_flag() chop
the return value into an int.  There are other places where things
like this occur as well.

These routines, like the atomic_t counter operations returning values,
486 487 488
must provide explicit memory barrier semantics around their execution.
All memory operations before the atomic bit operation call must be
made visible globally before the atomic bit operation is made visible.
Linus Torvalds's avatar
Linus Torvalds committed
489
Likewise, the atomic bit operation must be visible globally before any
490
subsequent memory operation is made visible.  For example::
Linus Torvalds's avatar
Linus Torvalds committed
491 492 493 494 495 496

	obj->dead = 1;
	if (test_and_set_bit(0, &obj->flags))
		/* ... */;
	obj->killed = 1;

497
The implementation of test_and_set_bit() must guarantee that
Linus Torvalds's avatar
Linus Torvalds committed
498 499 500 501 502
"obj->dead = 1;" is visible to cpus before the atomic memory operation
done by test_and_set_bit() becomes visible.  Likewise, the atomic
memory operation done by test_and_set_bit() must become visible before
"obj->killed = 1;" is visible.

503
Finally there is the basic operation::
Linus Torvalds's avatar
Linus Torvalds committed
504 505 506 507 508 509

	int test_bit(unsigned long nr, __const__ volatile unsigned long *addr);

Which returns a boolean indicating if bit "nr" is set in the bitmask
pointed to by "addr".

510 511
If explicit memory barriers are required around {set,clear}_bit() (which do
not return a value, and thus does not need to provide memory barrier
512
semantics), two interfaces are provided::
Linus Torvalds's avatar
Linus Torvalds committed
513

514 515
	void smp_mb__before_atomic(void);
	void smp_mb__after_atomic(void);
Linus Torvalds's avatar
Linus Torvalds committed
516 517

They are used as follows, and are akin to their atomic_t operation
518
brothers::
Linus Torvalds's avatar
Linus Torvalds committed
519 520 521 522

	/* All memory operations before this call will
	 * be globally visible before the clear_bit().
	 */
523
	smp_mb__before_atomic();
Linus Torvalds's avatar
Linus Torvalds committed
524 525 526 527 528
	clear_bit( ... );

	/* The clear_bit() will be visible before all
	 * subsequent memory operations.
	 */
529
	 smp_mb__after_atomic();
Linus Torvalds's avatar
Linus Torvalds committed
530

Nick Piggin's avatar
Nick Piggin committed
531 532 533 534
There are two special bitops with lock barrier semantics (acquire/release,
same as spinlocks). These operate in the same way as their non-_lock/unlock
postfixed variants, except that they are to provide acquire/release semantics,
respectively. This means they can be used for bit_spin_trylock and
535
bit_spin_unlock type operations without specifying any more barriers. ::
Nick Piggin's avatar
Nick Piggin committed
536 537 538 539 540 541 542 543 544

	int test_and_set_bit_lock(unsigned long nr, unsigned long *addr);
	void clear_bit_unlock(unsigned long nr, unsigned long *addr);
	void __clear_bit_unlock(unsigned long nr, unsigned long *addr);

The __clear_bit_unlock version is non-atomic, however it still implements
unlock barrier semantics. This can be useful if the lock itself is protecting
the other bits in the word.

Linus Torvalds's avatar
Linus Torvalds committed
545 546 547 548 549
Finally, there are non-atomic versions of the bitmask operations
provided.  They are used in contexts where some other higher-level SMP
locking scheme is being used to protect the bitmask, and thus less
expensive non-atomic operations may be used in the implementation.
They have names similar to the above bitmask operation interfaces,
550
except that two underscores are prefixed to the interface name. ::
Linus Torvalds's avatar
Linus Torvalds committed
551 552 553 554 555 556 557 558 559 560 561

	void __set_bit(unsigned long nr, volatile unsigned long *addr);
	void __clear_bit(unsigned long nr, volatile unsigned long *addr);
	void __change_bit(unsigned long nr, volatile unsigned long *addr);
	int __test_and_set_bit(unsigned long nr, volatile unsigned long *addr);
	int __test_and_clear_bit(unsigned long nr, volatile unsigned long *addr);
	int __test_and_change_bit(unsigned long nr, volatile unsigned long *addr);

These non-atomic variants also do not require any special memory
barrier semantics.

562 563 564
The routines xchg() and cmpxchg() must provide the same exact
memory-barrier semantics as the atomic and bit operations returning
values.
Linus Torvalds's avatar
Linus Torvalds committed
565

566 567 568 569 570
.. note::

	If someone wants to use xchg(), cmpxchg() and their variants,
	linux/atomic.h should be included rather than asm/cmpxchg.h, unless the
	code is in arch/* and can take care of itself.
571

Linus Torvalds's avatar
Linus Torvalds committed
572 573 574 575 576 577 578 579 580 581 582 583
Spinlocks and rwlocks have memory barrier expectations as well.
The rule to follow is simple:

1) When acquiring a lock, the implementation must make it globally
   visible before any subsequent memory operation.

2) When releasing a lock, the implementation must make it such that
   all previous memory operations are globally visible before the
   lock release.

Which finally brings us to _atomic_dec_and_lock().  There is an
architecture-neutral version implemented in lib/dec_and_lock.c,
584
but most platforms will wish to optimize this in assembler. ::
Linus Torvalds's avatar
Linus Torvalds committed
585 586 587 588 589 590 591 592 593 594 595 596 597 598

	int _atomic_dec_and_lock(atomic_t *atomic, spinlock_t *lock);

Atomically decrement the given counter, and if will drop to zero
atomically acquire the given spinlock and perform the decrement
of the counter to zero.  If it does not drop to zero, do nothing
with the spinlock.

It is actually pretty simple to get the memory barrier correct.
Simply satisfy the spinlock grab requirements, which is make
sure the spinlock operation is globally visible before any
subsequent memory operation.

We can demonstrate this operation more clearly if we define
599
an abstract atomic operation::
Linus Torvalds's avatar
Linus Torvalds committed
600 601 602 603 604 605 606 607 608 609

	long cas(long *mem, long old, long new);

"cas" stands for "compare and swap".  It atomically:

1) Compares "old" with the value currently at "mem".
2) If they are equal, "new" is written to "mem".
3) Regardless, the current value at "mem" is returned.

As an example usage, here is what an atomic counter update
610
might look like::
Linus Torvalds's avatar
Linus Torvalds committed
611

612 613 614
	void example_atomic_inc(long *counter)
	{
		long old, new, ret;
Linus Torvalds's avatar
Linus Torvalds committed
615

616 617 618
		while (1) {
			old = *counter;
			new = old + 1;
Linus Torvalds's avatar
Linus Torvalds committed
619

620 621 622
			ret = cas(counter, old, new);
			if (ret == old)
				break;
Linus Torvalds's avatar
Linus Torvalds committed
623 624 625
		}
	}

626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651
Let's use cas() in order to build a pseudo-C atomic_dec_and_lock()::

	int _atomic_dec_and_lock(atomic_t *atomic, spinlock_t *lock)
	{
		long old, new, ret;
		int went_to_zero;

		went_to_zero = 0;
		while (1) {
			old = atomic_read(atomic);
			new = old - 1;
			if (new == 0) {
				went_to_zero = 1;
				spin_lock(lock);
			}
			ret = cas(atomic, old, new);
			if (ret == old)
				break;
			if (went_to_zero) {
				spin_unlock(lock);
				went_to_zero = 0;
			}
		}

		return went_to_zero;
	}
Linus Torvalds's avatar
Linus Torvalds committed
652 653 654 655 656

Now, as far as memory barriers go, as long as spin_lock()
strictly orders all subsequent memory operations (including
the cas()) with respect to itself, things will be fine.

657
Said another way, _atomic_dec_and_lock() must guarantee that
Linus Torvalds's avatar
Linus Torvalds committed
658 659 660
a counter dropping to zero is never made visible before the
spinlock being acquired.

661 662 663 664
.. note::

	Note that this also means that for the case where the counter is not
	dropping to zero, there are no memory ordering requirements.