An implementation of the C11 <stdatomic.h>
interface
Table of Contents
Jens Gustedt
The implementation of the C11 atomic interface typically sits between the implementation of the core language by the C compiler and the implementation of the C library. It needs compiler support for the individual atomic operations and library supports for the cases where no low-level atomic instruction is available and a lock must be taken.
- This implementation builds entirely on the two gcc ABIs for atomics. It doesn't even attempt to go down to assembly level by itself.
- We provide all function interfaces that the two gcc ABIs and the C standard need.
- For compilers that don't offer the direct language support for atomics this provides a syntactically reduced but fully functional approach to atomic operations.
The sources of this library can be found at https://gforge.inria.fr/projects/stdatomic
A short description of the new futex-lock algorithm that is implemented for the Linux operating system has been accepted for SAC'16. A long version of that article can be found here: https://hal.inria.fr/hal-01236734
1 Implemented library features
We distinguish between the implementation of library functions and
the <stdatomic.h>
header file.
The latter already contains a lot of intelligence, because it has to provide type generic interfaces. This is more involved than usual C library header files.
The header file is optional, the created function interface should be compatible with the header files that gcc and clang may provide.
But you should be careful with gcc's header file. Up to now (Dec 2015) that header file has an unfixed bug: addition and subtraction of atomic types only does the increments in bytes, and not in elements of the base type of the pointer. Avoid their header, as long as this bug isn't fixed. If you are interested you can follow their bug at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64843
1.1 Type, constants and function interfaces
These are the types and proper functions that are foreseen by the standard:
atomic_flag
and its four functions- the
memory_order
enumeration type - fences
- object-like macros to test for lock-freeness and similar things
typedef
for atomic integer and pointer types.
All of these are provided in forms that are compatible with gcc and clang.
1.2 Type generic functions
These are all implemented as macros, and should in many cases result in optimized inlined assembler instructions, and not in library calls. Library calls are only needed as fall back, when there is no reasonable instruction set available.
This implementation uses predefined macros of the form
__GCC_HAVE_SYNC_COMPARE_AND_SWAP_X
where X
can be one of 1, 2, 4, 8 or 16. All versions of gcc and
clang since at least ten years implement these macros and the
underlying operations consistently.
If that macro exists, we suppose that the compiler is able to
synthesize all corresponding memory functions for types of size X
and all necessary arithmetic function for integer types of that
size, as lock-free (= stateless) instructions.
This doesn't mean that there are direct assembler instruction for
all these operations. They can well be implemented as an unbounded
loop that uses a compare_and_swap
(CAS) primitive for atomic
exchange. Gcc typically does this for the less common atomic
arithmetic instructions such as atomic_fetch_and
, for
example. Lock-free doesn't mean a bounded number of instructions.
Relevant for C11 are 9 operations
fetch_add
for integer and pointer addition, returning the previous valuefetch_sub
for integer and pointer subtraction, returning the previous valuefetch_or
for bitwise or, returning the previous valuefetch_and
for bitwise and, returning the previous valuefetch_xor
for bitwise xor, returning the previous valueload
for an atomic load operationstore
for an atomic store operationexchange
for an atomic exchange operation, equivalent to astore
that returns the previous valuecompare_exchange
for an atomic compare and exchange operation, equivalent to a conditionalstore
that also saves the previous value, and returnsfalse
ortrue
according to the success of the condition and the possiblestore
operation. Two semantically different interfaces are provided. Astrong
version that returnsfalse
andtrue
as described, and aweak
version that may fail eventually, even if the condition of equality was fulfilled.
According to recent precision given by the C standards committee
the store
, exchange
and compare_exchange
functions have
memory semantics, that is they are done as-if using memcpy
and
memcmp
not as-if using equality and assignment operations. This
distinction is important for two cases
- If the corresponding type has padding bits or bytes,
compare_exchange
can fail even if the abstract value of two objects would compare equal. This is because the padding, if different for the objects, enters into the comparison. - Floating point
NaN
seen as values always compares false, even if twoNaN
are compared. When compared as bit pattern throughmemcmp
they could compare equal.
These operations in C11 generally have two variants, one that uses sequential consistency and another one where a more detailed consistency can be requested through additional parameters. So in total we have to provide 20 different type generic functional interfaces.
For the operations that cannot be mapped to built-in compiler support
the compilers inserts calls to external functions. The names for
these functions are typically composed of the operation and
prefixed either by __sync_
(the older gcc ABI) or __atomic_
(the newer gcc ABI). The names of these calls can be suffixed by
_X
for X
as above if this concerns an operation on a type of
the corresponding width.
All external functions that the gcc ABI's require are provided.
1.2.1 The __atomic_
ABI
is already close to the C11 call interface as given above.
In addition to the more or less obvious operands, the built-in
functions take one or two additional parameters that reflect an
eventual requirement for the memory_order
of the operation. So
the functions represent the C11 "explicit" features such as
atomic_fetch_add_explicit
. The non-explicit versions have to be
mapped to the explicit version by providing memory_order_seq_cst
as parameter(s).
Observe that the built-in functions only foresee one interface
compare_exchange
.
- The distinction between
weak
andstrong
versions of these built-in functions are ruled through an additional parameter, not through a different function interface. The function symbol fall-back
__atomic_compare_exchange
confusingly has a different semantic and prototype than the built-in function. It misses the parameter to chose between the "weak" and the "strong" version, and solely corresponds to the C11 operationatomic_compare_exchange_strong_explicit
As said, load
, store
and compare_exchange
operations have
memory semantics. The implementation may use =
or ==
operators
in some places for optimization, but it then does so with objects
of uintXX_t
, so every bit is accounted for.
Function call interfaces for the arithmetic operations are only
generated if we can suppose that an integer type for the
corresponding size exists. We can reasonably assume that there are
always types uint8_t
, uint16_t
, uint32_t
and uint64_t
, so
the variants for 1, 2, 4 and 8 can always be generated.
For a 128 bit type these are only generated if __SIZEOF_INT128__
or __GCC_HAVE_SYNC_COMPARE_AND_SWAP_16
exist. If so, we assume
that __uint128_t
is such an integer type and known to the
compiler.
Arithmetic operations can safely use these uintXX_t
types
internally, since the standard imposes two's complement
representation for signed atomic types and also enforces that
atomic operations may not produce traps on overflow.
Additionally to the operations that have generic function interfaces in the C11 standard, gcc additionally implements six other built-ins, namely
__atomic_add_fetch
for integer or pointer addition, returning the updated value__atomic_sub_fetch
for integer or pointer subtraction, returning the updated value__atomic_or_fetch
for bitwise or, returning the updated value__atomic_and_fetch
for bitwise and, returning the updated value__atomic_xor_fetch
for bitwise xor, returning the updated value__atomic_fetch_nand
for bitwise nand (x = ~(x & v)
), returning the previous value__atomic_nand_fetch
for bitwise nand (x = ~(x & v)
), returning the updated value
For the completeness of the library interface we supply analogous
functions with the _X
suffix for these. They might be called by
the compiler if the user code uses assign and add or similar
operators on atomic integers. The __atomic_add_fetch
and
__atomic_sub_fetch
functions may also eventually be used by the
compiler to implement an atomic prefix increment or decrement
operation (++x
and --x
). This would e.g happen if x
is an
object of type __int128_t
and the platform doesn't implement
lock-free atomics for types of size 16.
1.2.2 Clang's __c11_atomic
built-ins
Clang has gone a different path for the built-ins that implement
C11 atomics, prefixed with __c11_atomic
. These are a directly
feature equivalent to the C11 generic functions that have
memory_order
arguments (_explicit
suffix).
For the cases that no atomic instructions can be synthesized,
clang falls back to the same external calls as described for gcc's
__atomic
ABI.
1.2.3 The __sync
ABI
It dates back long before the C11 atomic interface had been designed and thus cannot be directly conforming to it. It has basically the same built-ins for arithmetic types as above, only that
- The functions are named a bit differently.
- They only implement sequential consistency.
- There are no
load
,store
orexchange
features. - The
nand
operations changed their meaning from version 4.4 onward. Therefore this operation cannot be used portably in an environment that might use different versions of compilers. So we don't implement these function interfaces and we deprecate the use of this built-in.
Additionally this interface also implements a test_and_set
functionality that is used to implement the atomic_flag
functions. This built-in is documented to have acquire-release
consistency. If used with sequential consistency, an additional
fence is inserted to ensure that.
These features are sufficient to provide a decent implementation of C11 atomics.
1.2.4 The lock-full fallback functions
In absence of proper architecture support, all fallbacks (for
the three built-in families) with _X
suffix use the ones without
suffix underneath. These external interfaces receive the size of
the data type as an additional, leading parameter:
__atomic_load
__atomic_store
__atomic_exchange
__atomic_compare_exchange
They have pure memory semantics and their basic operations are
memcpy
and memcmp
for load, store and comparison.
These functions cannot be called directly from within your code, because the compiler cannot distinguish them from the gcc built-ins, and they have different prototypes than these.
We implement these functions as critical sections that are protected with a lock, similar to a mutex. This implementations uses a table of locks and a hash function to choose one of the entries that only depends on the address of the atomic object.
At the moment, this implementation has several address-hash functions that can be chosen a library-compile time. Any function that mixes the bits of the address should perform reasonably well.
More important for performance is the choice of the lock. Such a lock can be relatively simple, since C11 atomics that are not lock-free don't have to be asynchronous signal safe.
There are several possibilities, in order of preference:
- An OS specific light-weighted lock with non-active waits. The
integration into
musl
uses Linux'futex
underneath to do an efficient wait. If by coincidence these are called in an un-threaded process, they are close to non-ops. - C11's
mtx_t
type has an shallow interface that should allow it to be implemented a bit simpler and efficient than OS specific mutexes that implement a lot of functionality. This solution should be portable to all platforms that implement this part of C11. In a relatively near future these could be all POSIX and Windows platforms. This approach has the disadvantage that a table ofmtx_t
must be initialized at process startup becausemtx_t
doesn't guarantee static initialization. - POSIX'
pthread_mutex_t
is a little less portable, but allows for static initialization. A spinlock similar to
atomic_flag
. Such an approach is portable to all platforms that implement atomics and allows for static initialization. This is the only choice when compiled without OS or library support.The wait functionality is an active wait, that burns CPU cycles and memory bandwidth. In many circumstances this should do well, the critical sections that are protected by this are nice and small.
2 The <stdatomic.h>
header file
2.1 Full C11 support
Versions of gcc and clang that fully implement the C11 atomics interface will not need a special header file but can use their own that is shipped with the compiler:
- gcc starting with version 4.9
- clang starting with version 3.6
This full support of atomics allows to use atomic objects just as other objects it whatever operations the base type supports.
These default operations on atomics use sequential consistency. That is, each such an operation will enforce a full memory transfer and the perceived effect is as if all these operations, even if issued in different threads, have been done one after another. Thus, thread parallelism can only play between such operations:
atomics operations are expensive
The functional interfaces with different memory_order
arguments
(_explicit
suffix to the name) that we described above may be used
to milder the memory effect that atomic operations have. The
possible gain of such different memory consistency models are very
architecture dependent. E.g on the x86 platforms they offer almost
no advantage, whereas on ARM platforms acquire/release semantics may
bring some noticeable gain.
But beware that this gain is bought with a sensible complexification of the code. Only use this if the atomic operations are a measurable performance bottleneck and you already have reduced the number of these operations to a minimum.
2.2 Partial C11 atomics support
A series of compiler versions offers partial atomics support that already implements most of the C11 semantic:
- gcc versions 4.7 and 4.8
- clang versions 3.2 to 3.5
These versions provide the built-in functions as described above but lack full compiler support for atomic types and operations.
With the <stdatomic.h>
header that we supply for these compilers,
application code can use the functional interfaces. A macro
_Atomic(T)
is provided that can be used to issue emulated
declarations of atomic types that should be forward compatible to
platforms with complete C11 atomics support. Example:
// global variables _Atomic(size_t) thread_inside_count = ATOMIC_VAR_INIT(0); _Atomic(size_t) thread_total_count = ATOMIC_VAR_INIT(1); int my_thread_function(void* arg) { atomic_fetch_add(&thread_inside_count, 1); atomic_fetch_add(&thread_total_count, 1); // do something complicated here // at the end atomic_fetch_sub(&thread_inside_count, 1); }
Underneath such emulated atomic objects are implemented as arrays of
volatile
base type of size 1. This has the following sought
effects:
- They can't be assigned to.
- They evaluate to a pointer in almost any context.
- Operations with them cannot be reordered by the compiler.
So you should be relatively safe from programming errors that would access such objects without passing through the type generic atomic functions. The compiler will error out on improper usage of such atomic objects, but the diagnostics may be a bit crude.
2.2.1 Issues
Since this approach may reinterpret data through pointer casts, it could potentially be dangerous. So let us discuss the possible issues.
- The generic fallbacks for memory access only use
memcpy
andmemcmp
to access the data itself. So the access of the data is within the constraints of the standard. - The generic fallbacks for memory access ensure that their arguments have compatible base types (if a pointer is passed in) or are assignment compatible with the base type of the atomic (if a value is passed in). So data that is copied across can never be misinterpreted as being of a wrong type because the two target types are compatible.
- The specialized functions with
_X
suffix may reinterpret their data as the correspondinguintXX_t
for the size. Copying or comparing such data is always guaranteed to use all bits, so in that sense it is equivalent tomemcpy
andmemcmp
. - The arithmetic operations that are executed then are operations on an unsigned integer type that has no padding bits. This arithmetic is compatible for all integer types that have no padding bits and, for the signed types, are represented with two's complement.
- An emulated atomic with this approach is implemented as an array to the base type, and so in the user code the base type of the object remains visible to the compiler. As a consequence this approach has no effect on the aliasing rules, the compiler always has complete information about the type of each object.
The only potential problem for our approach that remains is
alignment. Since the stub functions that are provided may use
casts to uintXX_t
of "atomic" objects you have to ensure that
these objects are at least aligned as these types would be. This
should not be a problem, if the base type is an integer type,
too. Integer types with same size should have the same alignment.
If you encounter problems with a user defined type that has a size that is a small power of two you could force alignment
_Alignas(sizeof(toto)) _Atomic(toto) toto1; __attribute__((__aligned__(sizeof(toto)))) _Atomic(toto) toto2;
with whatever of the two constructs works for you.
I am currently struggling to provide a version of the _Atomic(T)
macro that ensures that automatically. It seems to be possible but
produces a lot of noise for function parameters that are pointers
to atomics.
2.3 Basic atomics support
Even older versions of gcc and clang implement the __sync
built-in
functions and can thereby made to accept the same <stdatomic.h>
header as discussed above. Since, as their names indicate, these
built-ins only have fully synchronizing versions, they will not be
able to take advantage of the different consistency models. But
implementing atomics with stronger consistency than required, here
sequential consistency, only, is conforming to the C standard.
3 The implementation
3.1 Requirements
3.1.1 Compilers
You should be able to compile this implementation with any version of modern gcc and clang. (Versions are hard to tell, gcc should work for 4.1) The quality of the resulting binary will depend on the implementation of atomic support by the compiler.
There are three different implementations, for modern clang and gcc,
and one for those compilers that only support the __sync_
built-ins. They are only tested with clang and gcc, but might work
with other compilers that implement one of the sets of built-ins and
is otherwise compatible to some gcc extensions:
- compound expressions with
({ })
__typeof__
__attribute__((__unused__))
__builtin_choose_expr
for the__sync
version as a precursor of C11's_Generic
#pragma redefine_extname
to rename the external symbols that are produced
If aligment happens to be an issue you might also need
__attribute__((__aligned__(something)))
__alignof__
or the equivalent C11 features _Alignas
and _Alignof
.
There are some heuristics in place to decide at compile time which
case applies, namely __clang__
to detect clang, __ATOMIC_...
macros to detect the C11 versions of the built-ins.
3.1.2 OS or C library support
The library may work with different lock constructs, as described
above, that is a futex based support for Linux, C11's mtx_t
,
POSIX' pthread_mutex_t
, and active spinning as a last
resort. You may find a description of the algorithms and some
performance figures in the article
https://hal.inria.fr/hal-01236734
If you would like to see support for other OS or runtime environments, don't hesitate to contact me if you'd like to work on implementing and integrating this.
3.2 Caveats
3.2.1 Symbol renaming
There is one important difficulty when compiling this. The original
__atomic
library interface was developed with C++ in mind and not
C. Therefore it freely uses function overloading for the built-ins
versus the library interface. Since we also use the library
functions as fallbacks in the implementation of some of the _X
variants this naming scheme is not supportable with a C compiler.
We get away with it by using internal names, prefixed with __impl_
for all functions. Then a gcc extension is used to map that internal
name to an external name, e.g
#pragma redefine_extname __impl_load __atomic_load
If your compiler doesn't support this feature, you'd have to use an
external tool such as objcopy
to achieve the same.
3.2.2 Support of 16 byte atomic instructions
The main difference for modern processors that is relevant here is if it supports 16 byte atomic instructions or not. There is no difficulty to detect this at compile time, but if the library is used with code that is compiled with a different compiler or just different compiler options, incompatible binary code may be produced.
My plan is to freeze that feature at compile time of the library
and reflect the capacity in the <stdatomic.h>
that is
provided. This then may result in code that is a bit less
optimized than it could, but that is compatible.
- If the library is not compiled with direct 16 byte support the application may not use it, and thus use a memory implementation for such operations.
- If the library is compiled with direct 16 byte support but the application compiler doesn't support it, the user code should fallback to library calls, but which in turn use the atomic instructions. So such a variant would have a call overhead and would not be able to inline the atomics in the user binary.
I already have a working implementation of such a safety feature in some other code, so this is feasible. But for the moment this is not yet done, here. Be careful when using this preliminary version.
3.3 Leftovers
There are some leftovers that will hopefully disappear.
- There are several hash functions and a instrumentation infrastructure for the hashes. I didn't have enough test cases yet to see what would be best, here.
3.4 Instrumentation and testing
3.4.1 Instrumentation
There is optional instrumentation for the lock functions. Switching it on changes overall performance substantially, and thus I'd expect a noticeable effect by the observation principle. These counters can give qualitative information about what happens, but you shouldn't take the figures verbally. Also these counters are only protected if you test the library with only one lock, using atomics for these counters themselves would have a strong performance impact and the resulting statistics would basically be worthless.
You can switch the instrumentation of the code on by defining the
symbol BENCH
at compile time. A function atomic_summary
can be
used at the end of all operations to print the collected data to
stderr
.
3.4.2 Code injection
To test the behavior of the locking algorithm you may inject a function call just after the acquisition of the lock. Thereby you can e.g force the thread that obtains the lock to be descheduled, and test the worst-case behavior of the locking algorithm.
This feature is switched on by defining the macro ATOMIC_INJECT
at compile time. By that you have a thread local variable
atomic_faulty
and a function interface atomic_inject
at your
disposal, namely atomic_inject
is called iff atomic_faulty
is
true for the calling thread.
There is a "weak" version of atomic_inject
that does nothing. It
can be overwritten by a specific version that you provide
yourself. E.g for the benchmarks using
Modular C in the
article that we mentionned
above, slow path of the algorithm is stressed by simply calling
thrd_yield
.
The variable atomic_faulty
can be used to switch the code
injection on and off, such that you may experiment with different
probabilities of failure.
4 Performance considerations
4.1 Benchmarks
I have run a long series of benchmarks to validate the approach. The code for the benchmark is at the moment integrated in p11 with comes with Modular C, see Cmod. To compile it you'd need
- a C11 compliant library that has C11 threads, I only know of musl, or an implementation of POSIX' threads that can be used to emulate C11 threads.
- a C11 compiler that also has gcc extension. I tested with gcc and clang.
- Cmod
- P99, my old macro library. This one could probably avoided, it is just needed for some parts of p11.
The test in p11 is called p11#test#lifo. It is based on a stack implementation (Last In First Out) that uses an atomic pair of items for the head to avoid the ABA problem.
Please refer to the article for some results of the benchmarks.
4.2 Code inspection
4.2.1 Lower range of thread numbers
For this application the performance in the lower range of is
largely dominated by the fast path, that is by a very small number
of assembler instructions that constitute the good case, when a
thread doesn't encounter congestion. On a x86_64
machine, our
implementation of the four different categories result in the
following memory instructions:
lock | unlock | |
---|---|---|
spinlock | cmpxchgl |
movl |
futex | cmpxchgl |
lock addl |
mutex | cmpxchgl |
movl , xchg |
musl | xchg |
movl , mov , lock orl , mov |
The spinlock and futex implementation here have very similar performance, because they have a minimal number memory instructions.
Musl's internal lock implementation actually looses for the
unlock. It has four different memory instructions. Two of them
originate from the internal macro a_store
, which needs a
synchronization of the mov
instruction to avoid reordering on
the processor. It results in two instructions:
mov eax, (%rdi) lock orl (%rsp)
We observed an improvement whe a_store
is implemented directly
with on atomic instruction, e.g.
xchg %eax, (%rdi)
Such a change could perhaps be integrated into musl at a later stage.
The mutex implementations have two memory instructions for the
unlock functions. One movl
from memory to CPU for a waiters
counter, and one xchg
to manipulate the lock itself.
Our implementation attempts to combine the two instructions for
unlock into one: on the fast path we only need one atomic
addition. By that we are better than the mutex, we save one
movl
instruction for the waiters counter. We may be a bit
worse than the spinlock, because that only has a write to memory
to perform, and doesn't need information from memory to be
returned to the CPU.
5 Installation
As said above it will be important that your compiled library and
your user code agree on the model of the atomics that they have for
16-byte data types. Be careful, compiler options that change the
processor model can change this characteristic, e.g if you compile
with gcc
and the option -march=native
.
If your compiler (gcc
or clang
) already has a working
stdatomic.h
file in place, you have nothing to do from that
part. If not, you should install all files of the form
atomic_.....h
in an include directory where your compiler can find it when you compile application code.
5.1 Musl
5.1.1 direct integration
If you have the sources of musl the easiest is to integrate the library directly into the libc. To achieve that just do
make MUSL=your/path/to/musl musl
This copies all necessary code to a subdirectory of your musl path. Then just compile and install musl as you would do usually.
By default this chooses the futex
implementation of the generic
lock function.
5.1.2 building externally
If you want to compile a standalone libstdatomic.a
library archive
file, you first need to compile one object file that encapsulates
all system calls to futex
. This can be done with the same command
as above
make MUSL=your/path/to/musl musl
and then
make MUSL=your/path/to/musl libstdatomic.a
This will compile the one file inside the musl src
hierarchy and
then assemble all to one "standalone" library.
5.2 Other C libraries
Up to now I only tested with glibc
as other C library. If you
have experience with other libraries please let me know.
For the moment the futex
generic lock only works with musl, so
you have to chose another one when musl is not available. You
choose the version through a define of ATOMIC_GENERIC_LOCK
:
ATOMIC_GENERIC_LOCK_PTHREAD
- uses a
pthread_mutex_t
for the lock. It will be the choice for most, unless you have a C library that already implements C11 threads, then you'd use ATOMIC_GENERIC_LOCK_MTX
- and a
mtx_t
. ATOMIC_GENERIC_LOCK_CMPXCHG
- is the last resort if you have neither of the two above. This implements just a spinlock. It is only suited for applications that don't have a strong congestion on any atomic operation. If there is a lot of congestion, the application will suffer dramatically.
With this choice, you may compile the library with a simple make
make CONFIG='-DATOMIC_GENERIC_LOCK=ATOMIC_GENERIC_LOCK_PTHREAD' libatomic.a
6 Terms
6.1 Copyright and License
The stdatomic library as a whole is licensed under the following standard MIT license: ---------------------------------------------------------------------- Copyright © 2015-2017 Jens Gustedt Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ----------------------------------------------------------------------
6.2 Distribution
This work is distributed at http://stdatomic.gforge.inria.fr/.