Generic errors
Ripple introduces a slightly new SPMD representation, in which the shape of an operation is propagated from its operands. This varies from the traditional SPMD abstraction used for GPUs, in which an “ambient” block shape is associated with a function every time the function is called.
This section goes through errors users have encountered. For each error, we explain the issue and present a solution.
Link-time errors
Undefined symbols reported by the linker traditionally happen when users fail to point clang to a library that contains the symbol definitions. In Ripple, there are two more sources for potential missing symbols, explained below.
Missing ripple_* symbols
Issue: clang complains about any of the following symbols being undefined:
- ripple_id
- ripple_set_block_shape
- ripple_get_block_size
- ripple_parallel
- ripple_parallel_full
Likely cause 1: Ripple is not activated through command-line options, and hence clang does not interpret the aforementioned symbols.
Fix to likely cause 1: Use clang with the -fenable-ripple flag.
For example:
$ clang -fenable-ripple -O2 [other options] my_prog.cpp
Function calls not being vectorized
When a call to a scalar function has a shape,
Ripple tries to find a vector equivalent of the scalar function
that can be used with said shape.
To support this, Ripple comes with a “default” vector library
for architecture-specific functions.
However, users can also define vector libraries for Ripple
(as explained in the Ripple Documentation).
One way these libraries are made available is
by compiling them to the LLVM “bitcode” format (.bc extension).
For instance, you or someone else may have made a my_lib.bc Ripple vector library available
in a folder, for instance /usr/lib/ripple/my_lib.bc.
In that case, it is necessary to tell clang where to look for Ripple libraries,
using the -fripple-lib flag, as follows:
$ clang -fripple-lib=/usr/lib/ripple`
without this flag, Ripple will not detect the vector version to use, and will create sequential calls to the scalar function. If the scalar function is not defined in the input code or in a library, the linker will fail with a undefined symbol error.
Non-deterministic SIMD writes
Effect: Compilation error.
To avoid unintended concurrent write hazards, Ripple doesn’t allow users to write SIMD code in which several block elements explicitly write to the same memory location, as for instance
void bad_write(float A[8], float B[8]) {
ripple_block_t B = ripple_set_block_shape(VEC, 8);
size_t v = ripple_id(B, 0);
A[0] = B[v]; // error! 8 different elements are written to A[0]
}
Solution: Explicitly choose a value to be written out,
as in the following good_write example functions.
void bad_write(float A[8], float B[8]) {
ripple_block_t B = ripple_set_block_shape(VEC, 8);
size_t v = ripple_id(B, 0);
A[0] = ripple_slice(B[v], 3); // We are writing B[3] to A[0]
}
Or in this overly simple case,
void bad_write(float A[8], float B[8]) {
A[0] = B[3]; // We are writing B[3] to A[0]
}
Interplay with the automatic broadcast rule
In the case when the shape of the right-hand-side (the “read”) of an assignment is compatible but smaller than its left-hand-side (the “write”), the automatic broadcast rule applies: the read is first broadcasted to match the shape of the write, and there is no issue. The only problem, in SIMD processing elements, comes when the write has lower dimension than the read. This is illustrated in the following example:
void auto_bcast_example(float a, float B[8]) {
ripple_block_t B = ripple_set_block_shape(VEC, 8);
size_t v = ripple_id(B, 0);
B[v] = a; // a gets auto-broadcasted to [8] before the write to B[v]
}
Interplay with scalar expansion
Ripple includes a convenient mechanism to propagate shapes through scalar
temporary values (cf Ripple Manual’s Implicit Scalar Expansion section).
For instance, in the following function,
scalar variable tmp gets automatically expanded to a 8-element vector.
void auto_expand_example(float A[8], float B[8]) {
ripple_block_t B = ripple_set_block_shape(VEC, 8);
size_t v = ripple_id(B, 0);
float tmp = 2 * B[v]; // the [8] shape of B[v] is propagated through tmp
A[v] = tmp;
}
This automatic expansion mechanism is only valid for temporary scalars. Pointer-based, arrays and data structure writes are typically not subject to automatic expansion. In these cases, it is the programmer’s responsibility to ensure that the shape of a SIMD write has enough dimensions to accept the incoming value.
The following example illustrates a pointer-based write that can’t be expanded:
void no_auto_expand(float * a, float B[8]) {
ripple_block_t B = ripple_set_block_shape(VEC, 8);
size_t v = ripple_id(B, 0);
// *a has a [1] "scalar" shape,
// but it is not a temporary scalar in 'no_auto_expand()`
*a = B[v]; // error
}
Control shape vs value shape
Effect: Correctness (unexpected results)
In Ripple, shape is propagated through values, not through control.
In other terms, the shape of a statement that lies in a control block
(for instance the “then” branch of an “if-then-else” statement)
is not influenced by the shape of the conditional.
This is illustrated in the following cond_shape() example,
in which a scalar computation is performed
under the control of a vector conditional.
cond_shape sets all the elements of A to 1 if any of B[0..7] is positive.
void cond_shape(int A[8], float B[8]) {
ripple_block_t B = ripple_set_block_shape(VEC, 8);
size_t v = ripple_id(B, 0);
int x = 0;
if (B[v] > 0) { // conditional of shape [8]
x = 1; // scalar computation -> shape not influenced by the conditional
}
A[v] = x; // x (0 or 1) is broadcast here to A[0..7]
}
Problem: intuitively, we may think that A[v] contains the result
of checking if B[v] > 0.
Solution 1: Avoid creating statements with smaller-dimensional shapes controlled by conditionals with higher-dimensional shapes, as these can be counter-intuitive.
In the cond_shape example, if we want A[v] to contain
the result of whether B[v] is positive,
we need to give x an explicit [8] shape, as follows:
void cond_shape_fixed(int A[8], float B[8]) {
ripple_block_t B = ripple_set_block_shape(VEC, 8);
size_t v = ripple_id(B, 0);
int x = ripple_broadcast(B, 0b1, 0); // 0 --> [0 0 0 0 0 0 0 0]
if (B[v] > 0) { // [8]-shaped conditional
x = 1; // [8]-shaped computation controlled by a [8]-shaped conditional.
}
A[v] = x; // the shape of x is already [8] -> elementwise write to A[v]
}
ripple_broadcast is often useful to explicitly adjust the shape
of computations that would otherwise have too small a shape.
Some extra ripple_broadcasts in the code are the price we pay for the
ability to mix scalar, vector and tensor computations in the same function.
Solution 2: If what you want is the semantics of cond_shape,
explicitly express that x should be an or redution of the values of B[v],
as in the cond_shape_explicit code below:
void cond_shape(int A[8], float B[8]) {
ripple_block_t B = ripple_set_block_shape(VEC, 8);
size_t v = ripple_id(B, 0);
int x = ripple_reduceor(0b1, B[v] > 0);
A[v] = x; // x is broadcast here to A[0..7]
}
Using block sizes that don’t match hardware vector sizes
Effect: compiler error or incorrect code.
Not all compiler backends are designed to gracefully support the lowering of
target-independent code that doesn’t exactly match full vector computations.
Assume for instance that our target SIMD machine is 512 bits wide,
and that its backend was production-tested only with full vectors.
The following code, fit_my_loop, could cause a lowering issue
for the target’s LLVM lowering backend.
void fit_my_loop(char in[67], char out[67]) {
// Here we make the block size with the data, as opposed to the SIMD hardware
ripple_block_t B = ripple_set_block_shape(VEC, 67);
size_t v = ripple_id(B, 0);
out[v] = - in[v];
}
Problem: compiler internal error or incorrect results when using block sizes that do not match the targeted SIMD hardware’s vector size. In the example above, the hardware target’s vector size is 64 bytes. However, the user requests the computation to be performed on a block (i.e. a vector) of 67 elements. The targeted compiler backend may not be good at lowering that to 64-byte vector code.
Solution: Add the -mllvm -ripple-pad-to-target-simd option to your
clang compilation command line.
This will activate a Ripple behavior, which produces explicit full vector
computations.
This way, even a target that can only lower full vector code
will work with Ripple.
An added bonus is that the performance behavior of the code
becomes more predictable as we increase the Ripple block size.
Incorrect ripple_to_vec and vec_to_ripple handling in c++ object members
We have noticed issues that appear when using aligned vectors with C++ object
member variables.
This issue seems to come from the underlying clang/LLVM compiler,
but it can affects the use of ripple_to_vec and vec_to_ripple
when applied to C++ member variables.
Problem: Some conversions of aligned C++ object members are incorrect.
Solution: Don’t apply the ripple_to_vec and vec_to_ripple conversions
to aligned C++ member variables.