YAKL
|
This class informs YAKL parallel_for
and parallel_outer
routines how to launch kernels.
More...
#include <YAKL_LaunchConfig.h>
Public Member Functions | |
LaunchConfig () | |
set_inner_size() defaults to YAKL_DEFAULT_VECTOR_LEN More... | |
LaunchConfig (LaunchConfig &&rhs) | |
LaunchConfig objects may be copied or moved. More... | |
LaunchConfig (LaunchConfig const &rhs) | |
LaunchConfig objects may be copied or moved. More... | |
~LaunchConfig () | |
void | copyfrom (LaunchConfig const &rhs) |
int | get_inner_size () const |
Get the inner loop size for hierarchical parallelism. More... | |
Stream | get_stream () const |
Get the stream in which this launch will run. More... | |
LaunchConfig & | operator= (LaunchConfig &&rhs) |
LaunchConfig objects may be copied or moved. More... | |
LaunchConfig & | operator= (LaunchConfig const &rhs) |
LaunchConfig objects may be copied or moved. More... | |
LaunchConfig | set_inner_size (int num) |
This sets the actual inner looping size whereas the template parameter VL sets the maximum inner looping size. More... | |
LaunchConfig | set_stream (Stream stream) |
Set the stream in which this launch will run. More... | |
This class informs YAKL parallel_for
and parallel_outer
routines how to launch kernels.
It contains two optional template parameters: (1) VL
: When passed to parallel_for
, this defines the inner looping size on the device (e.g. "block size" for CUDA and HIP. When passed to parallel_outer
, this defines the maximum inner looping size on the device. (2) B4B
: If this is set to true
, then this tells parallel_for
and parallel_outer
to run the kernel serially (only when the -DYAKL_B4B
CPP macro is defined) to enable bitwise determinism when desired for kernels with yakl::atomicAdd in them.
VL | For parallel_for , this is the inner looping size. For parallel_outer , this is the maximum inner looping size. |
B4B | If the CPP macro YAKL_B4B is also defined, B4B == true will force the kernel to run in serial, typically used for kernels that contain yakl::atomicAdd to maintain bitwise determinism run-to-run. If YAKL_B4B is not defined, the kernel runs normally. |
|
inline |
set_inner_size() defaults to YAKL_DEFAULT_VECTOR_LEN
|
inline |
|
inline |
LaunchConfig objects may be copied or moved.
|
inline |
LaunchConfig objects may be copied or moved.
|
inline |
|
inline |
Get the inner loop size for hierarchical parallelism.
|
inline |
Get the stream in which this launch will run.
|
inline |
LaunchConfig objects may be copied or moved.
|
inline |
LaunchConfig objects may be copied or moved.
|
inline |
This sets the actual inner looping size whereas the template parameter VL
sets the maximum inner looping size.
|
inline |
Set the stream in which this launch will run.