|
YAKL
|
This class informs YAKL parallel_for and parallel_outer routines how to launch kernels.
More...
#include <YAKL_LaunchConfig.h>
Public Member Functions | |
| LaunchConfig () | |
| set_inner_size() defaults to YAKL_DEFAULT_VECTOR_LEN More... | |
| LaunchConfig (LaunchConfig &&rhs) | |
| LaunchConfig objects may be copied or moved. More... | |
| LaunchConfig (LaunchConfig const &rhs) | |
| LaunchConfig objects may be copied or moved. More... | |
| ~LaunchConfig () | |
| void | copyfrom (LaunchConfig const &rhs) |
| int | get_inner_size () const |
| Get the inner loop size for hierarchical parallelism. More... | |
| Stream | get_stream () const |
| Get the stream in which this launch will run. More... | |
| LaunchConfig & | operator= (LaunchConfig &&rhs) |
| LaunchConfig objects may be copied or moved. More... | |
| LaunchConfig & | operator= (LaunchConfig const &rhs) |
| LaunchConfig objects may be copied or moved. More... | |
| LaunchConfig | set_inner_size (int num) |
This sets the actual inner looping size whereas the template parameter VL sets the maximum inner looping size. More... | |
| LaunchConfig | set_stream (Stream stream) |
| Set the stream in which this launch will run. More... | |
This class informs YAKL parallel_for and parallel_outer routines how to launch kernels.
It contains two optional template parameters: (1) VL: When passed to parallel_for, this defines the inner looping size on the device (e.g. "block size" for CUDA and HIP. When passed to parallel_outer, this defines the maximum inner looping size on the device. (2) B4B: If this is set to true, then this tells parallel_for and parallel_outer to run the kernel serially (only when the -DYAKL_B4B CPP macro is defined) to enable bitwise determinism when desired for kernels with yakl::atomicAdd in them.
| VL | For parallel_for, this is the inner looping size. For parallel_outer, this is the maximum inner looping size. |
| B4B | If the CPP macro YAKL_B4B is also defined, B4B == true will force the kernel to run in serial, typically used for kernels that contain yakl::atomicAdd to maintain bitwise determinism run-to-run. If YAKL_B4B is not defined, the kernel runs normally. |
|
inline |
set_inner_size() defaults to YAKL_DEFAULT_VECTOR_LEN
|
inline |
|
inline |
LaunchConfig objects may be copied or moved.
|
inline |
LaunchConfig objects may be copied or moved.
|
inline |
|
inline |
Get the inner loop size for hierarchical parallelism.
|
inline |
Get the stream in which this launch will run.
|
inline |
LaunchConfig objects may be copied or moved.
|
inline |
LaunchConfig objects may be copied or moved.
|
inline |
This sets the actual inner looping size whereas the template parameter VL sets the maximum inner looping size.
|
inline |
Set the stream in which this launch will run.
1.8.17