YAKL
Public Member Functions | List of all members
yakl::LaunchConfig< VL, B4B > Struct Template Reference

This class informs YAKL parallel_for and parallel_outer routines how to launch kernels. More...

#include <YAKL_LaunchConfig.h>

Public Member Functions

 LaunchConfig ()
 set_inner_size() defaults to YAKL_DEFAULT_VECTOR_LEN More...
 
 LaunchConfig (LaunchConfig &&rhs)
 LaunchConfig objects may be copied or moved. More...
 
 LaunchConfig (LaunchConfig const &rhs)
 LaunchConfig objects may be copied or moved. More...
 
 ~LaunchConfig ()
 
void copyfrom (LaunchConfig const &rhs)
 
int get_inner_size () const
 Get the inner loop size for hierarchical parallelism. More...
 
Stream get_stream () const
 Get the stream in which this launch will run. More...
 
LaunchConfigoperator= (LaunchConfig &&rhs)
 LaunchConfig objects may be copied or moved. More...
 
LaunchConfigoperator= (LaunchConfig const &rhs)
 LaunchConfig objects may be copied or moved. More...
 
LaunchConfig set_inner_size (int num)
 This sets the actual inner looping size whereas the template parameter VL sets the maximum inner looping size. More...
 
LaunchConfig set_stream (Stream stream)
 Set the stream in which this launch will run. More...
 

Detailed Description

template<int VL = YAKL_DEFAULT_VECTOR_LEN, bool B4B = false>
struct yakl::LaunchConfig< VL, B4B >

This class informs YAKL parallel_for and parallel_outer routines how to launch kernels.

It contains two optional template parameters: (1) VL: When passed to parallel_for, this defines the inner looping size on the device (e.g. "block size" for CUDA and HIP. When passed to parallel_outer, this defines the maximum inner looping size on the device. (2) B4B: If this is set to true, then this tells parallel_for and parallel_outer to run the kernel serially (only when the -DYAKL_B4B CPP macro is defined) to enable bitwise determinism when desired for kernels with yakl::atomicAdd in them.

Parameters
VLFor parallel_for, this is the inner looping size. For parallel_outer, this is the maximum inner looping size.
B4BIf the CPP macro YAKL_B4B is also defined, B4B == true will force the kernel to run in serial, typically used for kernels that contain yakl::atomicAdd to maintain bitwise determinism run-to-run. If YAKL_B4B is not defined, the kernel runs normally.

Constructor & Destructor Documentation

◆ LaunchConfig() [1/3]

template<int VL = YAKL_DEFAULT_VECTOR_LEN, bool B4B = false>
yakl::LaunchConfig< VL, B4B >::LaunchConfig ( )
inline

set_inner_size() defaults to YAKL_DEFAULT_VECTOR_LEN

◆ ~LaunchConfig()

template<int VL = YAKL_DEFAULT_VECTOR_LEN, bool B4B = false>
yakl::LaunchConfig< VL, B4B >::~LaunchConfig ( )
inline

◆ LaunchConfig() [2/3]

template<int VL = YAKL_DEFAULT_VECTOR_LEN, bool B4B = false>
yakl::LaunchConfig< VL, B4B >::LaunchConfig ( LaunchConfig< VL, B4B > const &  rhs)
inline

LaunchConfig objects may be copied or moved.

◆ LaunchConfig() [3/3]

template<int VL = YAKL_DEFAULT_VECTOR_LEN, bool B4B = false>
yakl::LaunchConfig< VL, B4B >::LaunchConfig ( LaunchConfig< VL, B4B > &&  rhs)
inline

LaunchConfig objects may be copied or moved.

Member Function Documentation

◆ copyfrom()

template<int VL = YAKL_DEFAULT_VECTOR_LEN, bool B4B = false>
void yakl::LaunchConfig< VL, B4B >::copyfrom ( LaunchConfig< VL, B4B > const &  rhs)
inline

◆ get_inner_size()

template<int VL = YAKL_DEFAULT_VECTOR_LEN, bool B4B = false>
int yakl::LaunchConfig< VL, B4B >::get_inner_size ( ) const
inline

Get the inner loop size for hierarchical parallelism.

◆ get_stream()

template<int VL = YAKL_DEFAULT_VECTOR_LEN, bool B4B = false>
Stream yakl::LaunchConfig< VL, B4B >::get_stream ( ) const
inline

Get the stream in which this launch will run.

◆ operator=() [1/2]

template<int VL = YAKL_DEFAULT_VECTOR_LEN, bool B4B = false>
LaunchConfig& yakl::LaunchConfig< VL, B4B >::operator= ( LaunchConfig< VL, B4B > &&  rhs)
inline

LaunchConfig objects may be copied or moved.

◆ operator=() [2/2]

template<int VL = YAKL_DEFAULT_VECTOR_LEN, bool B4B = false>
LaunchConfig& yakl::LaunchConfig< VL, B4B >::operator= ( LaunchConfig< VL, B4B > const &  rhs)
inline

LaunchConfig objects may be copied or moved.

◆ set_inner_size()

template<int VL = YAKL_DEFAULT_VECTOR_LEN, bool B4B = false>
LaunchConfig yakl::LaunchConfig< VL, B4B >::set_inner_size ( int  num)
inline

This sets the actual inner looping size whereas the template parameter VL sets the maximum inner looping size.

◆ set_stream()

template<int VL = YAKL_DEFAULT_VECTOR_LEN, bool B4B = false>
LaunchConfig yakl::LaunchConfig< VL, B4B >::set_stream ( Stream  stream)
inline

Set the stream in which this launch will run.


The documentation for this struct was generated from the following file: