YAKL
Namespaces | Classes | Typedefs | Functions | Variables
yakl Namespace Reference

Namespaces

 c
 Contains Bounds class, and parallel_for() routines using C-style indexing and ordering.
 
 componentwise
 This namespace contains routines that perform element-wise / component-wise operations on Array, SArray, and FSArray objects.
 
 fortran
 Contains Bounds class, and parallel_for() routines using Fortran-style indexing and ordering.
 
 intrinsics
 The intrinsics namespace contains a limited Fortran-like intrinsics library.
 
 simd
 Holds YAKL's Pack class and operators to encourage SIMD vectorization.
 

Classes

class  Array
 This declares the yakl::Array class. Please see the yakl::styleC and yakl::styleFortran template specializations for more detailed information about this class. More...
 
class  Array< T, rank, myMem, styleC >
 This implements the yakl:Array class with yakl::styleC behavior. More...
 
class  Array< T, rank, myMem, styleFortran >
 This implements the yakl:Array class with yakl::styleFortran behavior. More...
 
class  ArrayBase
 This class implements functionality common to both yakl::styleC and yakl::styleFortran Array objects. More...
 
struct  Bnd
 Describes a single bound for creating Fortran-style yakl::Array objects. More...
 
class  Bnds
 This class holds Fortran-style dimensions for using in creating yakl::Array objects. More...
 
class  CSArray
 C-style array on the stack similar in nature to, e.g., float arr[ny][nx]; More...
 
class  Dims
 This class holds C-style dimensions for using in yakl::Array objects. More...
 
struct  Event
 Implements the functionality of an event within a stream. The event is not created until the Event::create() function is called. More...
 
class  FSArray
 Fortran-style array on the stack similar in nature to, e.g., float arr[ny][nx]; More...
 
class  Gator
 YAKL Pool allocator class. More...
 
class  InitConfig
 An object of this class can optionally be passed to yakl::init() to configure the initialization. IMPORTANT: Creating an InitConfig object pings environment variables, making it quite expensive to create. Please do not create a lot of these. More...
 
struct  InnerHandlerEmpty
 This class is necessary for coordination of two-level parallelism. More...
 
struct  LaunchConfig
 This class informs YAKL parallel_for and parallel_outer routines how to launch kernels. More...
 
class  Random
 Non-cryptographic pseudo-random number generator with a very small internal state. More...
 
class  RealFFT1D
 Compute batched real-to-complex forward and inverse FFTs on yakl::Array objects using vendor libraries. More...
 
class  SB
 This specifies a set of bounds for a dimension when declaring a yakl::FSArray. More...
 
class  ScalarLiveOut
 Class to handle scalars that exist before kernels, are written to by kernels, and read after the kernel terminates. More...
 
class  SimpleNetCDF
 Simple way to write yakl::Array objects to NetCDF files. More...
 
class  SimplePNetCDF
 Simple way to write yakl::Array objects to NetCDF files in parallel. More...
 
struct  Stream
 Implements the functionality of a stream for parallel kernel execution. If the Stream::create() method is not called on this object, then it is the default stream. More...
 
struct  StreamList
 Implements a list of Stream objects. Needs to store a pointer to avoid construction on the device since Array objects need to store a list of streams on which they depend. More...
 

Typedefs

using DefaultLaunchConfig = LaunchConfig<>
 This launch configuration sets vector length to the device default and B4B to false. More...
 
using DefaultLaunchConfigB4b = LaunchConfig< YAKL_DEFAULT_VECTOR_LEN, true >
 launch configuration sets B4B == true with the default VecLen. More...
 
typedef unsigned int index_t
 
typedef struct yakl::InnerHandlerEmpty InnerHandler
 This class is necessary for coordination of two-level parallelism. More...
 
template<int VecLen = YAKL_DEFAULT_VECTOR_LEN>
using LaunchConfigB4b = LaunchConfig< VecLen, true >
 launch configuration sets B4B == true with a user-specified VecLen. More...
 
template<class T , int rank, unsigned D0, unsigned D1 = 1, unsigned D2 = 1, unsigned D3 = 1>
using SArray = CSArray< T, rank, D0, D1, D2, D3 >
 Most often, codes use the type define yakl::SArray rather than yakl::CSArray. More...
 
typedef unsigned int uint
 

Functions

void * alloc_device (size_t bytes, char const *label)
 Allocate on the device using YAKL's device allocator. More...
 
template<class T >
YAKL_INLINE void atomicAdd (T &update, T value)
 [NON_B4B] yakl::atomicAdd(update,value) atomically performs update += value) More...
 
template<class T >
YAKL_INLINE void atomicMax (T &update, T value)
 yakl::atomicMax(update,value) atomically performs update = max(update,value) More...
 
template<class T >
YAKL_INLINE void atomicMin (T &update, T value)
 yakl::atomicMin(update,value) atomically performs update = min(update,value) More...
 
void check_last_error ()
 Checks to see if an error has occurred on the device. More...
 
Stream create_stream ()
 Create and return a Stream object. It is guaranteed to not be the default stream. More...
 
void fence ()
 Block the host code until all device code has completed. More...
 
YAKL_INLINE void fence_inner (InnerHandler &handler)
 Block inner threads until all inner threads have completed. More...
 
void finalize ()
 Finalize the YAKL runtime. More...
 
void free_device (void *ptr, char const *label)
 Free on the device using YAKL's device deallocator. More...
 
Gatorget_pool ()
 
void init (InitConfig config=InitConfig())
 Initialize the YAKL runtime. More...
 
bool isInitialized ()
 Determine if the YAKL runtime has been initialized. I.e., yakl::init() has been called without a corresponding call to yakl::finalize(). More...
 
template<class T1 , class T2 , typename std::enable_if< std::is_same< typename std::remove_cv< T1 >::type, typename std::remove_cv< T2 >::type >::value, int >::type = 0>
void memcpy_device_to_device (T1 *dst, T2 *src, index_t elems, Stream stream=Stream())
 [USE AT YOUR OWN RISK]: memcpy the specified number of elements on the device More...
 
void memcpy_device_to_device_void (void *dst, void *src, size_t bytes, Stream stream=Stream())
 [USE AT YOUR OWN RISK]: memcpy the specified number of bytes on the device More...
 
template<class T1 , class T2 , typename std::enable_if< std::is_same< typename std::remove_cv< T1 >::type, typename std::remove_cv< T2 >::type >::value, int >::type = 0>
void memcpy_device_to_host (T1 *dst, T2 *src, index_t elems, Stream stream=Stream())
 [USE AT YOUR OWN RISK]: memcpy the specified number of elements from device to host More...
 
template<class T1 , class T2 , typename std::enable_if< std::is_same< typename std::remove_cv< T1 >::type, typename std::remove_cv< T2 >::type >::value, int >::type = 0>
void memcpy_host_to_device (T1 *dst, T2 *src, index_t elems, Stream stream=Stream())
 [USE AT YOUR OWN RISK]: memcpy the specified number of elements from host to device More...
 
template<class T1 , class T2 , typename std::enable_if< std::is_same< typename std::remove_cv< T1 >::type, typename std::remove_cv< T2 >::type >::value, int >::type = 0>
void memcpy_host_to_host (T1 *dst, T2 *src, index_t elems)
 [USE AT YOUR OWN RISK]: memcpy the specified number of elements on the host More...
 
void memcpy_host_to_host_void (void *dst, void *src, size_t bytes)
 [USE AT YOUR OWN RISK]: memcpy the specified number of bytes on the host More...
 
template<unsigned int n, class real >
YAKL_INLINE void pentadiagonal (SArray< real, 1, n > const &a, SArray< real, 1, n > const &b, SArray< real, 1, n > const &c, SArray< real, 1, n > const &d, SArray< real, 1, n > const &e, SArray< real, 1, n > const &f, SArray< real, 1, n > &u)
 Performs a small non-periodic pentadiagonal solve. Click for more details. More...
 
template<unsigned int n, class real >
YAKL_INLINE void pentadiagonal_periodic (SArray< real, 1, n > const &a, SArray< real, 1, n > const &b, SArray< real, 1, n > const &c, SArray< real, 1, n > const &d, SArray< real, 1, n > const &e, SArray< real, 1, n > const &f, SArray< real, 1, n > &x)
 Performs a small periodic pentadiagonal solve. Click for more details. More...
 
Event record_event (Stream stream=Stream())
 Create, record, and return an event using the given stream. More...
 
void register_finalize_callback (std::function< void()> callback)
 Add a host-only callback to be called just before YAKL finalizes. This is useful for ensuring allcoated global variables are always deallocated before YAKL finalization. More...
 
void set_device_allocator (std::function< void *(size_t)> func)
 Override YAKL's device allocator with the passed function (No Label). More...
 
void set_device_allocator (std::function< void *(size_t, char const *)> func)
 Override YAKL's device allocator with the passed function (WITH Label). More...
 
void set_device_deallocator (std::function< void(void *)> func)
 Override YAKL's device deallocator with the passed function (No Label). More...
 
void set_device_deallocator (std::function< void(void *, char const *)> func)
 Override YAKL's device deallocator with the passed function (WITH Label). More...
 
void set_timer_finalize (std::function< void()> func)
 Override YAKL's default timer finalization routine. More...
 
void set_timer_init (std::function< void()> func)
 Override YAKL's default timer initialization routine. More...
 
void set_timer_start (std::function< void(char const *)> func)
 Override YAKL's default routine to start an individual timer. More...
 
void set_timer_stop (std::function< void(char const *)> func)
 Override YAKL's default routine to stop an individual timer. More...
 
void set_yakl_allocators_to_default ()
 Return all YAKL allocators to their defaults. More...
 
void timer_finalize ()
 Finalize the YAKL timers. More...
 
void timer_init ()
 Initialize the YAKL timers. More...
 
void timer_start (char const *lab)
 Start a timer with the given string label. NOTE: Timers must be perfectly nested. More...
 
void timer_stop (char const *lab)
 Stop a timer with the given string label. NOTE: Timers must be perfectly nested. More...
 
template<class real , unsigned int n>
YAKL_INLINE void tridiagonal (SArray< real, 1, n > const &a, SArray< real, 1, n > const &b, SArray< real, 1, n > &c, SArray< real, 1, n > &d)
 Solves a small non-periodic tridiagional system. More...
 
template<class real , unsigned int n>
YAKL_INLINE void tridiagonal_periodic (SArray< real, 1, n > const &a, SArray< real, 1, n > &b, SArray< real, 1, n > &c, SArray< real, 1, n > &d)
 Performs a small periodic tridiagional solve. Click for more details. More...
 
bool use_pool ()
 If true, then the pool allocator is being used for all device allocations. More...
 
bool yakl_mainproc ()
 If true, this is the main MPI process (task number == 0) More...
 
YAKL_INLINE void yakl_throw (const char *msg)
 Throw an error message. Works from the host or device. More...
 

Variables

constexpr int COLON = std::numeric_limits<int>::min()
 This is just a convenience syntax for slicing yakl::Array objects to make it clear in the user level code which dimensions are being sliced. More...
 
constexpr index_t INDEX_MAX = std::numeric_limits<index_t>::max()
 
constexpr int memDefault = memHost
 If the user does not specify a memory space template parameter to yakl::Array, host is the default. More...
 
constexpr int memDevice = 1
 Specifies a device memory address space for a yakl::Array object. More...
 
constexpr int memHost = 2
 Specifies a device memory address space for a yakl::Array object. More...
 
constexpr int NETCDF_MODE_NEW = NC_NOCLOBBER
 Tells NetCDF the created file should not overwite a file of the same name. More...
 
constexpr int NETCDF_MODE_READ = NC_NOWRITE
 Tells NetCDF the opened file should be opened for reading only. More...
 
constexpr int NETCDF_MODE_REPLACE = NC_CLOBBER
 Tells NetCDF the created file should overwite a file of the same name. More...
 
constexpr int NETCDF_MODE_WRITE = NC_WRITE
 Tells NetCDF the opened file should be opened for reading and writing. More...
 
constexpr bool streams_enabled = false
 If the CPP Macro YAKL_ENABLE_STREAMS is defined, then this bool is set to true More...
 
constexpr int styleC = 1
 Template parameter for yakl::Array that specifies it should follow C-style behavior. More...
 
constexpr int styleDefault = styleC
 Default style is C-style for yakl::Array objects. More...
 
constexpr int styleFortran = 2
 Template parameter for yakl::Array that specifies it should follow Fortran-style behavior. More...
 

Typedef Documentation

◆ DefaultLaunchConfig

This launch configuration sets vector length to the device default and B4B to false.

◆ DefaultLaunchConfigB4b

launch configuration sets B4B == true with the default VecLen.

◆ index_t

typedef unsigned int yakl::index_t

◆ InnerHandler

This class is necessary for coordination of two-level parallelism.

A yakl::InnerHandler object must be accepted as a parameter in the functor passed to parallel_outer, and it must be passed as a parameter to parallel_inner, fence_inner, and single_inner. An object of this class should never need to be explicitly created by the user.

◆ LaunchConfigB4b

template<int VecLen = YAKL_DEFAULT_VECTOR_LEN>
using yakl::LaunchConfigB4b = typedef LaunchConfig<VecLen,true>

launch configuration sets B4B == true with a user-specified VecLen.

◆ SArray

template<class T , int rank, unsigned D0, unsigned D1 = 1, unsigned D2 = 1, unsigned D3 = 1>
using yakl::SArray = typedef CSArray<T,rank,D0,D1,D2,D3>

Most often, codes use the type define yakl::SArray rather than yakl::CSArray.

◆ uint

typedef unsigned int yakl::uint

Function Documentation

◆ alloc_device()

void* yakl::alloc_device ( size_t  bytes,
char const *  label 
)
inline

Allocate on the device using YAKL's device allocator.

◆ atomicAdd()

template<class T >
YAKL_INLINE void yakl::atomicAdd ( T &  update,
value 
)

[NON_B4B] yakl::atomicAdd(update,value) atomically performs update += value)

Atomic instructions exist when multiple parallel threads are attempting to read-write to the same memory location at the same time. Min, max, and add will read a memory location perform a local operation, and then write a new value to that location. Atomic instructions ensure that the memory location has not changed between reading the memory location and writing a new value to that location.

Atomic min, max, and add are typically needed when you are writing to an array with fewer entries or dimensions than the number of threads in the parallel_for() kernel launch. E.g.:

parallel_for( Bounds<3>(nz,ny,nx) , YAKL_LAMBDA (int k, int j, int i) {
yakl::atomicAdd( average_column(k) , data(k,j,i) / (ny*nx) );
});

IMPORTANT: yakl::atomicAdd() is not bitwise deterministic for floating point (FP) numbers, meaning there is no guarantee what order threads will perform FP addition. Since FP addition is not commutative, you cannot guarantee bitwise reproducible results from one run to the next. To alleviate this, please use pass yakl::DefaultLaunchConfigB4b to the parallel_for() launcher, and when you want to force bitwise reproducibility define the CPP macro YAKL_B4B. yakl::atomicMin() and yakl::atomicMax() are both bitwise reproducible, so do not worry about those. This is only for yakl::atomicAdd().

◆ atomicMax()

template<class T >
YAKL_INLINE void yakl::atomicMax ( T &  update,
value 
)

yakl::atomicMax(update,value) atomically performs update = max(update,value)

Atomic instructions exist when multiple parallel threads are attempting to read-write to the same memory location at the same time. Min, max, and add will read a memory location perform a local operation, and then write a new value to that location. Atomic instructions ensure that the memory location has not changed between reading the memory location and writing a new value to that location.

Atomic min, max, and add are typically needed when you are writing to an array with fewer entries or dimensions than the number of threads in the parallel_for() kernel launch. E.g.:

parallel_for( Bounds<3>(nz,ny,nx) , YAKL_LAMBDA (int k, int j, int i) {
yakl::atomicAdd( average_column(k) , data(k,j,i) / (ny*nx) );
});

IMPORTANT: yakl::atomicAdd() is not bitwise deterministic for floating point (FP) numbers, meaning there is no guarantee what order threads will perform FP addition. Since FP addition is not commutative, you cannot guarantee bitwise reproducible results from one run to the next. To alleviate this, please use pass yakl::DefaultLaunchConfigB4b to the parallel_for() launcher, and when you want to force bitwise reproducibility define the CPP macro YAKL_B4B. yakl::atomicMin() and yakl::atomicMax() are both bitwise reproducible, so do not worry about those. This is only for yakl::atomicAdd().

◆ atomicMin()

template<class T >
YAKL_INLINE void yakl::atomicMin ( T &  update,
value 
)

yakl::atomicMin(update,value) atomically performs update = min(update,value)

Atomic instructions exist when multiple parallel threads are attempting to read-write to the same memory location at the same time. Min, max, and add will read a memory location perform a local operation, and then write a new value to that location. Atomic instructions ensure that the memory location has not changed between reading the memory location and writing a new value to that location.

Atomic min, max, and add are typically needed when you are writing to an array with fewer entries or dimensions than the number of threads in the parallel_for() kernel launch. E.g.:

parallel_for( Bounds<3>(nz,ny,nx) , YAKL_LAMBDA (int k, int j, int i) {
yakl::atomicAdd( average_column(k) , data(k,j,i) / (ny*nx) );
});

IMPORTANT: yakl::atomicAdd() is not bitwise deterministic for floating point (FP) numbers, meaning there is no guarantee what order threads will perform FP addition. Since FP addition is not commutative, you cannot guarantee bitwise reproducible results from one run to the next. To alleviate this, please use pass yakl::DefaultLaunchConfigB4b to the parallel_for() launcher, and when you want to force bitwise reproducibility define the CPP macro YAKL_B4B. yakl::atomicMin() and yakl::atomicMax() are both bitwise reproducible, so do not worry about those. This is only for yakl::atomicAdd().

◆ check_last_error()

void yakl::check_last_error ( )
inline

Checks to see if an error has occurred on the device.

This is a no-op unless the YAKL_DEBUG CPP macro is defined

◆ create_stream()

Stream yakl::create_stream ( )
inline

Create and return a Stream object. It is guaranteed to not be the default stream.

◆ fence()

void yakl::fence ( )
inline

Block the host code until all device code has completed.

◆ fence_inner()

YAKL_INLINE void yakl::fence_inner ( InnerHandler handler)

Block inner threads until all inner threads have completed.

To be called inside yakl::parallel_outer only. Block the inner-level parallelism until all inner threads have reached this point. In CUDA and HIP, this is __syncthreads().

Parameters
handlerThe yakl::InnerHandler object create by yakl::parallel_outer

◆ finalize()

void yakl::finalize ( )
inline

Finalize the YAKL runtime.

Best practice is to call yakl::isInitialized() to ensure the YAKL runtime is initialized before calling this routine. That said, this routine does check to ensure the runtime is initialized for you. THREAD SAFE!

◆ free_device()

void yakl::free_device ( void *  ptr,
char const *  label 
)
inline

Free on the device using YAKL's device deallocator.

◆ get_pool()

Gator& yakl::get_pool ( )
inline

◆ init()

void yakl::init ( InitConfig  config = InitConfig())
inline

Initialize the YAKL runtime.

  1. Determin if the pool allocator is to be used & pool allocator parameters.
  2. Initialize the pool if used.
  3. Set the YAKL allocators and deallocators to default.
  4. Initialize YAKL's timer calls to defaults.
  5. Inspect the optional yakl::InitConfig parameter to override default allocator, deallocator, and timer calls if requested.
  6. Inform the user with device information. THREAD SAFE!
    Parameters
    configThis yakl::InitConfig object allows the user to override YAKL's default allocator, deallocator and timer calls from the start of the runtime.

◆ isInitialized()

bool yakl::isInitialized ( )
inline

Determine if the YAKL runtime has been initialized. I.e., yakl::init() has been called without a corresponding call to yakl::finalize().

◆ memcpy_device_to_device()

template<class T1 , class T2 , typename std::enable_if< std::is_same< typename std::remove_cv< T1 >::type, typename std::remove_cv< T2 >::type >::value, int >::type = 0>
void yakl::memcpy_device_to_device ( T1 *  dst,
T2 *  src,
index_t  elems,
Stream  stream = Stream() 
)
inline

[USE AT YOUR OWN RISK]: memcpy the specified number of elements on the device

◆ memcpy_device_to_device_void()

void yakl::memcpy_device_to_device_void ( void *  dst,
void *  src,
size_t  bytes,
Stream  stream = Stream() 
)
inline

[USE AT YOUR OWN RISK]: memcpy the specified number of bytes on the device

◆ memcpy_device_to_host()

template<class T1 , class T2 , typename std::enable_if< std::is_same< typename std::remove_cv< T1 >::type, typename std::remove_cv< T2 >::type >::value, int >::type = 0>
void yakl::memcpy_device_to_host ( T1 *  dst,
T2 *  src,
index_t  elems,
Stream  stream = Stream() 
)
inline

[USE AT YOUR OWN RISK]: memcpy the specified number of elements from device to host

◆ memcpy_host_to_device()

template<class T1 , class T2 , typename std::enable_if< std::is_same< typename std::remove_cv< T1 >::type, typename std::remove_cv< T2 >::type >::value, int >::type = 0>
void yakl::memcpy_host_to_device ( T1 *  dst,
T2 *  src,
index_t  elems,
Stream  stream = Stream() 
)
inline

[USE AT YOUR OWN RISK]: memcpy the specified number of elements from host to device

◆ memcpy_host_to_host()

template<class T1 , class T2 , typename std::enable_if< std::is_same< typename std::remove_cv< T1 >::type, typename std::remove_cv< T2 >::type >::value, int >::type = 0>
void yakl::memcpy_host_to_host ( T1 *  dst,
T2 *  src,
index_t  elems 
)
inline

[USE AT YOUR OWN RISK]: memcpy the specified number of elements on the host

◆ memcpy_host_to_host_void()

void yakl::memcpy_host_to_host_void ( void *  dst,
void *  src,
size_t  bytes 
)
inline

[USE AT YOUR OWN RISK]: memcpy the specified number of bytes on the host

◆ pentadiagonal()

template<unsigned int n, class real >
YAKL_INLINE void yakl::pentadiagonal ( SArray< real, 1, n > const &  a,
SArray< real, 1, n > const &  b,
SArray< real, 1, n > const &  c,
SArray< real, 1, n > const &  d,
SArray< real, 1, n > const &  e,
SArray< real, 1, n > const &  f,
SArray< real, 1, n > &  u 
)

Performs a small non-periodic pentadiagonal solve. Click for more details.

This is to be performed on a SArray / CSArray object allocated on the stack

Solves for a vector u of length n in the pentadiagonal linear system

a_i u_(i-2) + b_i u_(i-1) + c_i u_i + d_i u_(i+1) + e_i u_(i+2) = f_i

input are the a, b, c, d, e, and f and they are not modified

◆ pentadiagonal_periodic()

template<unsigned int n, class real >
YAKL_INLINE void yakl::pentadiagonal_periodic ( SArray< real, 1, n > const &  a,
SArray< real, 1, n > const &  b,
SArray< real, 1, n > const &  c,
SArray< real, 1, n > const &  d,
SArray< real, 1, n > const &  e,
SArray< real, 1, n > const &  f,
SArray< real, 1, n > &  x 
)

Performs a small periodic pentadiagonal solve. Click for more details.

This is to be performed on a SArray / CSArray object allocated on the stack

Solves for a vector u of length n in the pentadiagonal linear system

a_i u_(i-2) + b_i u_(i-1) + c_i u_i + d_i u_(i+1) + e_i u_(i+2) = f_i

input are the a, b, c, d, e, and f and they are not modified

◆ record_event()

Event yakl::record_event ( Stream  stream = Stream())
inline

Create, record, and return an event using the given stream.

◆ register_finalize_callback()

void yakl::register_finalize_callback ( std::function< void()>  callback)
inline

Add a host-only callback to be called just before YAKL finalizes. This is useful for ensuring allcoated global variables are always deallocated before YAKL finalization.

YAKL calls a fence() beforehand and does nothing else until all finalize callbacks are completed. One thing this protects against in particular is the pool being finalized before all variables ar deallocated.

Parameters
callbackA void function with no parameters that does finalization / cleanup / deallocation work before YAKL finalize.

◆ set_device_allocator() [1/2]

void yakl::set_device_allocator ( std::function< void *(size_t)>  func)
inline

Override YAKL's device allocator with the passed function (No Label).

After overriding one of YAKL's allocators or deallocators, the passed function will be used until the user overrides it again or calls yakl::set_yakl_allocators_to_default(). There are overriding functions that accept labels for bookkeeping and debugging, and there are functions that do not use labels.

◆ set_device_allocator() [2/2]

void yakl::set_device_allocator ( std::function< void *(size_t, char const *)>  func)
inline

Override YAKL's device allocator with the passed function (WITH Label).

After overriding one of YAKL's allocators or deallocators, the passed function will be used until the user overrides it again or calls yakl::set_yakl_allocators_to_default(). There are overriding functions that accept labels for bookkeeping and debugging, and there are functions that do not use labels.

◆ set_device_deallocator() [1/2]

void yakl::set_device_deallocator ( std::function< void(void *)>  func)
inline

Override YAKL's device deallocator with the passed function (No Label).

After overriding one of YAKL's allocators or deallocators, the passed function will be used until the user overrides it again or calls yakl::set_yakl_allocators_to_default(). There are overriding functions that accept labels for bookkeeping and debugging, and there are functions that do not use labels.

◆ set_device_deallocator() [2/2]

void yakl::set_device_deallocator ( std::function< void(void *, char const *)>  func)
inline

Override YAKL's device deallocator with the passed function (WITH Label).

After overriding one of YAKL's allocators or deallocators, the passed function will be used until the user overrides it again or calls yakl::set_yakl_allocators_to_default(). There are overriding functions that accept labels for bookkeeping and debugging, and there are functions that do not use labels.

◆ set_timer_finalize()

void yakl::set_timer_finalize ( std::function< void()>  func)
inline

Override YAKL's default timer finalization routine.

◆ set_timer_init()

void yakl::set_timer_init ( std::function< void()>  func)
inline

Override YAKL's default timer initialization routine.

◆ set_timer_start()

void yakl::set_timer_start ( std::function< void(char const *)>  func)
inline

Override YAKL's default routine to start an individual timer.

◆ set_timer_stop()

void yakl::set_timer_stop ( std::function< void(char const *)>  func)
inline

Override YAKL's default routine to stop an individual timer.

◆ set_yakl_allocators_to_default()

void yakl::set_yakl_allocators_to_default ( )
inline

Return all YAKL allocators to their defaults.

If the user has not overridden YAKL's default allocators, then this has no effect.

◆ timer_finalize()

void yakl::timer_finalize ( )
inline

Finalize the YAKL timers.

◆ timer_init()

void yakl::timer_init ( )
inline

Initialize the YAKL timers.

◆ timer_start()

void yakl::timer_start ( char const *  lab)
inline

Start a timer with the given string label. NOTE: Timers must be perfectly nested.

◆ timer_stop()

void yakl::timer_stop ( char const *  lab)
inline

Stop a timer with the given string label. NOTE: Timers must be perfectly nested.

◆ tridiagonal()

template<class real , unsigned int n>
YAKL_INLINE void yakl::tridiagonal ( SArray< real, 1, n > const &  a,
SArray< real, 1, n > const &  b,
SArray< real, 1, n > &  c,
SArray< real, 1, n > &  d 
)

Solves a small non-periodic tridiagional system.

This is to be performed on a SArray / CSArray object allocated on the stack

Solves a tridiagonal system with no boundary conditions of the form:

[b(0) c(0) 0 0 0 0 ] [x(0) ] = [d(0) ]
[a(1) b(1) c(1) 0 0 0 ] [x(1) ] = [d(1) ]
[ 0 a(2) b(2) c(2) 0 0 ] [x(2) ] = [d(2) ]
[ 0 0 .. .. .. 0 0 ] [ . ] = [ . ]
[ 0 0 .. .. .. 0 ] [ . ] = [ . ]
[ 0 0 0 a(n-2) b(n-2) c(n-2)] [x(n-2)] = [d(n-2)]
[ 0 0 0 0 a(n-1) b(n-1)] [x(n-1)] = [d(n-1)]

This routine stores the result in d(), and as the signature indicates, it overwrites b, c, d.

This uses the Thomas algorithm.

◆ tridiagonal_periodic()

template<class real , unsigned int n>
YAKL_INLINE void yakl::tridiagonal_periodic ( SArray< real, 1, n > const &  a,
SArray< real, 1, n > &  b,
SArray< real, 1, n > &  c,
SArray< real, 1, n > &  d 
)

Performs a small periodic tridiagional solve. Click for more details.

This is to be performed on a SArray / CSArray object allocated on the stack

Solves a tridiagonal system with periodic boundary conditions of the form:

[b(0) c(0) 0 0 0 a(0) ] [x(0) ] = [d(0) ]
[a(1) b(1) c(1) 0 0 0 ] [x(1) ] = [d(1) ]
[ 0 a(2) b(2) c(2) 0 0 ] [x(2) ] = [d(2) ]
[ 0 0 .. .. .. 0 0 ] [ . ] = [ . ]
[ 0 0 .. .. .. 0 ] [ . ] = [ . ]
[ 0 0 0 a(n-2) b(n-2) c(n-2)] [x(n-2)] = [d(n-2)]
[c(n-1) 0 0 0 a(n-1) b(n-1)] [x(n-1)] = [d(n-1)]

This routine stores the result in d(), and as the signature indicates, it overwrites b, c, d

This uses the Thomas algorithm with the Sherman-Morrison formula. The Sherman-Morrison Formula is as follows:

Separate the tridiagonal + periodic matrix, A, into (B + u*v^T), where B is strictly tridiagonal, and u*v^T accounts for the non-tridiagonal periodic BCs:

u = [-b(0) , 0 , ... , 0 , c(n-1) ]^T
v = [1 , 0 , ... , 0 , -a(0)/b(0)]^T

Now we're solveing the system (B + u*v^T)*x = d, which is identical to A*x=d.

To get the solution, we solve two systems:

(1) B*y=d
(2) B*q=u

In this code, q is labeled as "tmp". Then, the answer is given by:

x = y - ( (v^T*y) / (1 + v^T*q) ) * q

Unfortunately, periodic boundary conditions roughly double the amount of work in the tridiagonal solve

◆ use_pool()

bool yakl::use_pool ( )
inline

If true, then the pool allocator is being used for all device allocations.

◆ yakl_mainproc()

bool yakl::yakl_mainproc ( )
inline

If true, this is the main MPI process (task number == 0)

If the CPP macro HAVE_MPI is defined, this tests the MPI rank ID. Otherwise, it always returns true.

◆ yakl_throw()

YAKL_INLINE void yakl::yakl_throw ( const char *  msg)

Throw an error message. Works from the host or device.

On the host, this throws an exception. On the device, it prints and then forces the program to halt.

Variable Documentation

◆ COLON

constexpr int yakl::COLON = std::numeric_limits<int>::min()
constexpr

This is just a convenience syntax for slicing yakl::Array objects to make it clear in the user level code which dimensions are being sliced.

◆ INDEX_MAX

constexpr index_t yakl::INDEX_MAX = std::numeric_limits<index_t>::max()
constexpr

◆ memDefault

constexpr int yakl::memDefault = memHost
constexpr

If the user does not specify a memory space template parameter to yakl::Array, host is the default.

◆ memDevice

constexpr int yakl::memDevice = 1
constexpr

Specifies a device memory address space for a yakl::Array object.

◆ memHost

constexpr int yakl::memHost = 2
constexpr

Specifies a device memory address space for a yakl::Array object.

◆ NETCDF_MODE_NEW

constexpr int yakl::NETCDF_MODE_NEW = NC_NOCLOBBER
constexpr

Tells NetCDF the created file should not overwite a file of the same name.

◆ NETCDF_MODE_READ

constexpr int yakl::NETCDF_MODE_READ = NC_NOWRITE
constexpr

Tells NetCDF the opened file should be opened for reading only.

◆ NETCDF_MODE_REPLACE

constexpr int yakl::NETCDF_MODE_REPLACE = NC_CLOBBER
constexpr

Tells NetCDF the created file should overwite a file of the same name.

◆ NETCDF_MODE_WRITE

constexpr int yakl::NETCDF_MODE_WRITE = NC_WRITE
constexpr

Tells NetCDF the opened file should be opened for reading and writing.

◆ streams_enabled

constexpr bool yakl::streams_enabled = false
constexpr

If the CPP Macro YAKL_ENABLE_STREAMS is defined, then this bool is set to true

◆ styleC

constexpr int yakl::styleC = 1
constexpr

Template parameter for yakl::Array that specifies it should follow C-style behavior.

◆ styleDefault

constexpr int yakl::styleDefault = styleC
constexpr

Default style is C-style for yakl::Array objects.

◆ styleFortran

constexpr int yakl::styleFortran = 2
constexpr

Template parameter for yakl::Array that specifies it should follow Fortran-style behavior.