Namespaces
	c
	Contains `Bounds` class, and `parallel_for()` routines using C-style indexing and ordering.

	componentwise
	This namespace contains routines that perform element-wise / component-wise operations on `Array`, `SArray`, and `FSArray` objects.

	fortran
	Contains `Bounds` class, and `parallel_for()` routines using Fortran-style indexing and ordering.

	intrinsics
	The intrinsics namespace contains a limited Fortran-like intrinsics library.

	simd
	Holds YAKL's Pack class and operators to encourage SIMD vectorization.

Classes
class	Array
	This declares the yakl::Array class. Please see the yakl::styleC and yakl::styleFortran template specializations for more detailed information about this class. More...

class	Array< T, rank, myMem, styleC >
	This implements the yakl:Array class with yakl::styleC behavior. More...

class	Array< T, rank, myMem, styleFortran >
	This implements the yakl:Array class with yakl::styleFortran behavior. More...

class	ArrayBase
	This class implements functionality common to both yakl::styleC and yakl::styleFortran `Array` objects. More...

struct	Bnd
	Describes a single bound for creating Fortran-style yakl::Array objects. More...

class	Bnds
	This class holds Fortran-style dimensions for using in creating yakl::Array objects. More...

class	CSArray
	C-style array on the stack similar in nature to, e.g., `float arr[ny][nx];` More...

class	Dims
	This class holds C-style dimensions for using in yakl::Array objects. More...

struct	Event
	Implements the functionality of an event within a stream. The event is not created until the `Event::create()` function is called. More...

class	FSArray
	Fortran-style array on the stack similar in nature to, e.g., `float arr[ny][nx];` More...

class	Gator
	YAKL Pool allocator class. More...

class	InitConfig
	An object of this class can optionally be passed to yakl::init() to configure the initialization. IMPORTANT: Creating an InitConfig object pings environment variables, making it quite expensive to create. Please do not create a lot of these. More...

struct	InnerHandlerEmpty
	This class is necessary for coordination of two-level parallelism. More...

struct	LaunchConfig
	This class informs YAKL `parallel_for` and `parallel_outer` routines how to launch kernels. More...

class	Random
	Non-cryptographic pseudo-random number generator with a very small internal state. More...

class	RealFFT1D
	Compute batched real-to-complex forward and inverse FFTs on yakl::Array objects using vendor libraries. More...

class	SB
	This specifies a set of bounds for a dimension when declaring a yakl::FSArray. More...

class	ScalarLiveOut
	Class to handle scalars that exist before kernels, are written to by kernels, and read after the kernel terminates. More...

class	SimpleNetCDF
	Simple way to write yakl::Array objects to NetCDF files. More...

class	SimplePNetCDF
	Simple way to write yakl::Array objects to NetCDF files in parallel. More...

struct	Stream
	Implements the functionality of a stream for parallel kernel execution. If the `Stream::create()` method is not called on this object, then it is the default stream. More...

struct	StreamList
	Implements a list of Stream objects. Needs to store a pointer to avoid construction on the device since Array objects need to store a list of streams on which they depend. More...

Typedefs
using	DefaultLaunchConfig = LaunchConfig<>
	This launch configuration sets vector length to the device default and `B4B` to `false`. More...

using	DefaultLaunchConfigB4b = LaunchConfig< YAKL_DEFAULT_VECTOR_LEN, true >
	launch configuration sets B4B == true with the default `VecLen`. More...

typedef unsigned int	index_t

typedef struct yakl::InnerHandlerEmpty	InnerHandler
	This class is necessary for coordination of two-level parallelism. More...

template<int VecLen = YAKL_DEFAULT_VECTOR_LEN>
using	LaunchConfigB4b = LaunchConfig< VecLen, true >
	launch configuration sets B4B == true with a user-specified `VecLen`. More...

template<class T , int rank, unsigned D0, unsigned D1 = 1, unsigned D2 = 1, unsigned D3 = 1>
using	SArray = CSArray< T, rank, D0, D1, D2, D3 >
	Most often, codes use the type define yakl::SArray rather than yakl::CSArray. More...

typedef unsigned int	uint

Functions
void *	alloc_device (size_t bytes, char const *label)
	Allocate on the device using YAKL's device allocator. More...

template<class T >
YAKL_INLINE void	atomicAdd (T &update, T value)
	`[NON_B4B] yakl::atomicAdd(update,value)` atomically performs `update += value)` More...

template<class T >
YAKL_INLINE void	atomicMax (T &update, T value)
	`yakl::atomicMax(update,value)` atomically performs `update = max(update,value)` More...

template<class T >
YAKL_INLINE void	atomicMin (T &update, T value)
	`yakl::atomicMin(update,value)` atomically performs `update = min(update,value)` More...

void	check_last_error ()
	Checks to see if an error has occurred on the device. More...

Stream	create_stream ()
	Create and return a Stream object. It is guaranteed to not be the default stream. More...

void	fence ()
	Block the host code until all device code has completed. More...

YAKL_INLINE void	fence_inner (InnerHandler &handler)
	Block inner threads until all inner threads have completed. More...

void	finalize ()
	Finalize the YAKL runtime. More...

void	free_device (void ptr, char const label)
	Free on the device using YAKL's device deallocator. More...

Gator &	get_pool ()

void	init (InitConfig config=InitConfig())
	Initialize the YAKL runtime. More...

bool	isInitialized ()
	Determine if the YAKL runtime has been initialized. I.e., yakl::init() has been called without a corresponding call to yakl::finalize(). More...

template<class T1 , class T2 , typename std::enable_if< std::is_same< typename std::remove_cv< T1 >::type, typename std::remove_cv< T2 >::type >::value, int >::type = 0>
void	memcpy_device_to_device (T1 dst, T2 src, index_t elems, Stream stream=Stream())
	[USE AT YOUR OWN RISK]: memcpy the specified number of elements on the device More...

void	memcpy_device_to_device_void (void dst, void src, size_t bytes, Stream stream=Stream())
	[USE AT YOUR OWN RISK]: memcpy the specified number of bytes on the device More...

template<class T1 , class T2 , typename std::enable_if< std::is_same< typename std::remove_cv< T1 >::type, typename std::remove_cv< T2 >::type >::value, int >::type = 0>
void	memcpy_device_to_host (T1 dst, T2 src, index_t elems, Stream stream=Stream())
	[USE AT YOUR OWN RISK]: memcpy the specified number of elements from device to host More...

template<class T1 , class T2 , typename std::enable_if< std::is_same< typename std::remove_cv< T1 >::type, typename std::remove_cv< T2 >::type >::value, int >::type = 0>
void	memcpy_host_to_device (T1 dst, T2 src, index_t elems, Stream stream=Stream())
	[USE AT YOUR OWN RISK]: memcpy the specified number of elements from host to device More...

template<class T1 , class T2 , typename std::enable_if< std::is_same< typename std::remove_cv< T1 >::type, typename std::remove_cv< T2 >::type >::value, int >::type = 0>
void	memcpy_host_to_host (T1 dst, T2 src, index_t elems)
	[USE AT YOUR OWN RISK]: memcpy the specified number of elements on the host More...

void	memcpy_host_to_host_void (void dst, void src, size_t bytes)
	[USE AT YOUR OWN RISK]: memcpy the specified number of bytes on the host More...

template<unsigned int n, class real >
YAKL_INLINE void	pentadiagonal (SArray< real, 1, n > const &a, SArray< real, 1, n > const &b, SArray< real, 1, n > const &c, SArray< real, 1, n > const &d, SArray< real, 1, n > const &e, SArray< real, 1, n > const &f, SArray< real, 1, n > &u)
	Performs a small non-periodic pentadiagonal solve. Click for more details. More...

template<unsigned int n, class real >
YAKL_INLINE void	pentadiagonal_periodic (SArray< real, 1, n > const &a, SArray< real, 1, n > const &b, SArray< real, 1, n > const &c, SArray< real, 1, n > const &d, SArray< real, 1, n > const &e, SArray< real, 1, n > const &f, SArray< real, 1, n > &x)
	Performs a small periodic pentadiagonal solve. Click for more details. More...

Event	record_event (Stream stream=Stream())
	Create, record, and return an event using the given stream. More...

void	register_finalize_callback (std::function< void()> callback)
	Add a host-only callback to be called just before YAKL finalizes. This is useful for ensuring allcoated global variables are always deallocated before YAKL finalization. More...

void	set_device_allocator (std::function< void *(size_t)> func)
	Override YAKL's device allocator with the passed function (No Label). More...

void	set_device_allocator (std::function< void (size_t, char const )> func)
	Override YAKL's device allocator with the passed function (WITH Label). More...

void	set_device_deallocator (std::function< void(void *)> func)
	Override YAKL's device deallocator with the passed function (No Label). More...

void	set_device_deallocator (std::function< void(void , char const )> func)
	Override YAKL's device deallocator with the passed function (WITH Label). More...

void	set_timer_finalize (std::function< void()> func)
	Override YAKL's default timer finalization routine. More...

void	set_timer_init (std::function< void()> func)
	Override YAKL's default timer initialization routine. More...

void	set_timer_start (std::function< void(char const *)> func)
	Override YAKL's default routine to start an individual timer. More...

void	set_timer_stop (std::function< void(char const *)> func)
	Override YAKL's default routine to stop an individual timer. More...

void	set_yakl_allocators_to_default ()
	Return all YAKL allocators to their defaults. More...

void	timer_finalize ()
	Finalize the YAKL timers. More...

void	timer_init ()
	Initialize the YAKL timers. More...

void	timer_start (char const *lab)
	Start a timer with the given string label. NOTE: Timers must be perfectly nested. More...

void	timer_stop (char const *lab)
	Stop a timer with the given string label. NOTE: Timers must be perfectly nested. More...

template<class real , unsigned int n>
YAKL_INLINE void	tridiagonal (SArray< real, 1, n > const &a, SArray< real, 1, n > const &b, SArray< real, 1, n > &c, SArray< real, 1, n > &d)
	Solves a small non-periodic tridiagional system. More...

template<class real , unsigned int n>
YAKL_INLINE void	tridiagonal_periodic (SArray< real, 1, n > const &a, SArray< real, 1, n > &b, SArray< real, 1, n > &c, SArray< real, 1, n > &d)
	Performs a small periodic tridiagional solve. Click for more details. More...

bool	use_pool ()
	If true, then the pool allocator is being used for all device allocations. More...

bool	yakl_mainproc ()
	If true, this is the main MPI process (task number == 0) More...

YAKL_INLINE void	yakl_throw (const char *msg)
	Throw an error message. Works from the host or device. More...

Variables
constexpr int	COLON = std::numeric_limits<int>::min()
	This is just a convenience syntax for slicing yakl::Array objects to make it clear in the user level code which dimensions are being sliced. More...

constexpr index_t	INDEX_MAX = std::numeric_limits<index_t>::max()

constexpr int	memDefault = memHost
	If the user does not specify a memory space template parameter to yakl::Array, host is the default. More...

constexpr int	memDevice = 1
	Specifies a device memory address space for a yakl::Array object. More...

constexpr int	memHost = 2
	Specifies a device memory address space for a yakl::Array object. More...

constexpr int	NETCDF_MODE_NEW = NC_NOCLOBBER
	Tells NetCDF the created file should not overwite a file of the same name. More...

constexpr int	NETCDF_MODE_READ = NC_NOWRITE
	Tells NetCDF the opened file should be opened for reading only. More...

constexpr int	NETCDF_MODE_REPLACE = NC_CLOBBER
	Tells NetCDF the created file should overwite a file of the same name. More...

constexpr int	NETCDF_MODE_WRITE = NC_WRITE
	Tells NetCDF the opened file should be opened for reading and writing. More...

constexpr bool	streams_enabled = false
	If the CPP Macro YAKL_ENABLE_STREAMS is defined, then this bool is set to `true` More...

constexpr int	styleC = 1
	Template parameter for yakl::Array that specifies it should follow C-style behavior. More...

constexpr int	styleDefault = styleC
	Default style is C-style for yakl::Array objects. More...

constexpr int	styleFortran = 2
	Template parameter for yakl::Array that specifies it should follow Fortran-style behavior. More...

Typedef Documentation

◆ DefaultLaunchConfig

using yakl::DefaultLaunchConfig = typedef LaunchConfig<>

This launch configuration sets vector length to the device default and B4B to false.

◆ DefaultLaunchConfigB4b

using yakl::DefaultLaunchConfigB4b = typedef LaunchConfig<YAKL_DEFAULT_VECTOR_LEN,true>

launch configuration sets B4B == true with the default VecLen.

◆ index_t

typedef unsigned int yakl::index_t

◆ InnerHandler

typedef struct yakl::InnerHandlerEmpty yakl::InnerHandler

This class is necessary for coordination of two-level parallelism.

A yakl::InnerHandler object must be accepted as a parameter in the functor passed to parallel_outer, and it must be passed as a parameter to parallel_inner, fence_inner, and single_inner. An object of this class should never need to be explicitly created by the user.

◆ LaunchConfigB4b

template<int VecLen = YAKL_DEFAULT_VECTOR_LEN>

using yakl::LaunchConfigB4b = typedef LaunchConfig<VecLen,true>

launch configuration sets B4B == true with a user-specified VecLen.

◆ SArray

template<class T , int rank, unsigned D0, unsigned D1 = 1, unsigned D2 = 1, unsigned D3 = 1>

using yakl::SArray = typedef CSArray<T,rank,D0,D1,D2,D3>

Most often, codes use the type define yakl::SArray rather than yakl::CSArray.

◆ uint

typedef unsigned int yakl::uint

Function Documentation

◆ alloc_device()

void* yakl::alloc_device	(	size_t	bytes,
		char const *	label
	)

inline

Allocate on the device using YAKL's device allocator.

◆ atomicAdd()

template<class T >

YAKL_INLINE void yakl::atomicAdd	(	T &	update,
		T	value
	)

[NON_B4B] yakl::atomicAdd(update,value) atomically performs update += value)

Atomic instructions exist when multiple parallel threads are attempting to read-write to the same memory location at the same time. Min, max, and add will read a memory location perform a local operation, and then write a new value to that location. Atomic instructions ensure that the memory location has not changed between reading the memory location and writing a new value to that location.

Atomic min, max, and add are typically needed when you are writing to an array with fewer entries or dimensions than the number of threads in the parallel_for() kernel launch. E.g.:

parallel_for( Bounds<3>(nz,ny,nx) , YAKL_LAMBDA (int k, int j, int i) {
  yakl::atomicAdd( average_column(k) , data(k,j,i) / (ny*nx) );
});

IMPORTANT: yakl::atomicAdd() is not bitwise deterministic for floating point (FP) numbers, meaning there is no guarantee what order threads will perform FP addition. Since FP addition is not commutative, you cannot guarantee bitwise reproducible results from one run to the next. To alleviate this, please use pass yakl::DefaultLaunchConfigB4b to the parallel_for() launcher, and when you want to force bitwise reproducibility define the CPP macro YAKL_B4B. yakl::atomicMin() and yakl::atomicMax() are both bitwise reproducible, so do not worry about those. This is only for yakl::atomicAdd().

◆ atomicMax()

template<class T >

YAKL_INLINE void yakl::atomicMax	(	T &	update,
		T	value
	)

yakl::atomicMax(update,value) atomically performs update = max(update,value)

Atomic instructions exist when multiple parallel threads are attempting to read-write to the same memory location at the same time. Min, max, and add will read a memory location perform a local operation, and then write a new value to that location. Atomic instructions ensure that the memory location has not changed between reading the memory location and writing a new value to that location.

Atomic min, max, and add are typically needed when you are writing to an array with fewer entries or dimensions than the number of threads in the parallel_for() kernel launch. E.g.:

parallel_for( Bounds<3>(nz,ny,nx) , YAKL_LAMBDA (int k, int j, int i) {
  yakl::atomicAdd( average_column(k) , data(k,j,i) / (ny*nx) );
});

IMPORTANT: yakl::atomicAdd() is not bitwise deterministic for floating point (FP) numbers, meaning there is no guarantee what order threads will perform FP addition. Since FP addition is not commutative, you cannot guarantee bitwise reproducible results from one run to the next. To alleviate this, please use pass yakl::DefaultLaunchConfigB4b to the parallel_for() launcher, and when you want to force bitwise reproducibility define the CPP macro YAKL_B4B. yakl::atomicMin() and yakl::atomicMax() are both bitwise reproducible, so do not worry about those. This is only for yakl::atomicAdd().

◆ atomicMin()

template<class T >

YAKL_INLINE void yakl::atomicMin	(	T &	update,
		T	value
	)

yakl::atomicMin(update,value) atomically performs update = min(update,value)

Atomic instructions exist when multiple parallel threads are attempting to read-write to the same memory location at the same time. Min, max, and add will read a memory location perform a local operation, and then write a new value to that location. Atomic instructions ensure that the memory location has not changed between reading the memory location and writing a new value to that location.

Atomic min, max, and add are typically needed when you are writing to an array with fewer entries or dimensions than the number of threads in the parallel_for() kernel launch. E.g.:

parallel_for( Bounds<3>(nz,ny,nx) , YAKL_LAMBDA (int k, int j, int i) {
  yakl::atomicAdd( average_column(k) , data(k,j,i) / (ny*nx) );
});

IMPORTANT: yakl::atomicAdd() is not bitwise deterministic for floating point (FP) numbers, meaning there is no guarantee what order threads will perform FP addition. Since FP addition is not commutative, you cannot guarantee bitwise reproducible results from one run to the next. To alleviate this, please use pass yakl::DefaultLaunchConfigB4b to the parallel_for() launcher, and when you want to force bitwise reproducibility define the CPP macro YAKL_B4B. yakl::atomicMin() and yakl::atomicMax() are both bitwise reproducible, so do not worry about those. This is only for yakl::atomicAdd().

◆ check_last_error()

void yakl::check_last_error ( )

inline

Checks to see if an error has occurred on the device.

This is a no-op unless the YAKL_DEBUG CPP macro is defined

◆ create_stream()

Stream yakl::create_stream ( )

inline

Create and return a Stream object. It is guaranteed to not be the default stream.

◆ fence()

void yakl::fence ( )

inline

Block the host code until all device code has completed.

◆ fence_inner()

YAKL_INLINE void yakl::fence_inner ( InnerHandler & handler )

Block inner threads until all inner threads have completed.

To be called inside yakl::parallel_outer only. Block the inner-level parallelism until all inner threads have reached this point. In CUDA and HIP, this is __syncthreads().

Parameters

handler The yakl::InnerHandler object create by yakl::parallel_outer

◆ finalize()

void yakl::finalize ( )

inline

Finalize the YAKL runtime.

Best practice is to call yakl::isInitialized() to ensure the YAKL runtime is initialized before calling this routine. That said, this routine does check to ensure the runtime is initialized for you. THREAD SAFE!

◆ free_device()

void yakl::free_device	(	void *	ptr,
		char const *	label
	)

inline

Free on the device using YAKL's device deallocator.

◆ get_pool()

Gator& yakl::get_pool ( )

inline

◆ init()

void yakl::init ( InitConfig config = InitConfig() )

inline

Initialize the YAKL runtime.

Determin if the pool allocator is to be used & pool allocator parameters.
Initialize the pool if used.
Set the YAKL allocators and deallocators to default.
Initialize YAKL's timer calls to defaults.
Inspect the optional yakl::InitConfig parameter to override default allocator, deallocator, and timer calls if requested.
Inform the user with device information. THREAD SAFE!
Parameters

config This yakl::InitConfig object allows the user to override YAKL's default allocator, deallocator and timer calls from the start of the runtime.

◆ isInitialized()

bool yakl::isInitialized ( )

inline

Determine if the YAKL runtime has been initialized. I.e., yakl::init() has been called without a corresponding call to yakl::finalize().

◆ memcpy_device_to_device()

template<class T1 , class T2 , typename std::enable_if< std::is_same< typename std::remove_cv< T1 >::type, typename std::remove_cv< T2 >::type >::value, int >::type = 0>

void yakl::memcpy_device_to_device	(	T1 *	dst,
		T2 *	src,
		index_t	elems,
		Stream	stream = `Stream()`
	)

inline

[USE AT YOUR OWN RISK]: memcpy the specified number of elements on the device

◆ memcpy_device_to_device_void()

void yakl::memcpy_device_to_device_void	(	void *	dst,
		void *	src,
		size_t	bytes,
		Stream	stream = `Stream()`
	)

inline

[USE AT YOUR OWN RISK]: memcpy the specified number of bytes on the device

◆ memcpy_device_to_host()

template<class T1 , class T2 , typename std::enable_if< std::is_same< typename std::remove_cv< T1 >::type, typename std::remove_cv< T2 >::type >::value, int >::type = 0>

void yakl::memcpy_device_to_host	(	T1 *	dst,
		T2 *	src,
		index_t	elems,
		Stream	stream = `Stream()`
	)

inline

[USE AT YOUR OWN RISK]: memcpy the specified number of elements from device to host

◆ memcpy_host_to_device()

template<class T1 , class T2 , typename std::enable_if< std::is_same< typename std::remove_cv< T1 >::type, typename std::remove_cv< T2 >::type >::value, int >::type = 0>

void yakl::memcpy_host_to_device	(	T1 *	dst,
		T2 *	src,
		index_t	elems,
		Stream	stream = `Stream()`
	)

inline

[USE AT YOUR OWN RISK]: memcpy the specified number of elements from host to device

◆ memcpy_host_to_host()

template<class T1 , class T2 , typename std::enable_if< std::is_same< typename std::remove_cv< T1 >::type, typename std::remove_cv< T2 >::type >::value, int >::type = 0>

void yakl::memcpy_host_to_host	(	T1 *	dst,
		T2 *	src,
		index_t	elems
	)

inline

[USE AT YOUR OWN RISK]: memcpy the specified number of elements on the host

◆ memcpy_host_to_host_void()

void yakl::memcpy_host_to_host_void	(	void *	dst,
		void *	src,
		size_t	bytes
	)

inline

[USE AT YOUR OWN RISK]: memcpy the specified number of bytes on the host

◆ pentadiagonal()

template<unsigned int n, class real >

YAKL_INLINE void yakl::pentadiagonal	(	SArray< real, 1, n > const &	a,
		SArray< real, 1, n > const &	b,
		SArray< real, 1, n > const &	c,
		SArray< real, 1, n > const &	d,
		SArray< real, 1, n > const &	e,
		SArray< real, 1, n > const &	f,
		SArray< real, 1, n > &	u
	)

Performs a small non-periodic pentadiagonal solve. Click for more details.

This is to be performed on a SArray / CSArray object allocated on the stack

Solves for a vector u of length n in the pentadiagonal linear system

a_i u_(i-2) + b_i u_(i-1) + c_i u_i + d_i u_(i+1) + e_i u_(i+2) = f_i

input are the a, b, c, d, e, and f and they are not modified

◆ pentadiagonal_periodic()

template<unsigned int n, class real >

YAKL_INLINE void yakl::pentadiagonal_periodic	(	SArray< real, 1, n > const &	a,
		SArray< real, 1, n > const &	b,
		SArray< real, 1, n > const &	c,
		SArray< real, 1, n > const &	d,
		SArray< real, 1, n > const &	e,
		SArray< real, 1, n > const &	f,
		SArray< real, 1, n > &	x
	)

Performs a small periodic pentadiagonal solve. Click for more details.

This is to be performed on a SArray / CSArray object allocated on the stack

Solves for a vector u of length n in the pentadiagonal linear system

a_i u_(i-2) + b_i u_(i-1) + c_i u_i + d_i u_(i+1) + e_i u_(i+2) = f_i

input are the a, b, c, d, e, and f and they are not modified

◆ record_event()

Event yakl::record_event ( Stream stream = Stream() )

inline

Create, record, and return an event using the given stream.

◆ register_finalize_callback()

void yakl::register_finalize_callback ( std::function< void()> callback )

inline

Add a host-only callback to be called just before YAKL finalizes. This is useful for ensuring allcoated global variables are always deallocated before YAKL finalization.

YAKL calls a fence() beforehand and does nothing else until all finalize callbacks are completed. One thing this protects against in particular is the pool being finalized before all variables ar deallocated.

Parameters

callback A void function with no parameters that does finalization / cleanup / deallocation work before YAKL finalize.

◆ set_device_allocator() [1/2]

void yakl::set_device_allocator ( std::function< void *(size_t)> func )

inline

Override YAKL's device allocator with the passed function (No Label).

After overriding one of YAKL's allocators or deallocators, the passed function will be used until the user overrides it again or calls yakl::set_yakl_allocators_to_default(). There are overriding functions that accept labels for bookkeeping and debugging, and there are functions that do not use labels.

◆ set_device_allocator() [2/2]

void yakl::set_device_allocator ( std::function< void *(size_t, char const *)> func )

inline

Override YAKL's device allocator with the passed function (WITH Label).

After overriding one of YAKL's allocators or deallocators, the passed function will be used until the user overrides it again or calls yakl::set_yakl_allocators_to_default(). There are overriding functions that accept labels for bookkeeping and debugging, and there are functions that do not use labels.

◆ set_device_deallocator() [1/2]

void yakl::set_device_deallocator ( std::function< void(void *)> func )

inline

Override YAKL's device deallocator with the passed function (No Label).

After overriding one of YAKL's allocators or deallocators, the passed function will be used until the user overrides it again or calls yakl::set_yakl_allocators_to_default(). There are overriding functions that accept labels for bookkeeping and debugging, and there are functions that do not use labels.

◆ set_device_deallocator() [2/2]

void yakl::set_device_deallocator ( std::function< void(void *, char const *)> func )

inline

Override YAKL's device deallocator with the passed function (WITH Label).

After overriding one of YAKL's allocators or deallocators, the passed function will be used until the user overrides it again or calls yakl::set_yakl_allocators_to_default(). There are overriding functions that accept labels for bookkeeping and debugging, and there are functions that do not use labels.

◆ set_timer_finalize()

void yakl::set_timer_finalize ( std::function< void()> func )

inline

Override YAKL's default timer finalization routine.

◆ set_timer_init()

void yakl::set_timer_init ( std::function< void()> func )

inline

Override YAKL's default timer initialization routine.

◆ set_timer_start()

void yakl::set_timer_start ( std::function< void(char const *)> func )

inline

Override YAKL's default routine to start an individual timer.

◆ set_timer_stop()

void yakl::set_timer_stop ( std::function< void(char const *)> func )

inline

Override YAKL's default routine to stop an individual timer.

◆ set_yakl_allocators_to_default()

void yakl::set_yakl_allocators_to_default ( )

inline

Return all YAKL allocators to their defaults.

If the user has not overridden YAKL's default allocators, then this has no effect.

◆ timer_finalize()

void yakl::timer_finalize ( )

inline

Finalize the YAKL timers.

◆ timer_init()

void yakl::timer_init ( )

inline

Initialize the YAKL timers.

◆ timer_start()

void yakl::timer_start ( char const * lab )

inline

Start a timer with the given string label. NOTE: Timers must be perfectly nested.

◆ timer_stop()

void yakl::timer_stop ( char const * lab )

inline

Stop a timer with the given string label. NOTE: Timers must be perfectly nested.

◆ tridiagonal()

template<class real , unsigned int n>

YAKL_INLINE void yakl::tridiagonal	(	SArray< real, 1, n > const &	a,
		SArray< real, 1, n > const &	b,
		SArray< real, 1, n > &	c,
		SArray< real, 1, n > &	d
	)

Solves a small non-periodic tridiagional system.

This is to be performed on a SArray / CSArray object allocated on the stack

Solves a tridiagonal system with no boundary conditions of the form:

[b(0)   c(0)  0    0      0      0    ] [x(0)  ] = [d(0)  ]
[a(1)   b(1) c(1)  0      0      0    ] [x(1)  ] = [d(1)  ]
[ 0     a(2) b(2) c(2)    0      0    ] [x(2)  ] = [d(2)  ]
[ 0      0   ..  ..  ..   0      0    ] [ .    ] = [ .    ]
[ 0      0      ..  ..  ..       0    ] [ .    ] = [ .    ]
[ 0      0    0   a(n-2) b(n-2) c(n-2)] [x(n-2)] = [d(n-2)]
[ 0      0    0    0     a(n-1) b(n-1)] [x(n-1)] = [d(n-1)]

This routine stores the result in d(), and as the signature indicates, it overwrites b, c, d.

This uses the Thomas algorithm.

◆ tridiagonal_periodic()

template<class real , unsigned int n>

YAKL_INLINE void yakl::tridiagonal_periodic	(	SArray< real, 1, n > const &	a,
		SArray< real, 1, n > &	b,
		SArray< real, 1, n > &	c,
		SArray< real, 1, n > &	d
	)

Performs a small periodic tridiagional solve. Click for more details.

This is to be performed on a SArray / CSArray object allocated on the stack

Solves a tridiagonal system with periodic boundary conditions of the form:

[b(0)   c(0)  0    0      0     a(0)  ] [x(0)  ] = [d(0)  ]
[a(1)   b(1) c(1)  0      0      0    ] [x(1)  ] = [d(1)  ]
[ 0     a(2) b(2) c(2)    0      0    ] [x(2)  ] = [d(2)  ]
[ 0      0   ..  ..  ..   0      0    ] [ .    ] = [ .    ]
[ 0      0      ..  ..  ..       0    ] [ .    ] = [ .    ]
[ 0      0    0   a(n-2) b(n-2) c(n-2)] [x(n-2)] = [d(n-2)]
[c(n-1)  0    0    0     a(n-1) b(n-1)] [x(n-1)] = [d(n-1)]

This routine stores the result in d(), and as the signature indicates, it overwrites b, c, d

This uses the Thomas algorithm with the Sherman-Morrison formula. The Sherman-Morrison Formula is as follows:

Separate the tridiagonal + periodic matrix, A, into (B + u*v^T), where B is strictly tridiagonal, and u*v^T accounts for the non-tridiagonal periodic BCs:

u = [-b(0) , 0 , ... , 0 , c(n-1) ]^T

v = [1 , 0 , ... , 0 , -a(0)/b(0)]^T

Now we're solveing the system (B + u*v^T)*x = d, which is identical to A*x=d.

To get the solution, we solve two systems:

(1) B*y=d

(2) B*q=u

In this code, q is labeled as "tmp". Then, the answer is given by:

x = y - ( (v^T*y) / (1 + v^T*q) ) * q

Unfortunately, periodic boundary conditions roughly double the amount of work in the tridiagonal solve

◆ use_pool()

bool yakl::use_pool ( )

inline

If true, then the pool allocator is being used for all device allocations.

◆ yakl_mainproc()

bool yakl::yakl_mainproc ( )

inline

If true, this is the main MPI process (task number == 0)

If the CPP macro HAVE_MPI is defined, this tests the MPI rank ID. Otherwise, it always returns true.

◆ yakl_throw()

YAKL_INLINE void yakl::yakl_throw ( const char * msg )

Throw an error message. Works from the host or device.

On the host, this throws an exception. On the device, it prints and then forces the program to halt.

Variable Documentation

◆ COLON

constexpr int yakl::COLON = std::numeric_limits<int>::min()

constexpr

This is just a convenience syntax for slicing yakl::Array objects to make it clear in the user level code which dimensions are being sliced.

◆ INDEX_MAX

constexpr index_t yakl::INDEX_MAX = std::numeric_limits<index_t>::max()

constexpr

◆ memDefault

constexpr int yakl::memDefault = memHost

constexpr

If the user does not specify a memory space template parameter to yakl::Array, host is the default.

◆ memDevice

constexpr int yakl::memDevice = 1

constexpr

Specifies a device memory address space for a yakl::Array object.

◆ memHost

constexpr int yakl::memHost = 2

constexpr

Specifies a device memory address space for a yakl::Array object.

◆ NETCDF_MODE_NEW

constexpr int yakl::NETCDF_MODE_NEW = NC_NOCLOBBER

constexpr

Tells NetCDF the created file should not overwite a file of the same name.

◆ NETCDF_MODE_READ

constexpr int yakl::NETCDF_MODE_READ = NC_NOWRITE

constexpr

Tells NetCDF the opened file should be opened for reading only.

◆ NETCDF_MODE_REPLACE

constexpr int yakl::NETCDF_MODE_REPLACE = NC_CLOBBER

constexpr

Tells NetCDF the created file should overwite a file of the same name.

◆ NETCDF_MODE_WRITE

constexpr int yakl::NETCDF_MODE_WRITE = NC_WRITE

constexpr

Tells NetCDF the opened file should be opened for reading and writing.

◆ streams_enabled

constexpr bool yakl::streams_enabled = false

constexpr

If the CPP Macro YAKL_ENABLE_STREAMS is defined, then this bool is set to true

◆ styleC

constexpr int yakl::styleC = 1

constexpr

Template parameter for yakl::Array that specifies it should follow C-style behavior.

◆ styleDefault

constexpr int yakl::styleDefault = styleC

constexpr

Default style is C-style for yakl::Array objects.

◆ styleFortran

constexpr int yakl::styleFortran = 2

constexpr

Template parameter for yakl::Array that specifies it should follow Fortran-style behavior.

Namespaces

Classes

Typedefs

Functions

Variables

Typedef Documentation

◆ DefaultLaunchConfig

◆ DefaultLaunchConfigB4b

◆ index_t

◆ InnerHandler

◆ LaunchConfigB4b

◆ SArray

◆ uint

Function Documentation

◆ alloc_device()

◆ atomicAdd()

◆ atomicMax()

◆ atomicMin()

◆ check_last_error()

◆ create_stream()

◆ fence()

◆ fence_inner()

◆ finalize()

◆ free_device()

◆ get_pool()

◆ init()

◆ isInitialized()

◆ memcpy_device_to_device()

◆ memcpy_device_to_device_void()

◆ memcpy_device_to_host()

◆ memcpy_host_to_device()

◆ memcpy_host_to_host()

◆ memcpy_host_to_host_void()

◆ pentadiagonal()

◆ pentadiagonal_periodic()

◆ record_event()

◆ register_finalize_callback()

◆ set_device_allocator() [1/2]

◆ set_device_allocator() [2/2]

◆ set_device_deallocator() [1/2]

◆ set_device_deallocator() [2/2]

◆ set_timer_finalize()

◆ set_timer_init()

◆ set_timer_start()

◆ set_timer_stop()

◆ set_yakl_allocators_to_default()

◆ timer_finalize()

◆ timer_init()

◆ timer_start()

◆ timer_stop()

◆ tridiagonal()

◆ tridiagonal_periodic()

◆ use_pool()

◆ yakl_mainproc()

◆ yakl_throw()

Variable Documentation

◆ COLON

◆ INDEX_MAX

◆ memDefault

◆ memDevice

◆ memHost

◆ NETCDF_MODE_NEW

◆ NETCDF_MODE_READ

◆ NETCDF_MODE_REPLACE

◆ NETCDF_MODE_WRITE

◆ streams_enabled

◆ styleC

◆ styleDefault

◆ styleFortran