loki.transformations.raw_stack_allocator

Classes

TemporariesRawStackTransformation(block_dim, ...)

Transformation to inject stack arrays at the driver level.

class TemporariesRawStackTransformation(block_dim, horizontal, stack_name='STACK', local_int_var_name_pattern='JD_{name}', directive=None, driver_horizontal=None, **kwargs)

Bases: Transformation

Transformation to inject stack arrays at the driver level. These, as well as corresponding sizes are passed on to the kernels. Any temporary arrays with the horizontal dimension as lead dimension are then allocated as offsets in the stack array.

The transformation needs to be applied in reverse order, which will do the following for each kernel:

  • Add arguments to the kernel call signature to pass the stack arrays and their (free) size

  • Determine the combined size of all local arrays that are to be allocated on the stack, taking into account calls to nested kernels. This is reported in Item’s trafo_data.

  • Replace any access to temporary arrays with the corresponding offsets in the stack array

  • Pass the stack arrays as arguments to any nested kernel calls

In a driver routine, the transformation will:

  • Determine the required scratch space from trafo_data

  • Allocate the stack arrays

  • Insert data sharing clauses into OpenMP or OpenACC pragmas

  • Pass the stack arrays and sizes into the kernel calls

Parameters:
  • block_dim (Dimension) – Dimension object to define the blocking dimension

  • horizontal (Dimension) – Dimension object to define the horizontal dimension

  • stack_name (str, optional) – Name of the scratch space variable that is allocated in the driver (default: 'STACK')

  • local_int_var_name_pattern (str, optional) – Python format string pattern for the name of the integer variable for each temporary (default: 'JD_{name}')

  • directive (str, optional) – Can be 'openmp' or 'openacc'. If given, insert data sharing clauses for the stack derived type, and insert data transfer statements (for OpenACC only).

  • driver_horizontal (str, optional) – Override string if a separate variable name should be used for the horizontal when allocating the stack in the driver.

  • key (str, optional) – Overwrite the key that is used to store analysis results in trafo_data.

reverse_traversal = True
type_name_dict = {BasicType.LOGICAL: {'driver': 'LL', 'kernel': 'LD'}, BasicType.INTEGER: {'driver': 'I', 'kernel': 'K'}, BasicType.REAL: {'driver': 'Z', 'kernel': 'P'}}
property int_type
transform_subroutine(routine, **kwargs)

Defines the transformation to apply to Subroutine items.

For transformations that modify Subroutine objects, this method should be implemented. It gets called via the dispatch method apply().

Parameters:
  • routine (Subroutine) – The subroutine to be transformed.

  • **kwargs (optional) – Keyword arguments for the transformation.

insert_stack_in_calls(routine, stack_arg_dict, successors)

Insert stack arguments into calls to successor routines.

Parameters:
  • routine (Subroutine) – The routine in which to transform call statements

  • stack_arg_dict (dict) – dict that maps dtype and kind to the sets of stack size variables and their corresponding stack array variables

  • successors (list of Item) – The items corresponding to successor routines called from routine

create_stacks_driver(routine, stack_dict, successors)

Create stack variables in the driver routine, add pragma directives to create the stacks on the device (if self.directive), and add the stack_variables to kernel call arguments.

Parameters:
  • routine (Subroutine) – The driver subroutine to get the stack_variables

  • stack_dict (dict) – dict that maps dtype and kind to an expression for the required stack size

  • successors (list of Item) – The items corresponding to successor routines called from routine

create_stacks_kernel(routine, stack_dict, successors)

Create stack variables in kernel routine, add pragma directives to create the stacks on the device (if self.directive), and add the stack_variables to kernel call arguments.

Parameters:
  • routine (Subroutine) – The kernel subroutine to get the stack_variables

  • stack_dict (dict) – dict that maps dtype and kind to an expression for the required stack size

  • successors (list of Item) – The items corresponding to successor routines called from routine

apply_raw_stack_allocator_to_temporaries(routine, item=None)

Apply raw stack allocator to local temporary arrays

This appends the relevant argument to the routine’s dummy argument list and creates the assignment for the local copy of the stack type. For all local arrays, a Cray pointer is instantiated and the temporaries are mapped via Cray pointers to the pool-allocated memory region.

The cumulative size of all temporary arrays is determined and returned.

Parameters:

routine (Subroutine) – Subroutine object to apply transformation to

Returns:

stack_dict – dict with required stack size mapped to type and kind

Return type:

dict