transformations.pool_allocator
Classes
|
Transformation to inject a pool allocator that allocates a large scratch space per block on the driver and maps temporary arrays in kernels to this scratch space |
- class TemporariesPoolAllocatorTransformation(block_dim, allocation_dims=None, stack_module_name='STACK_MOD', stack_type_name='STACK', stack_ptr_name='L', stack_end_name='U', stack_size_name='ISTSZ', stack_storage_name='ZSTACK', stack_argument_name='YDSTACK', stack_local_var_name='YLSTACK', local_ptr_var_name_pattern='IP_{name}', directive=None, check_bounds=True, key=None, **kwargs)
Bases:
Transformation
Transformation to inject a pool allocator that allocates a large scratch space per block on the driver and maps temporary arrays in kernels to this scratch space
It is built on top of a derived type declared in a separate Fortran module (by default called
stack_mod
), which should simply be commited to the target code base and included into the list of source files for transformed targets. It should look similar to this:MODULE STACK_MOD IMPLICIT NONE TYPE STACK INTEGER*8 :: L, U END TYPE PRIVATE PUBLIC :: STACK END MODULE
It provides two integer variables,
L
andU
, which are used as a stack pointer and stack end pointer, respectively. Naming is flexible and can be changed via options to the transformation.The transformation needs to be applied in reverse order, which will do the following for each kernel:
Import the
STACK
derived typeAdd an argument to the kernel call signature to pass the stack derived type
Create a local copy of the stack derived type inside the kernel
Determine the combined size of all local arrays that are to be allocated by the pool allocator, taking into account calls to nested kernels. This is reported in
Item
’strafo_data
.Inject Cray pointer assignments and stack pointer increments for all temporaries
Pass the local copy of the stack derived type as argument to any nested kernel calls
By default, all local array arguments are allocated by the pool allocator, but this can be restricted to include only those that have at least one dimension matching one of those provided in
allocation_dims
.In a driver routine, the transformation will:
Determine the required scratch space from
trafo_data
Allocate the scratch space to that size
Insert data transfers (for OpenACC offloading)
Insert data sharing clauses into OpenMP or OpenACC pragmas
Assign stack base pointer and end pointer for each block (identified via
block_dim
)Pass the stack argument to kernel calls
- Parameters:
block_dim (
Dimension
) –Dimension
object to define the blocking dimension to use for hoisted column arrays if hoisting is enabled.allocation_dims (list of
Dimension
, optional) – List ofDimension
objects to define those dimensions for which temporaries should be allocated by the pool allocator. By default, all local arrays are allocated by the pool allocator.stack_module_name (str, optional) – Name of the Fortran module containing the derived type definition (default:
'STACK_MOD'
)stack_type_name (str, optional) – Name of the derived type for the stack definition (default:
'STACK'
)stack_ptr_name (str, optional) – Name of the stack pointer variable inside the derived type (default:
'L'
)stack_end_name (str, optional) – Name of the stack end pointer variable inside the derived type (default:
'U'
)stack_size_name (str, optional) – Name of the variable that holds the size of the scratch space in the driver (default:
'ISTSZ'
)stack_storage_name (str, optional) – Name of the scratch space variable that is allocated in the driver (default:
'ZSTACK'
)stack_argument_name (str, optional) – Name of the stack argument that is added to kernels (default:
'YDSTACK'
)stack_local_var_name (str, optional) – Name of the local copy of the stack argument (default:
'YLSTACK'
)local_ptr_var_name_pattern (str, optional) – Python format string pattern for the name of the Cray pointer variable for each temporary (default:
'IP_{name}'
)directive (str, optional) – Can be
'openmp'
or'openacc'
. If given, insert data sharing clauses for the stack derived type, and insert data transfer statements (for OpenACC only).check_bounds (bool, optional) – Insert bounds-checks in the kernel to make sure the allocated stack size is not exceeded (default: True)
key (str, optional) – Overwrite the key that is used to store analysis results in
trafo_data
.
- transform_subroutine(routine, **kwargs)
Defines the transformation to apply to
Subroutine
items.For transformations that modify
Subroutine
objects, this method should be implemented. It gets called via the dispatch methodapply()
.- Parameters:
routine (
Subroutine
) – The subroutine to be transformed.**kwargs (optional) – Keyword arguments for the transformation.
- inject_pool_allocator_import(routine)
Add the import statement for the pool allocator’s “stack” type
- apply_pool_allocator_to_temporaries(routine)
Apply pool allocator to local temporary arrays
This appends the relevant argument to the routine’s dummy argument list and creates the assignment for the local copy of the stack type. For all local arrays, a Cray pointer is instantiated and the temporaries are mapped via Cray pointers to the pool-allocated memory region.
The cumulative size of all temporary arrays is determined and returned.
- create_pool_allocator(routine, stack_size)
Create a pool allocator in the driver
- inject_pool_allocator_into_calls(routine, targets)
Add the pool allocator argument into subroutine calls