transformations.single_column_coalesced_wrapper
Classes
|
Single Column Coalesced: Direct CPU-to-GPU transformation for block-indexed gridpoint routines. |
- class SingleColumnCoalescedTransformation(horizontal, vertical=None, block_dim=None, directive=None, demote_local_arrays=True, hoist_column_arrays=True)
Bases:
Transformation
Single Column Coalesced: Direct CPU-to-GPU transformation for block-indexed gridpoint routines.
This transformation will remove individiual CPU-style vectorization loops from “kernel” routines and either either re-insert the vector loop at the highest possible level (without interfering with subroutine calls), or completely strip it and promote the index variable to the driver if
hoist_column_arrays
is set.Unlike the CLAW-targetting SCA extraction, this will leave the block-based array passing structure in place, but pass a thread-local array index into any “kernel” routines. The block-based argument passing should map well to coalesced memory accesses on GPUs.
Note, this requires preprocessing with the
DerivedTypeArgumentsTransformation
.- Parameters:
horizontal (
Dimension
) –Dimension
object describing the variable conventions used in code to define the horizontal data dimension and iteration space.vertical (
Dimension
) –Dimension
object describing the variable conventions used in code to define the vertical dimension, as needed to decide array privatization.block_dim (
Dimension
) – OptionalDimension
object to define the blocking dimension to use for hoisted column arrays if hoisting is enabled.directive (string or None) – Directives flavour to use for parallelism annotations; either
'openacc'
orNone
.hoist_column_arrays (bool) – Flag to trigger the more aggressive “column array hoisting” optimization.
- transform_subroutine(routine, **kwargs)
Apply transformation to convert a
Subroutine
to SCC format.- Parameters:
routine (
Subroutine
) – Subroutine to apply this transformation to.role (string) – Role of the subroutine in the call tree; either
"driver"
or"kernel"
targets (list of strings) – Names of all kernel routines that are to be considered “active” in this call tree and should thus be processed accordingly.
- process_kernel(routine, demote_locals=True)
Applies the SCC loop layout transformation to a “kernel” subroutine. This will primarily strip the innermost vector loops and either re-insert the vector loop at the highest possible level (without interfering with subroutine calls), or completely strip it and promote the index variable to the driver if
hoist_column_arrays
is set.In both cases argument arrays are left fully dimensioned, allowing us to use them in recursive subroutine invocations.
- Parameters:
routine (
Subroutine
) – Subroutine to apply this transformation to.
- process_driver(routine, targets=None, item=None)
Process the “driver” routine by inserting the other level parallel loops, and optionally hoisting temporary column arrays.
Note that if
hoist_column_arrays
is set, the driver needs to be processed before any kernels are trnasformed. This is due to the use of an interprocedural analysis forward pass needed to collect the list of “column arrays”.- Parameters:
routine (
Subroutine
) – Subroutine to apply this transformation to.targets (list or string) – List of subroutines that are to be considered as part of the transformation call tree.
item (
Item
) – Scheduler work item corresponding to routine.