loki.transformations.single_column.scc
Module Attributes
The basic Single Column Coalesced (SCC) transformation with vector-level kernel parallelism. |
|
The basic Single Column Coalesced (SCC) transformation with sequential kernels. |
|
SCC-style transformation with "vector-parallel" kernels that additionally hoists local temporary arrays that cannot be demoted to the outer driver call. |
|
SCC-style transformation with sequential kernels that additionally hoists local temporary arrays that cannot be demoted to the outer driver call. |
|
SCC-style transformation with "vector-parallel" kernels that additionally pre-allocates a "stack" pool allocator and associates local arrays with preallocated memory. |
|
SCC-style transformation with sequential kernels that additionally pre-allocates a "stack" pool allocator and associates local arrays with preallocated memory. |
|
SCC-style transformation with "vector-parallel" kernels that additionally pre-allocates a "stack" pool allocator and replaces local temporaries with indexed sub-arrays of this preallocated array. |
|
SCC-style transformation with sequential kernels that additionally pre-allocates a "stack" pool allocator and replaces local temporaries with indexed sub-arrays of this preallocated array. |
- SCCVVectorPipeline = functools.partial(<class 'loki.batch.pipeline.Pipeline'>, classes=(<class 'loki.transformations.single_column.vertical.SCCFuseVerticalLoops'>, <class 'loki.transformations.single_column.base.SCCBaseTransformation'>, <class 'loki.transformations.single_column.vector.SCCDevectorTransformation'>, <class 'loki.transformations.single_column.vector.SCCDemoteTransformation'>, <class 'loki.transformations.single_column.vector.SCCVecRevectorTransformation'>, <class 'loki.transformations.single_column.annotate.SCCAnnotateTransformation'>, <class 'loki.transformations.pragma_model.PragmaModelTransformation'>))
The basic Single Column Coalesced (SCC) transformation with vector-level kernel parallelism.
This tranformation will convert kernels with innermost vectorisation along a common horizontal dimension to a GPU-friendly loop-layout via loop inversion and local array variable demotion. The resulting kernel remains “vector-parallel”, but with the
horizontal
loop as the outermost iteration dimension (as far as data dependencies allow). This allows local temporary arrays to be demoted to scalars, where possible.The outer “driver” loop over blocks is used as the secondary dimension of parallelism, where the outher data indexing dimension (
block_dim
) is resolved in the first call to a “kernel” routine. This is equivalent to a so-called “gang-vector” parallelisation scheme.This
Pipeline
applies the followingTransformation
classes in sequence: 1.SCCBaseTransformation
- Ensure utility variables and resolveproblematic code constructs.
SCCDevectorTransformation
- Remove horizontal vector loops.SCCDemoteTransformation
- Demote local temporary array variables where appropriate.SCCVecRevectorTransformation
- Re-insert the vector loops outermost, according to identified vector sections.SCCAnnotateTransformation
- Annotate loops according to programming model (directive
).
- Parameters:
horizontal (
Dimension
) –Dimension
object describing the variable conventions used in code to define the horizontal data dimension and iteration space.block_dim (
Dimension
) – OptionalDimension
object to define the blocking dimension to use for hoisted column arrays if hoisting is enabled.directive (string or None) – Directives flavour to use for parallelism annotations; either
'openacc'
,'omp-gpu'
orNone
.trim_vector_sections (bool) – Flag to trigger trimming of extracted vector sections to remove nodes that are not assignments involving vector parallel arrays.
demote_local_arrays (bool) – Flag to trigger local array demotion to scalar variables where possible
- SCCSVectorPipeline = functools.partial(<class 'loki.batch.pipeline.Pipeline'>, classes=(<class 'loki.transformations.single_column.vertical.SCCFuseVerticalLoops'>, <class 'loki.transformations.single_column.base.SCCBaseTransformation'>, <class 'loki.transformations.single_column.vector.SCCDevectorTransformation'>, <class 'loki.transformations.single_column.vector.SCCDemoteTransformation'>, <class 'loki.transformations.single_column.vector.SCCSeqRevectorTransformation'>, <class 'loki.transformations.single_column.annotate.SCCAnnotateTransformation'>, <class 'loki.transformations.pragma_model.PragmaModelTransformation'>))
The basic Single Column Coalesced (SCC) transformation with sequential kernels.
This tranformation will convert kernels with innermost vectorisation along a common horizontal dimension to a GPU-friendly loop-layout via loop inversion and local array variable demotion. The resulting kernel becomes sequential as the
horizontal
loop is hoisted to the driver and the loop index becomes an argument to the kernel(s). Moreover, this allows local temporary arrays to be demoted to scalars, where possible.The outer “driver” loop over blocks is used as the secondary dimension of parallelism, where the outher data indexing dimension (
block_dim
) is resolved in the first call to a “kernel” routine. This is equivalent to a so-called “gang-vector” parallelisation scheme.This
Pipeline
applies the followingTransformation
classes in sequence: 1.SCCBaseTransformation
- Ensure utility variables and resolveproblematic code constructs.
SCCDevectorTransformation
- Remove horizontal vector loops.SCCDemoteTransformation
- Demote local temporary array variables where appropriate.SCCSeqRevectorTransformation
- Re-insert the vector loops outermost, according to identified vector sections.SCCAnnotateTransformation
- Annotate loops according to programming model (directive
).
- Parameters:
horizontal (
Dimension
) –Dimension
object describing the variable conventions used in code to define the horizontal data dimension and iteration space.block_dim (
Dimension
) – OptionalDimension
object to define the blocking dimension to use for hoisted column arrays if hoisting is enabled.directive (string or None) – Directives flavour to use for parallelism annotations; either
'openacc'
,'omp-gpu'
orNone
.trim_vector_sections (bool) – Flag to trigger trimming of extracted vector sections to remove nodes that are not assignments involving vector parallel arrays.
demote_local_arrays (bool) – Flag to trigger local array demotion to scalar variables where possible
- SCCVHoistPipeline = functools.partial(<class 'loki.batch.pipeline.Pipeline'>, classes=(<class 'loki.transformations.single_column.vertical.SCCFuseVerticalLoops'>, <class 'loki.transformations.single_column.base.SCCBaseTransformation'>, <class 'loki.transformations.single_column.vector.SCCDevectorTransformation'>, <class 'loki.transformations.single_column.vector.SCCDemoteTransformation'>, <class 'loki.transformations.single_column.vector.SCCVecRevectorTransformation'>, <class 'loki.transformations.hoist_variables.HoistTemporaryArraysAnalysis'>, <class 'loki.transformations.single_column.hoist.SCCHoistTemporaryArraysTransformation'>, <class 'loki.transformations.single_column.annotate.SCCAnnotateTransformation'>, <class 'loki.transformations.pragma_model.PragmaModelTransformation'>))
SCC-style transformation with “vector-parallel” kernels that additionally hoists local temporary arrays that cannot be demoted to the outer driver call.
For details of the kernel and driver-side transformations, please refer to
SCCVVectorPipeline
In addition, this pipeline will invoke
HoistTemporaryArraysAnalysis
andSCCHoistTemporaryArraysTransformation
before the final annotation step to hoist multi-dimensional local temporary array variables to the “driver” routine, where they will be allocated on device and passed down as arguments.- Parameters:
horizontal (
Dimension
) –Dimension
object describing the variable conventions used in code to define the horizontal data dimension and iteration space.block_dim (
Dimension
) – OptionalDimension
object to define the blocking dimension to use for hoisted column arrays if hoisting is enabled.directive (string or None) – Directives flavour to use for parallelism annotations; either
'openacc'
,'omp-gpu'
orNone
.trim_vector_sections (bool) – Flag to trigger trimming of extracted vector sections to remove nodes that are not assignments involving vector parallel arrays.
demote_local_arrays (bool) – Flag to trigger local array demotion to scalar variables where possible
dim_vars (tuple of str, optional) – Variables to be within the dimensions of the arrays to be hoisted. If not provided, no checks will be done for the array dimensions in
HoistTemporaryArraysAnalysis
.
- SCCSHoistPipeline = functools.partial(<class 'loki.batch.pipeline.Pipeline'>, classes=(<class 'loki.transformations.single_column.vertical.SCCFuseVerticalLoops'>, <class 'loki.transformations.single_column.base.SCCBaseTransformation'>, <class 'loki.transformations.single_column.vector.SCCDevectorTransformation'>, <class 'loki.transformations.single_column.vector.SCCDemoteTransformation'>, <class 'loki.transformations.single_column.vector.SCCSeqRevectorTransformation'>, <class 'loki.transformations.hoist_variables.HoistTemporaryArraysAnalysis'>, <class 'loki.transformations.single_column.hoist.SCCHoistTemporaryArraysTransformation'>, <class 'loki.transformations.single_column.annotate.SCCAnnotateTransformation'>, <class 'loki.transformations.pragma_model.PragmaModelTransformation'>))
SCC-style transformation with sequential kernels that additionally hoists local temporary arrays that cannot be demoted to the outer driver call.
For details of the kernel and driver-side transformations, please refer to
SCCSVectorPipeline
In addition, this pipeline will invoke
HoistTemporaryArraysAnalysis
andSCCHoistTemporaryArraysTransformation
before the final annotation step to hoist multi-dimensional local temporary array variables to the “driver” routine, where they will be allocated on device and passed down as arguments.- Parameters:
horizontal (
Dimension
) –Dimension
object describing the variable conventions used in code to define the horizontal data dimension and iteration space.block_dim (
Dimension
) – OptionalDimension
object to define the blocking dimension to use for hoisted column arrays if hoisting is enabled.directive (string or None) – Directives flavour to use for parallelism annotations; either
'openacc'
,'omp-gpu'
orNone
.trim_vector_sections (bool) – Flag to trigger trimming of extracted vector sections to remove nodes that are not assignments involving vector parallel arrays.
demote_local_arrays (bool) – Flag to trigger local array demotion to scalar variables where possible
dim_vars (tuple of str, optional) – Variables to be within the dimensions of the arrays to be hoisted. If not provided, no checks will be done for the array dimensions in
HoistTemporaryArraysAnalysis
.
- SCCVStackPipeline = functools.partial(<class 'loki.batch.pipeline.Pipeline'>, classes=(<class 'loki.transformations.single_column.vertical.SCCFuseVerticalLoops'>, <class 'loki.transformations.single_column.base.SCCBaseTransformation'>, <class 'loki.transformations.single_column.vector.SCCDevectorTransformation'>, <class 'loki.transformations.single_column.vector.SCCDemoteTransformation'>, <class 'loki.transformations.single_column.vector.SCCVecRevectorTransformation'>, <class 'loki.transformations.single_column.annotate.SCCAnnotateTransformation'>, <class 'loki.transformations.pool_allocator.TemporariesPoolAllocatorTransformation'>, <class 'loki.transformations.pragma_model.PragmaModelTransformation'>))
SCC-style transformation with “vector-parallel” kernels that additionally pre-allocates a “stack” pool allocator and associates local arrays with preallocated memory.
For details of the kernel and driver-side transformations, please refer to
SCCVVectorPipeline
In addition, this pipeline will invoke
TemporariesPoolAllocatorTransformation
to back the remaining locally allocated arrays from a “stack” pool allocator that is pre-allocated in the driver routine and passed down via arguments.- Parameters:
horizontal (
Dimension
) –Dimension
object describing the variable conventions used in code to define the horizontal data dimension and iteration space.block_dim (
Dimension
) – OptionalDimension
object to define the blocking dimension to use for hoisted column arrays if hoisting is enabled.directive (string or None) – Directives flavour to use for parallelism annotations; either
'openacc'
,'omp-gpu'
orNone
.trim_vector_sections (bool) – Flag to trigger trimming of extracted vector sections to remove nodes that are not assignments involving vector parallel arrays.
demote_local_arrays (bool) – Flag to trigger local array demotion to scalar variables where possible
check_bounds (bool, optional) – Insert bounds-checks in the kernel to make sure the allocated stack size is not exceeded (default: True)
- SCCSStackPipeline = functools.partial(<class 'loki.batch.pipeline.Pipeline'>, classes=(<class 'loki.transformations.single_column.vertical.SCCFuseVerticalLoops'>, <class 'loki.transformations.single_column.base.SCCBaseTransformation'>, <class 'loki.transformations.single_column.vector.SCCDevectorTransformation'>, <class 'loki.transformations.single_column.vector.SCCDemoteTransformation'>, <class 'loki.transformations.single_column.vector.SCCSeqRevectorTransformation'>, <class 'loki.transformations.single_column.annotate.SCCAnnotateTransformation'>, <class 'loki.transformations.pool_allocator.TemporariesPoolAllocatorTransformation'>, <class 'loki.transformations.pragma_model.PragmaModelTransformation'>))
SCC-style transformation with sequential kernels that additionally pre-allocates a “stack” pool allocator and associates local arrays with preallocated memory.
For details of the kernel and driver-side transformations, please refer to
SCCSVectorPipeline
In addition, this pipeline will invoke
TemporariesPoolAllocatorTransformation
to back the remaining locally allocated arrays from a “stack” pool allocator that is pre-allocated in the driver routine and passed down via arguments.- Parameters:
horizontal (
Dimension
) –Dimension
object describing the variable conventions used in code to define the horizontal data dimension and iteration space.block_dim (
Dimension
) – OptionalDimension
object to define the blocking dimension to use for hoisted column arrays if hoisting is enabled.directive (string or None) – Directives flavour to use for parallelism annotations; either
'openacc'
,'omp-gpu'
orNone
.trim_vector_sections (bool) – Flag to trigger trimming of extracted vector sections to remove nodes that are not assignments involving vector parallel arrays.
demote_local_arrays (bool) – Flag to trigger local array demotion to scalar variables where possible
check_bounds (bool, optional) – Insert bounds-checks in the kernel to make sure the allocated stack size is not exceeded (default: True)
- SCCVRawStackPipeline = functools.partial(<class 'loki.batch.pipeline.Pipeline'>, classes=(<class 'loki.transformations.single_column.base.SCCBaseTransformation'>, <class 'loki.transformations.single_column.vector.SCCDevectorTransformation'>, <class 'loki.transformations.single_column.vector.SCCDemoteTransformation'>, <class 'loki.transformations.single_column.vector.SCCVecRevectorTransformation'>, <class 'loki.transformations.single_column.annotate.SCCAnnotateTransformation'>, <class 'loki.transformations.raw_stack_allocator.TemporariesRawStackTransformation'>, <class 'loki.transformations.pragma_model.PragmaModelTransformation'>))
SCC-style transformation with “vector-parallel” kernels that additionally pre-allocates a “stack” pool allocator and replaces local temporaries with indexed sub-arrays of this preallocated array.
For details of the kernel and driver-side transformations, please refer to
SCCVectorPipeline
In addition, this pipeline will invoke
TemporariesRawStackTransformation
to back the remaining locally allocated arrays from a “stack” pool allocator that is pre-allocated in the driver routine and passed down via arguments.- Parameters:
horizontal (
Dimension
) –Dimension
object describing the variable conventions used in code to define the horizontal data dimension and iteration space.block_dim (
Dimension
) – OptionalDimension
object to define the blocking dimension to use for hoisted column arrays if hoisting is enabled.directive (string or None) – Directives flavour to use for parallelism annotations; either
'openacc'
,'omp-gpu'
orNone
.trim_vector_sections (bool) – Flag to trigger trimming of extracted vector sections to remove nodes that are not assignments involving vector parallel arrays.
demote_local_arrays (bool) – Flag to trigger local array demotion to scalar variables where possible
check_bounds (bool, optional) – Insert bounds-checks in the kernel to make sure the allocated stack size is not exceeded (default: True)
driver_horizontal (str, optional) – Override string if a separate variable name should be used for the horizontal when allocating the stack in the driver.
- SCCSRawStackPipeline = functools.partial(<class 'loki.batch.pipeline.Pipeline'>, classes=(<class 'loki.transformations.single_column.base.SCCBaseTransformation'>, <class 'loki.transformations.single_column.vector.SCCDevectorTransformation'>, <class 'loki.transformations.single_column.vector.SCCDemoteTransformation'>, <class 'loki.transformations.single_column.vector.SCCSeqRevectorTransformation'>, <class 'loki.transformations.single_column.annotate.SCCAnnotateTransformation'>, <class 'loki.transformations.raw_stack_allocator.TemporariesRawStackTransformation'>, <class 'loki.transformations.pragma_model.PragmaModelTransformation'>))
SCC-style transformation with sequential kernels that additionally pre-allocates a “stack” pool allocator and replaces local temporaries with indexed sub-arrays of this preallocated array.
For details of the kernel and driver-side transformations, please refer to
SCCVectorPipeline
In addition, this pipeline will invoke
TemporariesRawStackTransformation
to back the remaining locally allocated arrays from a “stack” pool allocator that is pre-allocated in the driver routine and passed down via arguments.- Parameters:
horizontal (
Dimension
) –Dimension
object describing the variable conventions used in code to define the horizontal data dimension and iteration space.block_dim (
Dimension
) – OptionalDimension
object to define the blocking dimension to use for hoisted column arrays if hoisting is enabled.directive (string or None) – Directives flavour to use for parallelism annotations; either
'openacc'
,'omp-gpu'
orNone
.trim_vector_sections (bool) – Flag to trigger trimming of extracted vector sections to remove nodes that are not assignments involving vector parallel arrays.
demote_local_arrays (bool) – Flag to trigger local array demotion to scalar variables where possible
check_bounds (bool, optional) – Insert bounds-checks in the kernel to make sure the allocated stack size is not exceeded (default: True)
driver_horizontal (str, optional) – Override string if a separate variable name should be used for the horizontal when allocating the stack in the driver.