loki.transformations.single_column.scc_low_level
Module Attributes
The basic Single Column Coalesced low-level GPU via CUDA-Fortran (SCC-CUF). |
|
The Single Column Coalesced low-level GPU via CUDA-Fortran (SCC-CUF) handling temporaries via parametrisation. |
|
The Single Column Coalesced low-level GPU via CUDA-Fortran (SCC-CUF) handling temporaries via hoisting. |
|
The Single Column Coalesced low-level GPU via low-level C-style kernel language (CUDA, HIP, ...) handling temporaries via parametrisation. |
|
The Single Column Coalesced low-level GPU via low-level C-style kernel language (CUDA, HIP, ...) handling temporaries via parametrisation. |
Functions
|
Classes
|
- SCCLowLevelCufHoist = functools.partial(<class 'loki.batch.pipeline.Pipeline'>, classes=(<class 'loki.transformations.single_column.base.SCCBaseTransformation'>, <class 'loki.transformations.single_column.vector.SCCDevectorTransformation'>, <class 'loki.transformations.single_column.vector.SCCDemoteTransformation'>, <class 'loki.transformations.single_column.vector.SCCRevectorTransformation'>, <class 'loki.transformations.block_index_transformations.LowerBlockIndexTransformation'>, <class 'loki.transformations.block_index_transformations.InjectBlockIndexTransformation'>, <class 'loki.transformations.block_index_transformations.LowerBlockLoopTransformation'>, <class 'loki.transformations.single_column.scc_cuf.SccLowLevelLaunchConfiguration'>, <class 'loki.transformations.single_column.scc_cuf.SccLowLevelDataOffload'>, <class 'loki.transformations.hoist_variables.HoistTemporaryArraysAnalysis'>, <class 'loki.transformations.single_column.scc_cuf.HoistTemporaryArraysDeviceAllocatableTransformation'>))
The Single Column Coalesced low-level GPU via CUDA-Fortran (SCC-CUF) handling temporaries via hoisting.
For details of the kernel and driver-side transformations, please refer to
SCCLowLevelCuf
.In addition, this pipeline will invoke
HoistTemporaryArraysAnalysis
andHoistTemporaryArraysDeviceAllocatableTransformation
to hoist temporary arrays.- Parameters:
horizontal (
Dimension
) –Dimension
object describing the variable conventions used in code to define the horizontal data dimension and iteration space.block_dim (
Dimension
) – OptionalDimension
object to define the blocking dimension to use for hoisted column arrays if hoisting is enabled.directive (string or None) – Directives flavour to use for parallelism annotations; either
'openacc'
orNone
.trim_vector_sections (bool) – Flag to trigger trimming of extracted vector sections to remove nodes that are not assignments involving vector parallel arrays.
demote_local_arrays (bool) – Flag to trigger local array demotion to scalar variables where possible
derived_types (tuple) – List of relevant derived types
transformation_type (str) –
Kind of transformation/Handling of temporaries/local arrays
parametrise: parametrising the array dimensions to make the vertical dimension a compile-time constant
hoist: host side hoisting of (relevant) arrays
mode (str) –
Mode/language to target
CUF - CUDA Fortran
CUDA - CUDA C
HIP - HIP
- SCCLowLevelCufParametrise = functools.partial(<class 'loki.batch.pipeline.Pipeline'>, classes=(<class 'loki.transformations.single_column.base.SCCBaseTransformation'>, <class 'loki.transformations.single_column.vector.SCCDevectorTransformation'>, <class 'loki.transformations.single_column.vector.SCCDemoteTransformation'>, <class 'loki.transformations.single_column.vector.SCCRevectorTransformation'>, <class 'loki.transformations.block_index_transformations.LowerBlockIndexTransformation'>, <class 'loki.transformations.block_index_transformations.InjectBlockIndexTransformation'>, <class 'loki.transformations.block_index_transformations.LowerBlockLoopTransformation'>, <class 'loki.transformations.single_column.scc_cuf.SccLowLevelLaunchConfiguration'>, <class 'loki.transformations.single_column.scc_cuf.SccLowLevelDataOffload'>, <class 'loki.transformations.parametrise.ParametriseTransformation'>))
The Single Column Coalesced low-level GPU via CUDA-Fortran (SCC-CUF) handling temporaries via parametrisation.
For details of the kernel and driver-side transformations, please refer to
SCCLowLevelCuf
.In addition, this pipeline will invoke
ParametriseTransformation
to parametrise relevant array dimensions to allow having temporary arrays.- Parameters:
horizontal (
Dimension
) –Dimension
object describing the variable conventions used in code to define the horizontal data dimension and iteration space.block_dim (
Dimension
) – OptionalDimension
object to define the blocking dimension to use for hoisted column arrays if hoisting is enabled.directive (string or None) – Directives flavour to use for parallelism annotations; either
'openacc'
orNone
.trim_vector_sections (bool) – Flag to trigger trimming of extracted vector sections to remove nodes that are not assignments involving vector parallel arrays.
demote_local_arrays (bool) – Flag to trigger local array demotion to scalar variables where possible
derived_types (tuple) – List of relevant derived types
transformation_type (str) –
Kind of transformation/Handling of temporaries/local arrays
parametrise: parametrising the array dimensions to make the vertical dimension a compile-time constant
hoist: host side hoisting of (relevant) arrays
mode (str) –
Mode/language to target
CUF - CUDA Fortran
CUDA - CUDA C
HIP - HIP
dic2p (dict) – Dictionary of variable names and corresponding values to be parametrised.
- SCCLowLevelHoist = functools.partial(<class 'loki.batch.pipeline.Pipeline'>, classes=(<class 'loki.transformations.single_column.scc_low_level.InlineTransformation'>, <class 'loki.transformations.data_offload.GlobalVariableAnalysis'>, <class 'loki.transformations.data_offload.GlobalVarHoistTransformation'>, <class 'loki.transformations.transform_derived_types.DerivedTypeArgumentsTransformation'>, <class 'loki.transformations.single_column.base.SCCBaseTransformation'>, <class 'loki.transformations.single_column.vector.SCCDevectorTransformation'>, <class 'loki.transformations.single_column.vector.SCCDemoteTransformation'>, <class 'loki.transformations.single_column.vector.SCCRevectorTransformation'>, <class 'loki.transformations.block_index_transformations.LowerBlockIndexTransformation'>, <class 'loki.transformations.block_index_transformations.InjectBlockIndexTransformation'>, <class 'loki.transformations.block_index_transformations.LowerBlockLoopTransformation'>, <class 'loki.transformations.single_column.scc_cuf.SccLowLevelLaunchConfiguration'>, <class 'loki.transformations.single_column.scc_cuf.SccLowLevelDataOffload'>, <class 'loki.transformations.hoist_variables.HoistTemporaryArraysAnalysis'>, <class 'loki.transformations.single_column.scc_cuf.HoistTemporaryArraysPragmaOffloadTransformation'>))
The Single Column Coalesced low-level GPU via low-level C-style kernel language (CUDA, HIP, …) handling temporaries via parametrisation.
This tranformation will convert kernels with innermost vectorisation along a common horizontal dimension to a GPU-friendly loop-layout via loop inversion and local array variable demotion. The resulting kernel remains “vector-parallel”, but with the
horizontal
loop as the outermost iteration dimension (as far as data dependencies allow). This allows local temporary arrays to be demoted to scalars, where possible.Kernels are specified via e.g.,
'__global__'
and the number of threads that execute the kernel for a given call is specified via the chevron syntax.This
Pipeline
applies the followingTransformation
classes in sequence: 1.InlineTransformation
- Inline constants and elementalfunctions.
GlobalVariableAnalysis
- Analysis of global variablesGlobalVarHoistTransformation
- Hoist global variables to the driver.DerivedTypeArgumentsTransformation
- Flatten derived types/ remove derived types from procedure signatures by replacing the (relevant) derived type arguments by its member variables.SCCBaseTransformation
- Ensure utility variables and resolve problematic code constructs.SCCDevectorTransformation
- Remove horizontal vector loops.SCCDemoteTransformation
- Demote local temporary array variables where appropriate.SCCRevectorTransformation
- Re-insert the vecotr loops outermost, according to identified vector sections.LowerBlockIndexTransformation
- Lower the block index (for array argument definitions).InjectBlockIndexTransformation
- Complete the previous step
and inject the block index for the relevant arrays.
LowerBlockLoopTransformation
- Lower the block loop
from driver to kernel(s).
SCCLowLevelLaunchConfiguration
- Create launch configuration
and related things.
SCCLowLevelDataOffload
- Create/handle data offload
and related things.
HoistTemporaryArraysAnalysis
- Analysis part of hoisting.HoistTemporaryArraysPragmaOffloadTransformation
- Syntesis part of hoisting.
- Parameters:
horizontal (
Dimension
) –Dimension
object describing the variable conventions used in code to define the horizontal data dimension and iteration space.block_dim (
Dimension
) – OptionalDimension
object to define the blocking dimension to use for hoisted column arrays if hoisting is enabled.directive (string or None) – Directives flavour to use for parallelism annotations; either
'openacc'
orNone
.trim_vector_sections (bool) – Flag to trigger trimming of extracted vector sections to remove nodes that are not assignments involving vector parallel arrays.
demote_local_arrays (bool) – Flag to trigger local array demotion to scalar variables where possible
derived_types (tuple) – List of relevant derived types
transformation_type (str) –
Kind of transformation/Handling of temporaries/local arrays
parametrise: parametrising the array dimensions to make the vertical dimension a compile-time constant
hoist: host side hoisting of (relevant) arrays
mode (str) –
Mode/language to target
CUF - CUDA Fortran
CUDA - CUDA C
HIP - HIP
- SCCLowLevelParametrise = functools.partial(<class 'loki.batch.pipeline.Pipeline'>, classes=(<class 'loki.transformations.single_column.scc_low_level.InlineTransformation'>, <class 'loki.transformations.data_offload.GlobalVariableAnalysis'>, <class 'loki.transformations.data_offload.GlobalVarHoistTransformation'>, <class 'loki.transformations.transform_derived_types.DerivedTypeArgumentsTransformation'>, <class 'loki.transformations.single_column.base.SCCBaseTransformation'>, <class 'loki.transformations.single_column.vector.SCCDevectorTransformation'>, <class 'loki.transformations.single_column.vector.SCCDemoteTransformation'>, <class 'loki.transformations.single_column.vector.SCCRevectorTransformation'>, <class 'loki.transformations.block_index_transformations.LowerBlockIndexTransformation'>, <class 'loki.transformations.block_index_transformations.InjectBlockIndexTransformation'>, <class 'loki.transformations.block_index_transformations.LowerBlockLoopTransformation'>, <class 'loki.transformations.single_column.scc_cuf.SccLowLevelLaunchConfiguration'>, <class 'loki.transformations.single_column.scc_cuf.SccLowLevelDataOffload'>, <class 'loki.transformations.parametrise.ParametriseTransformation'>))
The Single Column Coalesced low-level GPU via low-level C-style kernel language (CUDA, HIP, …) handling temporaries via parametrisation.
This tranformation will convert kernels with innermost vectorisation along a common horizontal dimension to a GPU-friendly loop-layout via loop inversion and local array variable demotion. The resulting kernel remains “vector-parallel”, but with the
horizontal
loop as the outermost iteration dimension (as far as data dependencies allow). This allows local temporary arrays to be demoted to scalars, where possible.Kernels are specified via e.g.,
'__global__'
and the number of threads that execute the kernel for a given call is specified via the chevron syntax.This
Pipeline
applies the followingTransformation
classes in sequence: 1.InlineTransformation
- Inline constants and elementalfunctions.
GlobalVariableAnalysis
- Analysis of global variablesGlobalVarHoistTransformation
- Hoist global variables to the driver.DerivedTypeArgumentsTransformation
- Flatten derived types/ remove derived types from procedure signatures by replacing the (relevant) derived type arguments by its member variables.SCCBaseTransformation
- Ensure utility variables and resolve problematic code constructs.SCCDevectorTransformation
- Remove horizontal vector loops.SCCDemoteTransformation
- Demote local temporary array variables where appropriate.SCCRevectorTransformation
- Re-insert the vecotr loops outermost, according to identified vector sections.LowerBlockIndexTransformation
- Lower the block index (for array argument definitions).InjectBlockIndexTransformation
- Complete the previous step
and inject the block index for the relevant arrays.
LowerBlockLoopTransformation
- Lower the block loop
from driver to kernel(s).
SCCLowLevelLaunchConfiguration
- Create launch configuration
and related things.
SCCLowLevelDataOffload
- Create/handle data offload
and related things.
ParametriseTransformation
- Parametrise according todic2p
.
- Parameters:
horizontal (
Dimension
) –Dimension
object describing the variable conventions used in code to define the horizontal data dimension and iteration space.block_dim (
Dimension
) – OptionalDimension
object to define the blocking dimension to use for hoisted column arrays if hoisting is enabled.directive (string or None) – Directives flavour to use for parallelism annotations; either
'openacc'
orNone
.trim_vector_sections (bool) – Flag to trigger trimming of extracted vector sections to remove nodes that are not assignments involving vector parallel arrays.
demote_local_arrays (bool) – Flag to trigger local array demotion to scalar variables where possible
derived_types (tuple) – List of relevant derived types
transformation_type (str) –
Kind of transformation/Handling of temporaries/local arrays
parametrise: parametrising the array dimensions to make the vertical dimension a compile-time constant
hoist: host side hoisting of (relevant) arrays
mode (str) –
Mode/language to target
CUF - CUDA Fortran
CUDA - CUDA C
HIP - HIP
dic2p (dict) – Dictionary of variable names and corresponding values to be parametrised.
- SCCLowLevelCuf = functools.partial(<class 'loki.batch.pipeline.Pipeline'>, classes=(<class 'loki.transformations.single_column.base.SCCBaseTransformation'>, <class 'loki.transformations.single_column.vector.SCCDevectorTransformation'>, <class 'loki.transformations.single_column.vector.SCCDemoteTransformation'>, <class 'loki.transformations.single_column.vector.SCCRevectorTransformation'>, <class 'loki.transformations.block_index_transformations.LowerBlockIndexTransformation'>, <class 'loki.transformations.block_index_transformations.InjectBlockIndexTransformation'>, <class 'loki.transformations.block_index_transformations.LowerBlockLoopTransformation'>, <class 'loki.transformations.single_column.scc_cuf.SccLowLevelLaunchConfiguration'>, <class 'loki.transformations.single_column.scc_cuf.SccLowLevelDataOffload'>))
The basic Single Column Coalesced low-level GPU via CUDA-Fortran (SCC-CUF).
This tranformation will convert kernels with innermost vectorisation along a common horizontal dimension to a GPU-friendly loop-layout via loop inversion and local array variable demotion. The resulting kernel remains “vector-parallel”, but with the
horizontal
loop as the outermost iteration dimension (as far as data dependencies allow). This allows local temporary arrays to be demoted to scalars, where possible.Kernels are specified via
'GLOBAL'
and the number of threads that execute the kernel for a given call is specified via the chevron syntax.This
Pipeline
applies the followingTransformation
classes in sequence: 1.SCCBaseTransformation
- Ensure utility variables and resolveproblematic code constructs.
SCCDevectorTransformation
- Remove horizontal vector loops.SCCDemoteTransformation
- Demote local temporary array variables where appropriate.SCCRevectorTransformation
- Re-insert the vecotr loops outermost, according to identified vector sections.LowerBlockIndexTransformation
- Lower the block index (for array argument definitions).InjectBlockIndexTransformation
- Complete the previous step and inject the block index for the relevant arrays.LowerBlockLoopTransformation
- Lower the block loop from driver to kernel(s).SCCLowLevelLaunchConfiguration
- Create launch configuration and related things.SCCLowLevelDataOffload
- Create/handle data offload and related things.
- Parameters:
horizontal (
Dimension
) –Dimension
object describing the variable conventions used in code to define the horizontal data dimension and iteration space.block_dim (
Dimension
) – OptionalDimension
object to define the blocking dimension to use for hoisted column arrays if hoisting is enabled.directive (string or None) – Directives flavour to use for parallelism annotations; either
'openacc'
orNone
.trim_vector_sections (bool) – Flag to trigger trimming of extracted vector sections to remove nodes that are not assignments involving vector parallel arrays.
demote_local_arrays (bool) – Flag to trigger local array demotion to scalar variables where possible
derived_types (tuple) – List of relevant derived types
transformation_type (str) –
Kind of transformation/Handling of temporaries/local arrays
parametrise: parametrising the array dimensions to make the vertical dimension a compile-time constant
hoist: host side hoisting of (relevant) arrays
mode (str) –
Mode/language to target
CUF - CUDA Fortran
CUDA - CUDA C
HIP - HIP