loki.transformations.single_column.scc_low_level

Module Attributes

SCCLowLevelCuf

The basic Single Column Coalesced low-level GPU via CUDA-Fortran (SCC-CUF).

SCCLowLevelCufParametrise

The Single Column Coalesced low-level GPU via CUDA-Fortran (SCC-CUF) handling temporaries via parametrisation.

SCCLowLevelCufHoist

The Single Column Coalesced low-level GPU via CUDA-Fortran (SCC-CUF) handling temporaries via hoisting.

SCCLowLevelParametrise

The Single Column Coalesced low-level GPU via low-level C-style kernel language (CUDA, HIP, ...) handling temporaries via parametrisation.

SCCLowLevelHoist

The Single Column Coalesced low-level GPU via low-level C-style kernel language (CUDA, HIP, ...) handling temporaries via parametrisation.

Functions

inline_elemental_kernel(routine, **kwargs)

Classes

InlineTransformation()

SCCLowLevelCufHoist = functools.partial(<class 'loki.batch.pipeline.Pipeline'>, classes=(<class 'loki.transformations.single_column.base.SCCBaseTransformation'>, <class 'loki.transformations.single_column.vector.SCCDevectorTransformation'>, <class 'loki.transformations.single_column.vector.SCCDemoteTransformation'>, <class 'loki.transformations.single_column.vector.SCCRevectorTransformation'>, <class 'loki.transformations.block_index_transformations.LowerBlockIndexTransformation'>, <class 'loki.transformations.block_index_transformations.InjectBlockIndexTransformation'>, <class 'loki.transformations.block_index_transformations.LowerBlockLoopTransformation'>, <class 'loki.transformations.single_column.scc_cuf.SccLowLevelLaunchConfiguration'>, <class 'loki.transformations.single_column.scc_cuf.SccLowLevelDataOffload'>, <class 'loki.transformations.hoist_variables.HoistTemporaryArraysAnalysis'>, <class 'loki.transformations.single_column.scc_cuf.HoistTemporaryArraysDeviceAllocatableTransformation'>))

The Single Column Coalesced low-level GPU via CUDA-Fortran (SCC-CUF) handling temporaries via hoisting.

For details of the kernel and driver-side transformations, please refer to SCCLowLevelCuf.

In addition, this pipeline will invoke HoistTemporaryArraysAnalysis and HoistTemporaryArraysDeviceAllocatableTransformation to hoist temporary arrays.

Parameters:
  • horizontal (Dimension) – Dimension object describing the variable conventions used in code to define the horizontal data dimension and iteration space.

  • block_dim (Dimension) – Optional Dimension object to define the blocking dimension to use for hoisted column arrays if hoisting is enabled.

  • directive (string or None) – Directives flavour to use for parallelism annotations; either 'openacc' or None.

  • trim_vector_sections (bool) – Flag to trigger trimming of extracted vector sections to remove nodes that are not assignments involving vector parallel arrays.

  • demote_local_arrays (bool) – Flag to trigger local array demotion to scalar variables where possible

  • derived_types (tuple) – List of relevant derived types

  • transformation_type (str) –

    Kind of transformation/Handling of temporaries/local arrays

    • parametrise: parametrising the array dimensions to make the vertical dimension a compile-time constant

    • hoist: host side hoisting of (relevant) arrays

  • mode (str) –

    Mode/language to target

    • CUF - CUDA Fortran

    • CUDA - CUDA C

    • HIP - HIP

SCCLowLevelCufParametrise = functools.partial(<class 'loki.batch.pipeline.Pipeline'>, classes=(<class 'loki.transformations.single_column.base.SCCBaseTransformation'>, <class 'loki.transformations.single_column.vector.SCCDevectorTransformation'>, <class 'loki.transformations.single_column.vector.SCCDemoteTransformation'>, <class 'loki.transformations.single_column.vector.SCCRevectorTransformation'>, <class 'loki.transformations.block_index_transformations.LowerBlockIndexTransformation'>, <class 'loki.transformations.block_index_transformations.InjectBlockIndexTransformation'>, <class 'loki.transformations.block_index_transformations.LowerBlockLoopTransformation'>, <class 'loki.transformations.single_column.scc_cuf.SccLowLevelLaunchConfiguration'>, <class 'loki.transformations.single_column.scc_cuf.SccLowLevelDataOffload'>, <class 'loki.transformations.parametrise.ParametriseTransformation'>))

The Single Column Coalesced low-level GPU via CUDA-Fortran (SCC-CUF) handling temporaries via parametrisation.

For details of the kernel and driver-side transformations, please refer to SCCLowLevelCuf.

In addition, this pipeline will invoke ParametriseTransformation to parametrise relevant array dimensions to allow having temporary arrays.

Parameters:
  • horizontal (Dimension) – Dimension object describing the variable conventions used in code to define the horizontal data dimension and iteration space.

  • block_dim (Dimension) – Optional Dimension object to define the blocking dimension to use for hoisted column arrays if hoisting is enabled.

  • directive (string or None) – Directives flavour to use for parallelism annotations; either 'openacc' or None.

  • trim_vector_sections (bool) – Flag to trigger trimming of extracted vector sections to remove nodes that are not assignments involving vector parallel arrays.

  • demote_local_arrays (bool) – Flag to trigger local array demotion to scalar variables where possible

  • derived_types (tuple) – List of relevant derived types

  • transformation_type (str) –

    Kind of transformation/Handling of temporaries/local arrays

    • parametrise: parametrising the array dimensions to make the vertical dimension a compile-time constant

    • hoist: host side hoisting of (relevant) arrays

  • mode (str) –

    Mode/language to target

    • CUF - CUDA Fortran

    • CUDA - CUDA C

    • HIP - HIP

  • dic2p (dict) – Dictionary of variable names and corresponding values to be parametrised.

SCCLowLevelHoist = functools.partial(<class 'loki.batch.pipeline.Pipeline'>, classes=(<class 'loki.transformations.single_column.scc_low_level.InlineTransformation'>, <class 'loki.transformations.data_offload.GlobalVariableAnalysis'>, <class 'loki.transformations.data_offload.GlobalVarHoistTransformation'>, <class 'loki.transformations.transform_derived_types.DerivedTypeArgumentsTransformation'>, <class 'loki.transformations.single_column.base.SCCBaseTransformation'>, <class 'loki.transformations.single_column.vector.SCCDevectorTransformation'>, <class 'loki.transformations.single_column.vector.SCCDemoteTransformation'>, <class 'loki.transformations.single_column.vector.SCCRevectorTransformation'>, <class 'loki.transformations.block_index_transformations.LowerBlockIndexTransformation'>, <class 'loki.transformations.block_index_transformations.InjectBlockIndexTransformation'>, <class 'loki.transformations.block_index_transformations.LowerBlockLoopTransformation'>, <class 'loki.transformations.single_column.scc_cuf.SccLowLevelLaunchConfiguration'>, <class 'loki.transformations.single_column.scc_cuf.SccLowLevelDataOffload'>, <class 'loki.transformations.hoist_variables.HoistTemporaryArraysAnalysis'>, <class 'loki.transformations.single_column.scc_cuf.HoistTemporaryArraysPragmaOffloadTransformation'>))

The Single Column Coalesced low-level GPU via low-level C-style kernel language (CUDA, HIP, …) handling temporaries via parametrisation.

This tranformation will convert kernels with innermost vectorisation along a common horizontal dimension to a GPU-friendly loop-layout via loop inversion and local array variable demotion. The resulting kernel remains “vector-parallel”, but with the horizontal loop as the outermost iteration dimension (as far as data dependencies allow). This allows local temporary arrays to be demoted to scalars, where possible.

Kernels are specified via e.g., '__global__' and the number of threads that execute the kernel for a given call is specified via the chevron syntax.

This Pipeline applies the following Transformation classes in sequence: 1. InlineTransformation - Inline constants and elemental

functions.

  1. GlobalVariableAnalysis - Analysis of global variables

  2. GlobalVarHoistTransformation - Hoist global variables to the driver.

  3. DerivedTypeArgumentsTransformation - Flatten derived types/ remove derived types from procedure signatures by replacing the (relevant) derived type arguments by its member variables.

  4. SCCBaseTransformation - Ensure utility variables and resolve problematic code constructs.

  5. SCCDevectorTransformation - Remove horizontal vector loops.

  6. SCCDemoteTransformation - Demote local temporary array variables where appropriate.

  7. SCCRevectorTransformation - Re-insert the vecotr loops outermost, according to identified vector sections.

  8. LowerBlockIndexTransformation - Lower the block index (for array argument definitions).

  9. InjectBlockIndexTransformation - Complete the previous step

and inject the block index for the relevant arrays.

  1. LowerBlockLoopTransformation - Lower the block loop

from driver to kernel(s).

  1. SCCLowLevelLaunchConfiguration - Create launch configuration

and related things.

  1. SCCLowLevelDataOffload - Create/handle data offload

and related things.

  1. HoistTemporaryArraysAnalysis - Analysis part of hoisting.

  2. HoistTemporaryArraysPragmaOffloadTransformation - Syntesis part of hoisting.

Parameters:
  • horizontal (Dimension) – Dimension object describing the variable conventions used in code to define the horizontal data dimension and iteration space.

  • block_dim (Dimension) – Optional Dimension object to define the blocking dimension to use for hoisted column arrays if hoisting is enabled.

  • directive (string or None) – Directives flavour to use for parallelism annotations; either 'openacc' or None.

  • trim_vector_sections (bool) – Flag to trigger trimming of extracted vector sections to remove nodes that are not assignments involving vector parallel arrays.

  • demote_local_arrays (bool) – Flag to trigger local array demotion to scalar variables where possible

  • derived_types (tuple) – List of relevant derived types

  • transformation_type (str) –

    Kind of transformation/Handling of temporaries/local arrays

    • parametrise: parametrising the array dimensions to make the vertical dimension a compile-time constant

    • hoist: host side hoisting of (relevant) arrays

  • mode (str) –

    Mode/language to target

    • CUF - CUDA Fortran

    • CUDA - CUDA C

    • HIP - HIP

SCCLowLevelParametrise = functools.partial(<class 'loki.batch.pipeline.Pipeline'>, classes=(<class 'loki.transformations.single_column.scc_low_level.InlineTransformation'>, <class 'loki.transformations.data_offload.GlobalVariableAnalysis'>, <class 'loki.transformations.data_offload.GlobalVarHoistTransformation'>, <class 'loki.transformations.transform_derived_types.DerivedTypeArgumentsTransformation'>, <class 'loki.transformations.single_column.base.SCCBaseTransformation'>, <class 'loki.transformations.single_column.vector.SCCDevectorTransformation'>, <class 'loki.transformations.single_column.vector.SCCDemoteTransformation'>, <class 'loki.transformations.single_column.vector.SCCRevectorTransformation'>, <class 'loki.transformations.block_index_transformations.LowerBlockIndexTransformation'>, <class 'loki.transformations.block_index_transformations.InjectBlockIndexTransformation'>, <class 'loki.transformations.block_index_transformations.LowerBlockLoopTransformation'>, <class 'loki.transformations.single_column.scc_cuf.SccLowLevelLaunchConfiguration'>, <class 'loki.transformations.single_column.scc_cuf.SccLowLevelDataOffload'>, <class 'loki.transformations.parametrise.ParametriseTransformation'>))

The Single Column Coalesced low-level GPU via low-level C-style kernel language (CUDA, HIP, …) handling temporaries via parametrisation.

This tranformation will convert kernels with innermost vectorisation along a common horizontal dimension to a GPU-friendly loop-layout via loop inversion and local array variable demotion. The resulting kernel remains “vector-parallel”, but with the horizontal loop as the outermost iteration dimension (as far as data dependencies allow). This allows local temporary arrays to be demoted to scalars, where possible.

Kernels are specified via e.g., '__global__' and the number of threads that execute the kernel for a given call is specified via the chevron syntax.

This Pipeline applies the following Transformation classes in sequence: 1. InlineTransformation - Inline constants and elemental

functions.

  1. GlobalVariableAnalysis - Analysis of global variables

  2. GlobalVarHoistTransformation - Hoist global variables to the driver.

  3. DerivedTypeArgumentsTransformation - Flatten derived types/ remove derived types from procedure signatures by replacing the (relevant) derived type arguments by its member variables.

  4. SCCBaseTransformation - Ensure utility variables and resolve problematic code constructs.

  5. SCCDevectorTransformation - Remove horizontal vector loops.

  6. SCCDemoteTransformation - Demote local temporary array variables where appropriate.

  7. SCCRevectorTransformation - Re-insert the vecotr loops outermost, according to identified vector sections.

  8. LowerBlockIndexTransformation - Lower the block index (for array argument definitions).

  9. InjectBlockIndexTransformation - Complete the previous step

and inject the block index for the relevant arrays.

  1. LowerBlockLoopTransformation - Lower the block loop

from driver to kernel(s).

  1. SCCLowLevelLaunchConfiguration - Create launch configuration

and related things.

  1. SCCLowLevelDataOffload - Create/handle data offload

and related things.

  1. ParametriseTransformation - Parametrise according to dic2p.

Parameters:
  • horizontal (Dimension) – Dimension object describing the variable conventions used in code to define the horizontal data dimension and iteration space.

  • block_dim (Dimension) – Optional Dimension object to define the blocking dimension to use for hoisted column arrays if hoisting is enabled.

  • directive (string or None) – Directives flavour to use for parallelism annotations; either 'openacc' or None.

  • trim_vector_sections (bool) – Flag to trigger trimming of extracted vector sections to remove nodes that are not assignments involving vector parallel arrays.

  • demote_local_arrays (bool) – Flag to trigger local array demotion to scalar variables where possible

  • derived_types (tuple) – List of relevant derived types

  • transformation_type (str) –

    Kind of transformation/Handling of temporaries/local arrays

    • parametrise: parametrising the array dimensions to make the vertical dimension a compile-time constant

    • hoist: host side hoisting of (relevant) arrays

  • mode (str) –

    Mode/language to target

    • CUF - CUDA Fortran

    • CUDA - CUDA C

    • HIP - HIP

  • dic2p (dict) – Dictionary of variable names and corresponding values to be parametrised.

SCCLowLevelCuf = functools.partial(<class 'loki.batch.pipeline.Pipeline'>, classes=(<class 'loki.transformations.single_column.base.SCCBaseTransformation'>, <class 'loki.transformations.single_column.vector.SCCDevectorTransformation'>, <class 'loki.transformations.single_column.vector.SCCDemoteTransformation'>, <class 'loki.transformations.single_column.vector.SCCRevectorTransformation'>, <class 'loki.transformations.block_index_transformations.LowerBlockIndexTransformation'>, <class 'loki.transformations.block_index_transformations.InjectBlockIndexTransformation'>, <class 'loki.transformations.block_index_transformations.LowerBlockLoopTransformation'>, <class 'loki.transformations.single_column.scc_cuf.SccLowLevelLaunchConfiguration'>, <class 'loki.transformations.single_column.scc_cuf.SccLowLevelDataOffload'>))

The basic Single Column Coalesced low-level GPU via CUDA-Fortran (SCC-CUF).

This tranformation will convert kernels with innermost vectorisation along a common horizontal dimension to a GPU-friendly loop-layout via loop inversion and local array variable demotion. The resulting kernel remains “vector-parallel”, but with the horizontal loop as the outermost iteration dimension (as far as data dependencies allow). This allows local temporary arrays to be demoted to scalars, where possible.

Kernels are specified via 'GLOBAL' and the number of threads that execute the kernel for a given call is specified via the chevron syntax.

This Pipeline applies the following Transformation classes in sequence: 1. SCCBaseTransformation - Ensure utility variables and resolve

problematic code constructs.

  1. SCCDevectorTransformation - Remove horizontal vector loops.

  2. SCCDemoteTransformation - Demote local temporary array variables where appropriate.

  3. SCCRevectorTransformation - Re-insert the vecotr loops outermost, according to identified vector sections.

  4. LowerBlockIndexTransformation - Lower the block index (for array argument definitions).

  5. InjectBlockIndexTransformation - Complete the previous step and inject the block index for the relevant arrays.

  6. LowerBlockLoopTransformation - Lower the block loop from driver to kernel(s).

  7. SCCLowLevelLaunchConfiguration - Create launch configuration and related things.

  8. SCCLowLevelDataOffload - Create/handle data offload and related things.

Parameters:
  • horizontal (Dimension) – Dimension object describing the variable conventions used in code to define the horizontal data dimension and iteration space.

  • block_dim (Dimension) – Optional Dimension object to define the blocking dimension to use for hoisted column arrays if hoisting is enabled.

  • directive (string or None) – Directives flavour to use for parallelism annotations; either 'openacc' or None.

  • trim_vector_sections (bool) – Flag to trigger trimming of extracted vector sections to remove nodes that are not assignments involving vector parallel arrays.

  • demote_local_arrays (bool) – Flag to trigger local array demotion to scalar variables where possible

  • derived_types (tuple) – List of relevant derived types

  • transformation_type (str) –

    Kind of transformation/Handling of temporaries/local arrays

    • parametrise: parametrising the array dimensions to make the vertical dimension a compile-time constant

    • hoist: host side hoisting of (relevant) arrays

  • mode (str) –

    Mode/language to target

    • CUF - CUDA Fortran

    • CUDA - CUDA C

    • HIP - HIP