loki.transformations.single_column.scc

Module Attributes

SCCVVectorPipeline

The basic Single Column Coalesced (SCC) transformation with vector-level kernel parallelism.

SCCSVectorPipeline

The basic Single Column Coalesced (SCC) transformation with sequential kernels.

SCCVHoistPipeline

SCC-style transformation with "vector-parallel" kernels that additionally hoists local temporary arrays that cannot be demoted to the outer driver call.

SCCSHoistPipeline

SCC-style transformation with sequential kernels that additionally hoists local temporary arrays that cannot be demoted to the outer driver call.

SCCVStackPipeline

SCC-style transformation with "vector-parallel" kernels that additionally pre-allocates a "stack" pool allocator and associates local arrays with preallocated memory.

SCCSStackPipeline

SCC-style transformation with sequential kernels that additionally pre-allocates a "stack" pool allocator and associates local arrays with preallocated memory.

SCCVRawStackPipeline

SCC-style transformation with "vector-parallel" kernels that additionally pre-allocates a "stack" pool allocator and replaces local temporaries with indexed sub-arrays of this preallocated array.

SCCSRawStackPipeline

SCC-style transformation with sequential kernels that additionally pre-allocates a "stack" pool allocator and replaces local temporaries with indexed sub-arrays of this preallocated array.

SCCVVectorPipeline = functools.partial(<class 'loki.batch.pipeline.Pipeline'>, classes=(<class 'loki.transformations.single_column.vertical.SCCFuseVerticalLoops'>, <class 'loki.transformations.single_column.base.SCCBaseTransformation'>, <class 'loki.transformations.single_column.vector.SCCDevectorTransformation'>, <class 'loki.transformations.single_column.vector.SCCDemoteTransformation'>, <class 'loki.transformations.single_column.vector.SCCVecRevectorTransformation'>, <class 'loki.transformations.single_column.annotate.SCCAnnotateTransformation'>, <class 'loki.transformations.pragma_model.PragmaModelTransformation'>))

The basic Single Column Coalesced (SCC) transformation with vector-level kernel parallelism.

This tranformation will convert kernels with innermost vectorisation along a common horizontal dimension to a GPU-friendly loop-layout via loop inversion and local array variable demotion. The resulting kernel remains “vector-parallel”, but with the horizontal loop as the outermost iteration dimension (as far as data dependencies allow). This allows local temporary arrays to be demoted to scalars, where possible.

The outer “driver” loop over blocks is used as the secondary dimension of parallelism, where the outher data indexing dimension (block_dim) is resolved in the first call to a “kernel” routine. This is equivalent to a so-called “gang-vector” parallelisation scheme.

This Pipeline applies the following Transformation classes in sequence: 1. SCCBaseTransformation - Ensure utility variables and resolve

problematic code constructs.

  1. SCCDevectorTransformation - Remove horizontal vector loops.

  2. SCCDemoteTransformation - Demote local temporary array variables where appropriate.

  3. SCCVecRevectorTransformation - Re-insert the vector loops outermost, according to identified vector sections.

  4. SCCAnnotateTransformation - Annotate loops according to programming model (directive).

Parameters:
  • horizontal (Dimension) – Dimension object describing the variable conventions used in code to define the horizontal data dimension and iteration space.

  • block_dim (Dimension) – Optional Dimension object to define the blocking dimension to use for hoisted column arrays if hoisting is enabled.

  • directive (string or None) – Directives flavour to use for parallelism annotations; either 'openacc', 'omp-gpu' or None.

  • trim_vector_sections (bool) – Flag to trigger trimming of extracted vector sections to remove nodes that are not assignments involving vector parallel arrays.

  • demote_local_arrays (bool) – Flag to trigger local array demotion to scalar variables where possible

SCCSVectorPipeline = functools.partial(<class 'loki.batch.pipeline.Pipeline'>, classes=(<class 'loki.transformations.single_column.vertical.SCCFuseVerticalLoops'>, <class 'loki.transformations.single_column.base.SCCBaseTransformation'>, <class 'loki.transformations.single_column.vector.SCCDevectorTransformation'>, <class 'loki.transformations.single_column.vector.SCCDemoteTransformation'>, <class 'loki.transformations.single_column.vector.SCCSeqRevectorTransformation'>, <class 'loki.transformations.single_column.annotate.SCCAnnotateTransformation'>, <class 'loki.transformations.pragma_model.PragmaModelTransformation'>))

The basic Single Column Coalesced (SCC) transformation with sequential kernels.

This tranformation will convert kernels with innermost vectorisation along a common horizontal dimension to a GPU-friendly loop-layout via loop inversion and local array variable demotion. The resulting kernel becomes sequential as the horizontal loop is hoisted to the driver and the loop index becomes an argument to the kernel(s). Moreover, this allows local temporary arrays to be demoted to scalars, where possible.

The outer “driver” loop over blocks is used as the secondary dimension of parallelism, where the outher data indexing dimension (block_dim) is resolved in the first call to a “kernel” routine. This is equivalent to a so-called “gang-vector” parallelisation scheme.

This Pipeline applies the following Transformation classes in sequence: 1. SCCBaseTransformation - Ensure utility variables and resolve

problematic code constructs.

  1. SCCDevectorTransformation - Remove horizontal vector loops.

  2. SCCDemoteTransformation - Demote local temporary array variables where appropriate.

  3. SCCSeqRevectorTransformation - Re-insert the vector loops outermost, according to identified vector sections.

  4. SCCAnnotateTransformation - Annotate loops according to programming model (directive).

Parameters:
  • horizontal (Dimension) – Dimension object describing the variable conventions used in code to define the horizontal data dimension and iteration space.

  • block_dim (Dimension) – Optional Dimension object to define the blocking dimension to use for hoisted column arrays if hoisting is enabled.

  • directive (string or None) – Directives flavour to use for parallelism annotations; either 'openacc', 'omp-gpu' or None.

  • trim_vector_sections (bool) – Flag to trigger trimming of extracted vector sections to remove nodes that are not assignments involving vector parallel arrays.

  • demote_local_arrays (bool) – Flag to trigger local array demotion to scalar variables where possible

SCCVHoistPipeline = functools.partial(<class 'loki.batch.pipeline.Pipeline'>, classes=(<class 'loki.transformations.single_column.vertical.SCCFuseVerticalLoops'>, <class 'loki.transformations.single_column.base.SCCBaseTransformation'>, <class 'loki.transformations.single_column.vector.SCCDevectorTransformation'>, <class 'loki.transformations.single_column.vector.SCCDemoteTransformation'>, <class 'loki.transformations.single_column.vector.SCCVecRevectorTransformation'>, <class 'loki.transformations.hoist_variables.HoistTemporaryArraysAnalysis'>, <class 'loki.transformations.single_column.hoist.SCCHoistTemporaryArraysTransformation'>, <class 'loki.transformations.single_column.annotate.SCCAnnotateTransformation'>, <class 'loki.transformations.pragma_model.PragmaModelTransformation'>))

SCC-style transformation with “vector-parallel” kernels that additionally hoists local temporary arrays that cannot be demoted to the outer driver call.

For details of the kernel and driver-side transformations, please refer to SCCVVectorPipeline

In addition, this pipeline will invoke HoistTemporaryArraysAnalysis and SCCHoistTemporaryArraysTransformation before the final annotation step to hoist multi-dimensional local temporary array variables to the “driver” routine, where they will be allocated on device and passed down as arguments.

Parameters:
  • horizontal (Dimension) – Dimension object describing the variable conventions used in code to define the horizontal data dimension and iteration space.

  • block_dim (Dimension) – Optional Dimension object to define the blocking dimension to use for hoisted column arrays if hoisting is enabled.

  • directive (string or None) – Directives flavour to use for parallelism annotations; either 'openacc', 'omp-gpu' or None.

  • trim_vector_sections (bool) – Flag to trigger trimming of extracted vector sections to remove nodes that are not assignments involving vector parallel arrays.

  • demote_local_arrays (bool) – Flag to trigger local array demotion to scalar variables where possible

  • dim_vars (tuple of str, optional) – Variables to be within the dimensions of the arrays to be hoisted. If not provided, no checks will be done for the array dimensions in HoistTemporaryArraysAnalysis.

SCCSHoistPipeline = functools.partial(<class 'loki.batch.pipeline.Pipeline'>, classes=(<class 'loki.transformations.single_column.vertical.SCCFuseVerticalLoops'>, <class 'loki.transformations.single_column.base.SCCBaseTransformation'>, <class 'loki.transformations.single_column.vector.SCCDevectorTransformation'>, <class 'loki.transformations.single_column.vector.SCCDemoteTransformation'>, <class 'loki.transformations.single_column.vector.SCCSeqRevectorTransformation'>, <class 'loki.transformations.hoist_variables.HoistTemporaryArraysAnalysis'>, <class 'loki.transformations.single_column.hoist.SCCHoistTemporaryArraysTransformation'>, <class 'loki.transformations.single_column.annotate.SCCAnnotateTransformation'>, <class 'loki.transformations.pragma_model.PragmaModelTransformation'>))

SCC-style transformation with sequential kernels that additionally hoists local temporary arrays that cannot be demoted to the outer driver call.

For details of the kernel and driver-side transformations, please refer to SCCSVectorPipeline

In addition, this pipeline will invoke HoistTemporaryArraysAnalysis and SCCHoistTemporaryArraysTransformation before the final annotation step to hoist multi-dimensional local temporary array variables to the “driver” routine, where they will be allocated on device and passed down as arguments.

Parameters:
  • horizontal (Dimension) – Dimension object describing the variable conventions used in code to define the horizontal data dimension and iteration space.

  • block_dim (Dimension) – Optional Dimension object to define the blocking dimension to use for hoisted column arrays if hoisting is enabled.

  • directive (string or None) – Directives flavour to use for parallelism annotations; either 'openacc', 'omp-gpu' or None.

  • trim_vector_sections (bool) – Flag to trigger trimming of extracted vector sections to remove nodes that are not assignments involving vector parallel arrays.

  • demote_local_arrays (bool) – Flag to trigger local array demotion to scalar variables where possible

  • dim_vars (tuple of str, optional) – Variables to be within the dimensions of the arrays to be hoisted. If not provided, no checks will be done for the array dimensions in HoistTemporaryArraysAnalysis.

SCCVStackPipeline = functools.partial(<class 'loki.batch.pipeline.Pipeline'>, classes=(<class 'loki.transformations.single_column.vertical.SCCFuseVerticalLoops'>, <class 'loki.transformations.single_column.base.SCCBaseTransformation'>, <class 'loki.transformations.single_column.vector.SCCDevectorTransformation'>, <class 'loki.transformations.single_column.vector.SCCDemoteTransformation'>, <class 'loki.transformations.single_column.vector.SCCVecRevectorTransformation'>, <class 'loki.transformations.single_column.annotate.SCCAnnotateTransformation'>, <class 'loki.transformations.pool_allocator.TemporariesPoolAllocatorTransformation'>, <class 'loki.transformations.pragma_model.PragmaModelTransformation'>))

SCC-style transformation with “vector-parallel” kernels that additionally pre-allocates a “stack” pool allocator and associates local arrays with preallocated memory.

For details of the kernel and driver-side transformations, please refer to SCCVVectorPipeline

In addition, this pipeline will invoke TemporariesPoolAllocatorTransformation to back the remaining locally allocated arrays from a “stack” pool allocator that is pre-allocated in the driver routine and passed down via arguments.

Parameters:
  • horizontal (Dimension) – Dimension object describing the variable conventions used in code to define the horizontal data dimension and iteration space.

  • block_dim (Dimension) – Optional Dimension object to define the blocking dimension to use for hoisted column arrays if hoisting is enabled.

  • directive (string or None) – Directives flavour to use for parallelism annotations; either 'openacc', 'omp-gpu' or None.

  • trim_vector_sections (bool) – Flag to trigger trimming of extracted vector sections to remove nodes that are not assignments involving vector parallel arrays.

  • demote_local_arrays (bool) – Flag to trigger local array demotion to scalar variables where possible

  • check_bounds (bool, optional) – Insert bounds-checks in the kernel to make sure the allocated stack size is not exceeded (default: True)

SCCSStackPipeline = functools.partial(<class 'loki.batch.pipeline.Pipeline'>, classes=(<class 'loki.transformations.single_column.vertical.SCCFuseVerticalLoops'>, <class 'loki.transformations.single_column.base.SCCBaseTransformation'>, <class 'loki.transformations.single_column.vector.SCCDevectorTransformation'>, <class 'loki.transformations.single_column.vector.SCCDemoteTransformation'>, <class 'loki.transformations.single_column.vector.SCCSeqRevectorTransformation'>, <class 'loki.transformations.single_column.annotate.SCCAnnotateTransformation'>, <class 'loki.transformations.pool_allocator.TemporariesPoolAllocatorTransformation'>, <class 'loki.transformations.pragma_model.PragmaModelTransformation'>))

SCC-style transformation with sequential kernels that additionally pre-allocates a “stack” pool allocator and associates local arrays with preallocated memory.

For details of the kernel and driver-side transformations, please refer to SCCSVectorPipeline

In addition, this pipeline will invoke TemporariesPoolAllocatorTransformation to back the remaining locally allocated arrays from a “stack” pool allocator that is pre-allocated in the driver routine and passed down via arguments.

Parameters:
  • horizontal (Dimension) – Dimension object describing the variable conventions used in code to define the horizontal data dimension and iteration space.

  • block_dim (Dimension) – Optional Dimension object to define the blocking dimension to use for hoisted column arrays if hoisting is enabled.

  • directive (string or None) – Directives flavour to use for parallelism annotations; either 'openacc', 'omp-gpu' or None.

  • trim_vector_sections (bool) – Flag to trigger trimming of extracted vector sections to remove nodes that are not assignments involving vector parallel arrays.

  • demote_local_arrays (bool) – Flag to trigger local array demotion to scalar variables where possible

  • check_bounds (bool, optional) – Insert bounds-checks in the kernel to make sure the allocated stack size is not exceeded (default: True)

SCCVRawStackPipeline = functools.partial(<class 'loki.batch.pipeline.Pipeline'>, classes=(<class 'loki.transformations.single_column.base.SCCBaseTransformation'>, <class 'loki.transformations.single_column.vector.SCCDevectorTransformation'>, <class 'loki.transformations.single_column.vector.SCCDemoteTransformation'>, <class 'loki.transformations.single_column.vector.SCCVecRevectorTransformation'>, <class 'loki.transformations.single_column.annotate.SCCAnnotateTransformation'>, <class 'loki.transformations.raw_stack_allocator.TemporariesRawStackTransformation'>, <class 'loki.transformations.pragma_model.PragmaModelTransformation'>))

SCC-style transformation with “vector-parallel” kernels that additionally pre-allocates a “stack” pool allocator and replaces local temporaries with indexed sub-arrays of this preallocated array.

For details of the kernel and driver-side transformations, please refer to SCCVectorPipeline

In addition, this pipeline will invoke TemporariesRawStackTransformation to back the remaining locally allocated arrays from a “stack” pool allocator that is pre-allocated in the driver routine and passed down via arguments.

Parameters:
  • horizontal (Dimension) – Dimension object describing the variable conventions used in code to define the horizontal data dimension and iteration space.

  • block_dim (Dimension) – Optional Dimension object to define the blocking dimension to use for hoisted column arrays if hoisting is enabled.

  • directive (string or None) – Directives flavour to use for parallelism annotations; either 'openacc', 'omp-gpu' or None.

  • trim_vector_sections (bool) – Flag to trigger trimming of extracted vector sections to remove nodes that are not assignments involving vector parallel arrays.

  • demote_local_arrays (bool) – Flag to trigger local array demotion to scalar variables where possible

  • check_bounds (bool, optional) – Insert bounds-checks in the kernel to make sure the allocated stack size is not exceeded (default: True)

  • driver_horizontal (str, optional) – Override string if a separate variable name should be used for the horizontal when allocating the stack in the driver.

SCCSRawStackPipeline = functools.partial(<class 'loki.batch.pipeline.Pipeline'>, classes=(<class 'loki.transformations.single_column.base.SCCBaseTransformation'>, <class 'loki.transformations.single_column.vector.SCCDevectorTransformation'>, <class 'loki.transformations.single_column.vector.SCCDemoteTransformation'>, <class 'loki.transformations.single_column.vector.SCCSeqRevectorTransformation'>, <class 'loki.transformations.single_column.annotate.SCCAnnotateTransformation'>, <class 'loki.transformations.raw_stack_allocator.TemporariesRawStackTransformation'>, <class 'loki.transformations.pragma_model.PragmaModelTransformation'>))

SCC-style transformation with sequential kernels that additionally pre-allocates a “stack” pool allocator and replaces local temporaries with indexed sub-arrays of this preallocated array.

For details of the kernel and driver-side transformations, please refer to SCCVectorPipeline

In addition, this pipeline will invoke TemporariesRawStackTransformation to back the remaining locally allocated arrays from a “stack” pool allocator that is pre-allocated in the driver routine and passed down via arguments.

Parameters:
  • horizontal (Dimension) – Dimension object describing the variable conventions used in code to define the horizontal data dimension and iteration space.

  • block_dim (Dimension) – Optional Dimension object to define the blocking dimension to use for hoisted column arrays if hoisting is enabled.

  • directive (string or None) – Directives flavour to use for parallelism annotations; either 'openacc', 'omp-gpu' or None.

  • trim_vector_sections (bool) – Flag to trigger trimming of extracted vector sections to remove nodes that are not assignments involving vector parallel arrays.

  • demote_local_arrays (bool) – Flag to trigger local array demotion to scalar variables where possible

  • check_bounds (bool, optional) – Insert bounds-checks in the kernel to make sure the allocated stack size is not exceeded (default: True)

  • driver_horizontal (str, optional) – Override string if a separate variable name should be used for the horizontal when allocating the stack in the driver.