loki.transformations.single_column.scc_cuf

Single-Column-Coalesced CUDA Fortran (SCC-CUF) transformation.

Functions

device_derived_types(routine, derived_types)

Create device versions of variables of specific derived types including host-device-synchronisation.

device_subroutine_prefix(routine, depth)

Add prefix/specifier ATTRIBUTES(GLOBAL) for kernel subroutines and ATTRIBUTES(DEVICE) for device subroutines.

driver_device_variables(routine[, targets])

Driver device variable versions including

driver_launch_configuration(routine, block_dim)

Launch configuration for kernel calls within the driver with the CUDA Fortran (CUF) specific chevron syntax <<<griddim, blockdim>>>.

dynamic_local_arrays(routine, vertical)

Declaring local arrays with the vertical Dimension to be dynamically allocated.

increase_heap_size(routine)

Increase the heap size via call to cudaDeviceSetLimit needed for version with dynamic memory allocation on the device.

is_elemental(routine)

Check whether Subroutine routine is an elemental routine.

kernel_cuf(routine, horizontal, vertical, ...)

For CUDA Fortran (CUF) kernels and device functions: thread mapping, array dimension transformation, transforming (call) arguments, ...

kernel_demote_private_locals(routine, ...)

Demotes all local variables.

remove_pragmas(routine)

Remove all pragmas.

Classes

HoistTemporaryArraysDeviceAllocatableTransformation([...])

Synthesis part for variable/array hoisting for CUDA Fortran (CUF) (transformation).

HoistTemporaryArraysPragmaOffloadTransformation([...])

Synthesis part for variable/array hoisting, offload via pragmas e.g., OpenACC.

SccCufTransformation(horizontal, vertical, ...)

Single Column Coalesced CUDA Fortran - SCC-CUF: Direct CPU-to-GPU transformation for block-indexed gridpoint routines.

class SccCufTransformation(horizontal, vertical, block_dim, transformation_type='parametrise', derived_types=None)

Bases: Transformation

Single Column Coalesced CUDA Fortran - SCC-CUF: Direct CPU-to-GPU transformation for block-indexed gridpoint routines.

This transformation will remove individiual CPU-style vectorization loops from “kernel” routines and distributes the work for GPU threads according to the CUDA programming model using CUDA Fortran (CUF) syntax.

Note

This requires preprocessing with the DerivedTypeArgumentsTransformation.

Note

In dependence of the transformation type transformation_type, further transformations are necessary:

  • transformation_type = 'parametrise' requires a subsequent ParametriseTransformation transformation with the necessary information to parametrise (at least) the vertical size

  • transformation_type = 'hoist' requires subsequent HoistVariablesAnalysis and HoistVariablesTransformation transformations (e.g. HoistTemporaryArraysAnalysis for analysis and HoistTemporaryArraysTransformationDeviceAllocatable for synthesis)

  • transformation_type = 'dynamic' does not require a subsequent transformation

Parameters:
  • horizontal (Dimension) – Dimension object describing the variable conventions used in code to define the horizontal data dimension and iteration space.

  • vertical (Dimension) – Dimension object describing the variable conventions used in code to define the vertical dimension, as needed to decide array privatization.

  • block_dim (Dimension) – Dimension object to define the blocking dimension to use for hoisted column arrays if hoisting is enabled.

  • transformation_type (str) –

    Kind of SCC-CUF transformation, as automatic arrays currently not supported. Thus automatic arrays need to transformed by either

    • parametrise: parametrising the array dimensions to make the vertical dimension a compile-time constant

    • hoist: host side hoisting of (relevant) arrays

    • dynamic: dynamic memory allocation on the device (not recommended for performance reasons)

transform_subroutine(routine, **kwargs)

Defines the transformation to apply to Subroutine items.

For transformations that modify Subroutine objects, this method should be implemented. It gets called via the dispatch method apply().

Parameters:
  • routine (Subroutine) – The subroutine to be transformed.

  • **kwargs (optional) – Keyword arguments for the transformation.

process_routine_kernel(routine, depth=1, targets=None)

Kernel/Device subroutine specific changes/transformations.

Parameters:
  • routine (Subroutine) – The subroutine (kernel/device subroutine) to process

  • depth (int) – The subroutines depth

process_routine_driver(routine, targets=None)

Driver subroutine specific changes/transformations.

Parameters:

routine (Subroutine) – The subroutine (driver) to process

class HoistTemporaryArraysDeviceAllocatableTransformation(as_kwarguments=False)

Bases: HoistVariablesTransformation

Synthesis part for variable/array hoisting for CUDA Fortran (CUF) (transformation).

driver_variable_declaration(routine, variables)

CUDA Fortran (CUF) Variable/Array device declaration including allocation and de-allocation.

Parameters:
  • routine (Subroutine) – The subroutine to add the variable declaration

  • var (Variable) – The variable to be declared

class HoistTemporaryArraysPragmaOffloadTransformation(as_kwarguments=False)

Bases: HoistVariablesTransformation

Synthesis part for variable/array hoisting, offload via pragmas e.g., OpenACC.

driver_variable_declaration(routine, variables)

Standard Variable/Array declaration including device offload via pragmas.

Parameters:
  • routine (Subroutine) – The subroutine to add the variable declaration

  • var (Variable) – The variable to be declared