loki.transformations.single_column.scc_cuf

Single-Column-Coalesced CUDA Fortran (SCC-CUF) transformation.

Functions

device_subroutine_prefix(routine, depth)

Add prefix/specifier ATTRIBUTES(GLOBAL) for kernel subroutines and ATTRIBUTES(DEVICE) for device subroutines.

remove_non_loki_pragmas(routine)

Remove all pragmas.

Classes

HoistTemporaryArraysDeviceAllocatableTransformation([...])

Synthesis part for variable/array hoisting for CUDA Fortran (CUF) (transformation).

HoistTemporaryArraysPragmaOffloadTransformation([...])

Synthesis part for variable/array hoisting, offload via pragmas e.g., OpenACC.

SccLowLevelDataOffload(horizontal, vertical, ...)

Part of the pipeline for generating Single Column Coalesced Low Level GPU (CUDA Fortran, CUDA C, HIP, ...) for block-indexed gridpoint/single-column routines (responsible for the data offload).

SccLowLevelLaunchConfiguration(horizontal, ...)

Part of the pipeline for generating Single Column Coalesced Low Level GPU (CUDA Fortran, CUDA C, HIP, ...) for block-indexed gridpoint/single-column routines (responsible for the launch configuration including the chevron notation).

class HoistTemporaryArraysDeviceAllocatableTransformation(as_kwarguments=False, remap_dimensions=True)

Bases: HoistVariablesTransformation

Synthesis part for variable/array hoisting for CUDA Fortran (CUF) (transformation).

driver_variable_declaration(routine, variables)

CUDA Fortran (CUF) Variable/Array device declaration including allocation and de-allocation.

Parameters:
  • routine (Subroutine) – The subroutine to add the variable declaration

  • var (Variable) – The variable to be declared

class HoistTemporaryArraysPragmaOffloadTransformation(as_kwarguments=False, remap_dimensions=True)

Bases: HoistVariablesTransformation

Synthesis part for variable/array hoisting, offload via pragmas e.g., OpenACC.

driver_variable_declaration(routine, variables)

Standard Variable/Array declaration including device offload via pragmas.

Parameters:
  • routine (Subroutine) – The subroutine to add the variable declaration

  • var (Variable) – The variable to be declared

class SccLowLevelLaunchConfiguration(horizontal, vertical, block_dim, transformation_type='parametrise', mode='CUF')

Bases: Transformation

Part of the pipeline for generating Single Column Coalesced Low Level GPU (CUDA Fortran, CUDA C, HIP, …) for block-indexed gridpoint/single-column routines (responsible for the launch configuration including the chevron notation).

transform_subroutine(routine, **kwargs)

Defines the transformation to apply to Subroutine items.

For transformations that modify Subroutine objects, this method should be implemented. It gets called via the dispatch method apply().

Parameters:
  • routine (Subroutine) – The subroutine to be transformed.

  • **kwargs (optional) – Keyword arguments for the transformation.

process_kernel(routine, depth=1, targets=None)

Kernel/Device subroutine specific changes/transformations. :param routine: The subroutine (kernel/device subroutine) to process :type routine: Subroutine :param depth: The subroutines depth :type depth: int

process_driver(routine, targets=None)

Driver subroutine specific changes/transformations. :param routine: The subroutine (driver) to process :type routine: Subroutine

kernel_cuf(routine, horizontal, vertical, block_dim, depth, targets=None)
static kernel_demote_private_locals(routine, horizontal, vertical)

Demotes all local variables. Array variables whose dimensions include only the vector dimension or known (short) constant dimensions (eg. local vector or matrix arrays) can be privatized without requiring shared GPU memory. Array variables with unknown (at compile time) dimensions (eg. the vertical dimension) cannot be privatized at the vector loop level and should therefore not be demoted here. :param routine: The subroutine to demote the private locals :type routine: Subroutine :param horizontal: The dimension object specifying the horizontal vector dimension :type horizontal: Dimension :param vertical: The dimension object specifying the vertical loop dimension :type vertical: Dimension

driver_launch_configuration(routine, block_dim, targets=None)

Launch configuration for kernel calls within the driver with the CUDA Fortran (CUF) specific chevron syntax <<<griddim, blockdim>>>. :param routine: The subroutine to specify the launch configurations for kernel calls. :type routine: Subroutine :param block_dim: The dimension object specifying the block loop dimension :type block_dim: Dimension :param targets: Tuple of subroutine call names that are processed in this traversal :type targets: tuple of str

class SccLowLevelDataOffload(horizontal, vertical, block_dim, transformation_type='parametrise', derived_types=None, mode='CUF')

Bases: Transformation

Part of the pipeline for generating Single Column Coalesced Low Level GPU (CUDA Fortran, CUDA C, HIP, …) for block-indexed gridpoint/single-column routines (responsible for the data offload).

transform_subroutine(routine, **kwargs)

Defines the transformation to apply to Subroutine items.

For transformations that modify Subroutine objects, this method should be implemented. It gets called via the dispatch method apply().

Parameters:
  • routine (Subroutine) – The subroutine to be transformed.

  • **kwargs (optional) – Keyword arguments for the transformation.

process_driver(routine, targets=None)

Driver subroutine specific changes/transformations. :param routine: The subroutine (driver) to process :type routine: Subroutine

process_kernel(routine)

Kernel/Device subroutine specific changes/transformations. :param routine: The subroutine (kernel/device subroutine) to process :type routine: Subroutine

kernel_cuf(routine, horizontal, block_dim, transformation_type, derived_type_variables)
device_derived_types(routine, derived_types, targets=None)

Create device versions of variables of specific derived types including host-device-synchronisation. :param routine: The subroutine to create device versions of the specified derived type variables. :type routine: Subroutine :param derived_types: Tuple of derived types within the routine :type derived_types: tuple :param targets: Tuple of subroutine call names that are processed in this traversal :type targets: tuple of str

driver_device_variables(routine, targets=None)

Driver device variable versions including * variable declaration * allocation * host-device synchronisation * de-allocation :param routine: The subroutine (driver) to handle the device variables :type routine: Subroutine :param targets: Tuple of subroutine call names that are processed in this traversal :type targets: tuple of str