loki.transformations.single_column.scc_cuf
Single-Column-Coalesced CUDA Fortran (SCC-CUF) transformation.
Functions
|
Add prefix/specifier ATTRIBUTES(GLOBAL) for kernel subroutines and ATTRIBUTES(DEVICE) for device subroutines. |
|
Remove all pragmas. |
Classes
Synthesis part for variable/array hoisting for CUDA Fortran (CUF) (transformation). |
|
Synthesis part for variable/array hoisting, offload via pragmas e.g., OpenACC. |
|
|
Part of the pipeline for generating Single Column Coalesced Low Level GPU (CUDA Fortran, CUDA C, HIP, ...) for block-indexed gridpoint/single-column routines (responsible for the data offload). |
|
Part of the pipeline for generating Single Column Coalesced Low Level GPU (CUDA Fortran, CUDA C, HIP, ...) for block-indexed gridpoint/single-column routines (responsible for the launch configuration including the chevron notation). |
- class HoistTemporaryArraysDeviceAllocatableTransformation(as_kwarguments=False, remap_dimensions=True)
Bases:
HoistVariablesTransformation
Synthesis part for variable/array hoisting for CUDA Fortran (CUF) (transformation).
- driver_variable_declaration(routine, variables)
CUDA Fortran (CUF) Variable/Array device declaration including allocation and de-allocation.
- Parameters:
routine (
Subroutine
) – The subroutine to add the variable declarationvar (
Variable
) – The variable to be declared
- class HoistTemporaryArraysPragmaOffloadTransformation(as_kwarguments=False, remap_dimensions=True)
Bases:
HoistVariablesTransformation
Synthesis part for variable/array hoisting, offload via pragmas e.g., OpenACC.
- driver_variable_declaration(routine, variables)
Standard Variable/Array declaration including device offload via pragmas.
- Parameters:
routine (
Subroutine
) – The subroutine to add the variable declarationvar (
Variable
) – The variable to be declared
- class SccLowLevelLaunchConfiguration(horizontal, vertical, block_dim, transformation_type='parametrise', mode='CUF')
Bases:
Transformation
Part of the pipeline for generating Single Column Coalesced Low Level GPU (CUDA Fortran, CUDA C, HIP, …) for block-indexed gridpoint/single-column routines (responsible for the launch configuration including the chevron notation).
- transform_subroutine(routine, **kwargs)
Defines the transformation to apply to
Subroutine
items.For transformations that modify
Subroutine
objects, this method should be implemented. It gets called via the dispatch methodapply()
.- Parameters:
routine (
Subroutine
) – The subroutine to be transformed.**kwargs (optional) – Keyword arguments for the transformation.
- process_kernel(routine, depth=1, targets=None)
Kernel/Device subroutine specific changes/transformations. :param routine: The subroutine (kernel/device subroutine) to process :type routine:
Subroutine
:param depth: The subroutines depth :type depth: int
- process_driver(routine, targets=None)
Driver subroutine specific changes/transformations. :param routine: The subroutine (driver) to process :type routine:
Subroutine
- kernel_cuf(routine, horizontal, vertical, block_dim, depth, targets=None)
- static kernel_demote_private_locals(routine, horizontal, vertical)
Demotes all local variables. Array variables whose dimensions include only the vector dimension or known (short) constant dimensions (eg. local vector or matrix arrays) can be privatized without requiring shared GPU memory. Array variables with unknown (at compile time) dimensions (eg. the vertical dimension) cannot be privatized at the vector loop level and should therefore not be demoted here. :param routine: The subroutine to demote the private locals :type routine:
Subroutine
:param horizontal: The dimension object specifying the horizontal vector dimension :type horizontal:Dimension
:param vertical: The dimension object specifying the vertical loop dimension :type vertical:Dimension
- driver_launch_configuration(routine, block_dim, targets=None)
Launch configuration for kernel calls within the driver with the CUDA Fortran (CUF) specific chevron syntax <<<griddim, blockdim>>>. :param routine: The subroutine to specify the launch configurations for kernel calls. :type routine:
Subroutine
:param block_dim: The dimension object specifying the block loop dimension :type block_dim:Dimension
:param targets: Tuple of subroutine call names that are processed in this traversal :type targets: tuple of str
- class SccLowLevelDataOffload(horizontal, vertical, block_dim, transformation_type='parametrise', derived_types=None, mode='CUF')
Bases:
Transformation
Part of the pipeline for generating Single Column Coalesced Low Level GPU (CUDA Fortran, CUDA C, HIP, …) for block-indexed gridpoint/single-column routines (responsible for the data offload).
- transform_subroutine(routine, **kwargs)
Defines the transformation to apply to
Subroutine
items.For transformations that modify
Subroutine
objects, this method should be implemented. It gets called via the dispatch methodapply()
.- Parameters:
routine (
Subroutine
) – The subroutine to be transformed.**kwargs (optional) – Keyword arguments for the transformation.
- process_driver(routine, targets=None)
Driver subroutine specific changes/transformations. :param routine: The subroutine (driver) to process :type routine:
Subroutine
- process_kernel(routine)
Kernel/Device subroutine specific changes/transformations. :param routine: The subroutine (kernel/device subroutine) to process :type routine:
Subroutine
- kernel_cuf(routine, horizontal, block_dim, transformation_type, derived_type_variables)
- device_derived_types(routine, derived_types, targets=None)
Create device versions of variables of specific derived types including host-device-synchronisation. :param routine: The subroutine to create device versions of the specified derived type variables. :type routine:
Subroutine
:param derived_types: Tuple of derived types within the routine :type derived_types: tuple :param targets: Tuple of subroutine call names that are processed in this traversal :type targets: tuple of str
- driver_device_variables(routine, targets=None)
Driver device variable versions including * variable declaration * allocation * host-device synchronisation * de-allocation :param routine: The subroutine (driver) to handle the device variables :type routine:
Subroutine
:param targets: Tuple of subroutine call names that are processed in this traversal :type targets: tuple of str