loki.transformations.single_column.scc_cuf
Single-Column-Coalesced CUDA Fortran (SCC-CUF) transformation.
Functions
|
Create device versions of variables of specific derived types including host-device-synchronisation. |
|
Add prefix/specifier ATTRIBUTES(GLOBAL) for kernel subroutines and ATTRIBUTES(DEVICE) for device subroutines. |
|
Driver device variable versions including |
|
Launch configuration for kernel calls within the driver with the CUDA Fortran (CUF) specific chevron syntax <<<griddim, blockdim>>>. |
|
Declaring local arrays with the |
|
Increase the heap size via call to cudaDeviceSetLimit needed for version with dynamic memory allocation on the device. |
|
Check whether |
|
For CUDA Fortran (CUF) kernels and device functions: thread mapping, array dimension transformation, transforming (call) arguments, ... |
|
Demotes all local variables. |
|
Remove all pragmas. |
Classes
Synthesis part for variable/array hoisting for CUDA Fortran (CUF) (transformation). |
|
Synthesis part for variable/array hoisting, offload via pragmas e.g., OpenACC. |
|
|
Single Column Coalesced CUDA Fortran - SCC-CUF: Direct CPU-to-GPU transformation for block-indexed gridpoint routines. |
- class SccCufTransformation(horizontal, vertical, block_dim, transformation_type='parametrise', derived_types=None)
Bases:
Transformation
Single Column Coalesced CUDA Fortran - SCC-CUF: Direct CPU-to-GPU transformation for block-indexed gridpoint routines.
This transformation will remove individiual CPU-style vectorization loops from “kernel” routines and distributes the work for GPU threads according to the CUDA programming model using CUDA Fortran (CUF) syntax.
Note
This requires preprocessing with the
DerivedTypeArgumentsTransformation
.Note
In dependence of the transformation type
transformation_type
, further transformations are necessary:transformation_type = 'parametrise'
requires a subsequentParametriseTransformation
transformation with the necessary information to parametrise (at least) thevertical
sizetransformation_type = 'hoist'
requires subsequentHoistVariablesAnalysis
andHoistVariablesTransformation
transformations (e.g.HoistTemporaryArraysAnalysis
for analysis andHoistTemporaryArraysTransformationDeviceAllocatable
for synthesis)transformation_type = 'dynamic'
does not require a subsequent transformation
- Parameters:
horizontal (
Dimension
) –Dimension
object describing the variable conventions used in code to define the horizontal data dimension and iteration space.vertical (
Dimension
) –Dimension
object describing the variable conventions used in code to define the vertical dimension, as needed to decide array privatization.block_dim (
Dimension
) –Dimension
object to define the blocking dimension to use for hoisted column arrays if hoisting is enabled.transformation_type (str) –
Kind of SCC-CUF transformation, as automatic arrays currently not supported. Thus automatic arrays need to transformed by either
parametrise: parametrising the array dimensions to make the vertical dimension a compile-time constant
hoist: host side hoisting of (relevant) arrays
dynamic: dynamic memory allocation on the device (not recommended for performance reasons)
- transform_subroutine(routine, **kwargs)
Defines the transformation to apply to
Subroutine
items.For transformations that modify
Subroutine
objects, this method should be implemented. It gets called via the dispatch methodapply()
.- Parameters:
routine (
Subroutine
) – The subroutine to be transformed.**kwargs (optional) – Keyword arguments for the transformation.
- process_routine_kernel(routine, depth=1, targets=None)
Kernel/Device subroutine specific changes/transformations.
- Parameters:
routine (
Subroutine
) – The subroutine (kernel/device subroutine) to processdepth (int) – The subroutines depth
- process_routine_driver(routine, targets=None)
Driver subroutine specific changes/transformations.
- Parameters:
routine (
Subroutine
) – The subroutine (driver) to process
- class HoistTemporaryArraysDeviceAllocatableTransformation(as_kwarguments=False)
Bases:
HoistVariablesTransformation
Synthesis part for variable/array hoisting for CUDA Fortran (CUF) (transformation).
- driver_variable_declaration(routine, variables)
CUDA Fortran (CUF) Variable/Array device declaration including allocation and de-allocation.
- Parameters:
routine (
Subroutine
) – The subroutine to add the variable declarationvar (
Variable
) – The variable to be declared
- class HoistTemporaryArraysPragmaOffloadTransformation(as_kwarguments=False)
Bases:
HoistVariablesTransformation
Synthesis part for variable/array hoisting, offload via pragmas e.g., OpenACC.
- driver_variable_declaration(routine, variables)
Standard Variable/Array declaration including device offload via pragmas.
- Parameters:
routine (
Subroutine
) – The subroutine to add the variable declarationvar (
Variable
) – The variable to be declared