loki.transformations.data_offload.offload_deepcopy
Functions
|
Create nested dict from derived-type expression. |
|
Return sanitised mapping of dummy argument names to arguments. |
|
Map the root variable of derived-type dummy argument components to the corresponding argument. |
|
Merge nested dicts. |
|
Strip dimensions from array expressions of arbitrary derived-type nesting depth. |
Classes
|
A transformation pass to analyse the usage of subroutine arguments in a call-tree. |
A transformation that generates a deepcopy of all the arguments to a GPU kernel. |
|
|
Dummy argument intents in Fortran also have implications on memory status, and INTENT(OUT) is therefore fundamentally unsafe for allocatables and pointers. |
- class DataOffloadDeepcopyAnalysis(output_analysis=False)
Bases:
Transformation
A transformation pass to analyse the usage of subroutine arguments in a call-tree.
The resulting analysis is a nested dict, of nesting depth equal to the longest derived-type expression, containing the access mode of all the arguments used in a call-tree. For example, the following assignments:
would yield the following analysis:
The analysis is stored in the
Item.trafo_data
of theItem
corresponding to the driver layerSubroutine
. It should be noted that the analysis is stored per driver-layer loop. The driver’sItem.trafo_data
also containsScheduler
config entries corresponding to the derived-types used throughout the call-tree in atypedef_configs
dict.- Parameters:
output_analysis (bool) – If enabled, the analysis is written to disk as yaml files. For kernels, the files are named routine.name_dataoffload_analysis.yaml. For drivers, the files are named driver_target-name_offload_analysis.yaml, where “target-name” is the name of the first target routine in a given driver loop.
- reverse_traversal = True
Traversal from the leaves upwards
- item_filter = (<class 'loki.batch.item.ProcedureItem'>, <class 'loki.batch.item.TypeDefItem'>)
- process_ignored_items = True
- transform_subroutine(routine, **kwargs)
Defines the transformation to apply to
Subroutine
items.For transformations that modify
Subroutine
objects, this method should be implemented. It gets called via the dispatch methodapply()
.- Parameters:
routine (
Subroutine
) – The subroutine to be transformed.**kwargs (optional) – Keyword arguments for the transformation.
- stringify_dict(_dict)
Stringify expression keys of a nested dict.
- process_driver(routine, item, successors, targets, **kwargs)
- process_kernel(routine, item, successors, **kwargs)
- gather_analysis_from_children(successor_map)
Gather analysis from callees.
- gather_typedef_configs(successors, typedef_configs)
Gather typedef configs from children.
- transform_module(module, **kwargs)
Cache the current type definition config for later reuse.
- class DataOffloadDeepcopyTransformation(mode)
Bases:
Transformation
A transformation that generates a deepcopy of all the arguments to a GPU kernel. It relies on the analysis gathered by the
DataOffloadDeepcopyAnalysis
transformation, which must therefore be run before this. Please note that the analysis and deepcopy are per driver-loop, which must be wrapped in a !$loki dataPragmaRegion
.An underlying assumption of the transformation is that expressions used as lvalues and rvalues are of type
BasicType
, i.e. the data encompassed by a derived-type variablea
with componentsb
andc
is only ever accessed or modified via fully qualified derived-type expressionsa%b
ora%c
. The only accepted exception to this are memory status checks such asubound
,lbound
,size
etc.The encompassing !$loki data
PragmaRegion
can be used to to pass hints to the transformation. Consider the following example:!$loki data present(a) write(b) do ibl=1,nblks call kernel(a, b, ...) enddo !$loki end data
Marking
a
aspresent
instructs the transformation to skip the deepcopy generation for it and simply place it in a!$loki structured-data present
clause. Markingb
aswrite
means the contents of the analysis are overriden and the generated deepcopy forb
assumes write-only access. Other hints that can be passed to the deepcopy generation are:read: Assume read-only access for the specified variables.
readwrite: Assume read-write access for the specified variables.
- device_resident: Don’t copy the specificied variables back to host and
leave the device allocation intact.
- temporary: Wipe the device allocation of the specified variables but
don’t copy them back to host.
- The transformation supports two modes:
- offload: Generate device-host deepcopy for the arguments passed to the
encompassed call-tree.
- set_pointers: Generate the FIELD_API boiler-plate to set host pointers
for any argument representing a field.
- Parameters:
mode (str) – Transformation mode, must be either “offload” or “set_pointers”.
- field_array_match_pattern = re.compile('^field_[0-9][a-z][a-z]_array')
- transform_subroutine(routine, **kwargs)
Defines the transformation to apply to
Subroutine
items.For transformations that modify
Subroutine
objects, this method should be implemented. It gets called via the dispatch methodapply()
.- Parameters:
routine (
Subroutine
) – The subroutine to be transformed.**kwargs (optional) – Keyword arguments for the transformation.
- static update_with_manual_overrides(parameters, analysis, variable_map)
Update analysis with manual overrides specified in !loki data pragma.
- static get_pragma_vars(parameters, category)
- insert_deepcopy_instructions(region, mode, copy, host, wipe, present_vars)
Insert the generated deepcopy instructions and wrap the driver loop in a data present pragma region if applicable.
- process_driver(routine, analyses, typedef_configs, targets)
- wrap_in_loopnest(var, body, routine)
Wrap body in loop nest corresponding to the shape of var.
- static create_memory_status_test(check, var, body, scope)
Wrap a given body in a memory status check.
- static enter_data_copyin(var)
Generate unstructured data copyin instruction.
- static enter_data_create(var)
Generate unstructured data create instruction.
- static enter_data_attach(var)
Generate unstructured data attach instruction.
- static exit_data_detach(var)
Generate unstructured data detach instruction.
- static exit_data_delete(var)
Generate unstructured data delete instruction.
- static update_self(var)
Pull back data to host.
- create_field_api_offload(var, analysis, typedef_config, parent, scope)
- create_dummy_field_array_typedef_config(parent)
The scheduler will never traverse the FIELD_RANKSUFF_ARRAY type definitions, so we create a dummy typedef config here.
- generate_deepcopy(routine, **kwargs)
Recursively traverse the deepcopy analysis to generate the deepcopy instructions.