loki.transformations.data_offload.offload_deepcopy

Functions

create_nested_dict(k, v, variable_map)

Create nested dict from derived-type expression.

get_sanitised_arg_map(arg_map)

Return sanitised mapping of dummy argument names to arguments.

map_derived_type_arguments(arg_map, analysis)

Map the root variable of derived-type dummy argument components to the corresponding argument.

merge_nested_dict(ref_dict, temp_dict[, force])

Merge nested dicts.

strip_nested_dimensions(expr)

Strip dimensions from array expressions of arbitrary derived-type nesting depth.

Classes

DataOffloadDeepcopyAnalysis([output_analysis])

A transformation pass to analyse the usage of subroutine arguments in a call-tree.

DataOffloadDeepcopyTransformation(mode)

A transformation that generates a deepcopy of all the arguments to a GPU kernel.

DeepcopyDataflowAnalysisAttacher(**kwargs)

Dummy argument intents in Fortran also have implications on memory status, and INTENT(OUT) is therefore fundamentally unsafe for allocatables and pointers.

class DataOffloadDeepcopyAnalysis(output_analysis=False)

Bases: Transformation

A transformation pass to analyse the usage of subroutine arguments in a call-tree.

The resulting analysis is a nested dict, of nesting depth equal to the longest derived-type expression, containing the access mode of all the arguments used in a call-tree. For example, the following assignments:

would yield the following analysis:

The analysis is stored in the Item.trafo_data of the Item corresponding to the driver layer Subroutine. It should be noted that the analysis is stored per driver-layer loop. The driver’s Item.trafo_data also contains Scheduler config entries corresponding to the derived-types used throughout the call-tree in a typedef_configs dict.

Parameters:

output_analysis (bool) – If enabled, the analysis is written to disk as yaml files. For kernels, the files are named routine.name_dataoffload_analysis.yaml. For drivers, the files are named driver_target-name_offload_analysis.yaml, where “target-name” is the name of the first target routine in a given driver loop.

reverse_traversal = True

Traversal from the leaves upwards

item_filter = (<class 'loki.batch.item.ProcedureItem'>, <class 'loki.batch.item.TypeDefItem'>)
process_ignored_items = True
transform_subroutine(routine, **kwargs)

Defines the transformation to apply to Subroutine items.

For transformations that modify Subroutine objects, this method should be implemented. It gets called via the dispatch method apply().

Parameters:
  • routine (Subroutine) – The subroutine to be transformed.

  • **kwargs (optional) – Keyword arguments for the transformation.

stringify_dict(_dict)

Stringify expression keys of a nested dict.

process_driver(routine, item, successors, targets, **kwargs)
process_kernel(routine, item, successors, **kwargs)
gather_analysis_from_children(successor_map)

Gather analysis from callees.

gather_typedef_configs(successors, typedef_configs)

Gather typedef configs from children.

transform_module(module, **kwargs)

Cache the current type definition config for later reuse.

class DataOffloadDeepcopyTransformation(mode)

Bases: Transformation

A transformation that generates a deepcopy of all the arguments to a GPU kernel. It relies on the analysis gathered by the DataOffloadDeepcopyAnalysis transformation, which must therefore be run before this. Please note that the analysis and deepcopy are per driver-loop, which must be wrapped in a !$loki data PragmaRegion.

An underlying assumption of the transformation is that expressions used as lvalues and rvalues are of type BasicType, i.e. the data encompassed by a derived-type variable a with components b and c is only ever accessed or modified via fully qualified derived-type expressions a%b or a%c. The only accepted exception to this are memory status checks such as ubound, lbound, size etc.

The encompassing !$loki data PragmaRegion can be used to to pass hints to the transformation. Consider the following example:

!$loki data present(a) write(b)
do ibl=1,nblks
   call kernel(a, b, ...)
enddo
!$loki end data

Marking a as present instructs the transformation to skip the deepcopy generation for it and simply place it in a !$loki structured-data present clause. Marking b as write means the contents of the analysis are overriden and the generated deepcopy for b assumes write-only access. Other hints that can be passed to the deepcopy generation are:

  • read: Assume read-only access for the specified variables.

  • readwrite: Assume read-write access for the specified variables.

  • device_resident: Don’t copy the specificied variables back to host and

    leave the device allocation intact.

  • temporary: Wipe the device allocation of the specified variables but

    don’t copy them back to host.

The transformation supports two modes:
  • offload: Generate device-host deepcopy for the arguments passed to the

    encompassed call-tree.

  • set_pointers: Generate the FIELD_API boiler-plate to set host pointers

    for any argument representing a field.

Parameters:

mode (str) – Transformation mode, must be either “offload” or “set_pointers”.

field_array_match_pattern = re.compile('^field_[0-9][a-z][a-z]_array')
transform_subroutine(routine, **kwargs)

Defines the transformation to apply to Subroutine items.

For transformations that modify Subroutine objects, this method should be implemented. It gets called via the dispatch method apply().

Parameters:
  • routine (Subroutine) – The subroutine to be transformed.

  • **kwargs (optional) – Keyword arguments for the transformation.

static update_with_manual_overrides(parameters, analysis, variable_map)

Update analysis with manual overrides specified in !loki data pragma.

static get_pragma_vars(parameters, category)
insert_deepcopy_instructions(region, mode, copy, host, wipe, present_vars)

Insert the generated deepcopy instructions and wrap the driver loop in a data present pragma region if applicable.

process_driver(routine, analyses, typedef_configs, targets)
wrap_in_loopnest(var, body, routine)

Wrap body in loop nest corresponding to the shape of var.

static create_memory_status_test(check, var, body, scope)

Wrap a given body in a memory status check.

static enter_data_copyin(var)

Generate unstructured data copyin instruction.

static enter_data_create(var)

Generate unstructured data create instruction.

static enter_data_attach(var)

Generate unstructured data attach instruction.

static exit_data_detach(var)

Generate unstructured data detach instruction.

static exit_data_delete(var)

Generate unstructured data delete instruction.

static update_self(var)

Pull back data to host.

create_field_api_offload(var, analysis, typedef_config, parent, scope)
create_dummy_field_array_typedef_config(parent)

The scheduler will never traverse the FIELD_RANKSUFF_ARRAY type definitions, so we create a dummy typedef config here.

generate_deepcopy(routine, **kwargs)

Recursively traverse the deepcopy analysis to generate the deepcopy instructions.