Transformation pipelines
Important
Loki is still under active development and has not yet seen a stable release. Interfaces can change at any time, objects may be renamed, or concepts may be re-thought. Make sure to sync your work to the current release frequently by rebasing feature branches and upstreaming more general applicable work in the form of pull requests.
Transformations
Transformations are the building blocks of a transformation pipeline in Loki.
They encode the workflow of converting a Sourcefile
or an individual
program unit (such as Module
or Subroutine
) to the desired
output format.
A transformation can encode a single modification, combine multiple steps, or call other transformations to create complex changes. If a transformation depends on another transformation, inheritance can be used to combine them.
Every transformation in a pipeline should implement the interface defined by
Transformation
. It provides generic entry points for transforming
different objects and thus allows for batch processing. To implement a new
transformation, only one or all of the relevant methods
Transformation.transform_subroutine
,
Transformation.transform_module
, or Transformation.transform_file
need to be implemented.
Example: A transformation that inserts a comment at the beginning of every module and subroutine:
class InsertCommentTransformation(Transformation):
def _insert_comment(self, program_unit):
program_unit.spec.prepend(Comment(text='! Processed by Loki'))
def transform_subroutine(self, routine, **kwargs):
self._insert_comment(routine)
def transform_module(self, module, **kwargs):
self._insert_comment(module)
The transformation can be applied by calling apply()
with
the relevant object.
source = Sourcefile(...) # may contain modules and subroutines
transformation = InsertCommentTransformation()
for module in source.modules:
transformation.apply(module)
for routine in source.all_subroutines:
transformation.apply(routine)
Note that we have to apply the transformation separately for every
relevant ProgramUnit
. The transformation can also be modified
such that it is automatically applied to all program units in a file,
despite only implementing logic for transforming modules and subroutines:
class InsertCommentTransformation(Transformation):
# When called on a Sourcefile, automatically apply this to all modules
# in the file
recurse_to_modules = True
# When called on a Sourcefile or Module, automatically apply this to all
# Subroutines in the file or module
recurse_to_procedures = True
def _insert_comment(self, program_unit):
program_unit.spec.prepend(Comment(text='! Processed by Loki'))
def transform_subroutine(self, routine, **kwargs):
self._insert_comment(routine)
def transform_module(self, module, **kwargs):
self._insert_comment(module)
With these two attributes added, we can now apply the transformation to all modules and procedures in a single command:
source = Sourcefile(...) # may contain modules and subroutines
transformation = InsertCommentTransformation()
transformation.apply(source)
Most transformations, however, will only require modifying those parts of a file that are part of the call tree that is to be transformed to avoid unexpected side-effects.
Typically, transformations should be implemented by users to encode the
transformation pipeline for their individual use-case. However, Loki comes
with a growing number of built-in transformations that are implemented in
the loki.transformations
namespace:
Sub-package with supported source code transformation passes. |
This includes also a number of tools for common transformation tasks that are provided as functions that can be readily used when implementing new transformations.
Batch processing large source trees
Transformations can be applied over source trees using the Scheduler
.
It is a work queue manager that automatically discovers source files in a list
of paths and builds a dependency graph from a given starting point.
This dependency graph includes all called procedures and imported modules.
Calling Scheduler.process
on a source tree and providing it with a
Transformation
applies this transformation to all files, modules, or
routines that appear in the dependency graph. The exact traversal
behaviour can be parameterized in the implementation of the Transformation
.
The behaviour modifications include:
limiting the processing only to specific node types in the dependency graph
reversing the traversal direction, i.e., called routines or imported modules are processed before their caller, such that the starting point/root of the dependency is processed last
traversing the file graph, i.e., processing full source files rather than individual routines or modules
automatic recursion into contained program units, e.g., processing also all procedures in a module after the module has been processed
When applying the transformation to an item in the source tree, the scheduler provides certain information about the item to the transformation:
the transformation mode (provided in the scheduler’s config),
the item’s role (e.g.,
'driver'
or'kernel'
, configurable via the scheduler’s config), andtargets (dependencies that are depended on by the currently processed item, and are included in the scheduler’s tree, i.e., are processed, too).
Note
The scheduler’s dependency graph will include all dependency types it discovers. This includes not only control-flow dependencies via procedure calls, but also dependencies on other modules via the import of global variables, or dependencies on derived type definitions.
However, for backwards-compatibility with the original scheduler implementation,
only control-flow dependencies are followed and processed by default, and reported
as items
in Scheduler.items
. To remove this limitation, which is required
e.g., for the GlobalVarOffloadTransformation
, the enable_imports
option
can be set to True
. This can be done in the [default]
block of the config,
or as a constructor argument in the Scheduler
.
The Scheduler’s dependency graph
The Scheduler
builds a dependency graph consisting of Item
instances as nodes. Every item corresponds to a specific node in Loki’s
internal representation.
The name of an item refers to a symbol using a fully-qualified name in the
format: <scope_name>#<local_name>
. The <scope_name>
corresponds to
a Fortran module, in which a subroutine, interface or derived type is
declared. That declaration’s name (e.g., the name of the subroutine)
constitutes the <local_name>
part. For subroutines that are not embedded
into a module, the <scope_name>
is empty, i.e., the item’s name starts with
a dash (#
).
In most cases these IR nodes are scopes and the entry points for transformations:
FileItem
corresponds toSourcefile
ModuleItem
corresponds toModule
ProcedureItem
corresponds toSubroutine
The remaining cases are items corresponding to IR nodes that constitute some form of intermediate dependency, which are required to resolve the indirection to the scope node:
InterfaceItem
corresponding toInterface
, i.e., providing a callable target that resolves to one or multiple procedures that are defined in the interface.ProcedureBindingItem
corresponding to theProcedureSymbol
that is declared in aDeclaration
in a derived type. Similarly to interfaces, these resolve to one or multiple procedures that are defined in the procedure binding inside the derived type.TypeDefItem
corresponding toTypeDef
, which does not introduce a control flow dependency but is crucial to capture as a dependency to enable annotating type information for inter-procedural analysis.
Finally, ExternalItem
denotes items that the scheduler was unable to discover.
The expected item type of the missing item is stored in ExternalItem.origin_cls
.
When batch processing a transformation, the external items are ignored, unless the
config option strict=True
is enabled. In that case, an error will be issued when
an external item is encountered that matches the item_filter
that is provided by
the transformation’s manifest (in Transformation.item_filter
).
To facilitate the creation of the dependency tree, every Item
provides two key properties:
Item.definitions
: A list of all IR nodes that constitute symbols/names that are made available by an item. For aFileItem
, this typically consists of all modules and procedures in that sourcefile, and for aModuleItem
it comprises of procedures, interfaces, global variables and derived type definitions.Item.dependencies
: A list of all IR nodes that introduce a dependency on other items, e.g.,CallStatement
orImport
.
This information is used to populate the scheduler’s dependency graph, which is
constructed by the SGraph
class. Importantly, to improve processing speed
and limit parsing to the minimum of required files, this relies on incremental
parsing using the REGEX
frontend. Starting with only the top-level program
units in every discovered source file and a specified seed, the dependencies of each
item are used to determine the next set of items, which are generated on-demand
from the enclosing scope via partial re-parses. This may incur incremental parsing
with additional RegexParserClass
enabled to discover definitions or dependencies
as required. Only once the full dependency graph has been generated, a full parse
of the source files in the graph is performed, providing the complete internal
representation and automatically enriching type information with inter-procedural annotations.
Pruning the dependency graph
If the intention is not to process some items it is recommended to not
leave them dangling as ExternalItem
. Instead, they should be explicitly
excluded from the dependency graph and the strict
mode enabled.
To exclude specific items, any of the following annotations can be used, resulting in
different behaviour:
disable
: Dependency items matching an entry in this list are treated as if they don’t exist, and their definitions are not searched for or parsed. This is useful, e.g., to exclude frequently used utility routines or modules (such as the yomhook module in IFS), which are not to be transformed.block
: Dependency items matching an entry in this list are not parsed or added to the dependency graph, and therefore excluded from transformations. They are, however, included for reference in the dependency graph visualization produced byScheduler.callgraph
.ignore
: Dependency items matching an entry in this list are parsed and added to the dependency graph. This makes their definitions available for enrichment but they are not processed by default. Transformations can include them during batch processing by enabling theTransformation.process_ignored_items
option. A typical use case for this are dependencies that are part of a separate compilation target (and therefore transformed separately), but analysis passes may need to collect information across an entire call tree (e.g., use of temporary arrays).
These three lists can be supplied globally in the [default]
section of the scheduler
config file, or per routine. The matching of items against entries in these lists is
supports basic patterns (via fnmatch
), and is also effective for entire scopes.
For example, a subroutine my_routine
that is defined in a module my_mod
would be
matched by any of the following:
my_routine
my_mod
my_mod#my_routine
*_routine
By default, all items are expanded during dependency discovery, i.e., for every item
all dependencies are added to the graph, and then dependencies of these dependencies are
added as well. This procedure continues until all dependencies have been included.
For individual items, this expansion can be disabled by setting expand=False
for
them in the scheduler config.
Filtering graph traversals
Often, only specific item types are of interest when traversing the dependency graph.
For that purpose, the SFilter
class provides an iterator for an SGraph
,
which allows specifying an item_filter
or reversing the direction of traversals.
Other traversal modes may be added in the future.
|
Work queue manager to discover and capture dependencies for a given call tree, and apply transformations for batch processing |
|
The dependency graph underpinning the |
|
Filtered iterator over a |
|
Configuration object for the |
Configuration object for |
|
|
|
|
Base class of a work item in the |
|
Item class representing a |
|
Item class representing a |
|
Item class representing a |
|
Item class representing a |
|
Item class representing a Fortran procedure binding |
|
Item class representing a |
Utility class to instantiate instances of |