Loki’s internal representation (IR)

Important

Loki is still under active development and has not yet seen a stable release. Interfaces can change at any time, objects may be renamed, or concepts may be re-thought. Make sure to sync your work to the current release frequently by rebasing feature branches and upstreaming more general applicable work in the form of pull requests.

Loki’s internal representation aims to achieve a balance between usability and general applicability. This means that in places there may be shortcuts taken to ease its use in the context of a source-to-source translation utility but may break with established practices in compiler theory. The IR was developed with Fortran source code in mind and that shows. Where there exist similar concepts in other languages, things are transferable. In other places, Fortran-specific annotations are included for the sole purpose of enabling string reproducibility.

The internal representation is vertically divided into different layers, roughly aligned with high level concepts found in Fortran and other programming languages:

Container data structures

Outermost are container data structures that conceptually translate to Fortran’s program-units, such as modules and subprograms.

Fortran modules are represented by Module objects which comprise a specification part (Module.spec) and a list of Subroutine objects contained in the module.

Subroutines and functions are represented by Subroutine objects that in turn have their own docstring (Subroutine.docstring), specification part (Subroutine.spec), execution part (Subroutine.body), and contained subprograms (Subroutine.members).

To map these programming language concepts to source files and ease input or output operations, any number of these container data structures can be classes.

Available container classes

loki.sourcefile.Sourcefile(path[, ir, ast, ...])

Class to handle and manipulate source files, storing Module and Subroutine objects.

loki.module.Module([name, docstring, spec, ...])

Class to handle and manipulate source modules.

loki.subroutine.Subroutine(name[, args, ...])

Class to handle and manipulate a single subroutine.

Control flow tree

Specification and execution parts of (sub)programs and modules are the central components of container data structures. Each of them is represented by a tree of control flow nodes, with a Section as root node. This tree resembles to some extend a hierarchical control flow graph where each node can have control flow and expression nodes as children. Consequently, this separation on node level is reflected in the internal representation, splitting the tree into two levels:

  1. Control flow (e.g., loops, conditionals, assignments, etc.); the corresponding classes are declared in loki.ir and described in this section.

  2. Expressions (e.g., scalar/array variables, literals, operators, etc.); this is based on Pymbolic with encapsulating classes declared in loki.expression.symbols and described below.

The split in IR levels is meant to control the complexity of individual tree traversals, and separate the symbolic expression layer from the more Fortran-specific elements of the IR tree. This loosely follows the principles outlined in Luporini et al..

All control flow nodes implement the common base class Node and can have an arbitrary number of children that are either control flow nodes or expression nodes. Thus, any control flow node looks in principle like the following:

                Node
                / | \
        +------+  |  +---+
       /          |       \
      /           |        \
Expression   Expression   Node   ...

As an example, consider a basic Fortran DO i=1,n loop: it defines a loop variable (i), a loop range (1:n) and a loop body. The body can be one/multiple statements or other control flow structures and therefore is a subtree of control flow nodes. Loop variable and range, however, are expression nodes.

All control flow nodes fall into one of two categories:

  • InternalNode: nodes that have a body and therefore have other control flow nodes as children.

  • LeafNode: nodes that (generally) do not have any other control flow nodes as children.

Note that InternalNode can have other properties than body in which control flow nodes are contained as children (for example, else_body in Conditional). All Node may, however, have one or multiple expression trees as children.

Note

All actual control flow nodes are implementations of one of the two base classes. Two notable exceptions to the above are the following:

  • MultiConditional (for example, Fortran’s SELECT CASE): It has multiple bodies and thus does not fit the above framework. Conceptually, these could be converted into nested Conditional but it would break string reproducibility. For that reason they are retained as a LeafNode for the time being.

  • TypeDef: This defines a new scope for symbols, which does not include symbols from the enclosing scope. Thus, it behaves like a leaf node although it has technically control flow nodes as children. It is therefore also implemented as a LeafNode.

With this separation into two types of nodes, the schematics of the control flow layer of the internal representation are as follows:

                    InternalNode
                         |
                        body
                       /|||\
      +---------------+ /|\ +-------------+
     /          +------+ | +-----+         \
    /          /         |        \         \
LeafNode InternalNode LeafNode LeafNode InternalNode ...
              |                              |
             body                           body
            /    \                         /    \
           /      \                         ....
     LeafNode  InternalNode
                    |
                   ...

Available control flow nodes

Abstract base classes

loki.ir.Node([source, label])

Base class for all node types in Loki's internal representation.

loki.ir.InternalNode([body, source, label])

Internal representation of a control flow node that has a traversable body property.

loki.ir.LeafNode([source, label])

Internal representation of a control flow node without a body.

Internal node classes

loki.ir.Section([body, source, label])

Internal representation of a single code region.

loki.ir.Associate(associations[, body, ...])

Internal representation of a code region in which names are associated with expressions or variables.

loki.ir.Loop(variable, bounds[, body, ...])

Internal representation of a loop with induction variable and range.

loki.ir.WhileLoop(condition[, body, pragma, ...])

Internal representation of a while loop in source code.

loki.ir.Conditional(condition[, body, ...])

Internal representation of a conditional branching construct.

loki.ir.PragmaRegion([body, pragma, ...])

Internal representation of a block of code defined by two matching pragmas.

loki.ir.Interface([body, abstract, spec, ...])

Internal representation of a Fortran interface block.

Leaf node classes

loki.ir.Assignment(lhs, rhs[, ptr, comment, ...])

Internal representation of a variable assignment.

loki.ir.ConditionalAssignment([lhs, ...])

Internal representation of an inline conditional assignment using a ternary operator.

loki.ir.CallStatement(name[, arguments, ...])

Internal representation of a subroutine call.

loki.ir.Allocation(variables[, data_source, ...])

Internal representation of a variable allocation.

loki.ir.Deallocation(variables[, ...])

Internal representation of a variable deallocation.

loki.ir.Nullify(variables[, source, label])

Internal representation of a pointer nullification.

loki.ir.Comment(text[, source, label])

Internal representation of a single comment.

loki.ir.CommentBlock(comments[, source, label])

Internal representation of a block comment that is formed from multiple single-line comments.

loki.ir.Pragma(keyword[, content, source, label])

Internal representation of a pragma.

loki.ir.PreprocessorDirective([text, ...])

Internal representation of a preprocessor directive.

loki.ir.Import(module[, symbols, nature, ...])

Internal representation of an import.

loki.ir.VariableDeclaration(symbols[, ...])

Internal representation of a variable declaration.

loki.ir.ProcedureDeclaration(symbols[, ...])

Internal representation of a procedure declaration.

loki.ir.DataDeclaration(variable, values[, ...])

Internal representation of a DATA declaration for explicit array value lists.

loki.ir.StatementFunction(variable, ...[, ...])

Internal representation of Fortran statement function statements

loki.ir.TypeDef([name, body, abstract, ...])

Internal representation of a derived type definition.

loki.ir.MultiConditional(expr, values, ...)

Internal representation of a multi-value conditional (eg.

loki.ir.MaskedStatement(conditions, bodies)

Internal representation of a masked array assignment (WHERE clause).

loki.ir.Intrinsic(text[, source, label])

Catch-all generic node for corner-cases.

loki.ir.Enumeration(symbols[, source, label])

Internal representation of an ENUM

Expression tree

Many control flow nodes contain one or multiple expressions, such as the right-hand side of an assignment (loki.ir.Assignment.rhs) or the condition of an IF statement (loki.ir.Conditional.condition). Such expressions are represented by expression trees, comprising a single node (e.g., the left-hand side of an assignment may be just a scalar variable) or a large expression tree consisting of multiple nested sub-expressions.

Loki’s expression representation is based on Pymbolic but encapsulates all classes with bespoke own implementations. This allows to enrich expression nodes by attaching custom metadata, implementing bespoke comparison operators, or store type information.

The base class for all expression nodes is pymbolic.primitives.Expression.

Available expression tree nodes

Typed symbol nodes

loki.expression.symbols.TypedSymbol(*args, ...)

Base class for symbols that carry type information.

loki.expression.symbols.Variable(**kwargs)

Factory class for TypedSymbol or MetaSymbol classes

loki.expression.symbols.DeferredTypeSymbol(name)

Internal representation of symbols with deferred type

loki.expression.symbols.Scalar(name[, ...])

Expression node for scalar variables.

loki.expression.symbols.Array(name[, scope, ...])

Expression node for array variables.

loki.expression.symbols.ProcedureSymbol(name)

Internal representation of a symbol that represents a callable subroutine or function

Literals

loki.expression.symbols.Literal(value, **kwargs)

Factory class to instantiate the best-matching literal node.

loki.expression.symbols.FloatLiteral(value, ...)

A floating point constant in an expression.

loki.expression.symbols.IntLiteral(value, ...)

An integer constant in an expression.

loki.expression.symbols.LogicLiteral(value, ...)

A boolean constant in an expression.

loki.expression.symbols.StringLiteral(value, ...)

A string constant in an expression.

loki.expression.symbols.IntrinsicLiteral(...)

Any literal not represented by a dedicated class.

loki.expression.symbols.LiteralList(values)

A list of constant literals, e.g., as used in Array Initialization Lists.

Mix-ins

loki.expression.symbols.StrCompareMixin()

Mixin to enable comparing expressions to strings.

Expression modules

loki.expression.expr_visitors

Visitor classes for traversing and transforming all expression trees in Loki’s internal representation (IR).

loki.expression.mappers

Mappers for traversing and transforming the Expression tree.

loki.expression.operations

Sub-classes of Pymbolic's native operations that allow us to express niche things like mathematically irrelevant parenthesis that nevertheless change code results.

loki.expression.symbolic

loki.expression.symbols

Expression tree node classes for Expression tree.

Type information and scopes

Every symbol in an expressions tree (TypedSymbol, such as Scalar, Array, ProcedureSymbol) has a type (represented by a DataType) and, possibly, other attributes associated with it. Type and attributes are stored together in a SymbolAttributes object, which is essentially a dict.

Note

Example: An array variable VAR may be declared in Fortran as a subroutine argument in the following way:

INTEGER(4), INTENT(INOUT) :: VAR(10)

This variable has type BasicType.INTEGER and the following additional attributes:

  • KIND=4

  • INTENT=INOUT

  • SHAPE=(10,)

The corresponding SymbolAttributes object can be created as

SymbolAttributes(BasicType.INTEGER, kind=Literal(4), intent='inout', shape=(Literal(10),))

If the variable object is associated with a Scope, then its SymbolAttributes object is stored in the relevant SymbolTable. From there, all expression nodes that represent use of the associated symbol (i.e., the variable object and any others with the same name) query the type information from there. This means, changing the declared attributes of a symbol applies this change for all instances of this symbol.

If the variable is not associated with a Scope, then its SymbolAttributes object is stored locally and not shared by any other variable objects.

Warning

Loki allows to apply changes very freely, which means changing symbol attributes can lead to invalid states.

For example, removing the shape property from the SymbolAttributes object in a symbol table converts the corresponding Array to a Scalar variable. But at this point all expression tree nodes will still be Array, possibly also with subscript operations (represented by the dimensions property).

For plain Array nodes (without subscript), rebuilding the IR will automatically take care of instantiating these objects as Scalar but removing dimensions properties must be done explicitly.

Every object that defines a new scope (e.g., Subroutine, Module, implementing Scope) has an associated symbol table (SymbolTable). The SymbolAttributes of a symbol declared or imported in a scope are stored in the symbol table of that scope. These symbol tables/scopes are organized in a hierarchical fashion, i.e., they are aware of their enclosing scope and allow to recursively look-up entries.

The overall schematics of the scope and type representation are depicted in the following diagram:

   Subroutine | Module | TypeDef | ...
           \      |      /
            \     |     /   <is>
             \    |    /
                Scope
                  |
                  | <has>
                  |
             SymbolTable  - - - - - - - - - - - - TypedSymbol
                  |
                  |  <has entries>
                  |
           SymbolAttributes
        /     |       |      \
       /      |       |       \  <has properties>
      /       |       |        \
DataType | (kind) | (intent) | (...)

Available data types

The DataType of a symbol can be one of

  • BasicType: intrinsic types, such as INTEGER, REAL, etc.

  • DerivedType: derived types defined somewhere

  • ProcedureType: any subroutines or functions declared or imported

Note that this is different from the understanding of types in the Fortran standard, where only intrinsic types and derived types are considered a type. Treating also procedures as types allows us to treat them uniformly when considering external subprograms, procedure pointers and type bound procedures.

BasicType | DerivedType | ProcedureType
         \       |       /
          \      |      /    <implements>
           \     |     /
              DataType

Derived types

Derived type definitions (via TypeDef) create entries in the scope’s symbol table in which they are defined to make the type definition available to declarations.

Imports and deferred type

For imported symbols (via Import) the source module may not be available and thus no information about the symbol. This is indicated by BasicType.DEFERRED. This is also applied to any variable that is instantiated without providing a type and where no type information can be found in the scope’s symbol table (either because no information has been provided previously or because no scope is attached).

loki.scope.Scope([parent])

Scoping object that manages type caching and derivation for typed symbols.

loki.scope.SymbolTable(*args[, case_sensitive])

Lookup table for symbol types that maps symbol names to SymbolAttributes

loki.types.SymbolAttributes(dtype, **kwargs)

Representation of a symbol's attributes, such as data type and declared properties

loki.types.DataType()

Base class for data types a symbol may have

loki.types.BasicType(value)

Representation of intrinsic data types, names taken from the FORTRAN convention.

loki.types.DerivedType([name, typedef])

Representation of derived data types that may have an associated TypeDef

loki.types.ProcedureType([name, ...])

Representation of a function or subroutine type definition.