Adding a new step to iota2

iota2 is based on a sequence of steps. As example, if the purpose is to achieve a supervised classification, then steps could roughly be : “prepare data”, “learn models”, “classify” then “validate classifications”. Steps define iota2’s granularity. Each step must have a well defined goal.

In the previous example, “prepare data” is too abstact to define an entire step. We could split it until getting a coherent purpose / goal / aim by step. “prepare data” could become “prepare raster data” and “prepare input vector data”

Steps definition

The pupose of a step is to achieve something. In most cases in iota2, a given step has to be reproduced many times: on a set of tiles or by dates etc. In order to provide developers with a generic workflow, steps inherit from Step. The goal is to transform the call of a function into the homogeneous API where some parameters have been fixed and others can vary over a range (tile list, dates, etc.).

In fact every step is a class, derived from the base class Step.IOTA2Step.Step. As step_execute is the method dedicated to input step’s parameters, step_execute is the method which contains the lambda definition. Every step class must contain the following :

class MyNewStep(IOTA2Step.Step):
    def __init__():
        """
        """
        pass

    def step_inputs(self):
        """
        ...
        """
        return [1, 2, 3]

    def step_execute(self):
    """
    Return
    ------
    lambda
        the function to execute as a lambda function.
    """
    from MyLib import MyFunction
    step_function = lambda a: MyFunction(1, a)
    return step_function

Existing steps

You can find all iota2’s steps in the Steps directory.

Define a new step and add it to the iota2 workflow

This section will describe how to build a new iota2 step from scratch and how to enable it in iota2.

Define a new step

#!/usr/bin/python
#-*- coding: utf-8 -*-

import IOTA2Step

def awesome_function(arg1, arg2):
   """
   """
   print arg1

class MyStep(IOTA2Step.Step):
    def __init__(self, cfg, cfg_resources_file, workingDirectory=None):
        # heritage init
        super(MyStep, self).__init__(cfg, cfg_resources_file)

    def step_description(self):
        """
        function used to print a short description of the step's purpose
        """
        return "This step will print something"

    def step_inputs(self):
        """
        Return
        ------
            the return can be and iterable or a callable
        """
        return range(1, 10)

    def step_execute(self, workingDirectory=None):
        """
        Return
        ------
        lambda
            the function to execute as a lambda function.
        """
        step_function = lambda x: awesome_function(x, "Tile")
        return step_function

    def step_outputs(self):
        """
        function called once the step is finished. This is the place to do some
        clean-up, raise exceptions...
        """
        pass

Note

The base class constructor must contain three arguments :

  • cfg

    absolute path to a iota2 configuration file

  • cfg_resources_file

    absolute path to a configuration file dedicated to resources consumption. It can be set to None

  • workingDirectory

    absolute path to a working directory which will contain all temporary files during the whole step.

Add it to the step’s sequence

A iota2 step sequence is done using builders. Depending on our aim, two possibilities are given, update an existing builder or creating a new one.

To enable the step in an existing iota2 builder, append it to the step sequence list s_container.

Note

You can insert the new step at the beginning, at the end or between two existing steps

def build_steps(self, cfg, config_ressources=None):
    ...
    from Steps import (...
                      MyStep
                      )
    ...
    step_print = MyStep.MyStep(cfg,
                               config_ressources,
                               self.workingDirectory)
    ...
    # mosaic step
    s_container.append(step_mosaic, "mosaic")

    # append between two steps
    s_container.append(step_print, "mosaic")

    # validation steps
    s_container.append(step_confusions_cmd, "validation")
    ...

Note

The append method needs two arguments, a step and the step group it belongs to.

Available groups are stored in self.steps_group class attributes. It allows iota2 to restart from a given group until a another group is reached.

About resources

As iota2 is composed of stages, it is convenient to be able of allow resources consumption limits by steps (CPU, RAM). This feature is very useful in HPC usage, where resources consumption is a hard constraint.

Furthermore, many OTB application have a parameter called ram which defines the pipeline size. Therefore, getting these parameters through the base class attribute resource is interesting.

Reminder

  • output’s name

    The functions in a step can be launched in parallel using MPI or dask and a master / worker behaviour. It is important to think about temporary files name written of disk. If some workers try to write the same output, issues could appear : files could be corrupted or containing outliers values…

    Please name temporary file as specific to a processing as possible.

  • workingDirectory usage

    The parameter working_directory is used to access an additional temporary directory. The purpose of this directory is to store additional data and to benefit from better posting performance when using an HPC. The last task of a step should be the copying of the data needed for the workflow to the output directories. The working_directory should not be used to communicate between steps.