Adding a new step to iota2
iota2 is based on a sequence of steps. As example, if the purpose is to achieve a supervised classification, then steps could roughly be : “prepare data”, “learn models”, “classify” then “validate classifications”. Steps define iota2’s granularity. Each step must have a well defined goal.
In the previous example, “prepare data” is too abstact to define an entire step. We could split it until getting a coherent purpose / goal / aim by step. “prepare data” could become “prepare raster data” and “prepare input vector data”
The pupose of a step is to achieve something. In most cases in iota2, a given step
has to be reproduced many times: on a set of tiles or by dates etc.
In order to provide developers with a generic workflow, steps inherit from
Step. The goal is to transform the call of a function into the homogeneous API where some parameters have been fixed and others can vary over a range (tile list, dates, etc.).
In fact every step is a class, derived from the base class
step_execute is the method dedicated to input step’s parameters,
is the method which contains the lambda definition. Every step class must
contain the following :
class MyNewStep(IOTA2Step.Step): def __init__(): """ """ pass def step_inputs(self): """ ... """ return [1, 2, 3] def step_execute(self): """ Return ------ lambda the function to execute as a lambda function. """ from MyLib import MyFunction step_function = lambda a: MyFunction(1, a) return step_function
You can find all iota2’s steps in the Steps directory.
Define a new step and add it to the iota2 workflow
This section will describe how to build a new iota2 step from scratch and how to enable it in iota2.
Define a new step
#!/usr/bin/python #-*- coding: utf-8 -*- import IOTA2Step def awesome_function(arg1, arg2): """ """ print arg1 class MyStep(IOTA2Step.Step): def __init__(self, cfg, cfg_resources_file, workingDirectory=None): # heritage init super(MyStep, self).__init__(cfg, cfg_resources_file) def step_description(self): """ function used to print a short description of the step's purpose """ return "This step will print something" def step_inputs(self): """ Return ------ the return can be and iterable or a callable """ return range(1, 10) def step_execute(self, workingDirectory=None): """ Return ------ lambda the function to execute as a lambda function. """ step_function = lambda x: awesome_function(x, "Tile") return step_function def step_outputs(self): """ function called once the step is finished. This is the place to do some clean-up, raise exceptions... """ pass
The base class constructor must contain three arguments :
absolute path to a iota2 configuration file
absolute path to a configuration file dedicated to resources consumption. It can be set to
absolute path to a working directory which will contain all temporary files during the whole step.
Add it to the step’s sequence
A iota2 step sequence is done using builders. Depending on our aim, two possibilities are given, update an existing builder or creating a new one.
To enable the step in an existing iota2 builder, append it to the step sequence list
You can insert the new step at the beginning, at the end or between two existing steps
def build_steps(self, cfg, config_ressources=None): ... from Steps import (... MyStep ) ... step_print = MyStep.MyStep(cfg, config_ressources, self.workingDirectory) ... # mosaic step s_container.append(step_mosaic, "mosaic") # append between two steps s_container.append(step_print, "mosaic") # validation steps s_container.append(step_confusions_cmd, "validation") ...
append method needs two arguments, a step and the step group it belongs to.
Available groups are stored in
self.steps_group class attributes.
It allows iota2 to restart from a given group until a another group is reached.
As iota2 is composed of stages, it is convenient to be able of allow resources consumption limits by steps (CPU, RAM). This feature is very useful in HPC usage, where resources consumption is a hard constraint.
Furthermore, many OTB application have a parameter called ram which
defines the pipeline size. Therefore, getting these parameters through
the base class attribute
resource is interesting.
The functions in a step can be launched in parallel using MPI or dask and a master / worker behaviour. It is important to
think about temporary files name written of disk. If some workers try to write the same output, issues could appear : files could be corrupted or containing outliers values…
Please name temporary file as specific to a processing as possible.
working_directoryis used to access an additional temporary directory. The purpose of this directory is to store additional data and to benefit from better posting performance when using an HPC. The last task of a step should be the copying of the data needed for the workflow to the output directories. The
working_directoryshould not be used to communicate between steps.