Adding a new step to iota2
iota2 is based on a sequence of steps. As example, if the purpose is to achieve a supervised classification, then steps could roughly be : “prepare data”, “learn models”, “classify” then “validate classifications”. Steps define iota2’s granularity. Each step must have a well defined goal.
In the previous example, “prepare data” is too abstact to define an entire step. We could split it until getting a coherent purpose / goal / aim by step. “prepare data” could become “prepare raster data” and “prepare input vector data”
Steps definition
The pupose of a step is to achieve something. In most cases in iota2, a given step
has to be reproduced many times: on a set of tiles or by dates etc.
In order to provide developers with a generic workflow, steps inherit from Step
. The goal is to transform the call of a function into the homogeneous API where some parameters have been fixed and others can vary over a range (tile list, dates, etc.).
In fact every step is a class, derived from the base class Step.IOTA2Step.Step
.
As step_execute
is the method dedicated to input step’s parameters, step_execute
is the method which contains the lambda definition. Every step class must
contain the following :
class MyNewStep(IOTA2Step.Step):
def __init__():
"""
"""
pass
def step_inputs(self):
"""
...
"""
return [1, 2, 3]
def step_execute(self):
"""
Return
------
lambda
the function to execute as a lambda function.
"""
from MyLib import MyFunction
step_function = lambda a: MyFunction(1, a)
return step_function
Existing steps
You can find all iota2’s steps in the Steps directory.
Define a new step and add it to the iota2 workflow
This section will describe how to build a new iota2 step from scratch and how to enable it in iota2.
Define a new step
#!/usr/bin/python
#-*- coding: utf-8 -*-
import IOTA2Step
def awesome_function(arg1, arg2):
"""
"""
print arg1
class MyStep(IOTA2Step.Step):
def __init__(self, cfg, cfg_resources_file, workingDirectory=None):
# heritage init
super(MyStep, self).__init__(cfg, cfg_resources_file)
def step_description(self):
"""
function used to print a short description of the step's purpose
"""
return "This step will print something"
def step_inputs(self):
"""
Return
------
the return can be and iterable or a callable
"""
return range(1, 10)
def step_execute(self, workingDirectory=None):
"""
Return
------
lambda
the function to execute as a lambda function.
"""
step_function = lambda x: awesome_function(x, "Tile")
return step_function
def step_outputs(self):
"""
function called once the step is finished. This is the place to do some
clean-up, raise exceptions...
"""
pass
Note
The base class constructor must contain three arguments :
- cfg
absolute path to a iota2 configuration file
- cfg_resources_file
absolute path to a configuration file dedicated to resources consumption. It can be set to
None
- workingDirectory
absolute path to a working directory which will contain all temporary files during the whole step.
Add it to the step’s sequence
A iota2 step sequence is done using builders. Depending on our aim, two possibilities are given, update an existing builder or creating a new one.
To enable the step in an existing iota2 builder, append it to the step sequence list s_container
.
Note
You can insert the new step at the beginning, at the end or between two existing steps
def build_steps(self, cfg, config_ressources=None):
...
from Steps import (...
MyStep
)
...
step_print = MyStep.MyStep(cfg,
config_ressources,
self.workingDirectory)
...
# mosaic step
s_container.append(step_mosaic, "mosaic")
# append between two steps
s_container.append(step_print, "mosaic")
# validation steps
s_container.append(step_confusions_cmd, "validation")
...
Note
The append
method needs two arguments, a step and the step group it belongs to.
Available groups are stored in self.steps_group
class attributes.
It allows iota2 to restart from a given group until a another group is reached.
About resources
As iota2 is composed of stages, it is convenient to be able of allow resources consumption limits by steps (CPU, RAM). This feature is very useful in HPC usage, where resources consumption is a hard constraint.
Furthermore, many OTB application have a parameter called ram which
defines the pipeline size. Therefore, getting these parameters through
the base class attribute resource
is interesting.
Reminder
output’s name
The functions in a step can be launched in parallel using MPI or dask and a master / worker behaviour. It is important to
think about temporary files name written of disk
. If some workers try to write the same output, issues could appear : files could be corrupted or containing outliers values…Please name temporary file as specific to a processing as possible.
workingDirectory usage
The parameter
working_directory
is used to access an additional temporary directory. The purpose of this directory is to store additional data and to benefit from better posting performance when using an HPC. The last task of a step should be the copying of the data needed for the workflow to the output directories. Theworking_directory
should not be used to communicate between steps.