.. _new-steps: Adding a new step to iota2 ########################## iota2 is based on a sequence of steps. As example, if the purpose is to achieve a supervised classification, then steps could roughly be : "prepare data", "learn models", "classify" then "validate classifications". Steps define iota2's granularity. Each step must have a well defined goal. In the previous example, "prepare data" is too abstact to define an entire step. We could split it until getting a coherent purpose / goal / aim by step. "prepare data" could become "prepare raster data" and "prepare input vector data" .. _step_def: Steps definition **************** .. currentmodule:: iota2.Steps.Iota2Step The pupose of a step is to achieve something. In most cases in iota2, a given step has to be reproduced many times: on a set of tiles or by dates etc. In order to provide developers with a generic workflow, steps inherit from :class:`Step`. The goal is to transform the call of a function into the homogeneous API where some parameters have been fixed and others can vary over a range (tile list, dates, etc.). In fact every step is a class, derived from the base class ``Step.IOTA2Step.Step``. As ``step_execute`` is the method dedicated to input step's parameters, ``step_execute`` is the method which contains the lambda definition. Every step class must contain the following : .. code-block:: python class MyNewStep(IOTA2Step.Step): def __init__(): """ """ pass def step_inputs(self): """ ... """ return [1, 2, 3] def step_execute(self): """ Return ------ lambda the function to execute as a lambda function. """ from MyLib import MyFunction step_function = lambda a: MyFunction(1, a) return step_function Existing steps ============== You can find all iota2's steps in the Steps directory. Define a new step and add it to the iota2 workflow **************************************************** This section will describe how to build a new iota2 step from scratch and how to enable it in iota2. Define a new step ================= .. code-block:: python #!/usr/bin/python #-*- coding: utf-8 -*- import IOTA2Step def awesome_function(arg1, arg2): """ """ print arg1 class MyStep(IOTA2Step.Step): def __init__(self, cfg, cfg_resources_file, workingDirectory=None): # heritage init super(MyStep, self).__init__(cfg, cfg_resources_file) def step_description(self): """ function used to print a short description of the step's purpose """ return "This step will print something" def step_inputs(self): """ Return ------ the return can be and iterable or a callable """ return range(1, 10) def step_execute(self, workingDirectory=None): """ Return ------ lambda the function to execute as a lambda function. """ step_function = lambda x: awesome_function(x, "Tile") return step_function def step_outputs(self): """ function called once the step is finished. This is the place to do some clean-up, raise exceptions... """ pass .. Note:: The base class constructor must contain three arguments : - cfg absolute path to a iota2 configuration file - cfg_resources_file absolute path to a configuration file dedicated to resources consumption. It can be set to ``None`` - workingDirectory absolute path to a working directory which will contain all temporary files during the whole step. Add it to the step's sequence ============================= A iota2 step sequence is done using :ref:`builders `. Depending on our aim, two possibilities are given, update an existing builder or creating a new one. To enable the step in an existing iota2 builder, append it to the step sequence list ``s_container``. .. Note:: You can insert the new step at the beginning, at the end or between two existing steps .. code-block:: python def build_steps(self, cfg, config_ressources=None): ... from Steps import (... MyStep ) ... step_print = MyStep.MyStep(cfg, config_ressources, self.workingDirectory) ... # mosaic step s_container.append(step_mosaic, "mosaic") # append between two steps s_container.append(step_print, "mosaic") # validation steps s_container.append(step_confusions_cmd, "validation") ... .. Note:: The ``append`` method needs two arguments, a step and the step group it belongs to. Available groups are stored in ``self.steps_group`` class attributes. It allows iota2 to restart from a given group until a another group is reached. About resources =============== As iota2 is composed of stages, it is convenient to be able of allow resources consumption limits by steps (CPU, RAM). This feature is very useful in HPC usage, where resources consumption is a hard constraint. Furthermore, many OTB application have a parameter called `ram` which defines the pipeline size. Therefore, getting these parameters through the base class attribute ``resource`` is interesting. Reminder ******** - output's name The functions in a step can be launched in parallel using MPI or dask and a master / worker behaviour. It is important to ``think about temporary files name written of disk``. If some workers try to write the same output, issues could appear : files could be corrupted or containing outliers values... Please name temporary file as specific to a processing as possible. - workingDirectory usage The parameter ``working_directory`` is used to access an additional temporary directory. The purpose of this directory is to store additional data and to benefit from better posting performance when using an HPC. The last task of a step should be the copying of the data needed for the workflow to the output directories. The ``working_directory`` should not be used to communicate between steps.