Deep Learning in iota2

Among the list of possible classification algorithms, iota2 also offers the possibility to use deep neural networks. To date, only networks that work on pixel time series can be used (i.e. no spatial/2D convolution). This documentation summarizes the parameters available to users and their meaning through examples. It also discusses the chain outputs and the development choices that have been made.

Parameters involved

All the parameters below must be inside the deep_learning_parameters section, which is itself inside the arg_train section of the iota2 configuration file.


Once the parameter deep_learning_parameters.dl_name is provided, iota2 will try to use the deepLearning workflow

# usual iota2 parameters

# place here deep learning algorithm parameters


Default Value






Available neural network’s architecture (class name), currently : ‘LTAEClassifier’, ‘ANN’, ‘MLPClassifier’ or ‘SimpleSelfAttentionClassifier’


True when using neural networks




Set of key/value to create the neural network instance (constructor parameters).






Select the model which maximizes one of these metrics computed on the validation set during the training process: “loss”, “fscore”, “oa”, “kappa”






number of epochs for the learning stage






apply weights to samples according to the proportion of each class in the computation of the loss function






how many sub-processes to use for data loading. 0 means that the data will be loaded in the main process.





{“batch_size”: [1000], “learning_rate”: [0.00001]}

key/value of hyperparameters to use to build models






path to a user python module containing custom neural networks






if existing, restart learning point from the checkpoint






during the learning stage, load the full data-set into memory (‘full’) or by batch (‘stream’)






flag to enable early stop during learning phase






epoch number after which the monitoring of the metric trend starts






number of epochs without improvement after which training will be stopped






minimum change in the monitored quantity to qualify as an improvement. If metric is ‘train_loss’ or ‘valid_loss’ then tol must be in dB as \(dB = \log_{10}(\frac{loss_{N-1}}{loss_{N}})\) with N the current epoch.






metric to monitor for early stopping






percentage ]0;1] of samples to use from the incoming database to compute quantiles






allow the use of adaptive learning rate across epochs






Neural network architectures available in iota2 are defined in the python module


During the learning step, several metrics can be computed on a validation set to evaluate the model. The optimized loss metric quantifies the fit of the model on the training sample, but iota2 also computes metrics such as the OA, Kappa and F1-score on the validation sample.

For each epoch, models maximizing each of these metrics are saved. When the learning phase ends, iota2 will use for the inference the model that maximizes the metric chosen by the user.


Weights can be assigned to samples w.r.t their class membership when computing the loss function during the learning step. These weights are computed using only the training + validation database and correspond to the inverse of their proportion in the database. For example, if the database contains 2 classes, 1 and 2, 80% of the samples belonging to class 1 and 20% to class 2. The weights will then be 1.25 for samples from class 1 (1 / 0.8) and 5 for samples from class 2 (1 / 0.2).


During the learning phase, the model is optimized iteratively using stochastic gradient descent. For each epoch, the model is optimized with a subset of the database (i.e., batches). The number of workers corresponds to the number of tasks that prepare in parallel the batched data. Each worker will provide the data it has collected to the model and then check another batch of data until the database has been fully read. Therefore, the more workers there are available to read the batches, the faster the model will be optimized. However, it is the user responsibility to set the number of workers accordingly to the amount of available RAM.


Hyperparameters are parameters that influence the learning process but cannot be learned. In iota2, it possible to test various values within the same run for 2 hyperparameters. This is done via a dictionary which contains 2 keys (“batch_size” and “learning_rate”) and values to be used as a list. The product of the lists will constitute the number of models to be learned and then the best of them will be used for inference (cf the model_selection_criterion parameter).

For example, if the configuration file contains :

# usual iota2 parameters

hyperparameters_solver : {"batch_size" : [1000],
                    "learning_rate" : [0.1, 0.00001]}

Then two models will be trained (in parallel if possible) one with a batch’s size of 1000 and a learning rate of 0.1; and an other one with the same batch size but with a learning rate of 0.00001.


Users can define their own neural network via this parameter which should point to a user provided python module. However, the neural network must be defined as a class derived from Iota2NeuralNetwork available in the module

Currently, iota2 can only perform pixel-wise operations, since the input to the model are the spectro-temporal features for pixels. Convolutional layers can be used in the spectral and temporal dimensions, but not in the spatial dimension.


The learning phase can be quite long. If for some reason the learning stops, everything that has been learned is lost. However iota2 integrates the possibility to restart the learning step from the last learned epoch, with a backup of the model state being made at each epoch.


Adaptive learning rate allows the use of ReduceLROnPlateau from Pytorch which reduces the learning rate when a metric has stopped improving. The metric monitored in iota2 is the validation loss.

adaptive_lr parameter will receive a dictionary where keys are ReduceLROnPlateau parameter’s name and the value is the value of the parameter.

ie, the configuration file can contain:

# usual iota2 parameters

adaptive_lr : {patience: 10,
                 factor: 0.1,
                 threshold: 1e-4,
                 threshold_mode: 'rel',
                 cooldown: 0,
                 min_lr: 0,
                 eps: 1e-8,
                 verbose: False}# which are default values for all parameter's key

Parameters mode and optimizer can not be set by users. mode is forced to min and the optimizer will be the one used during the training step. By default, adaptive_lr is {} and therefore no adapatative learning rate will be used.

Expected output descriptions


One model per hyperparameter pairs

Each pair of hyperparameters produces a model file in the model output directory. For example, for the first possible pair of hyperparameters, the file model_1_seed_0_hyp_0.txt will be produced. The name of the file is defined as:

  • model_1 : the model for the region 1

  • seed_0 : for the first random split

  • hyp_0 : for the first hyperparameter pair

Selected model

Once all the models per pair of hyperparameters are learned, the one that provides the best result according to the selection criterion (cf model_selection_criterion) is selected. The result is stored on disk under a serialized object using pickle: model_1_seed_0.txt. It will be used later in the inference phase.


After each epoch, all the information needed to restart from the current epoch is stored in a serialized object in the model directory in a file. For example, for the model model_1_seed_0_hyp_0.txt the equivalent checkpoint file would be model_1_seed_0_hyp_0_checkpoint.txt.


To visualize the evolution of the model loss function, 2 figures are generated for each model, next to the models. For example for the model model_1_seed_0_hyp_0.txt

  • model_1_seed_0_hyp_0_loss.png : shows the evolution of the learning and validation loss (dB) over the epochs

  • model_1_seed_0_hyp_0_confusion_metrics.png : shows the evolution of the Kappa, OA, Precision and the Recall over the epochs.


The classifications maps are stored in a conventional tif format in the classif output directory. First, chunks of tiles are classified, then all these pieces will be merged to form one tile.

iota2 internal choices


We have decided to use pytorch to implement deep neural networks in iota2.

Classification by chunks

As mentioned above, deep learning classifications are done in chunks, to fit RAM constraints. In this workflow, iota2 works with numpy arrays that need to be stored temporarily in RAM, but few machines have enough RAM to hold a whole Sentinel-2 tile with many acquisition dates at the same time. This is why we work in chunks to make the predictions and then merge these predictions.

The size of the chunks can be set via the number_of_chunks parameters in the parameter block python_data_managing

The shape of the tensor of data

Every model will be fed with a tensor of data shaped as (batch_size, nb_dates, nb_bands)

Learning vs validation vs test

In a conventional deep learning approach, the initial database is split into 3 distinct data-sets:

  • Learning: which is used to train the model (let’s call it L)

  • Validation : which allows, during the learning process, to observe the behavior of the model (convergence, over-fitting etc.), let’s call this V. These observations may, for example, allow a readjustment of some hyperparameters

  • Test : allows the performance of the model to be validated on a larger database than the validation database, let’s call this T.

How are the samples distributed in these three databases?

Initially the parameter ratio allows us to build a database that will contain L + V on one side and T on the other side. Then at the time of training, 80% of the L + V is used to build the model (L) and 20% to build the validation database (V).

For example, if the configuration file contains :

ratio : 0.7

Then 30% of the database will go into the test database and 70% will be set aside to build the learning and validation databases. Then 80% of the 70% will be used to build the training database and 20% of the 70% for validation. In iota2, these splits are made ‘polygon wise’ i.e. a polygon is placed in one of the databases in its entirety and cannot be found in another database (unless the class is represented by a single polygon).

Inheritance of iota2 neural network

As already explained for the parameter dl_module, all classes implementing neural networks in iota2 must derive from the class Iota2NeuralNetwork defined in the module This class allows insertion into the iota2 workflow.

Database format

The format of the input database used in the learning phase is the NETCDF format. This database is stored in the learningSamples directory under the name (for model 1 representing region 1) which contains both the learning and validation database. However, the user does not need to do anything special, since this format is generated by iota2 from the user-provided reference data which is the same as for other classifiers.


GPUs are automatically detected by Pytorch. When a GPU is detected, learning and inference will use it.

How to use GPUs ?

As mentioned before if a GPUs is detected, computations and data will be transferred to it. However, with the scheduler_type set to PBS the task will spawn on a specific dedicated node means that iota2 must allocate a GPU before to send tasks to it. This allocation can be done by specifying a dedicated queue and nb_gpu in the step resources block as the following

training : {
              nb_gpu:1 # number of GPUs to use
              queue:"qgpgpu" # queue containing GPUs

Dataloader per batch vs full memory

Loading the full learning/validation data-set into RAM may significantly decrease the learning time. However, this is not always possible depending on the amount of RAM available on the processing unit. In this case the stream mode must be used. In this mode the database will be read in batch-sized chunks.

Managing randomness during the learning step

Random mixing of data in batches between epochs is crucial to obtain an optimal stochastic gradient descent. In iota2, randomness is managed at several levels

  • When splitting samples in the Learning, Validation and Test databases. These distributions are made ‘polygon wise’.

  • When allocating the content of the data batches to feed the model.

  • The order of the batches.

  • Moreover, at each epoch, the content of each batch is again randomly distributed.

Cost function and gradient optimizer

Currently, users can’t choose them.

Using statistics to alter incoming data

All neural network instances have a _stats attribute which provides statistics for each sensor encountered, for instance considering the Sentinel2 data provided by THEIA :

self._stats = {'sentinel2': {'min': tensor([-0.0100, ..., 0.0000]),
                            'max': tensor([-0.0100, ..., 0.0000,]),
                            'mean': tensor([-0.01, ..., 0.]),
                            'var': tensor([-0.01, ..., 0.]),
                            'quantile_0.1': tensor([-0.01, ..., 0.)],
                            'quantile_0.5': tensor([-0.01, ..., 0.]),
                                                        'quantile_0.95': tensor([-0.01, ..., 0.])}}

Statistics are shaped as (nb component * nb_dates) and chronologically sorted. For instance if we consider 2 Sentinel-2 acquisitions d1 and d2 and all bands available in iota2 (b2, b3, b4, b5, b6, b7, b8, b8A, b11 and b12) then one stat vector can be

self._stats = {'sentinel2': {'min': tensor([d1_b1, ..., d1_b12, d2_b1, d2_b12]),...}}

Available sensors in self._stats are sentinel1_desvv, sentinel1_desvh, sentinel1_ascvv, sentinel1_ascvh, sentinel2, sentinel2s2c, sentinel2l3a, landsat8, landsat8old and landsat5old. Keys for stats are the ones already presented : min, max, mean, var, quantile_0.1, quantile_0.5 (median) and quantile_0.95. The statistics are automatically computed except for the quantiles which are only computed if the parameter additional_statistics_percentage is set to a value different from None. It is possible to use these statistics to scale data in the forward method. Iota2 provides the method self.standardize(x, mean, std, self.default_mean, self.default_std) where x is the input data and where mean and std are the empirical mean and std values for each feature.

In some cases, data may contain NaN values. These specific values can cause the neural network to crash. Iota2 offers the possibility to impute such values using self.nans_imputation(x, mean, self.default_mean). The method replaces NaNs (in x) with consistent values (ie: using the empirical mean value of the corresponding feature). Please note that the x shape is (batch_size, nb_dates, nb_components) and mean shape is (nb_dates * nb_components).

In some conditions, empirical statistic values can also contain NaNs values. That’s why self.nans_imputation() and self.standardize() accepts default values to replace NaNs in the statistics vector.

The following code block shows how to perform imputations and standardization

def _forward(self, x):

    mean = self._stats["sentinel2"]["mean"]
    x = self.nans_imputation(x, mean, self.default_mean)
    std = torch.sqrt(self._stats["sentinel2"]["var"])
    x = self.standardize(x, mean, std, self.default_mean, self.default_std)

    x = F.relu(x)
    x = F.relu(self.bnhidden2(self.hidden1(x)))
    x = F.relu(self.bnout(self.hidden2(x)))
    x = self.output(x)
    return x

About neural network definition

The neural model and iota2 interact at two points: when the model is instantiated and when iota2 feeds the model with data. These two interactions involve calling the __init__() method and the forward method of the neural network object respectively. In order for these interactions to proceed correctly, a formalism must be defined. This section details the expected signatures for these 2 methods.

Constructor parameters

Iota2 will automatically pass a series of parameters to the networks when they are instantiated, so they must exist in the constructor. As a minimum, all neural networks must have 5 parameters in their initialisation method: nb_class, sensors_information, doy_sensors_dic, default_std and default_mean as follows :

class MyNeuralNetwork(Iota2NeuralNetwork):
"""MyNeuralNetwork class definition."""

    def __init__(
        nb_class: int,
        sensors_information: dict,
        doy_sensors_dic: Optional[dict] = None,
        default_std: int = 1,
        default_mean: int = 0,


Every neural network must inherit from Iota2NeuralNetwork base class.


is an integer representing the number of class.


is a dictionnary where each key is a sensor’s name and the value is a dictionnary of two keys nb_components and nb_dates which allow users to know how many features per dates exists (total number of sensor’s features = nb_components x nb_dates).


is an OrderedDict where each key is a sensor’s name and the value is a dictionnary of two keys doy and features_per_dates. features_per_dates is currently redondant with the nb_components keys of sensors_information dictionnary.

default_std and default_mean

Are used as substituion values in standardization computation : x = (features - mean) / std if mean or std are NaNs.

Forward definition

It is via the forward method that iota2 will provide the network with data. The choice has been made to separate the data by sensor, so the forward method will have to contain a dedicated parameter for each activated sensor. A possible definition of the forward method could be as follows, if all the sensors in iota2 are used.

class MyNeuralNetwork(Iota2NeuralNetwork):
"""MyNeuralNetwork class definition."""

    def forward(

It is also possible to feed the forward with exogenous data already written to disk userfeatures or with external features calculated in a python module. In the case of userfeatures, a new parameter per userfeature must be added to the forward. External features are more versatile, allowing you to add primitives to an existing sensor, or create new temporal or non-temporal primitives. These different possibilities are illustrated in the next section using examples.

For the examples, we will use Sentinel-2 data, 13 features (10 spectral bands + NDVI + NDWI + Brightness) and 3 dates.


Let’s start with a classic example: using Sentinel-2 data alone: 13 primitives and 3 dates.

The forward can simply be written as

class MyNeuralNetwork(Iota2NeuralNetwork):
"""MyNeuralNetwork class definition."""
    def forward(

In this case, the sentinel2 tensor will have the shape (B, 3, 13) where B is the size of the data batch.

Combining Sentinel-2 and two user features

Here, we want to use Sentinel-2 and exogenous data already written to disk, for example a DEM and temperature data. We will therefore use the user_feat_path field and the userFeat section of the configuration file as follows

s2_path : '/path/to/my_tiled/s2_data'
user_feat_path : '/path/to/my_tiled/exogenous_data

The dem is a raster file written to disk and contains 3 bands, the temperature data is also a raster on disk but with a single band. The forward should then have the following definition:

def forward(


the names of the dem and temperature parameters of the forward method must correspond to the patterns fields in the userFeat section

In this case, the tensor sentinel2 will always have the shape (B, 3, 13), the tensor dem will have the shape (B, 3) and temperature (B, 1).


Tensors can have a different number of dimensions, the time dimension can disappear.

Using external features

External features can be used to create new features, which can be concatenated with an existing sensor (if temporal), be temporal or non-temporal.

If, in addition to Sentinel-2 data (over 3 dates), the user uses 3 python functions to create new primitives as follows

from iota2.learning.utils import I2Label, I2TemporalLabel

def new_s2_features(self):
    coef = self.get_interpolated_Sentinel2_B3() ** 2
    labels = [I2TemporalLabel(sensor_name="sentinel2", feat_name="pow", date=date)
              for date in self.interpolated_dates["Sentinel2"]]
    return coef, labels

def new_temporal_features(self):
    coef = self.get_interpolated_Sentinel2_B2() + self.get_interpolated_Sentinel2_B3()
    labels = [I2TemporalLabel(sensor_name="myfeatures", feat_name="add", date=date)
              for date in self.interpolated_dates["Sentinel2"]]
    return coef, labels

def new_feature(self):
    coef = self.get_interpolated_Sentinel2_B2()[:, :, 0:1] # get the two first date of Sentinel-2 B2.
    labels = [I2Label(sensor_name="newfeature", feat_name="b2date1"),
              I2Label(sensor_name="newfeature", feat_name="b2date2")]
    return coef, labels

then the forward method must have the signature

def forward(

sentinel2 tensor will have the shape (B, 3, 14), myfeatures the shape (B, 3, 1) and newfeature (B, 2).


sentinel2 tensor get an extra features because the function new_s2_features add the feature pow to the sensor sentinel2 for every sentinel2 dates.