Deep Learning in iota2

Among the list of possible classification algorithms, iota2 also offers the possibility to use deep neural networks. To date, only networks that work on pixel time series can be used (i.e. no spatial/2D convolution). This documentation summarizes the parameters available to users and their meaning through examples. It also discusses the chain outputs and the development choices that have been made.

parameters involved

All the parameters below must be inside the deep_learning_parameters section, which is itself inside the arg_train section of the iota2 configuration file.

Note

Once the parameter deep_learning_parameters.dl_name is provided, iota2 will try to use the deepLearning workflow

chain:{
# usual iota2 parameters
}

arg_train:{
deep_learning_parameters:{
...
# place here deep learning algorithm parameters
...
}
}

Name

Default Value

Description

Type

Mandatory

Name

dl_name

Available neural network’s architecture (class name), currently : ‘LTAEClassifier’, ‘ANN’, ‘MLPClassifier’ or ‘SimpleSelfAttentionClassifier’

str

True when using neural networks

dl_name

dl_parameters

{}

Set of key/value to create the neural network instance (constructor parameters).

dict

False

dl_parameters

model_selection_criterion

“loss”

Select the model which maximizes one of these metrics computed on the validation set during the training process: “loss”, “fscore”, “oa”, “kappa”

str

False

model_selection_criterion

epochs

100

number of epochs for the learning stage

int

False

epochs

weighted_labels

False

apply weights to samples according to the proportion of each class in the computation of the loss function

bool

False

weighted_labels

num_workers

1

how many sub-processes to use for data loading. 0 means that the data will be loaded in the main process.

int

False

num_workers

hyperparameters_solver

{“batch_size”: [1000], “learning_rate”: [0.00001]}

key/value of hyperparameters to use to build models

dict

False

hyperparameters_solver

dl_module

None

path to a user python module containing custom neural networks

str

False

dl_module

restart_from_checkpoint

True

if existing, restart learning point from the checkpoint

bool

False

restart_from_checkpoint

dataloader_mode

‘stream’

during the learning stage, load the full data-set into memory (‘full’) or by batch (‘stream’)

str

False

dataloader_mode

enable_early_stop

False

flag to enable early stop during learning phase

bool

False

enable_early_stop

epoch_to_trigger

5

epoch number after which the monitoring of the metric trend starts

int

False

epoch_to_trigger

early_stop_patience

10

number of epochs without improvement after which training will be stopped

int

False

early_stop_patience

early_stop_tol

0.01

minimum change in the monitored quantity to qualify as an improvement. If metric is ‘train_loss’ or ‘valid_loss’ then tol must be in dB as \(dB = \log_{10}(\frac{loss_{N-1}}{loss_{N}})\) with N the current epoch.

float

False

early_stop_tol

early_stop_metric

“val_loss”

metric to monitor for early stopping

str

False

early_stop_metric

additional_statistics_percentage`

None

percentage ]0;1] of samples to use from the incoming database to compute quantiles

float

False

additional_statistics_percentage

adaptive_lr

{}

allow the use of adaptive learning rate across epochs

dict

False

adaptive_lr

Note

dl_name

Neural network architectures available in iota2 are defined in the python module torch_nn_bank.py

model_selection_criterion

During the learning step, several metrics can be computed on a validation set to evaluate the model. The optimized loss metric quantifies the fit of the model on the training sample, but iota2 also computes metrics such as the OA, Kappa and F1-score on the validation sample.

For each epoch, models maximizing each of these metrics are saved. When the learning phase ends, iota2 will use for the inference the model that maximizes the metric chosen by the user.

weighted_labels

Weights can be assigned to samples w.r.t their class membership when computing the loss function during the learning step. These weights are computed using only the training + validation database and correspond to the inverse of their proportion in the database. For example, if the database contains 2 classes, 1 and 2, 80% of the samples belonging to class 1 and 20% to class 2. The weights will then be 1.25 for samples from class 1 (1 / 0.8) and 5 for samples from class 2 (1 / 0.2).

num_workers

During the learning phase, the model is optimized iteratively using stochastic gradient descent. For each epoch, the model is optimized with a subset of the database (i.e., batches). The number of workers corresponds to the number of tasks that prepare in parallel the batched data. Each worker will provide the data it has collected to the model and then check another batch of data until the database has been fully read. Therefore, the more workers there are available to read the batches, the faster the model will be optimized. However, it is the user responsibility to set the number of workers accordingly to the amount of available RAM.

hyperparameters

Hyperparameters are parameters that influence the learning process but cannot be learned. In iota2, it possible to test various values within the same run for 2 hyperparameters. This is done via a dictionary which contains 2 keys (“batch_size” and “learning_rate”) and values to be used as a list. The product of the lists will constitute the number of models to be learned and then the best of them will be used for inference (cf the model_selection_criterion parameter).

For example, if the configuration file contains :

chain:{
# usual iota2 parameters
}

arg_train:{
deep_learning_parameters:{
...
hyperparameters_solver : {"batch_size" : [1000],
                    "learning_rate" : [0.1, 0.00001]}
...
}
}

Then two models will be trained (in parallel if possible) one with a batch’s size of 1000 and a learning rate of 0.1; and an other one with the same batch size but with a learning rate of 0.00001.

dl_module

Users can define their own neural network via this parameter which should point to a user provided python module. However, the neural network must be defined as a class derived from Iota2NeuralNetwork available in the module torch_nn_bank.py.

Currently, iota2 can only perform pixel-wise operations, since the input to the model are the spectro-temporal features for pixels. Convolutional layers can be used in the spectral and temporal dimensions, but not in the spatial dimension.

restart_from_checkpoint

The learning phase can be quite long. If for some reason the learning stops, everything that has been learned is lost. However iota2 integrates the possibility to restart the learning step from the last learned epoch, with a backup of the model state being made at each epoch.

adaptive_lr

Adaptive learning rate allows the use of ReduceLROnPlateau from Pytorch which reduces the learning rate when a metric has stopped improving. The metric monitored in iota2 is the validation loss.

adaptive_lr parameter will receive a dictionary where keys are ReduceLROnPlateau parameter’s name and the value is the value of the parameter.

ie, the configuration file can contain:

chain:{
# usual iota2 parameters
}

arg_train:{
deep_learning_parameters:{
...
adaptive_lr : {patience: 10,
                 factor: 0.1,
                 threshold: 1e-4,
                 threshold_mode: 'rel',
                 cooldown: 0,
                 min_lr: 0,
                 eps: 1e-8,
                 verbose: False}# which are default values for all parameter's key
...
}
}

Parameters mode and optimizer can not be set by users. mode is forced to min and the optimizer will be the one used during the training step. By default, adaptive_lr is {} and therefore no adapatative learning rate will be used.

Expected output descriptions

Models

One model per hyperparameter pairs

Each pair of hyperparameters produces a model file in the model output directory. For example, for the first possible pair of hyperparameters, the file model_1_seed_0_hyp_0.txt will be produced. The name of the file is defined as:

  • model_1 : the model for the region 1

  • seed_0 : for the first random split

  • hyp_0 : for the first hyperparameter pair

Selected model

Once all the models per pair of hyperparameters are learned, the one that provides the best result according to the selection criterion (cf model_selection_criterion) is selected. The result is stored on disk under a serialized object using pickle: model_1_seed_0.txt. It will be used later in the inference phase.

Checkpoints

After each epoch, all the information needed to restart from the current epoch is stored in a serialized object in the model directory in a file. For example, for the model model_1_seed_0_hyp_0.txt the equivalent checkpoint file would be model_1_seed_0_hyp_0_checkpoint.txt.

Plots

To visualize the evolution of the model loss function, 2 figures are generated for each model, next to the models. For example for the model model_1_seed_0_hyp_0.txt

  • model_1_seed_0_hyp_0_loss.png : shows the evolution of the learning and validation loss (dB) over the epochs

  • model_1_seed_0_hyp_0_confusion_metrics.png : shows the evolution of the Kappa, OA, Precision and the Recall over the epochs.

Classifications

The classifications maps are stored in a conventional tif format in the classif output directory. First, chunks of tiles are classified, then all these pieces will be merged to form one tile.

iota2 internal choices

Pytorch

We have decided to use pytorch to implement deep neural networks in iota2.

Classification by chunks

As mentioned above, deep learning classifications are done in chunks, to fit RAM constraints. In this workflow, iota2 works with numpy arrays that need to be stored temporarily in RAM, but few machines have enough RAM to hold a whole Sentinel-2 tile with many acquisition dates at the same time. This is why we work in chunks to make the predictions and then merge these predictions.

The size of the chunks can be set via the number_of_chunks parameters in the parameter block python_data_managing

The shape of the tensor of data

Every model will be fed with a tensor of data shaped as (batch_size, nb_dates, nb_bands)

Learning vs validation vs test

In a conventional deep learning approach, the initial database is split into 3 distinct data-sets:

  • Learning: which is used to train the model (let’s call it L)

  • Validation : which allows, during the learning process, to observe the behavior of the model (convergence, over-fitting etc.), let’s call this V. These observations may, for example, allow a readjustment of some hyperparameters

  • Test : allows the performance of the model to be validated on a larger database than the validation database, let’s call this T.

How are the samples distributed in these three databases?

Initially the parameter ratio allows us to build a database that will contain L + V on one side and T on the other side. Then at the time of training, 80% of the L + V is used to build the model (L) and 20% to build the validation database (V).

For example, if the configuration file contains :

chain:{
...
ratio : 0.7
...
}

Then 30% of the database will go into the test database and 70% will be set aside to build the learning and validation databases. Then 80% of the 70% will be used to build the training database and 20% of the 70% for validation. In iota2, these splits are made ‘polygon wise’ i.e. a polygon is placed in one of the databases in its entirety and cannot be found in another database (unless the class is represented by a single polygon).

Inheritance of iota2 neural network

As already explained for the parameter dl_module, all classes implementing neural networks in iota2 must derive from the class Iota2NeuralNetwork defined in the module torch_nn_bank.py. This class allows insertion into the iota2 workflow.

Database format

The format of the input database used in the learning phase is the NETCDF format. This database is stored in the learningSamples directory under the name Samples_region_1_seed0_learn.nc (for model 1 representing region 1) which contains both the learning and validation database. However, the user does not need to do anything special, since this format is generated by iota2 from the user-provided reference data which is the same as for other classifiers.

GPU vs CPU

GPUs are automatically detected by Pytorch. When a GPU is detected, learning and inference will use it.

How to use GPUs ?

As mentioned before if a GPUs is detected, computations and data will be transferred to it. However, with the scheduler_type set to PBS the task will spawn on a specific dedicated node means that iota2 must allocate a GPU before to send tasks to it. This allocation can be done by specifying a dedicated queue and nb_gpu in the step resources block as the following

training : {
              name:"training"
              nb_cpu:10
              ram:"92gb"
              walltime:"12:00:00"
              nb_gpu:1 # number of GPUs to use
              queue:"qgpgpu" # queue containing GPUs
            }

Dataloader per batch vs full memory

Loading the full learning/validation data-set into RAM may significantly decrease the learning time. However, this is not always possible depending on the amount of RAM available on the processing unit. In this case the stream mode must be used. In this mode the database will be read in batch-sized chunks.

Managing randomness during the learning step

Random mixing of data in batches between epochs is crucial to obtain an optimal stochastic gradient descent. In iota2, randomness is managed at several levels

  • When splitting samples in the Learning, Validation and Test databases. These distributions are made ‘polygon wise’.

  • When allocating the content of the data batches to feed the model.

  • The order of the batches.

  • Moreover, at each epoch, the content of each batch is again randomly distributed.

Cost function and gradient optimizer

Currently, users can’t choose them.

Using statistics to alter incoming data

All neural network instances have a _stats attribute which provides statistics for each sensor encountered, for instance considering the Sentinel2 data provided by THEIA :

self._stats = {'s2_theia': {'min': tensor([-0.0100, ..., 0.0000]),
                            'max': tensor([-0.0100, ..., 0.0000,]),
                            'mean': tensor([-0.01, ..., 0.]),
                            'var': tensor([-0.01, ..., 0.]),
                            'quantile_0.1': tensor([-0.01, ..., 0.)],
                            'quantile_0.5': tensor([-0.01, ..., 0.]),
                                                        'quantile_0.95': tensor([-0.01, ..., 0.])}}

Statistics are shaped as (nb component * nb_dates) and chronologically sorted. For instance if we consider 2 Sentinel-2 acquisitions d1 and d2 and all bands available in iota2 (b2, b3, b4, b5, b6, b7, b8, b8A, b11 and b12) then one stat vector can be

self._stats = {'s2_theia': {'min': tensor([d1_b1, ..., d1_b12, d2_b1, d2_b12]),...}}

Available sensors in self._stats are s2_theia, s2_s2c, s2_l3a, l5, l8_old, l8, s1_desvv, s1_desvh, s1_ascvv and s1_ascvh. Keys for stats are the ones already presented : min, max, mean, var, quantile_0.1, quantile_0.5 (median) and quantile_0.95. The statistics are automatically computed except for the quantiles which are only computed if the parameter additional_statistics_percentage is set to a value different from None. It is possible to use these statistics to scale data in the forward method. Iota2 provides the method self.standardize(x, mean, std, self.default_mean, self.default_std) where x is the input data and where mean and std are the empirical mean and std values for each feature.

In some cases, data may contain NaN values. These specific values can cause the neural network to crash. Iota2 offers the possibility to impute such values using self.nans_imputation(x, mean, self.default_mean). The method replaces NaNs (in x) with consistent values (ie: using the empirical mean value of the corresponding feature). Please note that the x shape is (batch_size, nb_dates, nb_components) and mean shape is (nb_dates * nb_components).

In some conditions, empirical statistic values can also contain NaNs values. That’s why self.nans_imputation() and self.standardize() accepts default values to replace NaNs in the statistics vector.

The following code block shows how to perform imputations and standardization

def _forward(self, x):

    mean = self._stats["s2_theia"]["mean"]
    x = self.nans_imputation(x, mean, self.default_mean)
    std = torch.sqrt(self._stats["s2_theia"]["var"])
    x = self.standardize(x, mean, std, self.default_mean, self.default_std)

    x = F.relu(x)
    x = F.relu(self.bnhidden2(self.hidden1(x)))
    x = F.relu(self.bnout(self.hidden2(x)))
    x = self.output(x)
    return x

Using external features with deep learning models

Warning

Currently only temporal indices are supported, i.e for each date it is required to compute an indice. Cumulative NDVI of the full time series is not supported yet for instance.

Before planning the external functions that will be used, the architecture of the model must be carefully considered. Indeed, to avoid strange behaviour, it is better to ensure that the user functions and the model will exploit the same data source: raw or interpolated data. The considered parameters to handle this correctly is features_from_raw_dates.

If the used model required the number of bands as input parameter, the added indices must be added to the sensor bands.

Great attention must be paid to the names of the calculated and returned indices as they must have the following structure: Sensor_SpectralIndice_date. Some functions prexist to help the user, see the example below:

def get_soi_s2(self):
    """
    compute the Soil Composition Index
    """
    dates = self.get_interpolated_dates()
    coef = (self.get_interpolated_Sentinel2_B11() -
            self.get_interpolated_Sentinel2_B8()) / (
                self.get_interpolated_Sentinel2_B11() +
                self.get_interpolated_Sentinel2_B8() + 1.E-6)
    labels = [f"Sentinel2_soi_{date}" for date in dates["Sentinel2"]]
        return coef, labels

If the label structure is not respected, this will be ignored when computing statistics which can leads to shape mismatch errors.