You are looking the documentation of a development version. The release version is available at master.

Deep Learning in iota2

Among the list of possible classification algorithms, iota2 also offers the possibility to use deep neural networks. To date, only networks that work on pixel time series can be used (i.e. no spatial/2D convolution). This documentation summarizes the parameters available to users and their meaning through examples. It also discusses the chain outputs and the development choices that have been made.

Parameters involved

All the parameters below must be inside the deep_learning_parameters section, which is itself inside the arg_train section of the iota2 configuration file.

Note

Once the parameter deep_learning_parameters.dl_name is provided, iota2 will try to use the deepLearning workflow

chain:{
# usual iota2 parameters
}

arg_train:{
deep_learning_parameters:{
...
# place here deep learning algorithm parameters
...
}
}

Name

Default Value

Description

Type

Mandatory

Name

dl_name

Available neural network’s architecture (class name), currently : ‘LTAEClassifier’, ‘ANN’, ‘MLPClassifier’ or ‘SimpleSelfAttentionClassifier’

str

True when using neural networks

dl_name

dl_parameters

{}

Set of key/value to create the neural network instance (constructor parameters).

dict

False

dl_parameters

model_selection_criterion

“loss”

Select the model which maximizes one of these metrics computed on the validation set during the training process: “loss”, “fscore”, “oa”, “kappa”

str

False

model_selection_criterion

epochs

100

number of epochs for the learning stage

int

False

epochs

weighted_labels

False

apply weights to samples according to the proportion of each class in the computation of the loss function

bool

False

weighted_labels

num_workers

1

how many sub-processes to use for data loading. 0 means that the data will be loaded in the main process.

int

False

num_workers

hyperparameters_solver

{“batch_size”: [1000], “learning_rate”: [0.00001]}

key/value of hyperparameters to use to build models

dict

False

hyperparameters_solver

dl_module

None

path to a user python module containing custom neural networks

str

False

dl_module

restart_from_checkpoint

True

if existing, restart learning point from the checkpoint

bool

False

restart_from_checkpoint

dataloader_mode

‘stream’

during the learning stage, load the full data-set into memory (‘full’) or by batch (‘stream’)

str

False

dataloader_mode

enable_early_stop

False

flag to enable early stop during learning phase

bool

False

enable_early_stop

epoch_to_trigger

5

epoch number after which the monitoring of the metric trend starts

int

False

epoch_to_trigger

early_stop_patience

10

number of epochs without improvement after which training will be stopped

int

False

early_stop_patience

early_stop_tol

0.01

minimum change in the monitored quantity to qualify as an improvement. If metric is ‘train_loss’ or ‘valid_loss’ then tol must be in dB as \(dB = \log_{10}(\frac{loss_{N-1}}{loss_{N}})\) with N the current epoch.

float

False

early_stop_tol

early_stop_metric

“val_loss”

metric to monitor for early stopping

str

False

early_stop_metric

additional_statistics_percentage`

None

percentage ]0;1] of samples to use from the incoming database to compute quantiles

float

False

additional_statistics_percentage

adaptive_lr

{}

allow the use of adaptive learning rate across epochs

dict

False

adaptive_lr

Note

dl_name

Neural network architectures available in iota2 are defined in the python module torch_nn_bank.py

model_selection_criterion

During the learning step, several metrics can be computed on a validation set to evaluate the model. The optimized loss metric quantifies the fit of the model on the training sample, but iota2 also computes metrics such as the OA, Kappa and F1-score on the validation sample.

For each epoch, models maximizing each of these metrics are saved. When the learning phase ends, iota2 will use for the inference the model that maximizes the metric chosen by the user.

weighted_labels

Weights can be assigned to samples w.r.t their class membership when computing the loss function during the learning step. These weights are computed using only the training + validation database and correspond to the inverse of their proportion in the database. For example, if the database contains 2 classes, 1 and 2, 80% of the samples belonging to class 1 and 20% to class 2. The weights will then be 1.25 for samples from class 1 (1 / 0.8) and 5 for samples from class 2 (1 / 0.2).

num_workers

During the learning phase, the model is optimized iteratively using stochastic gradient descent. For each epoch, the model is optimized with a subset of the database (i.e., batches). The number of workers corresponds to the number of tasks that prepare in parallel the batched data. Each worker will provide the data it has collected to the model and then check another batch of data until the database has been fully read. Therefore, the more workers there are available to read the batches, the faster the model will be optimized. However, it is the user responsibility to set the number of workers accordingly to the amount of available RAM.

hyperparameters

Hyperparameters are parameters that influence the learning process but cannot be learned. In iota2, it possible to test various values within the same run for 2 hyperparameters. This is done via a dictionary which contains 2 keys (“batch_size” and “learning_rate”) and values to be used as a list. The product of the lists will constitute the number of models to be learned and then the best of them will be used for inference (cf the model_selection_criterion parameter).

For example, if the configuration file contains :

chain:{
# usual iota2 parameters
}

arg_train:{
deep_learning_parameters:{
...
hyperparameters_solver : {"batch_size" : [1000],
                    "learning_rate" : [0.1, 0.00001]}
...
}
}

Then two models will be trained (in parallel if possible) one with a batch’s size of 1000 and a learning rate of 0.1; and an other one with the same batch size but with a learning rate of 0.00001.

dl_module

Users can define their own neural network via this parameter which should point to a user provided python module. However, the neural network must be defined as a class derived from Iota2NeuralNetwork available in the module torch_nn_bank.py.

Currently, iota2 can only perform pixel-wise operations, since the input to the model are the spectro-temporal features for pixels. Convolutional layers can be used in the spectral and temporal dimensions, but not in the spatial dimension.

restart_from_checkpoint

The learning phase can be quite long. If for some reason the learning stops, everything that has been learned is lost. However iota2 integrates the possibility to restart the learning step from the last learned epoch, with a backup of the model state being made at each epoch.

adaptive_lr

Adaptive learning rate allows the use of ReduceLROnPlateau from Pytorch which reduces the learning rate when a metric has stopped improving. The metric monitored in iota2 is the validation loss.

adaptive_lr parameter will receive a dictionary where keys are ReduceLROnPlateau parameter’s name and the value is the value of the parameter.

ie, the configuration file can contain:

chain:{
# usual iota2 parameters
}

arg_train:{
deep_learning_parameters:{
...
adaptive_lr : {patience: 10,
                 factor: 0.1,
                 threshold: 1e-4,
                 threshold_mode: 'rel',
                 cooldown: 0,
                 min_lr: 0,
                 eps: 1e-8,
                 verbose: False}# which are default values for all parameter's key
...
}
}

Parameters mode and optimizer can not be set by users. mode is forced to min and the optimizer will be the one used during the training step. By default, adaptive_lr is {} and therefore no adapatative learning rate will be used.

Expected output descriptions

Models

One model per hyperparameter pairs

Each pair of hyperparameters produces a model file in the model output directory. For example, for the first possible pair of hyperparameters, the file model_1_seed_0_hyp_0.txt will be produced. The name of the file is defined as:

  • model_1 : the model for the region 1

  • seed_0 : for the first random split

  • hyp_0 : for the first hyperparameter pair

Selected model

Once all the models per pair of hyperparameters are learned, the one that provides the best result according to the selection criterion (cf model_selection_criterion) is selected. The result is stored on disk under a serialized object using pickle: model_1_seed_0.txt. It will be used later in the inference phase.

Checkpoints

After each epoch, all the information needed to restart from the current epoch is stored in a serialized object in the model directory in a file. For example, for the model model_1_seed_0_hyp_0.txt the equivalent checkpoint file would be model_1_seed_0_hyp_0_checkpoint.txt.

Plots

To visualize the evolution of the model loss function, 2 figures are generated for each model, next to the models. For example for the model model_1_seed_0_hyp_0.txt

  • model_1_seed_0_hyp_0_loss.png : shows the evolution of the learning and validation loss (dB) over the epochs

  • model_1_seed_0_hyp_0_confusion_metrics.png : shows the evolution of the Kappa, OA, Precision and the Recall over the epochs.

Classifications

The classifications maps are stored in a conventional tif format in the classif output directory. First, chunks of tiles are classified, then all these pieces will be merged to form one tile.

iota2 internal choices

Pytorch

We have decided to use pytorch to implement deep neural networks in iota2.

Classification by chunks

As mentioned above, deep learning classifications are done in chunks, to fit RAM constraints. In this workflow, iota2 works with numpy arrays that need to be stored temporarily in RAM, but few machines have enough RAM to hold a whole Sentinel-2 tile with many acquisition dates at the same time. This is why we work in chunks to make the predictions and then merge these predictions.

The size of the chunks can be set via the number_of_chunks parameters in the parameter block python_data_managing

The shape of the tensor of data

Every model will be fed with a tensor of data shaped as (batch_size, nb_dates, nb_bands)

Learning vs validation vs test

In a conventional deep learning approach, the initial database is split into 3 distinct data-sets:

  • Learning: which is used to train the model (let’s call it L)

  • Validation : which allows, during the learning process, to observe the behavior of the model (convergence, over-fitting etc.), let’s call this V. These observations may, for example, allow a readjustment of some hyperparameters

  • Test : allows the performance of the model to be validated on a larger database than the validation database, let’s call this T.

How are the samples distributed in these three databases?

Initially the parameter ratio allows us to build a database that will contain L + V on one side and T on the other side. Then at the time of training, 80% of the L + V is used to build the model (L) and 20% to build the validation database (V).

For example, if the configuration file contains :

chain:{
...
ratio : 0.7
...
}

Then 30% of the database will go into the test database and 70% will be set aside to build the learning and validation databases. Then 80% of the 70% will be used to build the training database and 20% of the 70% for validation. In iota2, these splits are made ‘polygon wise’ i.e. a polygon is placed in one of the databases in its entirety and cannot be found in another database (unless the class is represented by a single polygon).

Inheritance of iota2 neural network

As already explained for the parameter dl_module, all classes implementing neural networks in iota2 must derive from the class Iota2NeuralNetwork defined in the module torch_nn_bank.py. This class allows insertion into the iota2 workflow.

Database format

The format of the input database used in the learning phase is the NETCDF format. This database is stored in the learningSamples directory under the name Samples_region_1_seed0_learn.nc (for model 1 representing region 1) which contains both the learning and validation database. However, the user does not need to do anything special, since this format is generated by iota2 from the user-provided reference data which is the same as for other classifiers.

GPU vs CPU

GPUs are automatically detected by Pytorch. When a GPU is detected, learning and inference will use it.

How to use GPUs ?

As mentioned before if a GPUs is detected, computations and data will be transferred to it. However, with the scheduler_type set to PBS the task will spawn on a specific dedicated node means that iota2 must allocate a GPU before to send tasks to it. This allocation can be done by specifying a dedicated queue and nb_gpu in the step resources block as the following

training : {
              name:"training"
              nb_cpu:10
              ram:"92gb"
              walltime:"12:00:00"
              nb_gpu:1 # number of GPUs to use
              queue:"qgpgpu" # queue containing GPUs
            }

Dataloader per batch vs full memory

Loading the full learning/validation data-set into RAM may significantly decrease the learning time. However, this is not always possible depending on the amount of RAM available on the processing unit. In this case the stream mode must be used. In this mode the database will be read in batch-sized chunks.

Managing randomness during the learning step

Random mixing of data in batches between epochs is crucial to obtain an optimal stochastic gradient descent. In iota2, randomness is managed at several levels

  • When splitting samples in the Learning, Validation and Test databases. These distributions are made ‘polygon wise’.

  • When allocating the content of the data batches to feed the model.

  • The order of the batches.

  • Moreover, at each epoch, the content of each batch is again randomly distributed.

Cost function and gradient optimizer

Currently, users can’t choose them.

Using statistics to alter incoming data

All neural network instances have a _stats attribute which provides statistics for each sensor encountered, for instance considering the Sentinel2 data provided by THEIA :

self._stats = {'sentinel2': {'min': tensor([-0.0100, ..., 0.0000]),
                            'max': tensor([-0.0100, ..., 0.0000,]),
                            'mean': tensor([-0.01, ..., 0.]),
                            'var': tensor([-0.01, ..., 0.]),
                            'quantile_0.1': tensor([-0.01, ..., 0.)],
                            'quantile_0.5': tensor([-0.01, ..., 0.]),
                                                        'quantile_0.95': tensor([-0.01, ..., 0.])}}

Statistics are shaped as (nb component * nb_dates) and chronologically sorted. For instance if we consider 2 Sentinel-2 acquisitions d1 and d2 and all bands available in iota2 (b2, b3, b4, b5, b6, b7, b8, b8A, b11 and b12) then one stat vector can be

self._stats = {'sentinel2': {'min': tensor([d1_b1, ..., d1_b12, d2_b1, d2_b12]),...}}

Available sensors in self._stats are sentinel1_desvv, sentinel1_desvh, sentinel1_ascvv, sentinel1_ascvh, sentinel2, sentinel2s2c, sentinel2l3a, landsat8, landsat8old and landsat5old. Keys for stats are the ones already presented : min, max, mean, var, quantile_0.1, quantile_0.5 (median) and quantile_0.95. The statistics are automatically computed except for the quantiles which are only computed if the parameter additional_statistics_percentage is set to a value different from None. It is possible to use these statistics to scale data in the forward method. Iota2 provides the method self.standardize(x, mean, std, self.default_mean, self.default_std) where x is the input data and where mean and std are the empirical mean and std values for each feature.

In some cases, data may contain NaN values. These specific values can cause the neural network to crash. Iota2 offers the possibility to impute such values using self.nans_imputation(x, mean, self.default_mean). The method replaces NaNs (in x) with consistent values (ie: using the empirical mean value of the corresponding feature). Please note that the x shape is (batch_size, nb_dates, nb_components) and mean shape is (nb_dates * nb_components).

In some conditions, empirical statistic values can also contain NaNs values. That’s why self.nans_imputation() and self.standardize() accepts default values to replace NaNs in the statistics vector.

The following code block shows how to perform imputations and standardization

def _forward(self, x):

    mean = self._stats["sentinel2"]["mean"]
    x = self.nans_imputation(x, mean, self.default_mean)
    std = torch.sqrt(self._stats["sentinel2"]["var"])
    x = self.standardize(x, mean, std, self.default_mean, self.default_std)

    x = F.relu(x)
    x = F.relu(self.bnhidden2(self.hidden1(x)))
    x = F.relu(self.bnout(self.hidden2(x)))
    x = self.output(x)
    return x

About neural network definition

The neural model and iota2 interact at two points: when the model is instantiated and when iota2 feeds the model with data. These two interactions involve calling the __init__() method and the forward method of the neural network object respectively. In order for these interactions to proceed correctly, a formalism must be defined. This section details the expected signatures for these 2 methods.

Constructor parameters

Iota2 will automatically pass a series of parameters to the networks when they are instantiated, so they must exist in the constructor. As a minimum, all neural networks must have 5 parameters in their initialisation method: nb_class, sensors_information, doy_sensors_dic, default_std and default_mean as follows :

class MyNeuralNetwork(Iota2NeuralNetwork):
"""MyNeuralNetwork class definition."""

    def __init__(
        self,
        nb_class: int,
        sensors_information: dict,
        doy_sensors_dic: Optional[dict] = None,
        default_std: int = 1,
        default_mean: int = 0,
    ):

Note

Every neural network must inherit from Iota2NeuralNetwork base class.

nb_class

is an integer representing the number of class.

sensors_information

is a dictionnary where each key is a sensor’s name and the value is a dictionnary of two keys nb_components and nb_dates which allow users to know how many features per dates exists (total number of sensor’s features = nb_components x nb_dates).

doy_sensors_dic

is an OrderedDict where each key is a sensor’s name and the value is a dictionnary of two keys doy and features_per_dates. features_per_dates is currently redondant with the nb_components keys of sensors_information dictionnary.

default_std and default_mean

Are used as substituion values in standardization computation : x = (features - mean) / std if mean or std are NaNs.

Forward definition

It is via the forward method that iota2 will provide the network with data. The choice has been made to separate the data by sensor, so the forward method will have to contain a dedicated parameter for each activated sensor. A possible definition of the forward method could be as follows, if all the sensors in iota2 are used.

class MyNeuralNetwork(Iota2NeuralNetwork):
"""MyNeuralNetwork class definition."""

    def forward(
        self,
        sentinel1_desvv=None,
        sentinel1_desvh=None,
        sentinel1_ascvv=None,
        sentinel1_ascvh=None,
        sentinel2=None,
        sentinel2s2c=None,
        sentinel2l3a=None,
        landsat8=None,
        landsat8old=None,
        landsat5old=None
):

It is also possible to feed the forward with exogenous data already written to disk userfeatures or with external features calculated in a python module. In the case of userfeatures, a new parameter per userfeature must be added to the forward. External features are more versatile, allowing you to add primitives to an existing sensor, or create new temporal or non-temporal primitives. These different possibilities are illustrated in the next section using examples.

For the examples, we will use Sentinel-2 data, 13 features (10 spectral bands + NDVI + NDWI + Brightness) and 3 dates.

Sentinel-2

Let’s start with a classic example: using Sentinel-2 data alone: 13 primitives and 3 dates.

The forward can simply be written as

class MyNeuralNetwork(Iota2NeuralNetwork):
"""MyNeuralNetwork class definition."""
    ...
    def forward(
        self,
        sentinel2
    ):

In this case, the sentinel2 tensor will have the shape (B, 3, 13) where B is the size of the data batch.

Combining Sentinel-2 and two user features

Here, we want to use Sentinel-2 and exogenous data already written to disk, for example a DEM and temperature data. We will therefore use the user_feat_path field and the userFeat section of the configuration file as follows

chain:{
...
s2_path : '/path/to/my_tiled/s2_data'
user_feat_path : '/path/to/my_tiled/exogenous_data
...
}
...
userFeat:{
patterns:"dem,temperature"
}

The dem is a raster file written to disk and contains 3 bands, the temperature data is also a raster on disk but with a single band. The forward should then have the following definition:

def forward(
    self,
    sentinel2,
    dem,
    temperature
):

Note

the names of the dem and temperature parameters of the forward method must correspond to the patterns fields in the userFeat section

In this case, the tensor sentinel2 will always have the shape (B, 3, 13), the tensor dem will have the shape (B, 3) and temperature (B, 1).

Warning

Tensors can have a different number of dimensions, the time dimension can disappear.

Using external features

External features can be used to create new features, which can be concatenated with an existing sensor (if temporal), be temporal or non-temporal.

If, in addition to Sentinel-2 data (over 3 dates), the user uses 3 python functions to create new primitives as follows

from iota2.learning.utils import I2Label, I2TemporalLabel

def new_s2_features(self):
    coef = self.get_interpolated_Sentinel2_B3() ** 2
    labels = [I2TemporalLabel(sensor_name="sentinel2", feat_name="pow", date=date)
              for date in self.interpolated_dates["Sentinel2"]]
    return coef, labels

def new_temporal_features(self):
    coef = self.get_interpolated_Sentinel2_B2() + self.get_interpolated_Sentinel2_B3()
    labels = [I2TemporalLabel(sensor_name="myfeatures", feat_name="add", date=date)
              for date in self.interpolated_dates["Sentinel2"]]
    return coef, labels

def new_feature(self):
    coef = self.get_interpolated_Sentinel2_B2()[:, :, 0:1] # get the two first date of Sentinel-2 B2.
    labels = [I2Label(sensor_name="newfeature", feat_name="b2date1"),
              I2Label(sensor_name="newfeature", feat_name="b2date2")]
    return coef, labels

then the forward method must have the signature

def forward(
    self,
    sentinel2,
    myfeatures,
    newfeature
):

sentinel2 tensor will have the shape (B, 3, 14), myfeatures the shape (B, 3, 1) and newfeature (B, 2).

Note

sentinel2 tensor get an extra features because the function new_s2_features add the feature pow to the sensor sentinel2 for every sentinel2 dates.