Deep Learning in iota2 ###################### Among the list of possible classification algorithms, iota2 also offers the possibility to use deep neural networks. To date, only networks that work on pixel time series can be used (i.e. no spatial/2D convolution). This documentation summarizes the parameters available to users and their meaning through examples. It also discusses the chain outputs and the development choices that have been made. Parameters involved ******************* All the parameters below must be inside the ``deep_learning_parameters`` section, which is itself inside the ``arg_train`` section of the iota2 configuration file. .. note:: Once the parameter deep_learning_parameters.dl_name is provided, iota2 will try to use the deepLearning workflow .. code-block:: python chain:{ # usual iota2 parameters } arg_train:{ deep_learning_parameters:{ ... # place here deep learning algorithm parameters ... } } .. list-table:: :widths: auto :header-rows: 1 * - Name - Default Value - Description - Type - Mandatory - Name .. _nn_dl_name: * - :ref:`dl_name ` - - Available neural network's architecture (class name), currently : 'LTAEClassifier', 'ANN', 'MLPClassifier' or 'SimpleSelfAttentionClassifier' - str - True when using neural networks - dl_name .. _nn_dl_parameters: * - dl_parameters - {} - Set of key/value to create the neural network instance (constructor parameters). - dict - False - dl_parameters .. _nn_model_selection_criterion: * - :ref:`model_selection_criterion ` - "loss" - Select the model which maximizes one of these metrics computed on the validation set during the training process: "loss", "fscore", "oa", "kappa" - str - False - model_selection_criterion .. _nn_epochs: * - epochs - 100 - number of epochs for the learning stage - int - False - epochs .. _nn_weighted_labels: * - :ref:`weighted_labels ` - False - apply weights to samples according to the proportion of each class in the computation of the loss function - bool - False - weighted_labels .. _nn_num_workers: * - :ref:`num_workers ` - 1 - how many sub-processes to use for data loading. 0 means that the data will be loaded in the main process. - int - False - num_workers .. _nn_hyperparameters: * - :ref:`hyperparameters_solver ` - {"batch_size": [1000], "learning_rate": [0.00001]} - key/value of hyperparameters to use to build models - dict - False - hyperparameters_solver .. _dl_module: * - :ref:`dl_module ` - None - path to a user python module containing custom neural networks - str - False - dl_module .. _nn_restart_from_checkpoint: * - :ref:`restart_from_checkpoint ` - True - if existing, restart learning point from the checkpoint - bool - False - restart_from_checkpoint .. _nn_dataloader_mode: * - dataloader_mode - 'stream' - during the learning stage, load the full data-set into memory ('full') or by batch ('stream') - str - False - dataloader_mode .. _nn_enable_early_stop: * - enable_early_stop - False - flag to enable early stop during learning phase - bool - False - enable_early_stop .. _nn_epoch_to_trigger: * - epoch_to_trigger - 5 - epoch number after which the monitoring of the metric trend starts - int - False - epoch_to_trigger .. _nn_patience: * - early_stop_patience - 10 - number of epochs without improvement after which training will be stopped - int - False - early_stop_patience .. _nn_early_stop_tol: * - early_stop_tol - 0.01 - minimum change in the monitored quantity to qualify as an improvement. If metric is 'train_loss' or 'valid_loss' then tol must be in dB as :math:`dB = \log_{10}(\frac{loss_{N-1}}{loss_{N}})` with `N` the current epoch. - float - False - early_stop_tol .. _nn_early_stop_metric: * - early_stop_metric - "val_loss" - metric to monitor for early stopping - str - False - early_stop_metric .. _nn_additional_statistics_percentage: * - additional_statistics_percentage` - None - percentage ]0;1] of samples to use from the incoming database to compute quantiles - float - False - additional_statistics_percentage .. _nn_adaptive_lr: * - :ref:`adaptive_lr ` - {} - allow the use of adaptive learning rate across epochs - dict - False - adaptive_lr Note ==== .. _desc_dl_name: :ref:`dl_name ` --------------------------- Neural network architectures available in iota2 are defined in the python module `torch_nn_bank.py `_ .. _desc_model_selection_criterion: :ref:`model_selection_criterion ` --------------------------------------------------------------- During the learning step, several metrics can be computed on a validation set to evaluate the model. The optimized loss metric quantifies the fit of the model on the training sample, but iota2 also computes metrics such as the OA, Kappa and F1-score on the validation sample. For each epoch, models maximizing each of these metrics are saved. When the learning phase ends, iota2 will use for the inference the model that maximizes the metric chosen by the user. .. _desc_weighted_labels: :ref:`weighted_labels ` -------------------------------------------- Weights can be assigned to samples w.r.t their class membership when computing the loss function during the learning step. These weights are computed using only the training + validation database and correspond to the inverse of their proportion in the database. For example, if the database contains 2 classes, 1 and 2, 80% of the samples belonging to class 1 and 20% to class 2. The weights will then be 1.25 for samples from class 1 (1 / 0.8) and 5 for samples from class 2 (1 / 0.2). .. _desc_num_workers: :ref:`num_workers ` ----------------------------------- During the learning phase, the model is optimized iteratively using stochastic gradient descent. For each epoch, the model is optimized with a subset of the database (i.e., batches). The number of workers corresponds to the number of tasks that prepare in parallel the batched data. Each worker will provide the data it has collected to the model and then check another batch of data until the database has been fully read. Therefore, the more workers there are available to read the batches, the faster the model will be optimized. However, it is the user responsibility to set the number of workers accordingly to the amount of available RAM. .. _desc_hyperparameters: :ref:`hyperparameters ` ------------------------------------------- Hyperparameters are parameters that influence the learning process but cannot be learned. In iota2, it possible to test various values within the same run for 2 hyperparameters. This is done via a dictionary which contains 2 keys ("batch_size" and "learning_rate") and values to be used as a list. The product of the lists will constitute the number of models to be learned and then the best of them will be used for inference (cf the model_selection_criterion parameter). For example, if the configuration file contains : .. code-block:: python chain:{ # usual iota2 parameters } arg_train:{ deep_learning_parameters:{ ... hyperparameters_solver : {"batch_size" : [1000], "learning_rate" : [0.1, 0.00001]} ... } } Then two models will be trained (in parallel if possible) one with a batch's size of 1000 and a learning rate of 0.1; and an other one with the same batch size but with a learning rate of 0.00001. .. _desc_dl_module: :ref:`dl_module ` ---------------------------- Users can define their own neural network via this parameter which should point to a user provided python module. However, the neural network must be defined as a class derived from ``Iota2NeuralNetwork`` available in the module `torch_nn_bank.py `_. Currently, iota2 can only perform pixel-wise operations, since the input to the model are the spectro-temporal features for pixels. Convolutional layers can be used in the spectral and temporal dimensions, but not in the spatial dimension. .. _desc_restart_from_checkpoint: :ref:`restart_from_checkpoint ` ----------------------------------------------------------- The learning phase can be quite long. If for some reason the learning stops, everything that has been learned is lost. However iota2 integrates the possibility to restart the learning step from the last learned epoch, with a backup of the model state being made at each epoch. .. _desc_adaptive_lr: :ref:`adaptive_lr ` --------------------------------------- Adaptive learning rate allows the use of `ReduceLROnPlateau `_ from Pytorch which reduces the learning rate when a metric has stopped improving. The metric monitored in iota2 is the ``validation loss``. ``adaptive_lr`` parameter will receive a dictionary where keys are `ReduceLROnPlateau `_ parameter's name and the value is the value of the parameter. ie, the configuration file can contain: .. code-block:: python chain:{ # usual iota2 parameters } arg_train:{ deep_learning_parameters:{ ... adaptive_lr : {patience: 10, factor: 0.1, threshold: 1e-4, threshold_mode: 'rel', cooldown: 0, min_lr: 0, eps: 1e-8, verbose: False}# which are default values for all parameter's key ... } } Parameters ``mode`` and ``optimizer`` can not be set by users. ``mode`` is forced to ``min`` and the optimizer will be the one used during the training step. By default, ``adaptive_lr`` is ``{}`` and therefore no adapatative learning rate will be used. Expected output descriptions **************************** Models ====== One model per hyperparameter pairs ---------------------------------- Each pair of hyperparameters produces a model file in the ``model`` output directory. For example, for the first possible pair of hyperparameters, the file ``model_1_seed_0_hyp_0.txt`` will be produced. The name of the file is defined as: - model_1 : the model for the region 1 - seed_0 : for the first random split - hyp_0 : for the first hyperparameter pair Selected model -------------- Once all the models per pair of hyperparameters are learned, the one that provides the best result according to the selection criterion (cf :ref:`model_selection_criterion `) is selected. The result is stored on disk under a serialized object using pickle: ``model_1_seed_0.txt``. It will be used later in the inference phase. Checkpoints =========== After each epoch, all the information needed to restart from the current epoch is stored in a serialized object in the ``model`` directory in a file. For example, for the model ``model_1_seed_0_hyp_0.txt`` the equivalent checkpoint file would be ``model_1_seed_0_hyp_0_checkpoint.txt``. Plots ===== To visualize the evolution of the model loss function, 2 figures are generated for each model, next to the models. For example for the model ``model_1_seed_0_hyp_0.txt`` - ``model_1_seed_0_hyp_0_loss.png`` : shows the evolution of the learning and validation loss (dB) over the epochs - ``model_1_seed_0_hyp_0_confusion_metrics.png`` : shows the evolution of the Kappa, OA, Precision and the Recall over the epochs. Classifications =============== The classifications maps are stored in a conventional ``tif`` format in the ``classif`` output directory. First, chunks of tiles are classified, then all these pieces will be merged to form one tile. iota2 internal choices ********************** Pytorch ======= We have decided to use `pytorch `_ to implement deep neural networks in iota2. Classification by chunks ======================== As mentioned above, deep learning classifications are done in chunks, to fit RAM constraints. In this workflow, iota2 works with numpy arrays that need to be stored temporarily in RAM, but few machines have enough RAM to hold a whole Sentinel-2 tile with many acquisition dates at the same time. This is why we work in chunks to make the predictions and then merge these predictions. The size of the chunks can be set via the ``number_of_chunks`` parameters in the parameter block :ref:`python_data_managing ` The shape of the tensor of data =============================== Every model will be fed with a tensor of data shaped as (batch_size, nb_dates, nb_bands) Learning vs validation vs test ============================== In a conventional deep learning approach, the initial database is split into 3 distinct data-sets: - Learning: which is used to train the model (let's call it ``L``) - Validation : which allows, during the learning process, to observe the behavior of the model (convergence, over-fitting etc.), let's call this ``V``. These observations may, for example, allow a readjustment of some hyperparameters - Test : allows the performance of the model to be validated on a larger database than the validation database, let's call this ``T``. How are the samples distributed in these three databases? Initially the parameter :ref:`ratio ` allows us to build a database that will contain ``L`` + ``V`` on one side and ``T`` on the other side. Then at the time of training, **80%** of the ``L`` + ``V`` is used to build the model (``L``) and **20%** to build the validation database (``V``). For example, if the configuration file contains : .. code-block:: python chain:{ ... ratio : 0.7 ... } Then **30%** of the database will go into the **test** database and **70%** will be set aside to build the **learning and validation** databases. Then 80% of the 70% will be used to build the training database and 20% of the 70% for validation. In iota2, these splits are made 'polygon wise' i.e. a polygon is placed in one of the databases in its entirety and cannot be found in another database (unless the class is represented by a single polygon). Inheritance of iota2 neural network =================================== As already explained for the parameter :ref:`dl_module `, all classes implementing neural networks in iota2 must derive from the class ``Iota2NeuralNetwork`` defined in the module `torch_nn_bank.py `_. This class allows insertion into the iota2 workflow. Database format =============== The format of the input database used in the learning phase is the NETCDF format. This database is stored in the ``learningSamples`` directory under the name ``Samples_region_1_seed0_learn.nc`` (for model 1 representing region 1) which contains both the learning and validation database. However, the user does not need to do anything special, since this format is generated by iota2 from the user-provided reference data which is the same as for other classifiers. GPU vs CPU ========== GPUs are automatically detected by Pytorch. When a GPU is detected, learning and inference will use it. How to use GPUs ? ----------------- As mentioned before if a GPUs is detected, computations and data will be transferred to it. However, with the `scheduler_type` set to `PBS` the task will spawn on a specific dedicated node means that iota2 must allocate a GPU before to send tasks to it. This allocation can be done by specifying a dedicated ``queue`` and ``nb_gpu`` in the step resources block as the following .. code-block:: python training : { name:"training" nb_cpu:10 ram:"92gb" walltime:"12:00:00" nb_gpu:1 # number of GPUs to use queue:"qgpgpu" # queue containing GPUs } Dataloader per batch vs full memory =================================== Loading the full learning/validation data-set into RAM may significantly decrease the learning time. However, this is not always possible depending on the amount of RAM available on the processing unit. In this case the ``stream`` mode must be used. In this mode the database will be read in batch-sized chunks. Managing randomness during the learning step ============================================ Random mixing of data in batches between epochs is crucial to obtain an optimal stochastic gradient descent. In iota2, randomness is managed at several levels - When splitting samples in the Learning, Validation and Test databases. These distributions are made 'polygon wise'. - When allocating the content of the data batches to feed the model. - The order of the batches. - Moreover, at each epoch, the content of each batch is again randomly distributed. Cost function and gradient optimizer ==================================== Currently, users can't choose them. - The cost function used is `torch.nn.functional.cross_entropy `_ with default values. - The gradient optimizer is `torch.optim.Adam `_ with betas=(0.9, 0.999), eps=1e-08, weight_decay=0 and amsgrad=False. Using statistics to alter incoming data ======================================= All neural network instances have a `_stats` attribute which provides statistics for each sensor encountered, for instance considering the Sentinel2 data provided by THEIA : .. code-block:: python self._stats = {'sentinel2': {'min': tensor([-0.0100, ..., 0.0000]), 'max': tensor([-0.0100, ..., 0.0000,]), 'mean': tensor([-0.01, ..., 0.]), 'var': tensor([-0.01, ..., 0.]), 'quantile_0.1': tensor([-0.01, ..., 0.)], 'quantile_0.5': tensor([-0.01, ..., 0.]), 'quantile_0.95': tensor([-0.01, ..., 0.])}} Statistics are shaped as (nb component * nb_dates) and chronologically sorted. For instance if we consider 2 Sentinel-2 acquisitions d1 and d2 and all bands available in iota2 (b2, b3, b4, b5, b6, b7, b8, b8A, b11 and b12) then one stat vector can be .. code-block:: python self._stats = {'sentinel2': {'min': tensor([d1_b1, ..., d1_b12, d2_b1, d2_b12]),...}} Available sensors in self._stats are sentinel1_desvv, sentinel1_desvh, sentinel1_ascvv, sentinel1_ascvh, sentinel2, sentinel2s2c, sentinel2l3a, landsat8, landsat8old and landsat5old. Keys for stats are the ones already presented : min, max, mean, var, quantile_0.1, quantile_0.5 (median) and quantile_0.95. The statistics are automatically computed except for the quantiles which are only computed if the parameter `additional_statistics_percentage` is set to a value different from `None`. It is possible to use these statistics to scale data in the `forward` method. Iota2 provides the method `self.standardize(x, mean, std, self.default_mean, self.default_std)` where `x` is the input data and where `mean` and `std` are the empirical mean and std values for each feature. In some cases, data may contain NaN values. These specific values can cause the neural network to crash. Iota2 offers the possibility to impute such values using `self.nans_imputation(x, mean, self.default_mean)`. The method replaces NaNs (in x) with consistent values (ie: using the empirical mean value of the corresponding feature). Please note that the `x` shape is (batch_size, nb_dates, nb_components) and `mean` shape is (nb_dates * nb_components). In some conditions, empirical statistic values can also contain NaNs values. That's why `self.nans_imputation()` and `self.standardize()` accepts default values to replace NaNs in the statistics vector. The following code block shows how to perform imputations and standardization .. code-block:: python def _forward(self, x): mean = self._stats["sentinel2"]["mean"] x = self.nans_imputation(x, mean, self.default_mean) std = torch.sqrt(self._stats["sentinel2"]["var"]) x = self.standardize(x, mean, std, self.default_mean, self.default_std) x = F.relu(x) x = F.relu(self.bnhidden2(self.hidden1(x))) x = F.relu(self.bnout(self.hidden2(x))) x = self.output(x) return x About neural network definition ******************************* The neural model and iota2 interact at two points: when the model is instantiated and when iota2 feeds the model with data. These two interactions involve calling the __init__() method and the forward method of the neural network object respectively. In order for these interactions to proceed correctly, a formalism must be defined. This section details the expected signatures for these 2 methods. Constructor parameters ====================== Iota2 will automatically pass a series of parameters to the networks when they are instantiated, so they must exist in the constructor. As a minimum, all neural networks must have 5 parameters in their initialisation method: `nb_class`, `sensors_information`, `doy_sensors_dic`, `default_std` and `default_mean` as follows : .. code-block:: python class MyNeuralNetwork(Iota2NeuralNetwork): """MyNeuralNetwork class definition.""" def __init__( self, nb_class: int, sensors_information: dict, doy_sensors_dic: Optional[dict] = None, default_std: int = 1, default_mean: int = 0, ): .. note:: Every neural network must inherit from Iota2NeuralNetwork base class. nb_class -------- is an integer representing the number of class. sensors_information ------------------- is a dictionnary where each key is a sensor's name and the value is a dictionnary of two keys `nb_components` and `nb_dates` which allow users to know how many features per dates exists (total number of sensor's features = `nb_components` x `nb_dates`). doy_sensors_dic --------------- is an OrderedDict where each key is a sensor's name and the value is a dictionnary of two keys `doy` and `features_per_dates`. `features_per_dates` is currently redondant with the `nb_components` keys of `sensors_information` dictionnary. default_std and default_mean ---------------------------- Are used as substituion values in standardization computation : x = (features - mean) / std if mean or std are NaNs. Forward definition ================== It is via the forward method that iota2 will provide the network with data. The choice has been made to separate the data by sensor, so the forward method will have to contain a dedicated parameter for each activated sensor. A possible definition of the `forward` method could be as follows, if all the sensors in iota2 are used. .. code-block:: python class MyNeuralNetwork(Iota2NeuralNetwork): """MyNeuralNetwork class definition.""" def forward( self, sentinel1_desvv=None, sentinel1_desvh=None, sentinel1_ascvv=None, sentinel1_ascvh=None, sentinel2=None, sentinel2s2c=None, sentinel2l3a=None, landsat8=None, landsat8old=None, landsat5old=None ): It is also possible to feed the forward with exogenous data already written to disk :ref:`userfeatures ` or with external features calculated in a python module. In the case of userfeatures, a new parameter per userfeature must be added to the forward. External features are more versatile, allowing you to add primitives to an existing sensor, or create new temporal or non-temporal primitives. These different possibilities are illustrated in the next section using examples. For the examples, we will use Sentinel-2 data, 13 features (10 spectral bands + NDVI + NDWI + Brightness) and 3 dates. Sentinel-2 ---------- Let's start with a classic example: using Sentinel-2 data alone: 13 primitives and 3 dates. The forward can simply be written as .. code-block:: python class MyNeuralNetwork(Iota2NeuralNetwork): """MyNeuralNetwork class definition.""" ... def forward( self, sentinel2 ): In this case, the `sentinel2` tensor will have the shape (B, 3, 13) where B is the size of the data batch. Combining Sentinel-2 and two user features ------------------------------------------ Here, we want to use Sentinel-2 and exogenous data already written to disk, for example a DEM and temperature data. We will therefore use the `user_feat_path` field and the `userFeat` section of the configuration file as follows .. code-block:: python chain:{ ... s2_path : '/path/to/my_tiled/s2_data' user_feat_path : '/path/to/my_tiled/exogenous_data ... } ... userFeat:{ patterns:"dem,temperature" } The dem is a raster file written to disk and contains 3 bands, the temperature data is also a raster on disk but with a single band. The forward should then have the following definition: .. code-block:: python def forward( self, sentinel2, dem, temperature ): .. note:: the names of the `dem` and `temperature` parameters of the forward method must correspond to the patterns fields in the `userFeat` section In this case, the tensor `sentinel2` will always have the shape (B, 3, 13), the tensor `dem` will have the shape (B, 3) and `temperature` (B, 1). .. warning:: Tensors can have a different number of dimensions, the time dimension can disappear. Using external features ----------------------- External features can be used to create new features, which can be concatenated with an existing sensor (if temporal), be temporal or non-temporal. If, in addition to Sentinel-2 data (over 3 dates), the user uses 3 python functions to create new primitives as follows .. code-block:: python from iota2.learning.utils import I2Label, I2TemporalLabel def new_s2_features(self): coef = self.get_interpolated_Sentinel2_B3() ** 2 labels = [I2TemporalLabel(sensor_name="sentinel2", feat_name="pow", date=date) for date in self.interpolated_dates["Sentinel2"]] return coef, labels def new_temporal_features(self): coef = self.get_interpolated_Sentinel2_B2() + self.get_interpolated_Sentinel2_B3() labels = [I2TemporalLabel(sensor_name="myfeatures", feat_name="add", date=date) for date in self.interpolated_dates["Sentinel2"]] return coef, labels def new_feature(self): coef = self.get_interpolated_Sentinel2_B2()[:, :, 0:1] # get the two first date of Sentinel-2 B2. labels = [I2Label(sensor_name="newfeature", feat_name="b2date1"), I2Label(sensor_name="newfeature", feat_name="b2date2")] return coef, labels then the forward method must have the signature .. code-block:: python def forward( self, sentinel2, myfeatures, newfeature ): `sentinel2` tensor will have the shape (B, 3, 14), `myfeatures` the shape (B, 3, 1) and `newfeature` (B, 2). .. note:: sentinel2 tensor get an extra features because the function `new_s2_features` add the feature `pow` to the sensor `sentinel2` for every `sentinel2` dates.