Deep Learning in iota2
Among the list of possible classification algorithms, iota2 also offers the possibility to use deep neural networks. To date, only networks that work on pixel time series can be used (i.e. no spatial/2D convolution). This documentation summarizes the parameters available to users and their meaning through examples. It also discusses the chain outputs and the development choices that have been made.
Parameters involved
All the parameters below must be inside the deep_learning_parameters
section, which is itself inside the arg_train
section of the iota2 configuration file.
Note
Once the parameter deep_learning_parameters.dl_name is provided, iota2 will try to use the deepLearning workflow
chain:{
# usual iota2 parameters
}
arg_train:{
deep_learning_parameters:{
...
# place here deep learning algorithm parameters
...
}
}
Name |
Default Value |
Description |
Type |
Mandatory |
Name |
---|---|---|---|---|---|
Available neural network’s architecture (class name), currently : ‘LTAEClassifier’, ‘ANN’, ‘MLPClassifier’ or ‘SimpleSelfAttentionClassifier’ |
str |
True when using neural networks |
dl_name |
||
dl_parameters |
{} |
Set of key/value to create the neural network instance (constructor parameters). |
dict |
False |
dl_parameters |
“loss” |
Select the model which maximizes one of these metrics computed on the validation set during the training process: “loss”, “fscore”, “oa”, “kappa” |
str |
False |
model_selection_criterion |
|
epochs |
100 |
number of epochs for the learning stage |
int |
False |
epochs |
False |
apply weights to samples according to the proportion of each class in the computation of the loss function |
bool |
False |
weighted_labels |
|
1 |
how many sub-processes to use for data loading. 0 means that the data will be loaded in the main process. |
int |
False |
num_workers |
|
{“batch_size”: [1000], “learning_rate”: [0.00001]} |
key/value of hyperparameters to use to build models |
dict |
False |
hyperparameters_solver |
|
None |
path to a user python module containing custom neural networks |
str |
False |
dl_module |
|
True |
if existing, restart learning point from the checkpoint |
bool |
False |
restart_from_checkpoint |
|
dataloader_mode |
‘stream’ |
during the learning stage, load the full data-set into memory (‘full’) or by batch (‘stream’) |
str |
False |
dataloader_mode |
enable_early_stop |
False |
flag to enable early stop during learning phase |
bool |
False |
enable_early_stop |
epoch_to_trigger |
5 |
epoch number after which the monitoring of the metric trend starts |
int |
False |
epoch_to_trigger |
early_stop_patience |
10 |
number of epochs without improvement after which training will be stopped |
int |
False |
early_stop_patience |
early_stop_tol |
0.01 |
minimum change in the monitored quantity to qualify as an improvement. If metric is ‘train_loss’ or ‘valid_loss’ then tol must be in dB as \(dB = \log_{10}(\frac{loss_{N-1}}{loss_{N}})\) with N the current epoch. |
float |
False |
early_stop_tol |
early_stop_metric |
“val_loss” |
metric to monitor for early stopping |
str |
False |
early_stop_metric |
additional_statistics_percentage` |
None |
percentage ]0;1] of samples to use from the incoming database to compute quantiles |
float |
False |
additional_statistics_percentage |
{} |
allow the use of adaptive learning rate across epochs |
dict |
False |
adaptive_lr |
Note
dl_name
Neural network architectures available in iota2 are defined in the python module torch_nn_bank.py
model_selection_criterion
During the learning step, several metrics can be computed on a validation set to evaluate the model. The optimized loss metric quantifies the fit of the model on the training sample, but iota2 also computes metrics such as the OA, Kappa and F1-score on the validation sample.
For each epoch, models maximizing each of these metrics are saved. When the learning phase ends, iota2 will use for the inference the model that maximizes the metric chosen by the user.
weighted_labels
Weights can be assigned to samples w.r.t their class membership when computing the loss function during the learning step. These weights are computed using only the training + validation database and correspond to the inverse of their proportion in the database. For example, if the database contains 2 classes, 1 and 2, 80% of the samples belonging to class 1 and 20% to class 2. The weights will then be 1.25 for samples from class 1 (1 / 0.8) and 5 for samples from class 2 (1 / 0.2).
num_workers
During the learning phase, the model is optimized iteratively using stochastic gradient descent. For each epoch, the model is optimized with a subset of the database (i.e., batches). The number of workers corresponds to the number of tasks that prepare in parallel the batched data. Each worker will provide the data it has collected to the model and then check another batch of data until the database has been fully read. Therefore, the more workers there are available to read the batches, the faster the model will be optimized. However, it is the user responsibility to set the number of workers accordingly to the amount of available RAM.
hyperparameters
Hyperparameters are parameters that influence the learning process but cannot be learned. In iota2, it possible to test various values within the same run for 2 hyperparameters. This is done via a dictionary which contains 2 keys (“batch_size” and “learning_rate”) and values to be used as a list. The product of the lists will constitute the number of models to be learned and then the best of them will be used for inference (cf the model_selection_criterion parameter).
For example, if the configuration file contains :
chain:{
# usual iota2 parameters
}
arg_train:{
deep_learning_parameters:{
...
hyperparameters_solver : {"batch_size" : [1000],
"learning_rate" : [0.1, 0.00001]}
...
}
}
Then two models will be trained (in parallel if possible) one with a batch’s size of 1000 and a learning rate of 0.1; and an other one with the same batch size but with a learning rate of 0.00001.
dl_module
Users can define their own neural network via this parameter which should point to a user provided python module. However, the neural network must be defined as a class derived from Iota2NeuralNetwork
available in the module torch_nn_bank.py.
Currently, iota2 can only perform pixel-wise operations, since the input to the model are the spectro-temporal features for pixels. Convolutional layers can be used in the spectral and temporal dimensions, but not in the spatial dimension.
restart_from_checkpoint
The learning phase can be quite long. If for some reason the learning stops, everything that has been learned is lost. However iota2 integrates the possibility to restart the learning step from the last learned epoch, with a backup of the model state being made at each epoch.
adaptive_lr
Adaptive learning rate allows the use of ReduceLROnPlateau from Pytorch which reduces the learning rate when a metric has stopped improving. The metric monitored in iota2 is the validation loss
.
adaptive_lr
parameter will receive a dictionary where keys are ReduceLROnPlateau parameter’s name and the value is the value of the parameter.
ie, the configuration file can contain:
chain:{
# usual iota2 parameters
}
arg_train:{
deep_learning_parameters:{
...
adaptive_lr : {patience: 10,
factor: 0.1,
threshold: 1e-4,
threshold_mode: 'rel',
cooldown: 0,
min_lr: 0,
eps: 1e-8,
verbose: False}# which are default values for all parameter's key
...
}
}
Parameters mode
and optimizer
can not be set by users. mode
is forced to min
and the optimizer will be the one used during the training step. By default, adaptive_lr
is {}
and therefore no adapatative learning rate will be used.
Expected output descriptions
Models
One model per hyperparameter pairs
Each pair of hyperparameters produces a model file in the model
output directory. For example, for the first possible pair of hyperparameters, the file model_1_seed_0_hyp_0.txt
will be produced. The name of the file is defined as:
model_1 : the model for the region 1
seed_0 : for the first random split
hyp_0 : for the first hyperparameter pair
Selected model
Once all the models per pair of hyperparameters are learned, the one that provides the best result according to the selection criterion (cf model_selection_criterion) is selected. The result is stored on disk under a serialized object using pickle: model_1_seed_0.txt
. It will be used later in the inference phase.
Checkpoints
After each epoch, all the information needed to restart from the current epoch is stored in a serialized object in the model
directory in a file. For example, for the model model_1_seed_0_hyp_0.txt
the equivalent checkpoint file would be model_1_seed_0_hyp_0_checkpoint.txt
.
Plots
To visualize the evolution of the model loss function, 2 figures are generated for each model, next to the models. For example for the model model_1_seed_0_hyp_0.txt
model_1_seed_0_hyp_0_loss.png
: shows the evolution of the learning and validation loss (dB) over the epochsmodel_1_seed_0_hyp_0_confusion_metrics.png
: shows the evolution of the Kappa, OA, Precision and the Recall over the epochs.
Classifications
The classifications maps are stored in a conventional tif
format in the classif
output directory. First, chunks of tiles are classified, then all these pieces will be merged to form one tile.
iota2 internal choices
Pytorch
We have decided to use pytorch to implement deep neural networks in iota2.
Classification by chunks
As mentioned above, deep learning classifications are done in chunks, to fit RAM constraints. In this workflow, iota2 works with numpy arrays that need to be stored temporarily in RAM, but few machines have enough RAM to hold a whole Sentinel-2 tile with many acquisition dates at the same time. This is why we work in chunks to make the predictions and then merge these predictions.
The size of the chunks can be set via the number_of_chunks
parameters in the parameter block python_data_managing
The shape of the tensor of data
Every model will be fed with a tensor of data shaped as (batch_size, nb_dates, nb_bands)
Learning vs validation vs test
In a conventional deep learning approach, the initial database is split into 3 distinct data-sets:
Learning: which is used to train the model (let’s call it
L
)Validation : which allows, during the learning process, to observe the behavior of the model (convergence, over-fitting etc.), let’s call this
V
. These observations may, for example, allow a readjustment of some hyperparametersTest : allows the performance of the model to be validated on a larger database than the validation database, let’s call this
T
.
How are the samples distributed in these three databases?
Initially the parameter ratio allows us to build a database that will contain L
+ V
on one side and T
on the other side. Then at the time of training, 80% of the L
+ V
is used to build the model (L
) and 20% to build the validation database (V
).
For example, if the configuration file contains :
chain:{
...
ratio : 0.7
...
}
Then 30% of the database will go into the test database and 70% will be set aside to build the learning and validation databases. Then 80% of the 70% will be used to build the training database and 20% of the 70% for validation. In iota2, these splits are made ‘polygon wise’ i.e. a polygon is placed in one of the databases in its entirety and cannot be found in another database (unless the class is represented by a single polygon).
Inheritance of iota2 neural network
As already explained for the parameter dl_module, all classes implementing neural networks in iota2 must derive from the class Iota2NeuralNetwork
defined in the module torch_nn_bank.py. This class allows insertion into the iota2 workflow.
Database format
The format of the input database used in the learning phase is the NETCDF format. This database is stored in the learningSamples
directory under the name Samples_region_1_seed0_learn.nc
(for model 1 representing region 1) which contains both the learning and validation database. However, the user does not need to do anything special, since this format is generated by iota2 from the user-provided reference data which is the same as for other classifiers.
GPU vs CPU
GPUs are automatically detected by Pytorch. When a GPU is detected, learning and inference will use it.
How to use GPUs ?
As mentioned before if a GPUs is detected, computations and data will be transferred to it. However, with the scheduler_type set to PBS the task will spawn on a specific dedicated node means that iota2 must allocate a GPU before to send tasks to it. This allocation can be done by specifying a dedicated queue
and nb_gpu
in the step resources block as the following
training : {
name:"training"
nb_cpu:10
ram:"92gb"
walltime:"12:00:00"
nb_gpu:1 # number of GPUs to use
queue:"qgpgpu" # queue containing GPUs
}
Dataloader per batch vs full memory
Loading the full learning/validation data-set into RAM may significantly decrease the learning time. However, this is not always possible depending on the amount of RAM available on the processing unit. In this case the stream
mode must be used. In this mode the database will be read in batch-sized chunks.
Managing randomness during the learning step
Random mixing of data in batches between epochs is crucial to obtain an optimal stochastic gradient descent. In iota2, randomness is managed at several levels
When splitting samples in the Learning, Validation and Test databases. These distributions are made ‘polygon wise’.
When allocating the content of the data batches to feed the model.
The order of the batches.
Moreover, at each epoch, the content of each batch is again randomly distributed.
Cost function and gradient optimizer
Currently, users can’t choose them.
The cost function used is torch.nn.functional.cross_entropy with default values.
The gradient optimizer is torch.optim.Adam with betas=(0.9, 0.999), eps=1e-08, weight_decay=0 and amsgrad=False.
Using statistics to alter incoming data
All neural network instances have a _stats attribute which provides statistics for each sensor encountered, for instance considering the Sentinel2 data provided by THEIA :
self._stats = {'sentinel2': {'min': tensor([-0.0100, ..., 0.0000]),
'max': tensor([-0.0100, ..., 0.0000,]),
'mean': tensor([-0.01, ..., 0.]),
'var': tensor([-0.01, ..., 0.]),
'quantile_0.1': tensor([-0.01, ..., 0.)],
'quantile_0.5': tensor([-0.01, ..., 0.]),
'quantile_0.95': tensor([-0.01, ..., 0.])}}
Statistics are shaped as (nb component * nb_dates) and chronologically sorted. For instance if we consider 2 Sentinel-2 acquisitions d1 and d2 and all bands available in iota2 (b2, b3, b4, b5, b6, b7, b8, b8A, b11 and b12) then one stat vector can be
self._stats = {'sentinel2': {'min': tensor([d1_b1, ..., d1_b12, d2_b1, d2_b12]),...}}
Available sensors in self._stats are sentinel1_desvv, sentinel1_desvh, sentinel1_ascvv, sentinel1_ascvh, sentinel2, sentinel2s2c, sentinel2l3a, landsat8, landsat8old and landsat5old. Keys for stats are the ones already presented : min, max, mean, var, quantile_0.1, quantile_0.5 (median) and quantile_0.95. The statistics are automatically computed except for the quantiles which are only computed if the parameter additional_statistics_percentage is set to a value different from None. It is possible to use these statistics to scale data in the forward method. Iota2 provides the method self.standardize(x, mean, std, self.default_mean, self.default_std) where x is the input data and where mean and std are the empirical mean and std values for each feature.
In some cases, data may contain NaN values. These specific values can cause the neural network to crash. Iota2 offers the possibility to impute such values using self.nans_imputation(x, mean, self.default_mean). The method replaces NaNs (in x) with consistent values (ie: using the empirical mean value of the corresponding feature). Please note that the x shape is (batch_size, nb_dates, nb_components) and mean shape is (nb_dates * nb_components).
In some conditions, empirical statistic values can also contain NaNs values. That’s why self.nans_imputation() and self.standardize() accepts default values to replace NaNs in the statistics vector.
The following code block shows how to perform imputations and standardization
def _forward(self, x):
mean = self._stats["sentinel2"]["mean"]
x = self.nans_imputation(x, mean, self.default_mean)
std = torch.sqrt(self._stats["sentinel2"]["var"])
x = self.standardize(x, mean, std, self.default_mean, self.default_std)
x = F.relu(x)
x = F.relu(self.bnhidden2(self.hidden1(x)))
x = F.relu(self.bnout(self.hidden2(x)))
x = self.output(x)
return x
About neural network definition
The neural model and iota2 interact at two points: when the model is instantiated and when iota2 feeds the model with data. These two interactions involve calling the __init__() method and the forward method of the neural network object respectively. In order for these interactions to proceed correctly, a formalism must be defined. This section details the expected signatures for these 2 methods.
Constructor parameters
Iota2 will automatically pass a series of parameters to the networks when they are instantiated, so they must exist in the constructor. As a minimum, all neural networks must have 5 parameters in their initialisation method: nb_class, sensors_information, doy_sensors_dic, default_std and default_mean as follows :
class MyNeuralNetwork(Iota2NeuralNetwork):
"""MyNeuralNetwork class definition."""
def __init__(
self,
nb_class: int,
sensors_information: dict,
doy_sensors_dic: Optional[dict] = None,
default_std: int = 1,
default_mean: int = 0,
):
Note
Every neural network must inherit from Iota2NeuralNetwork base class.
nb_class
is an integer representing the number of class.
sensors_information
is a dictionnary where each key is a sensor’s name and the value is a dictionnary of two keys nb_components and nb_dates which allow users to know how many features per dates exists (total number of sensor’s features = nb_components x nb_dates).
doy_sensors_dic
is an OrderedDict where each key is a sensor’s name and the value is a dictionnary of two keys doy and features_per_dates. features_per_dates is currently redondant with the nb_components keys of sensors_information dictionnary.
default_std and default_mean
Are used as substituion values in standardization computation : x = (features - mean) / std if mean or std are NaNs.
Forward definition
It is via the forward method that iota2 will provide the network with data. The choice has been made to separate the data by sensor, so the forward method will have to contain a dedicated parameter for each activated sensor. A possible definition of the forward method could be as follows, if all the sensors in iota2 are used.
class MyNeuralNetwork(Iota2NeuralNetwork):
"""MyNeuralNetwork class definition."""
def forward(
self,
sentinel1_desvv=None,
sentinel1_desvh=None,
sentinel1_ascvv=None,
sentinel1_ascvh=None,
sentinel2=None,
sentinel2s2c=None,
sentinel2l3a=None,
landsat8=None,
landsat8old=None,
landsat5old=None
):
It is also possible to feed the forward with exogenous data already written to disk userfeatures or with external features calculated in a python module. In the case of userfeatures, a new parameter per userfeature must be added to the forward. External features are more versatile, allowing you to add primitives to an existing sensor, or create new temporal or non-temporal primitives. These different possibilities are illustrated in the next section using examples.
For the examples, we will use Sentinel-2 data, 13 features (10 spectral bands + NDVI + NDWI + Brightness) and 3 dates.
Sentinel-2
Let’s start with a classic example: using Sentinel-2 data alone: 13 primitives and 3 dates.
The forward can simply be written as
class MyNeuralNetwork(Iota2NeuralNetwork):
"""MyNeuralNetwork class definition."""
...
def forward(
self,
sentinel2
):
In this case, the sentinel2 tensor will have the shape (B, 3, 13) where B is the size of the data batch.
Combining Sentinel-2 and two user features
Here, we want to use Sentinel-2 and exogenous data already written to disk, for example a DEM and temperature data. We will therefore use the user_feat_path field and the userFeat section of the configuration file as follows
chain:{
...
s2_path : '/path/to/my_tiled/s2_data'
user_feat_path : '/path/to/my_tiled/exogenous_data
...
}
...
userFeat:{
patterns:"dem,temperature"
}
The dem is a raster file written to disk and contains 3 bands, the temperature data is also a raster on disk but with a single band. The forward should then have the following definition:
def forward(
self,
sentinel2,
dem,
temperature
):
Note
the names of the dem and temperature parameters of the forward method must correspond to the patterns fields in the userFeat section
In this case, the tensor sentinel2 will always have the shape (B, 3, 13), the tensor dem will have the shape (B, 3) and temperature (B, 1).
Warning
Tensors can have a different number of dimensions, the time dimension can disappear.
Using external features
External features can be used to create new features, which can be concatenated with an existing sensor (if temporal), be temporal or non-temporal.
If, in addition to Sentinel-2 data (over 3 dates), the user uses 3 python functions to create new primitives as follows
from iota2.learning.utils import I2Label, I2TemporalLabel
def new_s2_features(self):
coef = self.get_interpolated_Sentinel2_B3() ** 2
labels = [I2TemporalLabel(sensor_name="sentinel2", feat_name="pow", date=date)
for date in self.interpolated_dates["Sentinel2"]]
return coef, labels
def new_temporal_features(self):
coef = self.get_interpolated_Sentinel2_B2() + self.get_interpolated_Sentinel2_B3()
labels = [I2TemporalLabel(sensor_name="myfeatures", feat_name="add", date=date)
for date in self.interpolated_dates["Sentinel2"]]
return coef, labels
def new_feature(self):
coef = self.get_interpolated_Sentinel2_B2()[:, :, 0:1] # get the two first date of Sentinel-2 B2.
labels = [I2Label(sensor_name="newfeature", feat_name="b2date1"),
I2Label(sensor_name="newfeature", feat_name="b2date2")]
return coef, labels
then the forward method must have the signature
def forward(
self,
sentinel2,
myfeatures,
newfeature
):
sentinel2 tensor will have the shape (B, 3, 14), myfeatures the shape (B, 3, 1) and newfeature (B, 2).
Note
sentinel2 tensor get an extra features because the function new_s2_features add the feature pow to the sensor sentinel2 for every sentinel2 dates.