iota2 classification
####################

Assuming ``iota2`` is :doc:`fully installed <HowToGetIOTA2>`, this chapter presents the main usage of ``iota2``: the production of a land cover map using satellite images time series.


Introduction to data
********************

``iota2`` handles several sensors images :

* Landsat 5 and 8 (old and new THEIA format)
* Sentinel 1, Sentinel 2 L2A(THEIA and Sen2cor), Sentinel L3A (Theia Format)
* Various other images already processed, with the ``userFeat`` sensor

In this chapter, only the use of Sentinel 2 L2A will be illustrated.
To use other sensor, it is necessary to adapt the inputs parameters according to :doc:`parameters descriptions <i2_classification_builder>`.

``iota2`` uses machine learning algorithm to produce land cover map. It requires, among others inputs, images and related reference data"
 
Get the data-set
================

.. include:: i2_tutorial_dataset_description.rst

.. Warning:: Each class must be represented in colorFile.txt and nomenclature.txt

Understand the configuration file
=================================

``iota2`` is configured through several of parameters, some of them are specfic to ``iota2`` and some belong to other libraries such as ``scikit-learn`` or ``OTB``.

These parameters allow to select the operations to be carried out and their various parameters. A documentation of all these parameters is provided :doc:`here <i2_classification_builder>`. The user defines these paramereters in a configuration file (a human readable text file) that is read by ``iota2`` at upon start. The file is structured into sections, each section containing several fields.

To simplify the use, ``iota2`` read a configuration file which is a simple text file containing sections and fields. A section (or group of fields) contains fields with similar purposes, for instance the section `chain` contain general information such as input data, and the output path and the section `arg_train` will contains informations about the classifier's parameters.

The minimal configuration file contains all required fields to produce a land cover map.

.. include:: examples/config_tutorial_classification.cfg
    :literal:


For an end user, launching ``iota2`` requires to fill correctly the configuration file.

In the above example, replace the ``XXXX`` by the path where the archive has been extracted.

Running the chain
=================

iota2 launch
************

The chain is launched with the following command line.

.. code-block:: console

    Iota2.py -config /XXXX/IOTA2_TESTS_DATA/config_tuto_classif.cfg -scheduler_type localCluster

First, the chain displays the list of all steps activated by the configuration file


.. include:: examples/steps_classification.txt
    :literal:

Once the processing start, a large amount of information will be printed, most of them concerning the dask-scheduler.

.. _logs:

Did it all go well?
===================

``iota2`` is packed with a logging system. Each step has its has its own log folder, available in the ``output_path/logs`` directory (see `logs` in :ref:`output-tree-structure`)
In these directories two kind of log can be found ``*_out.log`` and ``*_err.log``.
The error are compiled in "err" file and the standard output in "out" file.
With the scheduler dask, ``iota2`` go as far as possible while the data required for the next steps is available.
To simplify the error identification, an interactive graph is produced in a html page.
To open it, open the ``index.html`` file in ``html`` folder.
Nodes in graph can have three colors (red: error, blue: done, orange: not yielded).
By clicking on graph node, the corresponding log file is openned.

If despite all this information, the errors can not be identifiyed or solved, the ``iota2`` team can help all users.
The simplest way to ask help is to create an issue on `framagit <https://framagit.org/iota2-project/iota2/-/issues>`_ by adding the archive available in log directory.

.. _output-tree-structure:

Output tree structure
=====================

In this section, the ``iota2`` outputs available after a proper run are described.

.. raw:: html
   :file: interactive-tree-root.html

.. container:: interactive-tree-source

	* /XXXX/IOTA2_TEST_S2/IOTA2_Outputs/Results
		output folder
			output folder defined in config file `output_path`
		* ! classif
			per tile classification maps
				| Contains classification maps, for each tile and each region. They will be merged in the ``final`` directory.
			* Classif_T31TCJ_model_1_seed_0.tif
			* ! MASK
				* MASK_region_1_T31TCJ.tif
			* T31TCJ_model_1_confidence_seed_0.tif
			* tmpClassif
		* ! config_model
			empty

			* (empty)
		* ! dataAppVal
			desc
				| Shapefiles obtained after spliting reference data between learning and validation set according a ratio.
			* ! bymodels
				* (empty)
			* T31TCJ_seed_0_learn.sqlite
			* T31TCJ_seed_0_val.sqlite
		* ! dataRegion
			vector data split by region
				| When using eco-climatic region, contains the vector data split by region.
			* (empty)
		* ! dimRed
			desc
				| Contains features after dimensionality reduction. Empty if not activated.
			* (empty)
		* ! envelope
			shapefiles
				| Contains shapefiles, one for each tile.
				| Used to ensure tile priority, with no overlap.
			* T31TCJ.dbf
			* T31TCJ.prj
			* T31TCJ.shp
			* T31TCJ.shx
		* ! features
			useful information
				| for each tile, contains useful information
			* T31TCJ
				* ! tmp
					temporary folder
						| folder created temporarily during the chain execution
					* MaskCommunSL.dbf
					* MaskCommunSL.prj
					* MaskCommunSL.shp
						common scene
							| the common scene of all sensors for this tile.
					* MaskCommunSL.shx
					* MaskCommunSL.tif
					* Sentinel2L3A_T31TCJ_reference.tif
						reference image
							| the image, generated by ``iota2``, used for reprojecting data
					* Sentinel2L3A_T31TCJ_input_dates.txt
						list of dates
							| the list of date detected in ``s2_path`` for the current tile.
					* Sentinel2_T31TCJ_interpolation_dates.txt
				* CloudThreshold_0.dbf
				* CloudThreshold_0.prj
				* CloudThreshold_0.shp
					database used as mask
						| This database is used to mask training polygons according to a number of clear date. See :ref:`cloud_threshold` parameter
				* CloudThreshold_0.shx
				* nbView.tif
					number visits
						| number of time a pixel is seen in the whole time series (i.e., excluding clouds, shadows, staturation and no-data)
		* final
			final producs
				| This folder contains the final products of ``iota2``.
				| All final products will be generated in the ``final`` directory
				| see :ref:`final-products` for details
			* ! simplification
				* mosaic
				* tiles
				* tmp
				* vectors
			* ! TMP
				* ClassificationResults_seed_0.txt
				* Classif_Seed_0.csv
				* T31TCJ_Cloud.tif
				* T31TCJ_GlobalConfidence_seed_0.tif
				* T31TCJ_seed_0_CompRef.tif
				* T31TCJ_seed_0.csv
				* T31TCJ_seed_0.tif
			* Classif_Seed_0_ColorIndexed.tif
			* Classif_Seed_0.tif
			* Confidence_Seed_0.tif
			* Confusion_Matrix_Classif_Seed_0.png
			* diff_seed_0.tif
			* PixelsValidity.tif
			* RESULTS.txt
			* vectors
		* ! formattingVectors
			learning samples
				| The learning samples contained in each tiles.
				| Shapefiles in which pixel values from time series have been extracted.
			* ! T31TCJ
				temporary directory
					| This is a temporary working directory, intermediate files are (re)moved after step completion. 
				* (empty)
			* T31TCJ.cpg
			* T31TCJ.dbf
			* T31TCJ.prj
			* T31TCJ.shp
			* T31TCJ.shx
		* ! learningSamples
			learning samples
				| Sqlite file containing learning samples by regions.
				| Also contains a CSV file containing statistics about samples balance for each seed. See :ref:`tracing back samples <manual-outrates-concatenation>` to generate this file manually.
			* class_statistics_seed0_learn.csv
			* Samples_region_1_seed0_learn.sqlite
			* T31TCJ_region_1_seed0_Samples_learn.sqlite
		* ! logs
			logs
				| output logs of ``iota2``. See :ref:`logs` section for details
			* classification
				* classification_T31TCJ_model_1_seed_0.err
				* classification_T31TCJ_model_1_seed_0.out
			* CommonMasks
				* common_mask_T31TCJ.err
				* common_mask_T31TCJ.out
			* confusionCmd
				* confusion_T31TCJ_seed_0.err
				* confusion_T31TCJ_seed_0.out
			* confusionsMerge
				* merge_confusions.err
				* merge_confusions.out
			* Envelope
				* tiles_envelopes.err
				* tiles_envelopes.out
			* genRegionVector
				* region_generation.err
				* region_generation.out
			* ! html
				* configuration_file.html
				* environment_info.html
				* genindex.html
				* index.html
				* input_files_content.html
				* objects.inv
				* output_path_content.html
				* s2_path_content.html
				* search.html
				* searchindex.js
				* source
					* classification_T31TCJ_model_1_seed_0.out
					* common_mask_T31TCJ.out
					* configuration_file.rst
					* confusion_T31TCJ_seed_0.out
					* environment_info.rst
					* extraction_T31TCJ.out
					* final_report.out
					* index.rst
					* input_files_content.rst
					* learning_model_1_seed_0.out
					* merge_confusions.out
					* merge_model_1_seed_0_usually.out
					* merge_samples_T31TCJ.out
					* mosaic.out
					* output_path_content.rst
					* preprocessing_T31TCJ.out
					* region_generation.out
					* s2_path_content.rst
					* s_sel_model_1_seed_0.out
					* stats_1_S_0_T_T31TCJ.out
					* tasks_status_1.rst
					* tasks_status_2.rst
					* tiles_envelopes.out
					* validity_raster_T31TCJ.out
					* vector_form_T31TCJ.out
				* _sources
					* configuration_file.rst.txt
					* environment_info.rst.txt
					* index.rst.txt
					* input_files_content.rst.txt
					* output_path_content.rst.txt
					* s2_path_content.rst.txt
					* tasks_status_1.rst.txt
					* tasks_status_2.rst.txt
				* _static
					* basic.css
					* css
						* badge_only.css
						* fonts
							* fontawesome-webfont.eot
							* fontawesome-webfont.svg
							* fontawesome-webfont.ttf
							* fontawesome-webfont.woff
							* fontawesome-webfont.woff2
							* lato-bold-italic.woff
							* lato-bold-italic.woff2
							* lato-bold.woff
							* lato-bold.woff2
							* lato-normal-italic.woff
							* lato-normal-italic.woff2
							* lato-normal.woff
							* lato-normal.woff2
							* Roboto-Slab-Bold.woff
							* Roboto-Slab-Bold.woff2
							* Roboto-Slab-Regular.woff
							* Roboto-Slab-Regular.woff2
						* theme.css
					* doctools.js
					* documentation_options.js
					* file.png
					* jquery-3.5.1.js
					* jquery.js
					* js
						* badge_only.js
						* html5shiv.min.js
						* html5shiv-printshiv.min.js
						* theme.js
					* language_data.js
					* minus.png
					* plus.png
					* pygments.css
					* searchtools.js
					* underscore-1.3.1.js
					* underscore.js
				* tasks_status_1.html
					* tasks_status_2.html
			* learnModel
				* learning_model_1_seed_0.err
				* learning_model_1_seed_0.out
			* mosaic
				* mosaic.err
				* mosaic.out
			* PixelValidity
				* validity_raster_T31TCJ.err
				* validity_raster_T31TCJ.out
			* reportGeneration
				* final_report.err
				* final_report.out
			* samplesByModels
				* merge_model_1_seed_0_usually.err
				* merge_model_1_seed_0_usually.out
			* samplesByTiles
				* merge_samples_T31TCJ.err
				* merge_samples_T31TCJ.out
			* samplesExtraction
				* extraction_T31TCJ.err
				* extraction_T31TCJ.out
			* samplesMerge
				* merge_model_1_seed_0.err
				* merge_model_1_seed_0.out
			* samplingLearningPolygons
				* s_sel_model_1_seed_0.err
				* s_sel_model_1_seed_0.out
			* sensorsPreprocess
				* preprocessing_T31TCJ.err
				* preprocessing_T31TCJ.out
			* statsSamplesModel
				* stats_1_S_0_T_T31TCJ.err
				* stats_1_S_0_T_T31TCJ.out
			* tasks_status_1.svg
			* tasks_status_2.svg
			* VectorFormatting
				* vector_form_T31TCJ.err
				* vector_form_T31TCJ.out
		* ! model
			desc
				| The learned models
			* model_1_seed_0.txt
		* ! samplesSelection
			shapefiles
				| Shapefiles containing points (or pixels coordinates) selected for training stage.
				| Also contains a CSV summary of the actual number of samples per class
			* samples_region_1_seed_0.dbf
			* samples_region_1_seed_0_outrates.csv
			* samples_region_1_seed_0.prj
			* samples_region_1_seed_0_selection.sqlite
			* samples_region_1_seed_0.shp
			* samples_region_1_seed_0.shx
			* samples_region_1_seed_0.xml
			* T31TCJ_region_1_seed_0_stats.xml
			* T31TCJ_samples_region_1_seed_0_selection.sqlite
			* T31TCJ_selection_merge.sqlite
		* ! shapeRegion
			desc
				| Shapefiles indicating intersection between tiles and region.
			* MyRegion_region_1_T31TCJ.dbf
			* MyRegion_region_1_T31TCJ.prj
			* MyRegion_region_1_T31TCJ.shp
			* MyRegion_region_1_T31TCJ.shx
			* MyRegion_region_1_T31TCJ.tif
		* ! stats
			statistics
				| Optional xml statistics to standardize the data before learning (svm...).
			* (empty)
		* IOTA2_tasks_status.txt
			internal execution status
				| ``iota2`` keeps track of it's execution using this *pickle* file (not text) to be allowed to restart from the state where it stopped.
		* logs.zip
			logs archive
		* MyRegion.dbf
		* MyRegion.prj
		* MyRegion.shp
			fake region
				| When no ecoclimatic region is defined for learning step, ``iota2`` creates this fake file with a single region.
		* MyRegion.shx
		* reference_data.dbf
		* reference_data.prj
		* reference_data.shp
			reencoded shapefile
				| As OTB expects classes to be encoded as consecutive integers, which is not necessarily the case of user labels, this shapefile contains user data with reencoded labels.
		* reference_data.shx

.. _final-products:

Final products
==============

All final products will be generated in the ``final`` directory

Land cover map
--------------

Your *Classif_Seed_0_ColorIndexed.tif* should look like this one:

.. figure:: ./Images/classif_Example.jpg
    :scale: 15 %
    :align: center
    :alt: classification map
    
    Classif_Seed_0.tif Example


This map contains labels from the shapeFile ``groundTruth.shp``. As you can see the classification's quality is rather low.
A possible explanation is the low number of dates used to produce it. A raster called ``PixelsValidity.tif`` gives the number of dates for which the pixel is clear (no cloud, cloud shadow, saturation)

.. figure:: ./Images/PixVal_Example.png
    :scale: 50 %
    :align: center
    :alt: validity map
    
    PixelsValidity.tif Example

As only two dates are used to produce the classification map, pixels are in the [0; 2] range. ``iota2`` also provides a confidence map: ``Confidence_Seed_0.tif`` which
allows to better understand the resulting classification. This map gives for each pixel a scale between O and 100, where 0 and 100 meant that the probability membership provided by the classifier is 0 and 1, respectively. This is not a validation, just an estimate of the confidence in the decision of the classifier.

.. figure:: ./Images/confidence_example.jpg
    :scale: 63 %
    :align: center
    :alt: confidence map
    
    Confidence_Seed_0.tif Example

These three maps form ``iota2``'s main outputs: they are the minimum outputs required to analyse and understand the results.

We analyzed and produced classifications thanks to ``iota2``. The main objective is to get the better land cover map as possible. There are many ways to achieve this purpose: researchers publish every day new methods.

The simplest method to get better results can consist in using a longer time series, improving the reference data for training, etc. 


Measuring quality
-----------------

.. figure:: ./Images/Confusion_Matrix_Classif_Seed_0.jpeg
    :scale: 30 %
    :align: center
    :alt: confusion matrix
    
    Confusion_Matrix_Classif_Seed_0.png Example

Confusion matrices allow us to measure the quality of a classification. In the one provided by ``iota2`` (in the ``final`` output directory), the pixels whose labels are known (the reference) are in rows and the inferred pixels are in columns.

To go further
=============

.. toctree::
   :maxdepth: 1

	Advanced features <going_futher_i2_classification.rst>

.. raw:: html
   :file: interactive-tree.html