iota2 classification #################### Assuming ``iota2`` is :doc:`fully installed `, this chapter presents the main usage of ``iota2``: the production of a land cover map using satellite images time series. Introduction to data ******************** ``iota2`` handles several sensors images : * Landsat 5 and 8 (old and new THEIA format) * Sentinel 1, Sentinel 2 L2A(THEIA and Sen2cor), Sentinel L3A (Theia Format) * Various other images already processed, with the ``userFeat`` sensor In this chapter, only the use of Sentinel 2 L2A will be illustrated. To use other sensor, it is necessary to adapt the inputs parameters according to :doc:`parameters descriptions `. ``iota2`` uses machine learning algorithm to produce land cover map. It requires, among others inputs, images and related reference data" Get the data-set ================ .. include:: i2_tutorial_dataset_description.rst .. Warning:: Each class must be represented in colorFile.txt and nomenclature.txt Understand the configuration file ================================= ``iota2`` is configured through several of parameters, some of them are specfic to ``iota2`` and some belong to other libraries such as ``scikit-learn`` or ``OTB``. These parameters allow to select the operations to be carried out and their various parameters. A documentation of all these parameters is provided :doc:`here `. The user defines these paramereters in a configuration file (a human readable text file) that is read by ``iota2`` at upon start. The file is structured into sections, each section containing several fields. To simplify the use, ``iota2`` read a configuration file which is a simple text file containing sections and fields. A section (or group of fields) contains fields with similar purposes, for instance the section `chain` contain general information such as input data, and the output path and the section `arg_train` will contains informations about the classifier's parameters. The minimal configuration file contains all required fields to produce a land cover map. .. include:: examples/config_tutorial_classification.cfg :literal: For an end user, launching ``iota2`` requires to fill correctly the configuration file. In the above example, replace the ``XXXX`` by the path where the archive has been extracted. Running the chain ================= iota2 launch ************ The chain is launched with the following command line. .. code-block:: console Iota2.py -config /XXXX/IOTA2_TESTS_DATA/config_tuto_classif.cfg -scheduler_type localCluster First, the chain displays the list of all steps activated by the configuration file .. include:: examples/steps_classification.txt :literal: Once the processing start, a large amount of information will be printed, most of them concerning the dask-scheduler. .. _logs: Did it all go well? =================== ``iota2`` is packed with a logging system. Each step has its has its own log folder, available in the ``output_path/logs`` directory (see `logs` in :ref:`output-tree-structure`) In these directories two kind of log can be found ``*_out.log`` and ``*_err.log``. The error are compiled in "err" file and the standard output in "out" file. With the scheduler dask, ``iota2`` go as far as possible while the data required for the next steps is available. To simplify the error identification, an interactive graph is produced in a html page. To open it, open the ``index.html`` file in ``html`` folder. Nodes in graph can have three colors (red: error, blue: done, orange: not yielded). By clicking on graph node, the corresponding log file is openned. If despite all this information, the errors can not be identifiyed or solved, the ``iota2`` team can help all users. The simplest way to ask help is to create an issue on `framagit `_ by adding the archive available in log directory. .. _output-tree-structure: Output tree structure ===================== In this section, the ``iota2`` outputs available after a proper run are described. .. raw:: html :file: interactive-tree-root.html .. container:: interactive-tree-source * /XXXX/IOTA2_TEST_S2/IOTA2_Outputs/Results output folder output folder defined in config file `output_path` * ! classif per tile classification maps | Contains classification maps, for each tile and each region. They will be merged in the ``final`` directory. * Classif_T31TCJ_model_1_seed_0.tif * ! MASK * MASK_region_1_T31TCJ.tif * T31TCJ_model_1_confidence_seed_0.tif * tmpClassif * ! config_model empty * (empty) * ! dataAppVal desc | Shapefiles obtained after spliting reference data between learning and validation set according a ratio. * ! bymodels * (empty) * T31TCJ_seed_0_learn.sqlite * T31TCJ_seed_0_val.sqlite * ! dataRegion vector data split by region | When using eco-climatic region, contains the vector data split by region. * (empty) * ! dimRed desc | Contains features after dimensionality reduction. Empty if not activated. * (empty) * ! envelope shapefiles | Contains shapefiles, one for each tile. | Used to ensure tile priority, with no overlap. * T31TCJ.dbf * T31TCJ.prj * T31TCJ.shp * T31TCJ.shx * ! features useful information | for each tile, contains useful information * T31TCJ * ! tmp temporary folder | folder created temporarily during the chain execution * MaskCommunSL.dbf * MaskCommunSL.prj * MaskCommunSL.shp common scene | the common scene of all sensors for this tile. * MaskCommunSL.shx * MaskCommunSL.tif * Sentinel2L3A_T31TCJ_reference.tif reference image | the image, generated by ``iota2``, used for reprojecting data * Sentinel2L3A_T31TCJ_input_dates.txt list of dates | the list of date detected in ``s2_path`` for the current tile. * Sentinel2_T31TCJ_interpolation_dates.txt * CloudThreshold_0.dbf * CloudThreshold_0.prj * CloudThreshold_0.shp database used as mask | This database is used to mask training polygons according to a number of clear date. See :ref:`cloud_threshold` parameter * CloudThreshold_0.shx * nbView.tif number visits | number of time a pixel is seen in the whole time series (i.e., excluding clouds, shadows, staturation and no-data) * final final producs | This folder contains the final products of ``iota2``. | All final products will be generated in the ``final`` directory | see :ref:`final-products` for details * ! simplification * mosaic * tiles * tmp * vectors * ! TMP * ClassificationResults_seed_0.txt * Classif_Seed_0.csv * T31TCJ_Cloud.tif * T31TCJ_GlobalConfidence_seed_0.tif * T31TCJ_seed_0_CompRef.tif * T31TCJ_seed_0.csv * T31TCJ_seed_0.tif * Classif_Seed_0_ColorIndexed.tif * Classif_Seed_0.tif * Confidence_Seed_0.tif * Confusion_Matrix_Classif_Seed_0.png * diff_seed_0.tif * PixelsValidity.tif * RESULTS.txt * vectors * ! formattingVectors learning samples | The learning samples contained in each tiles. | Shapefiles in which pixel values from time series have been extracted. * ! T31TCJ temporary directory | This is a temporary working directory, intermediate files are (re)moved after step completion. * (empty) * T31TCJ.cpg * T31TCJ.dbf * T31TCJ.prj * T31TCJ.shp * T31TCJ.shx * ! learningSamples learning samples | Sqlite file containing learning samples by regions. | Also contains a CSV file containing statistics about samples balance for each seed. See :ref:`tracing back samples ` to generate this file manually. * class_statistics_seed0_learn.csv * Samples_region_1_seed0_learn.sqlite * T31TCJ_region_1_seed0_Samples_learn.sqlite * ! logs logs | output logs of ``iota2``. See :ref:`logs` section for details * classification * classification_T31TCJ_model_1_seed_0.err * classification_T31TCJ_model_1_seed_0.out * CommonMasks * common_mask_T31TCJ.err * common_mask_T31TCJ.out * confusionCmd * confusion_T31TCJ_seed_0.err * confusion_T31TCJ_seed_0.out * confusionsMerge * merge_confusions.err * merge_confusions.out * Envelope * tiles_envelopes.err * tiles_envelopes.out * genRegionVector * region_generation.err * region_generation.out * ! html * configuration_file.html * environment_info.html * genindex.html * index.html * input_files_content.html * objects.inv * output_path_content.html * s2_path_content.html * search.html * searchindex.js * source * classification_T31TCJ_model_1_seed_0.out * common_mask_T31TCJ.out * configuration_file.rst * confusion_T31TCJ_seed_0.out * environment_info.rst * extraction_T31TCJ.out * final_report.out * index.rst * input_files_content.rst * learning_model_1_seed_0.out * merge_confusions.out * merge_model_1_seed_0_usually.out * merge_samples_T31TCJ.out * mosaic.out * output_path_content.rst * preprocessing_T31TCJ.out * region_generation.out * s2_path_content.rst * s_sel_model_1_seed_0.out * stats_1_S_0_T_T31TCJ.out * tasks_status_1.rst * tasks_status_2.rst * tiles_envelopes.out * validity_raster_T31TCJ.out * vector_form_T31TCJ.out * _sources * configuration_file.rst.txt * environment_info.rst.txt * index.rst.txt * input_files_content.rst.txt * output_path_content.rst.txt * s2_path_content.rst.txt * tasks_status_1.rst.txt * tasks_status_2.rst.txt * _static * basic.css * css * badge_only.css * fonts * fontawesome-webfont.eot * fontawesome-webfont.svg * fontawesome-webfont.ttf * fontawesome-webfont.woff * fontawesome-webfont.woff2 * lato-bold-italic.woff * lato-bold-italic.woff2 * lato-bold.woff * lato-bold.woff2 * lato-normal-italic.woff * lato-normal-italic.woff2 * lato-normal.woff * lato-normal.woff2 * Roboto-Slab-Bold.woff * Roboto-Slab-Bold.woff2 * Roboto-Slab-Regular.woff * Roboto-Slab-Regular.woff2 * theme.css * doctools.js * documentation_options.js * file.png * jquery-3.5.1.js * jquery.js * js * badge_only.js * html5shiv.min.js * html5shiv-printshiv.min.js * theme.js * language_data.js * minus.png * plus.png * pygments.css * searchtools.js * underscore-1.3.1.js * underscore.js * tasks_status_1.html * tasks_status_2.html * learnModel * learning_model_1_seed_0.err * learning_model_1_seed_0.out * mosaic * mosaic.err * mosaic.out * PixelValidity * validity_raster_T31TCJ.err * validity_raster_T31TCJ.out * reportGeneration * final_report.err * final_report.out * samplesByModels * merge_model_1_seed_0_usually.err * merge_model_1_seed_0_usually.out * samplesByTiles * merge_samples_T31TCJ.err * merge_samples_T31TCJ.out * samplesExtraction * extraction_T31TCJ.err * extraction_T31TCJ.out * samplesMerge * merge_model_1_seed_0.err * merge_model_1_seed_0.out * samplingLearningPolygons * s_sel_model_1_seed_0.err * s_sel_model_1_seed_0.out * sensorsPreprocess * preprocessing_T31TCJ.err * preprocessing_T31TCJ.out * statsSamplesModel * stats_1_S_0_T_T31TCJ.err * stats_1_S_0_T_T31TCJ.out * tasks_status_1.svg * tasks_status_2.svg * VectorFormatting * vector_form_T31TCJ.err * vector_form_T31TCJ.out * ! model desc | The learned models * model_1_seed_0.txt * ! samplesSelection shapefiles | Shapefiles containing points (or pixels coordinates) selected for training stage. | Also contains a CSV summary of the actual number of samples per class * samples_region_1_seed_0.dbf * samples_region_1_seed_0_outrates.csv * samples_region_1_seed_0.prj * samples_region_1_seed_0_selection.sqlite * samples_region_1_seed_0.shp * samples_region_1_seed_0.shx * samples_region_1_seed_0.xml * T31TCJ_region_1_seed_0_stats.xml * T31TCJ_samples_region_1_seed_0_selection.sqlite * T31TCJ_selection_merge.sqlite * ! shapeRegion desc | Shapefiles indicating intersection between tiles and region. * MyRegion_region_1_T31TCJ.dbf * MyRegion_region_1_T31TCJ.prj * MyRegion_region_1_T31TCJ.shp * MyRegion_region_1_T31TCJ.shx * MyRegion_region_1_T31TCJ.tif * ! stats statistics | Optional xml statistics to standardize the data before learning (svm...). * (empty) * IOTA2_tasks_status.txt internal execution status | ``iota2`` keeps track of it's execution using this *pickle* file (not text) to be allowed to restart from the state where it stopped. * logs.zip logs archive * MyRegion.dbf * MyRegion.prj * MyRegion.shp fake region | When no ecoclimatic region is defined for learning step, ``iota2`` creates this fake file with a single region. * MyRegion.shx * reference_data.dbf * reference_data.prj * reference_data.shp reencoded shapefile | As OTB expects classes to be encoded as consecutive integers, which is not necessarily the case of user labels, this shapefile contains user data with reencoded labels. * reference_data.shx .. _final-products: Final products ============== All final products will be generated in the ``final`` directory Land cover map -------------- Your *Classif_Seed_0_ColorIndexed.tif* should look like this one: .. figure:: ./Images/classif_Example.jpg :scale: 15 % :align: center :alt: classification map Classif_Seed_0.tif Example This map contains labels from the shapeFile ``groundTruth.shp``. As you can see the classification's quality is rather low. A possible explanation is the low number of dates used to produce it. A raster called ``PixelsValidity.tif`` gives the number of dates for which the pixel is clear (no cloud, cloud shadow, saturation) .. figure:: ./Images/PixVal_Example.png :scale: 50 % :align: center :alt: validity map PixelsValidity.tif Example As only two dates are used to produce the classification map, pixels are in the [0; 2] range. ``iota2`` also provides a confidence map: ``Confidence_Seed_0.tif`` which allows to better understand the resulting classification. This map gives for each pixel a scale between O and 100, where 0 and 100 meant that the probability membership provided by the classifier is 0 and 1, respectively. This is not a validation, just an estimate of the confidence in the decision of the classifier. .. figure:: ./Images/confidence_example.jpg :scale: 63 % :align: center :alt: confidence map Confidence_Seed_0.tif Example These three maps form ``iota2``'s main outputs: they are the minimum outputs required to analyse and understand the results. We analyzed and produced classifications thanks to ``iota2``. The main objective is to get the better land cover map as possible. There are many ways to achieve this purpose: researchers publish every day new methods. The simplest method to get better results can consist in using a longer time series, improving the reference data for training, etc. Measuring quality ----------------- .. figure:: ./Images/Confusion_Matrix_Classif_Seed_0.jpeg :scale: 30 % :align: center :alt: confusion matrix Confusion_Matrix_Classif_Seed_0.png Example Confusion matrices allow us to measure the quality of a classification. In the one provided by ``iota2`` (in the ``final`` output directory), the pixels whose labels are known (the reference) are in rows and the inferred pixels are in columns. To go further ============= .. toctree:: :maxdepth: 1 Advanced features .. raw:: html :file: interactive-tree.html