iota2 classification #################### Assuming iota2 is :doc:`fully installed `, this chapter presents the main usage of iota2: the production of a land cover map using satellite images time series. Introduction to data ******************** iota2 handles several sensors images : * Landsat 5 and 8 (old and new THEIA format) * Sentinel 1, Sentinel 2 L2A(THEIA and Sen2cor), Sentinel L3A (Theia Format) * Various other images already processed, with the ``userFeat`` sensor In this chapter, only the use of Sentinel 2 L2A will be illustrated. To use other sensor, it is necessary to adapt the inputs parameters according to :doc:`parameters descriptions `. Iota2 uses machine learning algorithm to produce land cover map. It requires, among others inputs, images and related reference data" Get the data-set ================ .. include:: i2_tutorial_dataset_description.rst .. Warning:: Each class must be represented in colorFile.txt and nomenclature.txt Understand the configuration file ================================= iota2 exploits hundreds of parameters, some of them are specfic to iota2 and some are coming from other libraries such as scikit-learn or OTB. These parameters allows to select the treatments to be carried out and their various parameters. A documentation of all these parameters is provided :doc:`here `. User defines these paramereters in a configuration file (a human readable text file) that is read by iota2 at its start. The file is structured into Sections, each section containing various field. To simplify the use, iota2 read a configuration file which is a simple text file containing sections and fields. A section (or group of fields) contains fields with similar purposes, for instance the section `chain` contain general information such as input data, and the output path and the section `arg_train` will contains informations about the classifier's parameters. The minimal configuration file contains all required fields to produce a land cover map. .. include:: examples/config_tutorial_classification.cfg :literal: For an end user, launching iota2 requires to fill correctly the configuration file. In the above example, replace the ``XXXX`` by the path where the archive has been extracted. Running the chain ================= iota2 launch ************ The chain is launched with the following command line. .. code-block:: console Iota2.py -config /XXXX/IOTA2_TESTS_DATA/config_tuto_classif.cfg -scheduler_type localCluster First, the chain displays the list of all steps activated by the configuration file .. include:: examples/steps_classification.txt :literal: Once the processing start, a large amount of information will be printed, most of them concerning the dask-scheduler. .. _logs: Did it all go well? =================== Iota2 is packed with a logging system. Each step has its has its own log folder, available in the ``output_path/logs`` directory (see `logs` in :ref:`output-tree-structure`) In these directories two kind of log can be found ``*_out.log`` and ``*_err.log``. The error are compiled in "err" file and the standard output in "out" file. With the scheduler dask, iota2 go as far as possible while the data required for the next steps is available. To simplify the error identification, an interactive graph is produced in a html page. To open it, open the ``index.html`` file in ``html`` folder. Nodes in graph can have three colors (red: error, blue: done, orange: not yielded). By clicking on graph node, the corresponding log file is openned. If despite all this information, the errors can not be identifiyed or solved, the iota2 can help all users. The simplest way to ask help is to create an issue on `framagit `_ by adding the archive available in log directory. .. _output-tree-structure: Output tree structure ===================== In this section, the iota2 outputs available after a proper run are described. .. raw:: html :file: interactive-tree-root.html .. container:: interactive-tree-source * /XXXX/IOTA2_TEST_S2/IOTA2_Outputs/Results output folder output folder defined in config file `output_path` * ! classif per tile classification maps | Contains classification maps, for each tile and each region. They will be merged in the ``final`` directory. * Classif_T31TCJ_model_1_seed_0.tif * ! MASK * MASK_region_1_T31TCJ.tif * T31TCJ_model_1_confidence_seed_0.tif * tmpClassif * ! config_model empty * (empty) * ! dataAppVal desc | Shapefiles obtained after spliting reference data between learning and validation set according a ratio. * ! bymodels * (empty) * T31TCJ_seed_0_learn.sqlite * T31TCJ_seed_0_val.sqlite * ! dataRegion vector data split by region | When using eco-climatic region, contains the vector data split by region. * (empty) * ! dimRed desc | Contains features after dimensionality reduction. Empty if not activated. * (empty) * ! envelope shapefiles | Contains shapefiles, one for each tile. | Used to ensure tile priority, with no overlap. * T31TCJ.dbf * T31TCJ.prj * T31TCJ.shp * T31TCJ.shx * ! features useful information | for each tile, contains useful information * T31TCJ * ! tmp temporary folder | folder created temporarily during the chain execution * MaskCommunSL.dbf * MaskCommunSL.prj * MaskCommunSL.shp common scene | the common scene of all sensors for this tile. * MaskCommunSL.shx * MaskCommunSL.tif * Sentinel2L3A_T31TCJ_reference.tif reference image | the image, generated by iota2, used for reprojecting data * Sentinel2L3A_T31TCJ_input_dates.txt list of dates | the list of date detected in ``s2_path`` for the current tile. * Sentinel2_T31TCJ_interpolation_dates.txt * CloudThreshold_0.dbf * CloudThreshold_0.prj * CloudThreshold_0.shp database used as mask | This database is used to mask training polygons according to a number of clear date. See :ref:`cloud_threshold` parameter * CloudThreshold_0.shx * nbView.tif number visits | number of time a pixel is seen in the whole time series (i.e., excluding clouds, shadows, staturation and no-data) * final final producs | This folder contains the final products of iota2. | All final products will be generated in the ``final`` directory | see :ref:`final-products` for details * ! simplification * mosaic * tiles * tmp * vectors * ! TMP * ClassificationResults_seed_0.txt * Classif_Seed_0.csv * T31TCJ_Cloud.tif * T31TCJ_GlobalConfidence_seed_0.tif * T31TCJ_seed_0_CompRef.tif * T31TCJ_seed_0.csv * T31TCJ_seed_0.tif * Classif_Seed_0_ColorIndexed.tif * Classif_Seed_0.tif * Confidence_Seed_0.tif * Confusion_Matrix_Classif_Seed_0.png * diff_seed_0.tif * PixelsValidity.tif * RESULTS.txt * vectors * ! formattingVectors learning samples | The learning samples contained in each tiles. | Shapefiles in which pixel values from time series have been extracted. * ! T31TCJ temporary directory | This is a temporary working directory, intermediate files are (re)moved after step completion. * (empty) * T31TCJ.cpg * T31TCJ.dbf * T31TCJ.prj * T31TCJ.shp * T31TCJ.shx * ! learningSamples learning samples | Sqlite file containing learning samples by regions. | Also contains a CSV file containing statistics about samples balance for each seed. See :ref:`tracing back samples ` to generate this file manually. * class_statistics_seed0_learn.csv * Samples_region_1_seed0_learn.sqlite * T31TCJ_region_1_seed0_Samples_learn.sqlite * ! logs logs | output logs of iota2. See :ref:`logs` section for details * classification * classification_T31TCJ_model_1_seed_0.err * classification_T31TCJ_model_1_seed_0.out * CommonMasks * common_mask_T31TCJ.err * common_mask_T31TCJ.out * confusionCmd * confusion_T31TCJ_seed_0.err * confusion_T31TCJ_seed_0.out * confusionsMerge * merge_confusions.err * merge_confusions.out * Envelope * tiles_envelopes.err * tiles_envelopes.out * genRegionVector * region_generation.err * region_generation.out * ! html * configuration_file.html * environment_info.html * genindex.html * index.html * input_files_content.html * objects.inv * output_path_content.html * s2_path_content.html * search.html * searchindex.js * source * classification_T31TCJ_model_1_seed_0.out * common_mask_T31TCJ.out * configuration_file.rst * confusion_T31TCJ_seed_0.out * environment_info.rst * extraction_T31TCJ.out * final_report.out * index.rst * input_files_content.rst * learning_model_1_seed_0.out * merge_confusions.out * merge_model_1_seed_0_usually.out * merge_samples_T31TCJ.out * mosaic.out * output_path_content.rst * preprocessing_T31TCJ.out * region_generation.out * s2_path_content.rst * s_sel_model_1_seed_0.out * stats_1_S_0_T_T31TCJ.out * tasks_status_1.rst * tasks_status_2.rst * tiles_envelopes.out * validity_raster_T31TCJ.out * vector_form_T31TCJ.out * _sources * configuration_file.rst.txt * environment_info.rst.txt * index.rst.txt * input_files_content.rst.txt * output_path_content.rst.txt * s2_path_content.rst.txt * tasks_status_1.rst.txt * tasks_status_2.rst.txt * _static * basic.css * css * badge_only.css * fonts * fontawesome-webfont.eot * fontawesome-webfont.svg * fontawesome-webfont.ttf * fontawesome-webfont.woff * fontawesome-webfont.woff2 * lato-bold-italic.woff * lato-bold-italic.woff2 * lato-bold.woff * lato-bold.woff2 * lato-normal-italic.woff * lato-normal-italic.woff2 * lato-normal.woff * lato-normal.woff2 * Roboto-Slab-Bold.woff * Roboto-Slab-Bold.woff2 * Roboto-Slab-Regular.woff * Roboto-Slab-Regular.woff2 * theme.css * doctools.js * documentation_options.js * file.png * jquery-3.5.1.js * jquery.js * js * badge_only.js * html5shiv.min.js * html5shiv-printshiv.min.js * theme.js * language_data.js * minus.png * plus.png * pygments.css * searchtools.js * underscore-1.3.1.js * underscore.js * tasks_status_1.html * tasks_status_2.html * learnModel * learning_model_1_seed_0.err * learning_model_1_seed_0.out * mosaic * mosaic.err * mosaic.out * PixelValidity * validity_raster_T31TCJ.err * validity_raster_T31TCJ.out * reportGeneration * final_report.err * final_report.out * samplesByModels * merge_model_1_seed_0_usually.err * merge_model_1_seed_0_usually.out * samplesByTiles * merge_samples_T31TCJ.err * merge_samples_T31TCJ.out * samplesExtraction * extraction_T31TCJ.err * extraction_T31TCJ.out * samplesMerge * merge_model_1_seed_0.err * merge_model_1_seed_0.out * samplingLearningPolygons * s_sel_model_1_seed_0.err * s_sel_model_1_seed_0.out * sensorsPreprocess * preprocessing_T31TCJ.err * preprocessing_T31TCJ.out * statsSamplesModel * stats_1_S_0_T_T31TCJ.err * stats_1_S_0_T_T31TCJ.out * tasks_status_1.svg * tasks_status_2.svg * VectorFormatting * vector_form_T31TCJ.err * vector_form_T31TCJ.out * ! model desc | The learned models * model_1_seed_0.txt * ! samplesSelection shapefiles | Shapefiles containing points (or pixels coordinates) selected for training stage. | Also contains a CSV summary of the actual number of samples per class * samples_region_1_seed_0.dbf * samples_region_1_seed_0_outrates.csv * samples_region_1_seed_0.prj * samples_region_1_seed_0_selection.sqlite * samples_region_1_seed_0.shp * samples_region_1_seed_0.shx * samples_region_1_seed_0.xml * T31TCJ_region_1_seed_0_stats.xml * T31TCJ_samples_region_1_seed_0_selection.sqlite * T31TCJ_selection_merge.sqlite * ! shapeRegion desc | Shapefiles indicating intersection between tiles and region. * MyRegion_region_1_T31TCJ.dbf * MyRegion_region_1_T31TCJ.prj * MyRegion_region_1_T31TCJ.shp * MyRegion_region_1_T31TCJ.shx * MyRegion_region_1_T31TCJ.tif * ! stats statistics | Optional xml statistics to standardize the data before learning (svm...). * (empty) * IOTA2_tasks_status.txt internal execution status | Iota2 keeps track of it's execution using this *pickle* file (not text) to be allowed to restart from the state where it stopped. * logs.zip logs archive * MyRegion.dbf * MyRegion.prj * MyRegion.shp fake region | When no ecoclimatic region is defined for learning step, Iota2 creates this fake file with a single region. * MyRegion.shx * reference_data.dbf * reference_data.prj * reference_data.shp reencoded shapefile | As OTB expects classes to be encoded as consecutive integers, which is not necessarily the case of user labels, this shapefile contains user data with reencoded labels. * reference_data.shx .. _final-products: Final products ============== All final products will be generated in the ``final`` directory Land cover map -------------- Your *Classif_Seed_0_ColorIndexed.tif* should look like this one: .. figure:: ./Images/classif_Example.jpg :scale: 15 % :align: center :alt: classification map Classif_Seed_0.tif Example This map contains labels from the shapeFile ``groundTruth.shp``. As you can see the classification's quality is rather low. A possible explanation is the low number of dates used to produce it. A raster called ``PixelsValidity.tif`` gives the number of dates for which the pixel is clear (no cloud, cloud shadow, saturation) .. figure:: ./Images/PixVal_Example.png :scale: 50 % :align: center :alt: validity map PixelsValidity.tif Example As only two dates are used to produce the classification map, pixels are in the [0; 2] range. iota2 also provides a confidence map: ``Confidence_Seed_0.tif`` which allows to better understand the resulting classification. This map gives for each pixel a scale between O and 100, where 0 and 100 meant that the probability membership provided by the classifier is 0 and 1, respectively. This is not a validation, just an estimate of the confidence in the decision of the classifier. .. figure:: ./Images/confidence_example.jpg :scale: 63 % :align: center :alt: confidence map Confidence_Seed_0.tif Example These three maps form iota2's main outputs: they are the minimum outputs required to analyse and understand the results. We analyzed and produced classifications thanks to iota2. The main objective is to get the better land cover map as possible. There are many ways to achieve this purpose: researchers publish every day new methods. The simplest method to get better results can consist in using a longer time series, improving the reference data for training, etc. Measuring quality ----------------- .. figure:: ./Images/Confusion_Matrix_Classif_Seed_0.jpeg :scale: 30 % :align: center :alt: confusion matrix Confusion_Matrix_Classif_Seed_0.png Example Confusion matrices allow us to measure the quality of a classification. In the one provided by iota2 (in the ``final`` output directory), the pixels whose labels are known (the reference) are in rows and the inferred pixels are in columns. To go further ============= .. toctree:: :maxdepth: 1 Advanced features .. raw:: html :file: interactive-tree.html