Mix ground truths from different years ====================================== Care must be taken when mixing reference data valid for different years. The model learned must be as close as possible to the ground truth and the satellite data used. Some users have found it useful to use iota2 in a way that is outlined below for their context of use. This context is as follows: create a classification containing a certain number of classes, including agricultural classes. To learn the model, all the desired classes must be represented in the initial database. However, to create the classification for year N, the reference database for agricultural classes is only available for year N-1. To face this problem, the following protocol can be implemented 1. Using iota2, generate a classification for year N. The model will use year N for all the samples but year N-1 for the agricultural classes. The inference will be made using only satellite data from year N. 2. From this first classification we will extract the location of the agricultural classes 3. Finally, we will use the location of the agricultural classes to learn a new model using only satellite data from the target year. In the following sections we will detail how to proceed for each stage. 1. Generate a classification for year N *************************************** First we need to run iota2 for the target year (N) and the year in which the reference data is valid (for example N-1). These are classic iota2 executions, however, we will stop the executions at the stage preceding model learning (features extraction). After the sample extraction stage, the output directory must contain the file ``Samples_region_1_seed0_learn.sqlite`` for both iota2 runs. For example ``run_2018/learningSamples/Samples_region_1_seed0_learn.sqlite`` and ``run_2019/learningSamples/Samples_region_1_seed0_learn.sqlite`` for years N-1 and N respectively. To merge these databases we can use ``ogr2ogr``. .. code-block:: bash # delete agricultural classes from the N database ogr2ogr -f "SQLite" -sql "SELECT * FROM output WHERE code != 1" -nln output temp_no_crop.sqlite run_2019/learningSamples/Samples_region_1_seed0_learn.sqlite # in the N-1 database, select only agricultural samples ogr2ogr -f "SQLite" -sql "SELECT * FROM output WHERE code = 1" -nln output temp_only_crop.sqlite run_2018/learningSamples/Samples_region_1_seed0_learn.sqlite # merge de two temporary database ogr2ogr -f "SQLite" merge.sqlite temp_no_crop.sqlite ogr2ogr -f "SQLite" merge.sqlite temp_only_crop.sqlite -append -update # replace the original database mv merge.sqlite run_2019/learningSamples/Samples_region_1_seed0_learn.sqlite Then you need to restart iota2 from the learning step (which will use the newly created Samples_region_1_seed0_learn.sqlite file) to the last step. 2. Extract the location of agricultural classes *********************************************** We will now use the previously generated classification to extract the location of the classes of interest. To do this, we can use a python script like the one below, where ``extract_class_location`` is the main function. In this example the selection is done randomly and limited by a maximum number of samples, users can adapt this script according to their needs. .. code-block:: python import random import numpy as np import rasterio from osgeo import ogr, osr from rasterio.transform import Affine def load_raster(raster_path: str) -> tuple[np.ndarray, Affine, str]: """ Load raster data from a given file path. Parameters ---------- raster_path : str Path to the raster file. Returns ------- tuple[np.ndarray, Affine, str] A tuple containing the raster data as a NumPy array, the affine transform, and the coordinate reference system (CRS) in WKT format. """ with rasterio.open(raster_path) as src: raster_data = src.read(1) transform = src.transform crs = src.crs.wkt return raster_data, transform, crs def find_pixel_indices( raster_data: np.ndarray, value: int = 2 ) -> list[tuple[int, int]]: """ Find the indices of pixels with a specified value in the raster data. Parameters ---------- raster_data : np.ndarray The raster data array. value : int, optional The pixel value to search for (default is 2). Returns ------- list[tuple[int, int]] A list of tuples containing the row and column indices of the pixels with the specified value. """ rows, cols = np.where(raster_data == value) indices = list(zip(rows, cols)) return indices def select_random_indices( indices: list[tuple[int, int]], max_pixels: int ) -> list[tuple[int, int]]: """ Select a random subset of pixel indices. Parameters ---------- indices : list[tuple[int, int]] The list of pixel indices. max_pixels : int The maximum number of pixels to select. Returns ------- list[tuple[int, int]] A list of randomly selected pixel indices. """ if len(indices) > max_pixels: selected_indices = random.sample(indices, max_pixels) else: selected_indices = indices return selected_indices def create_shapefile( shapefile_path: str, transform: Affine, crs: str, selected_indices: list[tuple[int, int]], ) -> None: """ Create a shapefile containing points at the locations of the selected pixel indices. Parameters ---------- shapefile_path : str The path to the shapefile to be created. transform : Affine The affine transform for the raster coordinates. crs : str The coordinate reference system in WKT format. selected_indices : list[tuple[int, int]] The list of selected pixel indices. """ driver = ogr.GetDriverByName("ESRI Shapefile") data_source = driver.CreateDataSource(shapefile_path) srs = osr.SpatialReference() srs.ImportFromWkt(crs) layer = data_source.CreateLayer("points", srs, ogr.wkbPoint) layer.CreateField(ogr.FieldDefn("ID", ogr.OFTInteger)) for idx, (row, col) in enumerate(selected_indices): x, y = rasterio.transform.xy(transform, row, col) point = ogr.Geometry(ogr.wkbPoint) point.AddPoint(x, y) feature = ogr.Feature(layer.GetLayerDefn()) feature.SetGeometry(point) feature.SetField("ID", int(idx)) layer.CreateFeature(feature) feature = None data_source = None def extract_class_location( raster_path: str, shapefile_path: str, max_pixels: int, value: int ) -> None: """ Extract class location Generate a point database based on class value randomly selected in raster_path Parameters ---------- raster_path : str The path to the raster file. shapefile_path : str The path to the shapefile to be created. max_pixels : int The maximum number of pixels to select. value : int class to extract """ raster_data, transform, crs = load_raster(raster_path) indices = find_pixel_indices(raster_data, value) selected_indices = select_random_indices(indices, max_pixels) create_shapefile(shapefile_path, transform, crs, selected_indices) Once we've extracted the locations of the points, we need to update another database created by iota2. This is the database ``samplesSelection/samples_region_1_seed_0_selection.sqlite`` which contains the positions of the training samples. As before, we can use ``ogr2ogr`` to update the database. .. code-block:: bash # First convert the shapefile results from extract_class_location call into sqlite database ogr2ogr -f "SQLite" crop_location.sqlite input.shp # Then, remove crop samples location from the database ogr2ogr -f "SQLite" -sql "SELECT * FROM output WHERE code != 1" -nln output no_crop.sqlite run_2019/samplesSelection/samples_region_1_seed_0_selection.sqlite # merge de two temporary database ogr2ogr -f "SQLite" merge.sqlite no_crop.sqlite ogr2ogr -f "SQLite" merge.sqlite crop_location.sqlite -append -update 3. Swap samples location database ********************************* Finally we can launch a ``new iota2 run``, stop it after the sample selection step. Replace the ``new_run_2019/samplesSelection/samples_region_1_seed_0_selection.sqlite`` by the newly created one .. code-block:: bash mv merge.sqlite new_run_2019/samplesSelection/samples_region_1_seed_0_selection.sqlite after this step you can restart iota2 until the end