Mix ground truths from different years

Care must be taken when mixing reference data valid for different years. The model learned must be as close as possible to the ground truth and the satellite data used.

Some users have found it useful to use iota2 in a way that is outlined below for their context of use. This context is as follows: create a classification containing a certain number of classes, including agricultural classes. To learn the model, all the desired classes must be represented in the initial database.

However, to create the classification for year N, the reference database for agricultural classes is only available for year N-1. To face this problem, the following protocol can be implemented

  1. Using iota2, generate a classification for year N. The model will use year N for all the samples but year N-1 for the agricultural classes. The inference will be made using only satellite data from year N.

  2. From this first classification we will extract the location of the agricultural classes

  3. Finally, we will use the location of the agricultural classes to learn a new model using only satellite data from the target year.

In the following sections we will detail how to proceed for each stage.

1. Generate a classification for year N

First we need to run iota2 for the target year (N) and the year in which the reference data is valid (for example N-1).

These are classic iota2 executions, however, we will stop the executions at the stage preceding model learning (features extraction). After the sample extraction stage, the output directory must contain the file Samples_region_1_seed0_learn.sqlite for both iota2 runs. For example run_2018/learningSamples/Samples_region_1_seed0_learn.sqlite and run_2019/learningSamples/Samples_region_1_seed0_learn.sqlite for years N-1 and N respectively.

To merge these databases we can use ogr2ogr.

# delete agricultural classes from the N database
ogr2ogr -f "SQLite" -sql "SELECT * FROM output WHERE code != 1" -nln output temp_no_crop.sqlite run_2019/learningSamples/Samples_region_1_seed0_learn.sqlite
# in the N-1 database, select only agricultural samples
ogr2ogr -f "SQLite" -sql "SELECT * FROM output WHERE code = 1" -nln output temp_only_crop.sqlite run_2018/learningSamples/Samples_region_1_seed0_learn.sqlite
# merge de two temporary database
ogr2ogr -f "SQLite" merge.sqlite temp_no_crop.sqlite
ogr2ogr -f "SQLite" merge.sqlite temp_only_crop.sqlite -append -update
# replace the original database
mv merge.sqlite run_2019/learningSamples/Samples_region_1_seed0_learn.sqlite

Then you need to restart iota2 from the learning step (which will use the newly created Samples_region_1_seed0_learn.sqlite file) to the last step.

2. Extract the location of agricultural classes

We will now use the previously generated classification to extract the location of the classes of interest. To do this, we can use a python script like the one below, where extract_class_location is the main function. In this example the selection is done randomly and limited by a maximum number of samples, users can adapt this script according to their needs.

import random

import numpy as np
import rasterio
from osgeo import ogr, osr
from rasterio.transform import Affine


def load_raster(raster_path: str) -> tuple[np.ndarray, Affine, str]:
    """
    Load raster data from a given file path.

    Parameters
    ----------
    raster_path : str
        Path to the raster file.

    Returns
    -------
    tuple[np.ndarray, Affine, str]
        A tuple containing the raster data as a NumPy array, the affine transform, and the coordinate reference system (CRS) in WKT format.
    """
    with rasterio.open(raster_path) as src:
        raster_data = src.read(1)
        transform = src.transform
        crs = src.crs.wkt
    return raster_data, transform, crs


def find_pixel_indices(
    raster_data: np.ndarray, value: int = 2
) -> list[tuple[int, int]]:
    """
    Find the indices of pixels with a specified value in the raster data.

    Parameters
    ----------
    raster_data : np.ndarray
        The raster data array.
    value : int, optional
        The pixel value to search for (default is 2).

    Returns
    -------
    list[tuple[int, int]]
        A list of tuples containing the row and column indices of the pixels with the specified value.
    """
    rows, cols = np.where(raster_data == value)
    indices = list(zip(rows, cols))
    return indices


def select_random_indices(
    indices: list[tuple[int, int]], max_pixels: int
) -> list[tuple[int, int]]:
    """
    Select a random subset of pixel indices.

    Parameters
    ----------
    indices : list[tuple[int, int]]
        The list of pixel indices.
    max_pixels : int
        The maximum number of pixels to select.

    Returns
    -------
    list[tuple[int, int]]
        A list of randomly selected pixel indices.
    """
    if len(indices) > max_pixels:
        selected_indices = random.sample(indices, max_pixels)
    else:
        selected_indices = indices
    return selected_indices


def create_shapefile(
    shapefile_path: str,
    transform: Affine,
    crs: str,
    selected_indices: list[tuple[int, int]],
) -> None:
    """
    Create a shapefile containing points at the locations of the selected pixel indices.

    Parameters
    ----------
    shapefile_path : str
        The path to the shapefile to be created.
    transform : Affine
        The affine transform for the raster coordinates.
    crs : str
        The coordinate reference system in WKT format.
    selected_indices : list[tuple[int, int]]
        The list of selected pixel indices.
    """
    driver = ogr.GetDriverByName("ESRI Shapefile")
    data_source = driver.CreateDataSource(shapefile_path)
    srs = osr.SpatialReference()
    srs.ImportFromWkt(crs)
    layer = data_source.CreateLayer("points", srs, ogr.wkbPoint)
    layer.CreateField(ogr.FieldDefn("ID", ogr.OFTInteger))

    for idx, (row, col) in enumerate(selected_indices):
        x, y = rasterio.transform.xy(transform, row, col)
        point = ogr.Geometry(ogr.wkbPoint)
        point.AddPoint(x, y)

        feature = ogr.Feature(layer.GetLayerDefn())
        feature.SetGeometry(point)
        feature.SetField("ID", int(idx))
        layer.CreateFeature(feature)
        feature = None

    data_source = None


def extract_class_location(
    raster_path: str, shapefile_path: str, max_pixels: int, value: int
) -> None:
    """
    Extract class location

    Generate a point database based on class value randomly selected in raster_path

    Parameters
    ----------
    raster_path : str
        The path to the raster file.
    shapefile_path : str
        The path to the shapefile to be created.
    max_pixels : int
        The maximum number of pixels to select.
    value : int
        class to extract
    """
    raster_data, transform, crs = load_raster(raster_path)
    indices = find_pixel_indices(raster_data, value)
    selected_indices = select_random_indices(indices, max_pixels)
    create_shapefile(shapefile_path, transform, crs, selected_indices)

Once we’ve extracted the locations of the points, we need to update another database created by iota2. This is the database samplesSelection/samples_region_1_seed_0_selection.sqlite which contains the positions of the training samples. As before, we can use ogr2ogr to update the database.

# First convert the shapefile results from extract_class_location call into sqlite database
ogr2ogr -f "SQLite" crop_location.sqlite input.shp
# Then, remove crop samples location from the database
ogr2ogr -f "SQLite" -sql "SELECT * FROM output WHERE code != 1" -nln output no_crop.sqlite run_2019/samplesSelection/samples_region_1_seed_0_selection.sqlite
# merge de two temporary database
ogr2ogr -f "SQLite" merge.sqlite no_crop.sqlite
ogr2ogr -f "SQLite" merge.sqlite crop_location.sqlite -append -update

3. Swap samples location database

Finally we can launch a new iota2 run, stop it after the sample selection step. Replace the new_run_2019/samplesSelection/samples_region_1_seed_0_selection.sqlite by the newly created one

mv merge.sqlite new_run_2019/samplesSelection/samples_region_1_seed_0_selection.sqlite

after this step you can restart iota2 until the end