Mix ground truths from different years
Care must be taken when mixing reference data valid for different years. The model learned must be as close as possible to the ground truth and the satellite data used.
Some users have found it useful to use iota2 in a way that is outlined below for their context of use. This context is as follows: create a classification containing a certain number of classes, including agricultural classes. To learn the model, all the desired classes must be represented in the initial database.
However, to create the classification for year N, the reference database for agricultural classes is only available for year N-1. To face this problem, the following protocol can be implemented
Using iota2, generate a classification for year N. The model will use year N for all the samples but year N-1 for the agricultural classes. The inference will be made using only satellite data from year N.
From this first classification we will extract the location of the agricultural classes
Finally, we will use the location of the agricultural classes to learn a new model using only satellite data from the target year.
In the following sections we will detail how to proceed for each stage.
1. Generate a classification for year N
First we need to run iota2 for the target year (N) and the year in which the reference data is valid (for example N-1).
These are classic iota2 executions, however, we will stop the executions at the stage preceding model learning (features extraction).
After the sample extraction stage, the output directory must contain the file Samples_region_1_seed0_learn.sqlite
for both iota2 runs.
For example run_2018/learningSamples/Samples_region_1_seed0_learn.sqlite
and run_2019/learningSamples/Samples_region_1_seed0_learn.sqlite
for years N-1 and N respectively.
To merge these databases we can use ogr2ogr
.
# delete agricultural classes from the N database
ogr2ogr -f "SQLite" -sql "SELECT * FROM output WHERE code != 1" -nln output temp_no_crop.sqlite run_2019/learningSamples/Samples_region_1_seed0_learn.sqlite
# in the N-1 database, select only agricultural samples
ogr2ogr -f "SQLite" -sql "SELECT * FROM output WHERE code = 1" -nln output temp_only_crop.sqlite run_2018/learningSamples/Samples_region_1_seed0_learn.sqlite
# merge de two temporary database
ogr2ogr -f "SQLite" merge.sqlite temp_no_crop.sqlite
ogr2ogr -f "SQLite" merge.sqlite temp_only_crop.sqlite -append -update
# replace the original database
mv merge.sqlite run_2019/learningSamples/Samples_region_1_seed0_learn.sqlite
Then you need to restart iota2 from the learning step (which will use the newly created Samples_region_1_seed0_learn.sqlite file) to the last step.
2. Extract the location of agricultural classes
We will now use the previously generated classification to extract the location of the classes of interest.
To do this, we can use a python script like the one below, where extract_class_location
is the main function.
In this example the selection is done randomly and limited by a maximum number of samples, users can adapt this script according to their needs.
import random
import numpy as np
import rasterio
from osgeo import ogr, osr
from rasterio.transform import Affine
def load_raster(raster_path: str) -> tuple[np.ndarray, Affine, str]:
"""
Load raster data from a given file path.
Parameters
----------
raster_path : str
Path to the raster file.
Returns
-------
tuple[np.ndarray, Affine, str]
A tuple containing the raster data as a NumPy array, the affine transform, and the coordinate reference system (CRS) in WKT format.
"""
with rasterio.open(raster_path) as src:
raster_data = src.read(1)
transform = src.transform
crs = src.crs.wkt
return raster_data, transform, crs
def find_pixel_indices(
raster_data: np.ndarray, value: int = 2
) -> list[tuple[int, int]]:
"""
Find the indices of pixels with a specified value in the raster data.
Parameters
----------
raster_data : np.ndarray
The raster data array.
value : int, optional
The pixel value to search for (default is 2).
Returns
-------
list[tuple[int, int]]
A list of tuples containing the row and column indices of the pixels with the specified value.
"""
rows, cols = np.where(raster_data == value)
indices = list(zip(rows, cols))
return indices
def select_random_indices(
indices: list[tuple[int, int]], max_pixels: int
) -> list[tuple[int, int]]:
"""
Select a random subset of pixel indices.
Parameters
----------
indices : list[tuple[int, int]]
The list of pixel indices.
max_pixels : int
The maximum number of pixels to select.
Returns
-------
list[tuple[int, int]]
A list of randomly selected pixel indices.
"""
if len(indices) > max_pixels:
selected_indices = random.sample(indices, max_pixels)
else:
selected_indices = indices
return selected_indices
def create_shapefile(
shapefile_path: str,
transform: Affine,
crs: str,
selected_indices: list[tuple[int, int]],
) -> None:
"""
Create a shapefile containing points at the locations of the selected pixel indices.
Parameters
----------
shapefile_path : str
The path to the shapefile to be created.
transform : Affine
The affine transform for the raster coordinates.
crs : str
The coordinate reference system in WKT format.
selected_indices : list[tuple[int, int]]
The list of selected pixel indices.
"""
driver = ogr.GetDriverByName("ESRI Shapefile")
data_source = driver.CreateDataSource(shapefile_path)
srs = osr.SpatialReference()
srs.ImportFromWkt(crs)
layer = data_source.CreateLayer("points", srs, ogr.wkbPoint)
layer.CreateField(ogr.FieldDefn("ID", ogr.OFTInteger))
for idx, (row, col) in enumerate(selected_indices):
x, y = rasterio.transform.xy(transform, row, col)
point = ogr.Geometry(ogr.wkbPoint)
point.AddPoint(x, y)
feature = ogr.Feature(layer.GetLayerDefn())
feature.SetGeometry(point)
feature.SetField("ID", int(idx))
layer.CreateFeature(feature)
feature = None
data_source = None
def extract_class_location(
raster_path: str, shapefile_path: str, max_pixels: int, value: int
) -> None:
"""
Extract class location
Generate a point database based on class value randomly selected in raster_path
Parameters
----------
raster_path : str
The path to the raster file.
shapefile_path : str
The path to the shapefile to be created.
max_pixels : int
The maximum number of pixels to select.
value : int
class to extract
"""
raster_data, transform, crs = load_raster(raster_path)
indices = find_pixel_indices(raster_data, value)
selected_indices = select_random_indices(indices, max_pixels)
create_shapefile(shapefile_path, transform, crs, selected_indices)
Once we’ve extracted the locations of the points, we need to update another database created by iota2. This is the database
samplesSelection/samples_region_1_seed_0_selection.sqlite
which contains the positions of the training samples. As before,
we can use ogr2ogr
to update the database.
# First convert the shapefile results from extract_class_location call into sqlite database
ogr2ogr -f "SQLite" crop_location.sqlite input.shp
# Then, remove crop samples location from the database
ogr2ogr -f "SQLite" -sql "SELECT * FROM output WHERE code != 1" -nln output no_crop.sqlite run_2019/samplesSelection/samples_region_1_seed_0_selection.sqlite
# merge de two temporary database
ogr2ogr -f "SQLite" merge.sqlite no_crop.sqlite
ogr2ogr -f "SQLite" merge.sqlite crop_location.sqlite -append -update
3. Swap samples location database
Finally we can launch a new iota2 run
, stop it after the sample selection step.
Replace the new_run_2019/samplesSelection/samples_region_1_seed_0_selection.sqlite
by the newly created one
mv merge.sqlite new_run_2019/samplesSelection/samples_region_1_seed_0_selection.sqlite
after this step you can restart iota2 until the end