Dimensionality Reduction in iota2

Iota2 provides a functionality called external features which allows manipulating time series through Python functions. We will show how to use this functionality for dimensionality reduction.

A dedicated documentation for configuring this workflow is available in External features.

In this example, we will use the PCA function from scikit-learn, but any other dimensionality reduction methods can be used.

def pca_reduction(self: DataContainer) -> tuple[np.ndarray, list]:
    """
    Perform Principal Component Analysis (PCA) on the interpolated data to reduce its dimensionality.
    """
    original_shape = self.interpolated_data.shape
    reshaped_data = self.interpolated_data.reshape(-1, original_shape[2])

    # Apply PCA
    n_components = 3
    pca = PCA(n_components=n_components)
    pca_result = pca.fit_transform(reshaped_data)

    reduced_data = pca_result.reshape(original_shape[0], original_shape[1], n_components)
    labels = [
        I2Label(sensor_name="pca", feat_name=i + 1) for i in range(reduced_data.shape[2])
    ]
    return reduced_data, labels

In this example, the PCA_reduction function applies dimensionality reduction directly to the interpolated data self.interpolated_data. However, due to the size of the data being manipulated, this data is often divided into chunks. Consequently, self.interpolated_data often represents only a portion of the image on a given tile. However, in a traditional usage of iota2, the data is distributed over several tiles.

To achieve a dimensionality reduction that is coherent with the area of interest, we can rely on the data present in the databases created for model training. These files are located in the learningSamples folder of the iota2 output directory. For example, the file learningSamples/Samples_region_1_seed0_learn.sqlite.

Here is an example of python functions that can be used to build a PCA model and save it to disk. For this example, all the columns containing the string ‘sentinel’ are used to build the PCA model.

import sqlite3

import joblib
import pandas as pd
from sklearn.decomposition import PCA


def get_sentinel_columns(db_path: str, table_name: str) -> list:
    """
    Retrieve columns containing 'sentinel' from the specified SQLite table.

    Parameters
    ----------
    db_path : str
        Path to the SQLite database file.
    table_name : str
        Name of the table to query.

    Returns
    -------
    list
        List of columns containing 'sentinel'.
    """
    conn = sqlite3.connect(db_path)
    query = f"PRAGMA table_info({table_name})"
    table_info = pd.read_sql_query(query, conn)
    conn.close()
    sentinel_columns = table_info[table_info["name"].str.contains("sentinel")][
        "name"
    ].tolist()
    return sentinel_columns


def load_sentinel_data(
    db_path: str, table_name: str, sentinel_columns: list
) -> pd.DataFrame:
    """
    Load data from the specified columns of the SQLite table.

    Parameters
    ----------
    db_path : str
        Path to the SQLite database file.
    table_name : str
        Name of the table to query.
    sentinel_columns : list
        List of columns to load data from.

    Returns
    -------
    pd.DataFrame
        DataFrame with the loaded data.
    """
    if sentinel_columns:
        columns_query = ", ".join(sentinel_columns)
        query = f"SELECT {columns_query} FROM {table_name}"
        conn = sqlite3.connect(db_path)
        df = pd.read_sql_query(query, conn)
        conn.close()
    else:
        df = (
            pd.DataFrame()
        )  # If no columns contain "sentinel", return an empty DataFrame
    return df


def apply_pca(data: pd.DataFrame, n_components: int = 3) -> PCA:
    """
    Apply PCA to the given data.

    Parameters
    ----------
    data : pd.DataFrame
        Input data for PCA.
    n_components : int, optional
        Number of PCA components, by default 3.

    Returns
    -------
    PCA
        PCA object after fitting the data.
    """
    values = data.values
    pca = PCA(n_components=n_components)
    pca.fit(values)
    return pca


def build_pca(db_path: str, output_pca_model: str) -> None:
    """
    Build and save a PCA model based on sentinel columns in the specified SQLite table.

    Parameters
    ----------
    db_path : str
        Path to the SQLite database file.
    output_pca_model : str
        Path to store the PCA model.
    """
    table_name = "output"

    sentinel_columns = get_sentinel_columns(db_path, table_name)
    df = load_sentinel_data(db_path, table_name, sentinel_columns)

    if not df.empty:
        pca = apply_pca(df)
        joblib.dump(pca, output_pca_model)
        print("PCA model saved successfully.")
    else:
        print("No sentinel columns found in the table.")


if __name__ == "__main__":
    build_pca(
        "learningSamples/Samples_region_1_seed0_learn.sqlite",
        "pca_model.joblib"
    )

Once the model is saved on disk, we can reuse it in our function, which models it on the Sentinel-2 data set.

Our initial function can then become

def pca_reduction(self: DataContainer, pca_model_file: str) -> tuple[np.ndarray, list]:
    """
    Perform Principal Component Analysis (PCA) on the interpolated data to reduce its dimensionality.
    """
    pca_loaded = joblib.load(pca_file)
    original_shape = self.interpolated_data.shape
    reshaped_data = self.interpolated_data.reshape(-1, original_shape[2])

    # Apply PCA
    pca_result = pca_loaded.transform(reshaped_data)

    reduced_data = pca_result.reshape(original_shape[0], original_shape[1], n_components)
    labels = [
        I2Label(sensor_name="pca", feat_name=i + 1) for i in range(reduced_data.shape[2])
    ]
    return reduced_data, labels

The configuration file may looks like

"external_features": {
    "functions": [["PCA_reduction", {"pca_file":"/path/to/pca_model.joblib"}]]
    "module": "/absolute/path/to/pca_redudction.py"
    "concat_mode": False
},

Warning

It is important not to concatenate the result of the dimension reduction with the rest of the primitives thanks to the concat_mode parameter, which must be set to False.

Summary of the steps to perform dimensionality reduction:

Run iota2 up to the generation of training samples by models (step Merge samples dedicated to the same model) without dimensionality reduction.

Run the script that will train the dimensionality reduction model as described above.

Restart iota2 from the beginning, adding the user-defined dimensionality reduction function.