Dimensionality Reduction in iota2 ################################# Iota2 provides a functionality called external features which allows manipulating time series through Python functions. We will show how to use this functionality for dimensionality reduction. A dedicated documentation for configuring this workflow is available in :doc:`External features `. In this example, we will use the `PCA `_ function from scikit-learn, but any other dimensionality reduction methods can be used. .. code-block:: python def pca_reduction(self: DataContainer) -> tuple[np.ndarray, list]: """ Perform Principal Component Analysis (PCA) on the interpolated data to reduce its dimensionality. """ original_shape = self.interpolated_data.shape reshaped_data = self.interpolated_data.reshape(-1, original_shape[2]) # Apply PCA n_components = 3 pca = PCA(n_components=n_components) pca_result = pca.fit_transform(reshaped_data) reduced_data = pca_result.reshape(original_shape[0], original_shape[1], n_components) labels = [ I2Label(sensor_name="pca", feat_name=i + 1) for i in range(reduced_data.shape[2]) ] return reduced_data, labels In this example, the `PCA_reduction` function applies dimensionality reduction directly to the interpolated data `self.interpolated_data`. However, due to the size of the data being manipulated, this data is often divided into chunks. Consequently, `self.interpolated_data` often represents only a portion of the image on a given tile. However, in a traditional usage of iota2, the data is distributed over several tiles. To achieve a dimensionality reduction that is coherent with the area of interest, we can rely on the data present in the databases created for model training. These files are located in the `learningSamples` folder of the iota2 output directory. For example, the file `learningSamples/Samples_region_1_seed0_learn.sqlite`. Here is an example of python functions that can be used to build a PCA model and save it to disk. For this example, all the columns containing the string ‘sentinel’ are used to build the PCA model. .. code-block:: python import sqlite3 import joblib import pandas as pd from sklearn.decomposition import PCA def get_sentinel_columns(db_path: str, table_name: str) -> list: """ Retrieve columns containing 'sentinel' from the specified SQLite table. Parameters ---------- db_path : str Path to the SQLite database file. table_name : str Name of the table to query. Returns ------- list List of columns containing 'sentinel'. """ conn = sqlite3.connect(db_path) query = f"PRAGMA table_info({table_name})" table_info = pd.read_sql_query(query, conn) conn.close() sentinel_columns = table_info[table_info["name"].str.contains("sentinel")][ "name" ].tolist() return sentinel_columns def load_sentinel_data( db_path: str, table_name: str, sentinel_columns: list ) -> pd.DataFrame: """ Load data from the specified columns of the SQLite table. Parameters ---------- db_path : str Path to the SQLite database file. table_name : str Name of the table to query. sentinel_columns : list List of columns to load data from. Returns ------- pd.DataFrame DataFrame with the loaded data. """ if sentinel_columns: columns_query = ", ".join(sentinel_columns) query = f"SELECT {columns_query} FROM {table_name}" conn = sqlite3.connect(db_path) df = pd.read_sql_query(query, conn) conn.close() else: df = ( pd.DataFrame() ) # If no columns contain "sentinel", return an empty DataFrame return df def apply_pca(data: pd.DataFrame, n_components: int = 3) -> PCA: """ Apply PCA to the given data. Parameters ---------- data : pd.DataFrame Input data for PCA. n_components : int, optional Number of PCA components, by default 3. Returns ------- PCA PCA object after fitting the data. """ values = data.values pca = PCA(n_components=n_components) pca.fit(values) return pca def build_pca(db_path: str, output_pca_model: str) -> None: """ Build and save a PCA model based on sentinel columns in the specified SQLite table. Parameters ---------- db_path : str Path to the SQLite database file. output_pca_model : str Path to store the PCA model. """ table_name = "output" sentinel_columns = get_sentinel_columns(db_path, table_name) df = load_sentinel_data(db_path, table_name, sentinel_columns) if not df.empty: pca = apply_pca(df) joblib.dump(pca, output_pca_model) print("PCA model saved successfully.") else: print("No sentinel columns found in the table.") if __name__ == "__main__": build_pca( "learningSamples/Samples_region_1_seed0_learn.sqlite", "pca_model.joblib" ) Once the model is saved on disk, we can reuse it in our function, which models it on the Sentinel-2 data set. Our initial function can then become .. code-block:: python def pca_reduction(self: DataContainer, pca_model_file: str) -> tuple[np.ndarray, list]: """ Perform Principal Component Analysis (PCA) on the interpolated data to reduce its dimensionality. """ pca_loaded = joblib.load(pca_file) original_shape = self.interpolated_data.shape reshaped_data = self.interpolated_data.reshape(-1, original_shape[2]) # Apply PCA pca_result = pca_loaded.transform(reshaped_data) reduced_data = pca_result.reshape(original_shape[0], original_shape[1], n_components) labels = [ I2Label(sensor_name="pca", feat_name=i + 1) for i in range(reduced_data.shape[2]) ] return reduced_data, labels The configuration file may looks like .. code-block:: python "external_features": { "functions": [["PCA_reduction", {"pca_file":"/path/to/pca_model.joblib"}]] "module": "/absolute/path/to/pca_redudction.py" "concat_mode": False }, .. Warning:: It is important not to concatenate the result of the dimension reduction with the rest of the primitives thanks to the :ref:`concat_mode ` parameter, which must be set to False. Summary of the steps to perform dimensionality reduction: - Run iota2 up to the generation of training samples by models (step `Merge samples dedicated to the same model`) without dimensionality reduction. - Run the script that will train the dimensionality reduction model as described above. - Restart iota2 from the beginning, adding the user-defined dimensionality reduction function.