Dimensionality Reduction in iota2
Iota2 provides a functionality called external features which allows manipulating time series through Python functions. We will show how to use this functionality for dimensionality reduction.
A dedicated documentation for configuring this workflow is available in External features.
In this example, we will use the PCA function from scikit-learn, but any other dimensionality reduction methods can be used.
def pca_reduction(self: DataContainer) -> tuple[np.ndarray, list]:
"""
Perform Principal Component Analysis (PCA) on the interpolated data to reduce its dimensionality.
"""
original_shape = self.interpolated_data.shape
reshaped_data = self.interpolated_data.reshape(-1, original_shape[2])
# Apply PCA
n_components = 3
pca = PCA(n_components=n_components)
pca_result = pca.fit_transform(reshaped_data)
reduced_data = pca_result.reshape(original_shape[0], original_shape[1], n_components)
labels = [
I2Label(sensor_name="pca", feat_name=i + 1) for i in range(reduced_data.shape[2])
]
return reduced_data, labels
In this example, the PCA_reduction function applies dimensionality reduction directly to the interpolated data self.interpolated_data. However, due to the size of the data being manipulated, this data is often divided into chunks. Consequently, self.interpolated_data often represents only a portion of the image on a given tile. However, in a traditional usage of iota2, the data is distributed over several tiles.
To achieve a dimensionality reduction that is coherent with the area of interest, we can rely on the data present in the databases created for model training. These files are located in the learningSamples folder of the iota2 output directory. For example, the file learningSamples/Samples_region_1_seed0_learn.sqlite.
Here is an example of python functions that can be used to build a PCA model and save it to disk. For this example, all the columns containing the string ‘sentinel’ are used to build the PCA model.
import sqlite3
import joblib
import pandas as pd
from sklearn.decomposition import PCA
def get_sentinel_columns(db_path: str, table_name: str) -> list:
"""
Retrieve columns containing 'sentinel' from the specified SQLite table.
Parameters
----------
db_path : str
Path to the SQLite database file.
table_name : str
Name of the table to query.
Returns
-------
list
List of columns containing 'sentinel'.
"""
conn = sqlite3.connect(db_path)
query = f"PRAGMA table_info({table_name})"
table_info = pd.read_sql_query(query, conn)
conn.close()
sentinel_columns = table_info[table_info["name"].str.contains("sentinel")][
"name"
].tolist()
return sentinel_columns
def load_sentinel_data(
db_path: str, table_name: str, sentinel_columns: list
) -> pd.DataFrame:
"""
Load data from the specified columns of the SQLite table.
Parameters
----------
db_path : str
Path to the SQLite database file.
table_name : str
Name of the table to query.
sentinel_columns : list
List of columns to load data from.
Returns
-------
pd.DataFrame
DataFrame with the loaded data.
"""
if sentinel_columns:
columns_query = ", ".join(sentinel_columns)
query = f"SELECT {columns_query} FROM {table_name}"
conn = sqlite3.connect(db_path)
df = pd.read_sql_query(query, conn)
conn.close()
else:
df = (
pd.DataFrame()
) # If no columns contain "sentinel", return an empty DataFrame
return df
def apply_pca(data: pd.DataFrame, n_components: int = 3) -> PCA:
"""
Apply PCA to the given data.
Parameters
----------
data : pd.DataFrame
Input data for PCA.
n_components : int, optional
Number of PCA components, by default 3.
Returns
-------
PCA
PCA object after fitting the data.
"""
values = data.values
pca = PCA(n_components=n_components)
pca.fit(values)
return pca
def build_pca(db_path: str, output_pca_model: str) -> None:
"""
Build and save a PCA model based on sentinel columns in the specified SQLite table.
Parameters
----------
db_path : str
Path to the SQLite database file.
output_pca_model : str
Path to store the PCA model.
"""
table_name = "output"
sentinel_columns = get_sentinel_columns(db_path, table_name)
df = load_sentinel_data(db_path, table_name, sentinel_columns)
if not df.empty:
pca = apply_pca(df)
joblib.dump(pca, output_pca_model)
print("PCA model saved successfully.")
else:
print("No sentinel columns found in the table.")
if __name__ == "__main__":
build_pca(
"learningSamples/Samples_region_1_seed0_learn.sqlite",
"pca_model.joblib"
)
Once the model is saved on disk, we can reuse it in our function, which models it on the Sentinel-2 data set.
Our initial function can then become
def pca_reduction(self: DataContainer, pca_model_file: str) -> tuple[np.ndarray, list]:
"""
Perform Principal Component Analysis (PCA) on the interpolated data to reduce its dimensionality.
"""
pca_loaded = joblib.load(pca_file)
original_shape = self.interpolated_data.shape
reshaped_data = self.interpolated_data.reshape(-1, original_shape[2])
# Apply PCA
pca_result = pca_loaded.transform(reshaped_data)
reduced_data = pca_result.reshape(original_shape[0], original_shape[1], n_components)
labels = [
I2Label(sensor_name="pca", feat_name=i + 1) for i in range(reduced_data.shape[2])
]
return reduced_data, labels
The configuration file may looks like
"external_features": {
"functions": [["PCA_reduction", {"pca_file":"/path/to/pca_model.joblib"}]]
"module": "/absolute/path/to/pca_redudction.py"
"concat_mode": False
},
Warning
It is important not to concatenate the result of the dimension reduction with the rest of the primitives thanks to the concat_mode parameter, which must be set to False.
Summary of the steps to perform dimensionality reduction:
Run iota2 up to the generation of training samples by models (step Merge samples dedicated to the same model) without dimensionality reduction.
Run the script that will train the dimensionality reduction model as described above.
Restart iota2 from the beginning, adding the user-defined dimensionality reduction function.