smadi package

Submodules

smadi.anomaly_detectors module

Soil Moisture Anomalies Calculation Module

This module provides a suite of methods for calculating soil moisture anomalies based on climatological data. The implemented methods include:

Z-Score: Standardized z-score method.
SMAPI (Soil Moisture Anomaly Percent Index): Measures anomalies as a percentage deviation from the climatological mean or median.
SMDI (Soil Moisture Deficit Index): Quantifies soil moisture deficit based on deviations from climatological median and extremes.
ESSMI (Empirical Standardized Soil Moisture Index): Uses a nonparametric empirical probability density function for standardizing soil moisture values.
SMAD (Standardized Anomaly Absolute Deviation): Calculates anomalies using the median and interquartile range, providing robustness to outliers.
SMDS (Soil Moisture Drought Severity): Assesses drought severity based on percentile rankings of soil moisture values.
SMCI (Soil Moisture Condition Index): Measures soil moisture content relative to climatological minima and maxima.
SMCA (Soil Moisture Content Anomaly): Quantifies anomalies as a deviation from climatological mean or median relative to maxima.
ParaDis (Parametric Distribution): Fits observed data to parametric distributions (e.g., beta, gamma).

class smadi.anomaly_detectors.AnomalyDetector(df: DataFrame, variable: str, fillna: bool = False, fillna_window_size: int = None, smoothing=False, smooth_window_size=None, timespan: List[str] = None, time_step: str = 'month', normal_metrics: List[str] = ['mean'], agg_metric: str = 'mean', dist: List[str] = None)[source]

Bases: ABC

An abstract class for detecting anomalies in time series data based on the climatology.

apply_transformation(func, **kwargs) → DataFrame[source]

property clim_df: The DataFrame containing the climate normal data.

abstract detect_anomaly(**kwargs) → DataFrame[source]

property groupby_param: The column name to be used for grouping the data for the anomaly detection.

class smadi.anomaly_detectors.AnomalyDetectorFactory[source]

Bases: object

A factory class for creating anomaly detectors based on the provided method.

static create_detector(method: str, **kwargs) → AnomalyDetector[source]

methods = {'essmi': <class 'smadi.anomaly_detectors.ESSMI'>, 'paradis': <class 'smadi.anomaly_detectors.ParaDis'>, 'smad': <class 'smadi.anomaly_detectors.SMAD'>, 'smapi': <class 'smadi.anomaly_detectors.SMAPI'>, 'smca': <class 'smadi.anomaly_detectors.SMCA'>, 'smci': <class 'smadi.anomaly_detectors.SMCI'>, 'smdi': <class 'smadi.anomaly_detectors.SMDI'>, 'smds': <class 'smadi.anomaly_detectors.SMDS'>, 'zscore': <class 'smadi.anomaly_detectors.ZScore'>}

class smadi.anomaly_detectors.ESSMI(df: DataFrame, variable: str, fillna: bool = False, fillna_window_size: int = None, smoothing=False, smooth_window_size=None, timespan: List[str] = None, time_step: str = 'month', normal_metrics: List[str] = ['mean'], agg_metric: str = 'mean', dist: List[str] = None)[source]

Bases: AnomalyDetector

A class for detecting anomalies in time series data based on the Empirical Standardized Soil Moisture Index(ESSMI) method.

The index is computed by fitting the nonparametric empirical probability density function (ePDF) using the kernel density estimator KDE

f^h = 1/nh * Σ K((x - xi) / h) K = 1/√(2π) * exp(-x^2/2)

where: f^h: the ePDF K: the Guassian kernel function h: the bandwidth of the kernel function as smoothing parameter (Scott’s rule) n: the number of observations x: the average value of the variable in the time series data. It can be any of the following: Daily average, weekly average, monthly average, etc. xi: the ith observation

The ESSMI is then computed by transforming the ePDF to the standard normal distribution with a mean of zero and a standard deviation of one using the inverse of the standard normal distribution function.

ESSMI = Φ^-1(F^h(x))

where: Φ^-1: the inverse of the standard normal distribution function F^h: the ePDF

detect_anomaly(**kwargs) → DataFrame[source]

class smadi.anomaly_detectors.ParaDis(df: DataFrame, variable: str, fillna: bool = False, fillna_window_size: int = None, smoothing=False, smooth_window_size=None, timespan: List[str] = None, time_step: str = 'month', normal_metrics: List[str] = ['mean'], agg_metric: str = 'mean', dist: List[str] = None)[source]

Bases: AnomalyDetector

A class for detecting anomalies in time series data based on fitting the observed data to a parametric distribution(e.g. beta and gamma).

detect_anomaly(**kwargs) → DataFrame[source]

class smadi.anomaly_detectors.SMAD(df: DataFrame, variable: str, fillna: bool = False, fillna_window_size: int = None, smoothing=False, smooth_window_size=None, timespan: List[str] = None, time_step: str = 'month', normal_metrics: List[str] = ['mean'], agg_metric: str = 'mean', dist: List[str] = None)[source]

Bases: AnomalyDetector

A class for detecting anomalies in time series data based on the Standardized Anomaly Absolute Deviation(SMAD) method.

SMAD = (x - η) / IQR

where: x: the average value of the variable in the time series data. It can be any of the following: Daily average, weekly average, monthly average, etc. η: the long-term median of the variable(the climate normal). IQR: the interquartile range of the variable. It is the difference between the 75th and 25th percentiles of the variable.

detect_anomaly(**kwargs) → DataFrame[source]

class smadi.anomaly_detectors.SMAPI(df: DataFrame, variable: str, fillna: bool = False, fillna_window_size: int = None, smoothing=False, smooth_window_size=None, timespan: List[str] = None, time_step: str = 'month', normal_metrics: List[str] = ['mean'], agg_metric: str = 'mean', dist: List[str] = None)[source]

Bases: AnomalyDetector

A class for detecting anomalies in time series data based on the Soil Moisture Anomaly Percent Index(SMAPI) method.

SMAPI = ((x - ref) / ref) * 100

where: x: the average value of the variable in the time series data. It can be any of the following: Daily average, weekly average, monthly average, etc. ref: the long-term mean (μ) or median (η) of the variable(the climate normal).

detect_anomaly(**kwargs) → DataFrame[source]

class smadi.anomaly_detectors.SMCA(df: DataFrame, variable: str, fillna: bool = False, fillna_window_size: int = None, smoothing=False, smooth_window_size=None, timespan: List[str] = None, time_step: str = 'month', normal_metrics: List[str] = ['mean'], agg_metric: str = 'mean', dist: List[str] = None)[source]

Bases: AnomalyDetector

A class for detecting anomalies in time series data based on the Soil Moisture Content Anomaly(SMCA) method.

SMCA = (x - ref) / (max - ref)

where: x: the average value of the variable in the time series data. It can be any of the following: Daily average, weekly average, monthly average, etc.

ref: the long-term mean (μ) or median (η) of the variable(the climate normal). max: the long-term maximum of the variable. min: the long-term minimum of the variable.

detect_anomaly(**kwargs) → DataFrame[source]

class smadi.anomaly_detectors.SMCI(df: DataFrame, variable: str, fillna: bool = False, fillna_window_size: int = None, smoothing=False, smooth_window_size=None, timespan: List[str] = None, time_step: str = 'month', normal_metrics: List[str] = ['mean'], agg_metric: str = 'mean', dist: List[str] = None)[source]

Bases: AnomalyDetector

A class for detecting anomalies in time series data based on the Soil Moisture Condition Index(SMCI) method.

SMCI = ((x - min) / (max - min))

where: x: the average value of the variable in the time series data. It can be any of the following: Daily average, weekly average, monthly average, etc. min: the long-term minimum of the variable. max: the long-term maximum of the variable.

detect_anomaly(**kwargs) → DataFrame[source]

class smadi.anomaly_detectors.SMDI(df: DataFrame, variable: str, fillna: bool = False, fillna_window_size: int = None, smoothing=False, smooth_window_size=None, timespan: List[str] = None, time_step: str = 'month', normal_metrics: List[str] = ['mean'], agg_metric: str = 'mean', dist: List[str] = None)[source]

Bases: AnomalyDetector

A class for detecting anomalies in time series data based on the Soil Moisture Deficit Index(SMDI) method.

SMDI = 0.5 * SMDI(t-1) + (SD(t) / 50)

where

SD(t) = ((x - η) / (η - min)) * 100 if x <= η SD(t) = ((x - η) / (max - η)) * 100 if x > η

x: the average value of the variable in the time series data. It can be any of the following: Daily average, weekly average, monthly average, etc. η: the long-term median of the variable(the climate normal). min: the long-term minimum of the variable. max: the long-term maximum of the variable. t: the time step of the time series data.

detect_anomaly(**kwargs) → DataFrame[source]

class smadi.anomaly_detectors.SMDS(df: DataFrame, variable: str, fillna: bool = False, fillna_window_size: int = None, smoothing=False, smooth_window_size=None, timespan: List[str] = None, time_step: str = 'month', normal_metrics: List[str] = ['mean'], agg_metric: str = 'mean', dist: List[str] = None)[source]

Bases: AnomalyDetector

A class for detecting anomalies in time series data based on the Soil Moisture Drought Severity(SMDS) method.

SMDS = 1 - SMP SMP = (rank(x) / (n+1))

where:

SMP: the Soil Moisture Percentile. It is the percentile of the average value of the variable in the time series data. SMDS: the Soil Moisture Drought Severity. It is the severity of the drought based on the percentile of the average value of the variable in the time series data. rank(x): the rank of the average value of the variable in the time series data. n: the number of years in the time series data.

detect_anomaly(**kwargs) → DataFrame[source]

class smadi.anomaly_detectors.ZScore(df: DataFrame, variable: str, fillna: bool = False, fillna_window_size: int = None, smoothing=False, smooth_window_size=None, timespan: List[str] = None, time_step: str = 'month', normal_metrics: List[str] = ['mean'], agg_metric: str = 'mean', dist: List[str] = None)[source]

Bases: AnomalyDetector

A class for detecting anomalies in time series data based on the Z-Score method.

z_score = (x - μ) / σ

where: x: the average value of the variable in the time series data. It can be any of the following: Daily average, weekly average, monthly average, etc. μ: the long-term mean of the variable(the climate normal). σ: the long-term standard deviation of the variable.

detect_anomaly(**kwargs) → DataFrame[source]

smadi.climatology module

A module for calculating climatology (climate normal) for different time steps (month, dekad, week) based on time series data.

class smadi.climatology.Aggregator(df: DataFrame, variable: str, fillna: bool = False, fillna_window_size: int = None, smoothing: bool = False, smooth_window_size: int = None, timespan: List[str] = None, agg_metric: str = 'mean')[source]

Bases: ABC

An abstract class for aggregating time series data based on different time steps.

Attributes:

df: pd.DataFrame: The input DataFrame containing the time series data to be aggregated.
variable: str: The variable/column in the DataFrame to be aggregated.
fillna: bool: Fill NaN values in the time series data using a moving window average.

fillna_window_size: int

smoothing: bool: Smooth the time series data using a moving window average.
smooth_window_size: int: The size of the moving window for smoothing(n-days). It is recommended to be an odd number.
timespan: list[str, str] optional: The start and end dates for a timespan to be aggregated. Format: [‘YYYY-MM-DD ]
agg_metric: str: The aggregation metric to be used. Supported values: ‘mean’, ‘median’, ‘min’, ‘max’, ‘std’, etc.

Methods:

aggregate:: Aggregates the time series data based on the provided time step.

abstract aggregate(**kwargs)[source]

property preprocess_df

class smadi.climatology.BimonthlyAggregator(df: DataFrame, variable: str, fillna: bool = False, fillna_window_size: int = None, smoothing: bool = False, smooth_window_size: int = None, timespan: List[str] = None, agg_metric: str = 'mean')[source]

Bases: Aggregator

Aggregates the time series data based on bimonthly (twice a month) time step.

aggregate(**kwargs)[source]

class smadi.climatology.Climatology(df: DataFrame, variable: str, fillna: bool = False, fillna_window_size: int = None, smoothing=False, smooth_window_size=None, timespan: List[str] = None, time_step: str = 'month', normal_metrics: List[str] = ['mean'], agg_metric: str = 'mean')[source]

Bases: object

A class for calculating climatology(climate normal) for time series data.

Attributes:

df_original: pd.DataFrame: The original input DataFrame before resampling and removing NaN values.
df: pd.DataFrame: The input DataFrame containing the preprocessed data to be aggregated.
variable: str: The variable/column in the DataFrame to be aggregated.
fillna: bool: Fill NaN values in the time series data using a moving window average.
fillna_window_size: int: The size of the moving window for filling NaN values. It is recommended to be an odd number.
smoothing: bool: Smooth the time series data using a moving window average.
smooth_window_size: int: The size of the moving window for smoothing(n-days). It is recommended to be an odd number.
timespan: list[str, str] optional: The start and end dates for a timespan to be aggregated. Format: [‘YYYY-MM-DD ]
time_step: str: The time step for aggregation. Supported values: ‘day’, ‘week’, ‘dekad’, ‘bimonth’, ‘month’.
agg_metric: str: The aggregation metric to be used. Supported values: ‘mean’, ‘median’, ‘min’, ‘max’, ‘std’, etc.
normal_metrics: List[str]: The metrics to be used in the climatology computation. Supported values: ‘mean’, ‘median’, ‘min’, ‘max’, etc.
clima_df: pd.DataFrame: The DataFrame containing climatology information.

Methods:

compute_normals:: Calculates climatology based on the aggregated data.
plot_ts:: Plot the time series data for the provided dataframe.

property aggregated_df

compute_normals(**kwargs) → DataFrame[source]

Calculates climatology based on the aggregated data.

Parameters:

kwargs:: Additional time/date filtering parameters.

Returns:

pd.DataFrame: The DataFrame containing climatology information.

class smadi.climatology.DailyAggregator(df: DataFrame, variable: str, fillna: bool = False, fillna_window_size: int = None, smoothing: bool = False, smooth_window_size: int = None, timespan: List[str] = None, agg_metric: str = 'mean')[source]

Bases: Aggregator

Aggregates the time series data based on daily time step.

aggregate(**kwargs)[source]

class smadi.climatology.DekadalAggregator(df: DataFrame, variable: str, fillna: bool = False, fillna_window_size: int = None, smoothing: bool = False, smooth_window_size: int = None, timespan: List[str] = None, agg_metric: str = 'mean')[source]

Bases: Aggregator

Aggregates the data based on dekad-based time step.

aggregate(**kwargs)[source]

class smadi.climatology.MonthlyAggregator(df: DataFrame, variable: str, fillna: bool = False, fillna_window_size: int = None, smoothing: bool = False, smooth_window_size: int = None, timespan: List[str] = None, agg_metric: str = 'mean')[source]

Bases: Aggregator

Aggregates the time series data based on month-based time step.

aggregate(**kwargs)[source]

class smadi.climatology.Preprocessor(df: DataFrame, variable: str, fillna: bool = False, fillna_window_size: int = None, smoothing: bool = False, smooth_window_size: int = None, timespan: List[str] = None)[source]

Bases: object

A class for preprocessing the time series data before aggregation.

Attributes:

df: pd.DataFrame: The input DataFrame containing the time series data to be aggregated.
variable: str: The variable/column in the DataFrame to be aggregated.
fillna: bool: Fill NaN values in the time series data using a moving window average.
fillna_window_size: int: The size of the moving window for filling NaN values. It is recommended to be an odd number.
smoothing: bool: Smooth the time series data using a moving window average.
smooth_window_size: int: The size of the moving window for smoothing(n-days). It is recommended to be an odd number.
timespan: list[str, str] optional: The start and end dates for a timespan to be aggregated. Format: [‘YYYY-MM-DD ]

Methods:

preprocess:

Preprocess the time series data by resampling, truncating, filling NaN values, and smoothing.

preprocess()[source]

class smadi.climatology.Validator(df: DataFrame, variable: str, fillna: bool = False, fillna_window_size: int = None, smoothing: bool = False, smooth_window_size: int = None, time_step: str = None, metrics: List[str] = None, time_span: List[str] = None)[source]

Bases: object

A class for validating the input parameters for the climatology computation.

Methods:

validate:: Validates the input parameters for the climatology computation.

validate()[source]

class smadi.climatology.WeeklyAggregator(df: DataFrame, variable: str, fillna: bool = False, fillna_window_size: int = None, smoothing: bool = False, smooth_window_size: int = None, timespan: List[str] = None, agg_metric: str = 'mean')[source]

Bases: Aggregator

Aggregates the time series data based on week-based time step.

aggregate(**kwargs)[source]

smadi.data_reader module

class smadi.data_reader.AscatData(path, read_bulk=True)[source]

Bases: GriddedNcContiguousRaggedTs

Class reading ASCAT SSM 6.25 km data.

class smadi.data_reader.Era5Land(path, read_bulk=True, celsius=True)[source]

Bases: GriddedNcTs

Read time series data from ERA5 netCDF files.

smadi.data_reader.extract_obs_ts(loc, ascat_path, era5_land_path=None, obs_type='sm', read_bulk=False)[source]

Read time series of given observation type.

Parameters:

loc (int, tuple) – Tuple is interpreted as longitude, latitude coordinate. Integer is interpreted as grid point index.
ascat_path (str) – Path to ASCAT soil moisture data.
era5_land_path (str) – Path to ERA5-Land data.
obs (str, optional) – Observation type (default: “sm”).
read_bulk (bool, optional) – If “True” all data will be read in memory, if “False” only a single time series is read (default: False). Use “True” to process multiple GPIs in a loop and “False” to read/analyze a single time series.

smadi.data_reader.read_era5(era5_path, loc, interpo_method='nearest')[source]

Read ERA5 data for given location.

Parameters:

era5_path (str) – Path to ERA5 dataset.
loc (tuple) – Tuple is interpreted as longitude, latitude coordinate.
interpo_method (str, optional) – Interpolation method (default: “nearest”).

Returns:

ts – Time series of ERA5 data containing all variables.

Return type:

pd.DataFrame

smadi.data_reader.read_grid_point(loc, ascat_sm_path, era5_land_path=None, read_bulk=False)[source]

Read grid point for given lon/lat coordinates or grid_point.

Parameters:

loc (int, tuple) – Tuple is interpreted as longitude, latitude coordinate. Integer is interpreted as grid point index.
ascat_sm_path (str) – Path to ASCAT soil moisture data.
era5_land_path (str) – Path to ERA5-Land data.
read_bulk (bool, optional) – If “True” all data will be read in memory, if “False” only a single time series is read (default: False). Use “True” to process multiple GPIs in a loop and “False” to read/analyze a single time series.

smadi.indicators module

smadi.indicators.essmi(obs)[source]

Compute the anomalies in time series data based on the Empirical Standardized Soil Moisture Index(ESSMI) method.

parameters:

obs: sequence-like object: The observed time series data.

returns:

numpy.ndarray: The Empirical Standardized Soil Moisture Index computed based on the given observed value(s).

smadi.indicators.para_dis(obs, dist='beta')[source]

Compute the anomalies in time series data based on fitting the observed data to a parametric distribution.

parameters:

obs: pd.Series or np.ndarray or sequence-like object: The observed time series data.
dist: str, optional: The distribution to fit the observed data to. Supported values: ‘beta’,’gamma’, ‘gam’, ‘exp’, ‘pe3’ gam: Gamma exp: Exponential pe3: Pearson III

smadi.indicators.smad(obs, median=None, iqr=None)[source]

Computes the anomalies in time series data based on the Standardized Median Absolute Deviation(SMAD) method.

parameters:

obs: pd.Series or np.ndarray or sequence-like object: The observed time series data.
median: float, pd.Series or np.ndarray or sequence-like object, optional: The long-term median of the variable. if None, it will be computed from obs.
iqr: float, pd.Series or np.ndarray or sequence-like object, optional: The long-term interquartile range of the variable. if None, it will be computed from obs.

Returns:

numpy.ndarray: The anomalies computed based on the given observed value(s) and the long-term median.

smadi.indicators.smapi(obs, ref=None, metric='mean')[source]

Computes anomalies in time series data based on the Soil Moisture Anomaly Percent Index(SMAPI) method.

parameters:

obs: pd.Series or np.ndarray or sequence-like object: The observed time series data.
ref: float, pd.Series or np.ndarray or sequence-like object, optional: The long-term mean (μ) or median (η) of the variable(the climate normal)
metric: str, optional: The metric to be used for computing the anomalies. Supported values: ‘mean’, ‘median’

smadi.indicators.smca(obs, metric='mean', ref=None, minimum=None, maximum=None)[source]

Computes the anomalies in time series data based on the Soil Moisture Content Anomaly(SMCA) method.

parameters:

obs: pd.Series or np.ndarray or sequence-like object: The observed time series data.
metric: str, optional: The metric to be used for computing the anomalies. Supported values: ‘mean’, ‘median’
ref: float, pd.Series or np.ndarray or sequence-like object, optional: The long-term mean (μ) or median (η) of the variable(the climate normal)
minimum: float, pd.Series or np.ndarray or sequence-like object, optional: The long-term minimum of the variable. if None, it will be computed from obs.
maximum: float, pd.Series or np.ndarray or sequence-like object, optional: The long-term maximum of the variable. if None, it will be computed from obs.

Returns:

numpy.ndarray: The anomalies computed based on the given observed value(s) and the long-term median.

smadi.indicators.smci(obs, minimum=None, maximum=None)[source]

Computes the anomalies in time series data based on the Soil Moisture Condition Index(SMCI) method.

parameters:

obs: pd.Series or np.ndarray or sequence-like object: The observed time series data.
minimum: float, pd.Series or np.ndarray or sequence-like object: The long-term minimum of the variable. if None, it will be computed from obs.
maximum: float, pd.Series or np.ndarray or sequence-like object: The long-term maximum of the variable. if None, it will be computed from obs.

Returns:

numpy.ndarray: The Soil Moisture Content Index computed based on the given observed value(s).

smadi.indicators.smd(obs, median=None, minimum=None, maximum=None)[source]

Computes the Soil Moisture Deficit (SD) based on observed value and long-term median, minimum, and maximum values.

parameters:

obs: pd.Series or np.ndarray or sequence-like object: The observed time series data.
median: float, pd.Series or np.ndarray or sequence-like object, optional: The long-term median of the variable. if None, it will be computed from obs.
minimum: float, pd.Series or np.ndarray or sequence-like object, optional: The long-term minimum of the variable. if None, it will be computed from obs.
maximum: float, pd.Series or np.ndarray or sequence-like object, optional: The long-term maximum of the variable. if None, it will be computed from obs.

Returns:

numpy.ndarray: The Soil Moisture Deficit Index computed based on the given observed value(s).

smadi.indicators.smdi(sd)[source]: Computes the Soil Moisture Deficit Index (SMDI) incrementally based on the Soil Moisture Deficit (SD) values.

smadi.indicators.smds(obs)[source]

Computes anomalies in time series data based on the Soil Moisture Drought Severity(SMDS) method. SMDS = 1 - SMP SMP = (rank(x) / (n+1))

parameters:

obs: pd.Series or np.ndarray or sequence-like object: The observed time series data.

Returns:

numpy.ndarray: The Soil Moisture Drought Severity computed based on the given observed value(s).

smadi.indicators.zscore(obs, mean=None, std=None)[source]

Computes the standardized z-score of the time series data.

Parameters:

obs: pd.Series or np.ndarray or sequence-like object: The observed time series data.
mean: float, pd.Series or np.ndarray or sequence-like object, optional: The mean of the distribution of the time series data. If None, it will be computed from obs.
std: float, pd.Series or np.ndarray or sequence-like object, optional: The standard deviation of the distribution of the time series data. If None, it will be computed from obs.

Returns:

pd.Series or np.ndarray: The z-score of the time series data.

smadi.map module

smadi.map.plot_anomaly_maps(figsize=(25, 20), ax_rows=1, ax_cols=1, df=None, x='lon', y='lat', df_colms=None, map_crs=<Projected CRS: +proj=robin +a=6378137.0 +lon_0=0 +no_defs +type=c ...> Name: unknown Axis Info [cartesian]: - E[east]: Easting (metre) - N[north]: Northing (metre) Area of Use: - undefined Coordinate Operation: - name: unknown - method: Robinson Datum: unknown - Ellipsoid: unknown - Prime Meridian: Greenwich , figure_title='', figure_title_kwargs={'fontsize': 15, 'fontweight': 'bold', 'ha': 'center', 'va': 'center', 'x': 0.5, 'y': 0.95}, maps_titles=None, maps_titles_kwargs={'fontsize': 10, 'fontweight': 'bold', 'pad': 11}, add_features=True, frame_line_width=1, add_cb=True, cb_min_max=['sm_clim', 'sm_clim', 'abs', 'anomaly', 'anomaly'], cmap='RdYlBu', vmin=None, vmax=None, add_gridlines=False, cb_kwargs={'labelsize': 0.5, 'pos': 0.4, 'show_values': False, 'tick_lines': 'center'}, cb_label=None, save_to=None)[source]

smadi.map.set_bins(colm)[source]

Set the bins and labels for color classification for the selected column.

parameters:

colm: str: The data column name for which the bins and labels are to be set.

smadi.map.set_extent(df, x='lon', y='lat', buffer=2)[source]

Set the extent for the map based on the provided dataframe and buffer.

parameters:

df: pd.DataFrame: The dataframe containing the data.
x: str: The column name for the x-axis.
y: str: The column name for the y-axis.
buffer: int: The buffer to be added to the min and max values of the x and y axis.

smadi.metadata module

smadi.plot module

smadi.plot.clss_counter(df, columns, thresholds)[source]

Count the number of values in the dataframe that fall within the thresholds for each category of the anomaly method.

parameters:

df: pd.DataFrame: The dataframe containing the data to plot.
columns: dict: The dictionary containing the column names and their respective matplotlib plot options.
thresholds: str: The name of the anomaly method to use its thresholds.

smadi.plot.draw_hbars(thresholds, x_axis)[source]

Draw horizontal bars on the plot based on the provided thresholds for each anomaly method.

parameters:

thresholds: dict: The dictionary containing the thresholds for each category of the anomaly method.
x_axis: list: The x-axis values for the plot.

smadi.plot.get_plot_options(**kwargs)[source]

Set the basic plot options based on the provided kwargs for the plot.

parameters:

kwargs: dict: The keyword arguments for the matplotlib plot.

returns:

plot_options: dict: The plot options for the figure.

smadi.plot.plot_anomaly(df, x_axis, colmns, thresholds, plot_hbars=True, plot_categories=True, **kwargs)[source]

Plot the anomaly detection results for the provided dataframe.

parameters:

df: pd.DataFrame: The dataframe containing the data to plot.
x_axis: list: The x-axis values for the plot.
colmns: dict: The dictionary containing the column names and their respective matplotlib plot options.
thresholds: str: The name of the anomaly method to use its thresholds.
plot_hbars: bool: Whether to plot the horizontal bars on the plot according to the thresholds of the anomaly method used.
plot_categories: bool: Whether to plot the number of values in each category of the anomaly method that fall within the thresholds.
kwargs: dict: The keyword arguments for the matplotlib plot for the figure such as title, xlabel, ylabel, legend, figsize, and grid.

smadi.plot.plot_categories_count(x_axis, results, anomaly_method)[source]

Plot the number of values in each category of the anomaly method that fall within the thresholds.

parameters:

x_axis: list: The x-axis values for the plot.
results: list: The list containing the number of values in each category of the anomaly method.
anomaly_method: str: The name of the anomaly method to use its thresholds.

smadi.plot.plot_colmns(df, x_axis, colmns_kwargs)[source]

Plot the data in each column of the dataframe with the provided x_axis.

parameters:

df: pd.DataFrame: The dataframe containing the data to plot.
x_axis: list: The x-axis values for the plot.
colmns_kwargs: dict: The dictionary containing the column names and their respective matplotlib plot options.

smadi.plot.plot_figure(plot_params)[source]

Plot the figure based on the provided plot parameters.

parameters:

plot_params: dict: The plot parameters for the figure.

smadi.plot.plot_fill_bet(df=None, x_axis=None, colmn=None, plot_style='ggplot', **kwargs)[source]

Plot the computed anomalies using the fill_between method.

parameters:

df: pd.DataFrame

The dataframe containing the data to plot. if None, the computed anomalies will be used.

x_axis: list

The x-axis values for the plot. if None, the index of the dataframe will be used.

colmn: str

The column name to plot.

plot_style: str

The plot style to use for the plot.

kwargs: dict

Additional parameters to be used for customizing the plot. It can be any of the following:

[‘title’, ‘xlabel’, ‘ylabel’, ‘legend’, ‘figsize’, ‘grid’]

smadi.plot.plot_ts(df, x_axis, colmns_kwargs, plot_raw=False, raw_df=None, raw_var=None, raw_kwargs=None, **kwargs)[source]

Plot the time series data for the provided dataframe.

parameters:

df: pd.DataFrame: The dataframe containing the data to plot.
x_axis: list: The x-axis values for the plot.
colmns_kwargs: dict: The dictionary containing the column names and their respective matplotlib plot options.
plot_raw: bool: Whether to plot the raw data on the plot as background.
raw_df: pd.DataFrame: The dataframe containing the raw data to plot.
raw_var: str: The name of the raw variable to plot.
raw_kwargs: dict: The dictionary containing the matplotlib plot options for the raw data.
kwargs: dict: The keyword arguments for the matplotlib plot for the figure such as title, xlabel, ylabel, legend, figsize, and grid.

smadi.plot.set_thresholds(method)[source]

Set the thresholds for the specified method based on the method name.

parameters:

method: str: The method name for which the thresholds are to be set. Supported methods are: ‘zscore’, ‘smapi’, ‘smdi’, ‘smca’, ‘smad’, ‘smci’, ‘smds’, ‘essmi’, ‘beta’, ‘gamma’

smadi.preprocess module

smadi.preprocess.bimonthly_agg(df: DataFrame, variable: str, agg_metric='mean') → DataFrame[source]

Aggregates the time series data based on bimonth-based time step.

Parameters:

df: pd.DataFrame: The DataFrame containing the time series data to be aggregated indexed by datetime index.
variable: str: The variable/column in the DataFrame to be aggregated.
agg_metric: str: The aggregation metric to be used. Supported values: ‘mean’, ‘median’, ‘min’, ‘max’, etc.

Returns:

pd.DataFrame: The DataFrame containing the aggregated data.

smadi.preprocess.clim_groupping(df: DataFrame, time_step: str) → list[source]

Groups the DataFrame based on the provided time step for climatology computation.

parameters:

df: pd.DataFrame: The DataFrame to be grouped.

returns:

list: The list of date parameters to be used for grouping.

smadi.preprocess.compute_clim(df: DataFrame, time_step: str, variable: str, metrics: List[str]) → DataFrame[source]

Computes the climatology of the time series data based on the provided time step.

Parameters:

df: pd.DataFrame: The DataFrame containing the time series data to be aggregated indexed by datetime index.
time_step: str: The time step to be used for computing the climatology. Supported values: ‘month’, ‘week’, ‘dekad’, ‘bimonth’, ‘day’
variable: str: The variable/column in the DataFrame to be aggregated.
metrics: List[str]: The metrics to be computed. Supported values: ‘mean’, ‘median’, ‘min’, ‘max’, etc.

Returns:

pd.DataFrame: The DataFrame containing the climatology data.

smadi.preprocess.dekadal_agg(df: DataFrame, variable: str, agg_metric='mean') → DataFrame[source]

Aggregates the time series data based on dekad-based time step.

Parameters:

df: pd.DataFrame: The DataFrame containing the time series data to be aggregated indexed by datetime index.
variable: str: The variable/column in the DataFrame to be aggregated.
agg_metric: str: The aggregation metric to be used. Supported values: ‘mean’, ‘median’, ‘min’, ‘max’, etc.

Returns:

pd.DataFrame: The DataFrame containing the aggregated data.

smadi.preprocess.fillna(df: DataFrame, variable: str, fillna_window_size: int) → DataFrame[source]

Fills NaN values in the time series data using a moving window average.

Parameters:

df: pd.DataFrame: The DataFrame containing the time series data to be filled indexed by datetime index.
variable: str: The variable/column in the DataFrame to be filled.
fillna_window_size: int: The size of the moving window [days] for filling NaN values. It is recommended to be an odd number.

Returns:

pd.DataFrame: The DataFrame containing the filled time series data.

Filters the DataFrame based on specified time/date conditions.

Parameters:

df: pd.DataFrame, optional: The DataFrame to be filtered. It should be indexed by a datetime index.
year: int or None, optional: The year to filter the DataFrame.
month: int or None, optional: The month to filter the DataFrame.
bimonth: int or None, optional: The bimonth to filter the DataFrame.
dekad: int or None, optional: The dekad to filter the DataFrame.
week: int or None, optional: The week to filter the DataFrame.
day: int or None, optional: The day to filter the DataFrame.
start_date: str or None, optional: The start date for filtering.
end_date: str or None, optional: The end date for filtering.

Returns:

pd.DataFrame: The filtered DataFrame.

smadi.preprocess.monthly_agg(df: DataFrame, variable: str, agg_metric='mean') → DataFrame[source]

Aggregates the time series data based on month-based time step.

Parameters:

df: pd.DataFrame: The DataFrame containing the time series data to be aggregated indexed by datetime index.
variable: str: The variable/column in the DataFrame to be aggregated.
agg_metric: str: The aggregation metric to be used. Supported values: ‘mean’, ‘median’, ‘min’, ‘max’, etc.

Returns:

pd.DataFrame: The DataFrame containing the aggregated data.

smadi.preprocess.smooth(df: DataFrame, variable: str, window_size: int) → DataFrame[source]

Smooths the time series data using a moving window average.

Parameters:

df: pd.DataFrame: The DataFrame containing the time series data to be smoothed indexed by datetime index.
variable: str: The variable/column in the DataFrame to be smoothed.
window_size: int: The size of the moving window [days] for smoothing(. It is recommended to be an odd number.

Returns:

pd.DataFrame: The DataFrame containing the smoothed time series data.

smadi.preprocess.validate_anomaly_method(methods, _Detectors)[source]: Validate the names of the anomaly detection methods.

smadi.preprocess.validate_date_params(time_step: str, year: int | List[int] = None, month: int | List[int] = None, dekad: int | List[int] = None, week: int | List[int] = None, bimonth: int | List[int] = None, day: int | List[int] = None) → Dict[str, List[int]][source]: Validate the date parameters for the anomaly detection workflow.

smadi.preprocess.weekly_agg(df: DataFrame, variable: str, agg_metric='mean') → DataFrame[source]

Aggregates the time series data based on week-based time step.

Parameters:

df: pd.DataFrame: The DataFrame containing the time series data to be aggregated indexed by datetime index.
variable: str: The variable/column in the DataFrame to be aggregated.
agg_metric: str: The aggregation metric to be used. Supported values: ‘mean’, ‘median’, ‘min’, ‘max’, etc.

Returns:

pd.DataFrame: The DataFrame containing the aggregated data.

smadi.utils module

smadi.utils.create_logger(name, level=10)[source]

Create a logger with the given name and level

parameters:

name: str: name of the logger
level: logging.LEVEL: level of the logger

returns:

logger: logging.logger: a logger object

smadi.utils.get_country_code(country_name)[source]

Get the ISO 3166-1 alpha-3 country code for a given country name.

parameters:

country_name: str: name of the country

returns:

country_code: str: ISO 3166-1 alpha-3 country code

smadi.utils.get_gpis_from_bbox(bbox, res=6.25)[source]

Get the GPIS based on the bounding box

parameters:

bbox: tuple: bounding box in the format (lonmin, lonmax, latmin, latmax)
res: float: resolution of the grid. Default is 6.25 km

returns:

pd.DataFrame: a dataframe containing the GPIS, longitude, and latitude

smadi.utils.load_gpis_by_country(country, res=6.25, format='csv')[source]

Load the GPIS based on the country name from the DGG API Source: https://dgg.geo.tuwien.ac.at/

parameters:

country: str: name of the country
grid: str: name of the grid to be used. Default is “fibgrid_n6600000”. Supported grids are: - fibgrid_n6600000 (Fibonacci 6.5 km) - fibgrid_n1650000 (Fibonacci 12.5 km) - fibgrid_n430000 (Fibonacci 25 km) - warp (WARP)
format: str: format of the data to be returned. Default is “csv”. Supported formats are: - csv - json

smadi.utils.log_exception(logger)[source]

A decorator to log exceptions in a function

parameters:

logger: logging.logger: a logger object

returns:

decorator: function: a decorator function

smadi.utils.log_time(logger)[source]

A decorator to log the time taken by a function

parameters:

logger: logging.logger: a logger object

returns:

decorator: function: a decorator function

smadi.workflow module

run_workflow.py - SMADI Workflow Execution

smadi.workflow.load_ts(gpi, variable='sm')[source]: Load ASCAT time series for a given gpi

smadi.workflow.main()[source]

smadi.workflow.parse_arguments(parser)[source]

Parse the arguments and return the parsed arguments as a dictionary.

returns:

parsed_args: dict: The parsed arguments as a dictionary

smadi.workflow.run_smadi(aoi: str | Tuple[float, float, float, float], methods: str | List[str] = ['zscore'], variable: str = 'sm', time_step: str = 'month', fillna: bool = False, fillna_window_size: int = None, smoothing: bool = False, smooth_window_size: int = None, timespan: List[str] = None, year: List[int] = None, month: List[int] = None, dekad: List[int] = None, week: List[int] = None, bimonth: List[int] = None, day: List[int] = None, workers: int = None, addi_retrive: bool = False) → DataFrame[source]: Run the anomaly detection workflow for multiple grid point indices with multiprocessing support.

smadi.workflow.setup_argument_parser() → ArgumentParser[source]: Setup argument parser for SMADI workflow execution.

smadi.workflow.single_po_run(gpi: int, methods: str = ['zscore'], variable: str = 'sm', time_step: str = 'month', fillna: bool = False, fillna_window_size: int = None, smoothing: bool = False, smooth_window_size: int = None, year: int | List[int] = None, month: int | List[int] = None, dekad: int | List[int] = None, week: int | List[int] = None, bimonth: int | List[int] = None, day: int | List[int] = None, timespan: List[str] = None, addi_retrive: bool = False, agg_metric: list = 'mean') → Tuple[int, Dict[str, float]][source]: Run the anomaly detection workflow for a single grid point index.

smadi package

Submodules

smadi.anomaly_detectors module

smadi.climatology module

Methods:

Attributes:

Methods:

Parameters:

Returns:

Attributes:

Methods:

Methods:

smadi.data_reader module

smadi.indicators module

parameters:

returns:

parameters:

parameters:

Returns:

parameters:

parameters:

Returns:

parameters:

Returns:

parameters:

Returns:

parameters:

Returns:

Parameters:

Returns:

smadi.map module

parameters:

parameters:

smadi.metadata module

smadi.plot module

parameters:

parameters:

parameters:

returns:

parameters:

parameters:

parameters:

parameters:

parameters:

parameters:

parameters:

smadi.preprocess module

Parameters:

Returns:

parameters:

returns:

Parameters:

Returns:

Parameters:

Returns:

Parameters:

Returns:

Parameters:

Returns:

Parameters:

Returns:

Parameters:

Returns:

Parameters:

Returns:

smadi.utils module

parameters:

returns:

parameters:

returns:

parameters:

returns:

parameters:

parameters:

returns:

parameters:

returns:

smadi.workflow module

returns:

Module contents