smadi package

Submodules

smadi.anomaly_detectors module

Soil Moisture Anomalies Calculation Module

This module provides a suite of methods for calculating soil moisture anomalies based on climatological data. The implemented methods include:

  1. Z-Score: Standardized z-score method.

  2. SMAPI (Soil Moisture Anomaly Percent Index): Measures anomalies as a percentage deviation from the climatological mean or median.

  3. SMDI (Soil Moisture Deficit Index): Quantifies soil moisture deficit based on deviations from climatological median and extremes.

  4. ESSMI (Empirical Standardized Soil Moisture Index): Uses a nonparametric empirical probability density function for standardizing soil moisture values.

  5. SMAD (Standardized Anomaly Absolute Deviation): Calculates anomalies using the median and interquartile range, providing robustness to outliers.

  6. SMDS (Soil Moisture Drought Severity): Assesses drought severity based on percentile rankings of soil moisture values.

  7. SMCI (Soil Moisture Condition Index): Measures soil moisture content relative to climatological minima and maxima.

  8. SMCA (Soil Moisture Content Anomaly): Quantifies anomalies as a deviation from climatological mean or median relative to maxima.

  9. ParaDis (Parametric Distribution): Fits observed data to parametric distributions (e.g., beta, gamma).

class smadi.anomaly_detectors.AnomalyDetector(df: DataFrame, variable: str, fillna: bool = False, fillna_window_size: int = None, smoothing=False, smooth_window_size=None, timespan: List[str] = None, time_step: str = 'month', normal_metrics: List[str] = ['mean'], agg_metric: str = 'mean', dist: List[str] = None)[source]

Bases: ABC

An abstract class for detecting anomalies in time series data based on the climatology.

apply_transformation(func, **kwargs) DataFrame[source]
property clim_df

The DataFrame containing the climate normal data.

abstract detect_anomaly(**kwargs) DataFrame[source]
property groupby_param

The column name to be used for grouping the data for the anomaly detection.

class smadi.anomaly_detectors.AnomalyDetectorFactory[source]

Bases: object

A factory class for creating anomaly detectors based on the provided method.

static create_detector(method: str, **kwargs) AnomalyDetector[source]
methods = {'essmi': <class 'smadi.anomaly_detectors.ESSMI'>, 'paradis': <class 'smadi.anomaly_detectors.ParaDis'>, 'smad': <class 'smadi.anomaly_detectors.SMAD'>, 'smapi': <class 'smadi.anomaly_detectors.SMAPI'>, 'smca': <class 'smadi.anomaly_detectors.SMCA'>, 'smci': <class 'smadi.anomaly_detectors.SMCI'>, 'smdi': <class 'smadi.anomaly_detectors.SMDI'>, 'smds': <class 'smadi.anomaly_detectors.SMDS'>, 'zscore': <class 'smadi.anomaly_detectors.ZScore'>}
class smadi.anomaly_detectors.ESSMI(df: DataFrame, variable: str, fillna: bool = False, fillna_window_size: int = None, smoothing=False, smooth_window_size=None, timespan: List[str] = None, time_step: str = 'month', normal_metrics: List[str] = ['mean'], agg_metric: str = 'mean', dist: List[str] = None)[source]

Bases: AnomalyDetector

A class for detecting anomalies in time series data based on the Empirical Standardized Soil Moisture Index(ESSMI) method.

The index is computed by fitting the nonparametric empirical probability density function (ePDF) using the kernel density estimator KDE

f^h = 1/nh * Σ K((x - xi) / h) K = 1/√(2π) * exp(-x^2/2)

where: f^h: the ePDF K: the Guassian kernel function h: the bandwidth of the kernel function as smoothing parameter (Scott’s rule) n: the number of observations x: the average value of the variable in the time series data. It can be any of the following: Daily average, weekly average, monthly average, etc. xi: the ith observation

The ESSMI is then computed by transforming the ePDF to the standard normal distribution with a mean of zero and a standard deviation of one using the inverse of the standard normal distribution function.

ESSMI = Φ^-1(F^h(x))

where: Φ^-1: the inverse of the standard normal distribution function F^h: the ePDF

detect_anomaly(**kwargs) DataFrame[source]
class smadi.anomaly_detectors.ParaDis(df: DataFrame, variable: str, fillna: bool = False, fillna_window_size: int = None, smoothing=False, smooth_window_size=None, timespan: List[str] = None, time_step: str = 'month', normal_metrics: List[str] = ['mean'], agg_metric: str = 'mean', dist: List[str] = None)[source]

Bases: AnomalyDetector

A class for detecting anomalies in time series data based on fitting the observed data to a parametric distribution(e.g. beta and gamma).

detect_anomaly(**kwargs) DataFrame[source]
class smadi.anomaly_detectors.SMAD(df: DataFrame, variable: str, fillna: bool = False, fillna_window_size: int = None, smoothing=False, smooth_window_size=None, timespan: List[str] = None, time_step: str = 'month', normal_metrics: List[str] = ['mean'], agg_metric: str = 'mean', dist: List[str] = None)[source]

Bases: AnomalyDetector

A class for detecting anomalies in time series data based on the Standardized Anomaly Absolute Deviation(SMAD) method.

SMAD = (x - η) / IQR

where: x: the average value of the variable in the time series data. It can be any of the following: Daily average, weekly average, monthly average, etc. η: the long-term median of the variable(the climate normal). IQR: the interquartile range of the variable. It is the difference between the 75th and 25th percentiles of the variable.

detect_anomaly(**kwargs) DataFrame[source]
class smadi.anomaly_detectors.SMAPI(df: DataFrame, variable: str, fillna: bool = False, fillna_window_size: int = None, smoothing=False, smooth_window_size=None, timespan: List[str] = None, time_step: str = 'month', normal_metrics: List[str] = ['mean'], agg_metric: str = 'mean', dist: List[str] = None)[source]

Bases: AnomalyDetector

A class for detecting anomalies in time series data based on the Soil Moisture Anomaly Percent Index(SMAPI) method.

SMAPI = ((x - ref) / ref) * 100

where: x: the average value of the variable in the time series data. It can be any of the following: Daily average, weekly average, monthly average, etc. ref: the long-term mean (μ​) or median (η) of the variable(the climate normal).

detect_anomaly(**kwargs) DataFrame[source]
class smadi.anomaly_detectors.SMCA(df: DataFrame, variable: str, fillna: bool = False, fillna_window_size: int = None, smoothing=False, smooth_window_size=None, timespan: List[str] = None, time_step: str = 'month', normal_metrics: List[str] = ['mean'], agg_metric: str = 'mean', dist: List[str] = None)[source]

Bases: AnomalyDetector

A class for detecting anomalies in time series data based on the Soil Moisture Content Anomaly(SMCA) method.

SMCA = (x - ref) / (max - ref)

where: x: the average value of the variable in the time series data. It can be any of the following: Daily average, weekly average, monthly average, etc.

ref: the long-term mean (μ) or median (η) of the variable(the climate normal). max: the long-term maximum of the variable. min: the long-term minimum of the variable.

detect_anomaly(**kwargs) DataFrame[source]
class smadi.anomaly_detectors.SMCI(df: DataFrame, variable: str, fillna: bool = False, fillna_window_size: int = None, smoothing=False, smooth_window_size=None, timespan: List[str] = None, time_step: str = 'month', normal_metrics: List[str] = ['mean'], agg_metric: str = 'mean', dist: List[str] = None)[source]

Bases: AnomalyDetector

A class for detecting anomalies in time series data based on the Soil Moisture Condition Index(SMCI) method.

SMCI = ((x - min) / (max - min))

where: x: the average value of the variable in the time series data. It can be any of the following: Daily average, weekly average, monthly average, etc. min: the long-term minimum of the variable. max: the long-term maximum of the variable.

detect_anomaly(**kwargs) DataFrame[source]
class smadi.anomaly_detectors.SMDI(df: DataFrame, variable: str, fillna: bool = False, fillna_window_size: int = None, smoothing=False, smooth_window_size=None, timespan: List[str] = None, time_step: str = 'month', normal_metrics: List[str] = ['mean'], agg_metric: str = 'mean', dist: List[str] = None)[source]

Bases: AnomalyDetector

A class for detecting anomalies in time series data based on the Soil Moisture Deficit Index(SMDI) method.

SMDI = 0.5 * SMDI(t-1) + (SD(t) / 50)

where

SD(t) = ((x - η) / (η - min)) * 100 if x <= η SD(t) = ((x - η) / (max - η)) * 100 if x > η

x: the average value of the variable in the time series data. It can be any of the following: Daily average, weekly average, monthly average, etc. η: the long-term median of the variable(the climate normal). min: the long-term minimum of the variable. max: the long-term maximum of the variable. t: the time step of the time series data.

detect_anomaly(**kwargs) DataFrame[source]
class smadi.anomaly_detectors.SMDS(df: DataFrame, variable: str, fillna: bool = False, fillna_window_size: int = None, smoothing=False, smooth_window_size=None, timespan: List[str] = None, time_step: str = 'month', normal_metrics: List[str] = ['mean'], agg_metric: str = 'mean', dist: List[str] = None)[source]

Bases: AnomalyDetector

A class for detecting anomalies in time series data based on the Soil Moisture Drought Severity(SMDS) method.

SMDS = 1 - SMP SMP = (rank(x) / (n+1))

where:

SMP: the Soil Moisture Percentile. It is the percentile of the average value of the variable in the time series data. SMDS: the Soil Moisture Drought Severity. It is the severity of the drought based on the percentile of the average value of the variable in the time series data. rank(x): the rank of the average value of the variable in the time series data. n: the number of years in the time series data.

detect_anomaly(**kwargs) DataFrame[source]
class smadi.anomaly_detectors.ZScore(df: DataFrame, variable: str, fillna: bool = False, fillna_window_size: int = None, smoothing=False, smooth_window_size=None, timespan: List[str] = None, time_step: str = 'month', normal_metrics: List[str] = ['mean'], agg_metric: str = 'mean', dist: List[str] = None)[source]

Bases: AnomalyDetector

A class for detecting anomalies in time series data based on the Z-Score method.

z_score = (x - μ) / σ

where: x: the average value of the variable in the time series data. It can be any of the following: Daily average, weekly average, monthly average, etc. μ: the long-term mean of the variable(the climate normal). σ: the long-term standard deviation of the variable.

detect_anomaly(**kwargs) DataFrame[source]

smadi.climatology module

A module for calculating climatology (climate normal) for different time steps (month, dekad, week) based on time series data.

class smadi.climatology.Aggregator(df: DataFrame, variable: str, fillna: bool = False, fillna_window_size: int = None, smoothing: bool = False, smooth_window_size: int = None, timespan: List[str] = None, agg_metric: str = 'mean')[source]

Bases: ABC

An abstract class for aggregating time series data based on different time steps.

Attributes:

df: pd.DataFrame

The input DataFrame containing the time series data to be aggregated.

variable: str

The variable/column in the DataFrame to be aggregated.

fillna: bool

Fill NaN values in the time series data using a moving window average.

fillna_window_size: int

smoothing: bool

Smooth the time series data using a moving window average.

smooth_window_size: int

The size of the moving window for smoothing(n-days). It is recommended to be an odd number.

timespan: list[str, str] optional

The start and end dates for a timespan to be aggregated. Format: [‘YYYY-MM-DD ]

agg_metric: str

The aggregation metric to be used. Supported values: ‘mean’, ‘median’, ‘min’, ‘max’, ‘std’, etc.

Methods:

aggregate:

Aggregates the time series data based on the provided time step.

abstract aggregate(**kwargs)[source]
property preprocess_df
class smadi.climatology.BimonthlyAggregator(df: DataFrame, variable: str, fillna: bool = False, fillna_window_size: int = None, smoothing: bool = False, smooth_window_size: int = None, timespan: List[str] = None, agg_metric: str = 'mean')[source]

Bases: Aggregator

Aggregates the time series data based on bimonthly (twice a month) time step.

aggregate(**kwargs)[source]
class smadi.climatology.Climatology(df: DataFrame, variable: str, fillna: bool = False, fillna_window_size: int = None, smoothing=False, smooth_window_size=None, timespan: List[str] = None, time_step: str = 'month', normal_metrics: List[str] = ['mean'], agg_metric: str = 'mean')[source]

Bases: object

A class for calculating climatology(climate normal) for time series data.

Attributes:

df_original: pd.DataFrame

The original input DataFrame before resampling and removing NaN values.

df: pd.DataFrame

The input DataFrame containing the preprocessed data to be aggregated.

variable: str

The variable/column in the DataFrame to be aggregated.

fillna: bool

Fill NaN values in the time series data using a moving window average.

fillna_window_size: int

The size of the moving window for filling NaN values. It is recommended to be an odd number.

smoothing: bool

Smooth the time series data using a moving window average.

smooth_window_size: int

The size of the moving window for smoothing(n-days). It is recommended to be an odd number.

timespan: list[str, str] optional

The start and end dates for a timespan to be aggregated. Format: [‘YYYY-MM-DD ]

time_step: str

The time step for aggregation. Supported values: ‘day’, ‘week’, ‘dekad’, ‘bimonth’, ‘month’.

agg_metric: str

The aggregation metric to be used. Supported values: ‘mean’, ‘median’, ‘min’, ‘max’, ‘std’, etc.

normal_metrics: List[str]

The metrics to be used in the climatology computation. Supported values: ‘mean’, ‘median’, ‘min’, ‘max’, etc.

clima_df: pd.DataFrame

The DataFrame containing climatology information.

Methods:

compute_normals:

Calculates climatology based on the aggregated data.

plot_ts:

Plot the time series data for the provided dataframe.

property aggregated_df
compute_normals(**kwargs) DataFrame[source]

Calculates climatology based on the aggregated data.

Parameters:

kwargs:

Additional time/date filtering parameters.

Returns:

pd.DataFrame

The DataFrame containing climatology information.

class smadi.climatology.DailyAggregator(df: DataFrame, variable: str, fillna: bool = False, fillna_window_size: int = None, smoothing: bool = False, smooth_window_size: int = None, timespan: List[str] = None, agg_metric: str = 'mean')[source]

Bases: Aggregator

Aggregates the time series data based on daily time step.

aggregate(**kwargs)[source]
class smadi.climatology.DekadalAggregator(df: DataFrame, variable: str, fillna: bool = False, fillna_window_size: int = None, smoothing: bool = False, smooth_window_size: int = None, timespan: List[str] = None, agg_metric: str = 'mean')[source]

Bases: Aggregator

Aggregates the data based on dekad-based time step.

aggregate(**kwargs)[source]
class smadi.climatology.MonthlyAggregator(df: DataFrame, variable: str, fillna: bool = False, fillna_window_size: int = None, smoothing: bool = False, smooth_window_size: int = None, timespan: List[str] = None, agg_metric: str = 'mean')[source]

Bases: Aggregator

Aggregates the time series data based on month-based time step.

aggregate(**kwargs)[source]
class smadi.climatology.Preprocessor(df: DataFrame, variable: str, fillna: bool = False, fillna_window_size: int = None, smoothing: bool = False, smooth_window_size: int = None, timespan: List[str] = None)[source]

Bases: object

A class for preprocessing the time series data before aggregation.

Attributes:

df: pd.DataFrame

The input DataFrame containing the time series data to be aggregated.

variable: str

The variable/column in the DataFrame to be aggregated.

fillna: bool

Fill NaN values in the time series data using a moving window average.

fillna_window_size: int

The size of the moving window for filling NaN values. It is recommended to be an odd number.

smoothing: bool

Smooth the time series data using a moving window average.

smooth_window_size: int

The size of the moving window for smoothing(n-days). It is recommended to be an odd number.

timespan: list[str, str] optional

The start and end dates for a timespan to be aggregated. Format: [‘YYYY-MM-DD ]

Methods:

preprocess:

Preprocess the time series data by resampling, truncating, filling NaN values, and smoothing.

preprocess()[source]
class smadi.climatology.Validator(df: DataFrame, variable: str, fillna: bool = False, fillna_window_size: int = None, smoothing: bool = False, smooth_window_size: int = None, time_step: str = None, metrics: List[str] = None, time_span: List[str] = None)[source]

Bases: object

A class for validating the input parameters for the climatology computation.

Methods:

validate:

Validates the input parameters for the climatology computation.

validate()[source]
class smadi.climatology.WeeklyAggregator(df: DataFrame, variable: str, fillna: bool = False, fillna_window_size: int = None, smoothing: bool = False, smooth_window_size: int = None, timespan: List[str] = None, agg_metric: str = 'mean')[source]

Bases: Aggregator

Aggregates the time series data based on week-based time step.

aggregate(**kwargs)[source]

smadi.data_reader module

class smadi.data_reader.AscatData(path, read_bulk=True)[source]

Bases: GriddedNcContiguousRaggedTs

Class reading ASCAT SSM 6.25 km data.

class smadi.data_reader.Era5Land(path, read_bulk=True, celsius=True)[source]

Bases: GriddedNcTs

Read time series data from ERA5 netCDF files.

smadi.data_reader.extract_obs_ts(loc, ascat_path, era5_land_path=None, obs_type='sm', read_bulk=False)[source]

Read time series of given observation type.

Parameters:
  • loc (int, tuple) – Tuple is interpreted as longitude, latitude coordinate. Integer is interpreted as grid point index.

  • ascat_path (str) – Path to ASCAT soil moisture data.

  • era5_land_path (str) – Path to ERA5-Land data.

  • obs (str, optional) – Observation type (default: “sm”).

  • read_bulk (bool, optional) – If “True” all data will be read in memory, if “False” only a single time series is read (default: False). Use “True” to process multiple GPIs in a loop and “False” to read/analyze a single time series.

smadi.data_reader.read_era5(era5_path, loc, interpo_method='nearest')[source]

Read ERA5 data for given location.

Parameters:
  • era5_path (str) – Path to ERA5 dataset.

  • loc (tuple) – Tuple is interpreted as longitude, latitude coordinate.

  • interpo_method (str, optional) – Interpolation method (default: “nearest”).

Returns:

ts – Time series of ERA5 data containing all variables.

Return type:

pd.DataFrame

smadi.data_reader.read_grid_point(loc, ascat_sm_path, era5_land_path=None, read_bulk=False)[source]

Read grid point for given lon/lat coordinates or grid_point.

Parameters:
  • loc (int, tuple) – Tuple is interpreted as longitude, latitude coordinate. Integer is interpreted as grid point index.

  • ascat_sm_path (str) – Path to ASCAT soil moisture data.

  • era5_land_path (str) – Path to ERA5-Land data.

  • read_bulk (bool, optional) – If “True” all data will be read in memory, if “False” only a single time series is read (default: False). Use “True” to process multiple GPIs in a loop and “False” to read/analyze a single time series.

smadi.indicators module

smadi.indicators.essmi(obs)[source]

Compute the anomalies in time series data based on the Empirical Standardized Soil Moisture Index(ESSMI) method.

parameters:

obs: sequence-like object

The observed time series data.

returns:

numpy.ndarray

The Empirical Standardized Soil Moisture Index computed based on the given observed value(s).

smadi.indicators.para_dis(obs, dist='beta')[source]

Compute the anomalies in time series data based on fitting the observed data to a parametric distribution.

parameters:

obs: pd.Series or np.ndarray or sequence-like object

The observed time series data.

dist: str, optional

The distribution to fit the observed data to. Supported values: ‘beta’,’gamma’, ‘gam’, ‘exp’, ‘pe3’ gam: Gamma exp: Exponential pe3: Pearson III

smadi.indicators.smad(obs, median=None, iqr=None)[source]

Computes the anomalies in time series data based on the Standardized Median Absolute Deviation(SMAD) method.

parameters:

obs: pd.Series or np.ndarray or sequence-like object

The observed time series data.

median: float, pd.Series or np.ndarray or sequence-like object, optional

The long-term median of the variable. if None, it will be computed from obs.

iqr: float, pd.Series or np.ndarray or sequence-like object, optional

The long-term interquartile range of the variable. if None, it will be computed from obs.

Returns:

numpy.ndarray

The anomalies computed based on the given observed value(s) and the long-term median.

smadi.indicators.smapi(obs, ref=None, metric='mean')[source]

Computes anomalies in time series data based on the Soil Moisture Anomaly Percent Index(SMAPI) method.

parameters:

obs: pd.Series or np.ndarray or sequence-like object

The observed time series data.

ref: float, pd.Series or np.ndarray or sequence-like object, optional

The long-term mean (μ​) or median (η) of the variable(the climate normal)

metric: str, optional

The metric to be used for computing the anomalies. Supported values: ‘mean’, ‘median’

smadi.indicators.smca(obs, metric='mean', ref=None, minimum=None, maximum=None)[source]

Computes the anomalies in time series data based on the Soil Moisture Content Anomaly(SMCA) method.

parameters:

obs: pd.Series or np.ndarray or sequence-like object

The observed time series data.

metric: str, optional

The metric to be used for computing the anomalies. Supported values: ‘mean’, ‘median’

ref: float, pd.Series or np.ndarray or sequence-like object, optional

The long-term mean (μ​) or median (η) of the variable(the climate normal)

minimum: float, pd.Series or np.ndarray or sequence-like object, optional

The long-term minimum of the variable. if None, it will be computed from obs.

maximum: float, pd.Series or np.ndarray or sequence-like object, optional

The long-term maximum of the variable. if None, it will be computed from obs.

Returns:

numpy.ndarray

The anomalies computed based on the given observed value(s) and the long-term median.

smadi.indicators.smci(obs, minimum=None, maximum=None)[source]

Computes the anomalies in time series data based on the Soil Moisture Condition Index(SMCI) method.

parameters:

obs: pd.Series or np.ndarray or sequence-like object

The observed time series data.

minimum: float, pd.Series or np.ndarray or sequence-like object

The long-term minimum of the variable. if None, it will be computed from obs.

maximum: float, pd.Series or np.ndarray or sequence-like object

The long-term maximum of the variable. if None, it will be computed from obs.

Returns:

numpy.ndarray

The Soil Moisture Content Index computed based on the given observed value(s).

smadi.indicators.smd(obs, median=None, minimum=None, maximum=None)[source]

Computes the Soil Moisture Deficit (SD) based on observed value and long-term median, minimum, and maximum values.

parameters:

obs: pd.Series or np.ndarray or sequence-like object

The observed time series data.

median: float, pd.Series or np.ndarray or sequence-like object, optional

The long-term median of the variable. if None, it will be computed from obs.

minimum: float, pd.Series or np.ndarray or sequence-like object, optional

The long-term minimum of the variable. if None, it will be computed from obs.

maximum: float, pd.Series or np.ndarray or sequence-like object, optional

The long-term maximum of the variable. if None, it will be computed from obs.

Returns:

numpy.ndarray

The Soil Moisture Deficit Index computed based on the given observed value(s).

smadi.indicators.smdi(sd)[source]

Computes the Soil Moisture Deficit Index (SMDI) incrementally based on the Soil Moisture Deficit (SD) values.

smadi.indicators.smds(obs)[source]

Computes anomalies in time series data based on the Soil Moisture Drought Severity(SMDS) method. SMDS = 1 - SMP SMP = (rank(x) / (n+1))

parameters:

obs: pd.Series or np.ndarray or sequence-like object

The observed time series data.

Returns:

numpy.ndarray

The Soil Moisture Drought Severity computed based on the given observed value(s).

smadi.indicators.zscore(obs, mean=None, std=None)[source]

Computes the standardized z-score of the time series data.

Parameters:

obs: pd.Series or np.ndarray or sequence-like object

The observed time series data.

mean: float, pd.Series or np.ndarray or sequence-like object, optional

The mean of the distribution of the time series data. If None, it will be computed from obs.

std: float, pd.Series or np.ndarray or sequence-like object, optional

The standard deviation of the distribution of the time series data. If None, it will be computed from obs.

Returns:

pd.Series or np.ndarray

The z-score of the time series data.

smadi.map module

smadi.map.plot_anomaly_maps(figsize=(25, 20), ax_rows=1, ax_cols=1, df=None, x='lon', y='lat', df_colms=None, map_crs=<Projected CRS: +proj=robin +a=6378137.0 +lon_0=0 +no_defs +type=c ...> Name: unknown Axis Info [cartesian]: - E[east]: Easting (metre) - N[north]: Northing (metre) Area of Use: - undefined Coordinate Operation: - name: unknown - method: Robinson Datum: unknown - Ellipsoid: unknown - Prime Meridian: Greenwich , figure_title='', figure_title_kwargs={'fontsize': 15, 'fontweight': 'bold', 'ha': 'center', 'va': 'center', 'x': 0.5, 'y': 0.95}, maps_titles=None, maps_titles_kwargs={'fontsize': 10, 'fontweight': 'bold', 'pad': 11}, add_features=True, frame_line_width=1, add_cb=True, cb_min_max=['sm_clim', 'sm_clim', 'abs', 'anomaly', 'anomaly'], cmap='RdYlBu', vmin=None, vmax=None, add_gridlines=False, cb_kwargs={'labelsize': 0.5, 'pos': 0.4, 'show_values': False, 'tick_lines': 'center'}, cb_label=None, save_to=None)[source]
smadi.map.set_bins(colm)[source]

Set the bins and labels for color classification for the selected column.

parameters:

colm: str

The data column name for which the bins and labels are to be set.

smadi.map.set_extent(df, x='lon', y='lat', buffer=2)[source]

Set the extent for the map based on the provided dataframe and buffer.

parameters:

df: pd.DataFrame

The dataframe containing the data.

x: str

The column name for the x-axis.

y: str

The column name for the y-axis.

buffer: int

The buffer to be added to the min and max values of the x and y axis.

smadi.metadata module

smadi.plot module

smadi.plot.clss_counter(df, columns, thresholds)[source]

Count the number of values in the dataframe that fall within the thresholds for each category of the anomaly method.

parameters:

df: pd.DataFrame

The dataframe containing the data to plot.

columns: dict

The dictionary containing the column names and their respective matplotlib plot options.

thresholds: str

The name of the anomaly method to use its thresholds.

smadi.plot.draw_hbars(thresholds, x_axis)[source]

Draw horizontal bars on the plot based on the provided thresholds for each anomaly method.

parameters:

thresholds: dict

The dictionary containing the thresholds for each category of the anomaly method.

x_axis: list

The x-axis values for the plot.

smadi.plot.get_plot_options(**kwargs)[source]

Set the basic plot options based on the provided kwargs for the plot.

parameters:

kwargs: dict

The keyword arguments for the matplotlib plot.

returns:

plot_options: dict

The plot options for the figure.

smadi.plot.plot_anomaly(df, x_axis, colmns, thresholds, plot_hbars=True, plot_categories=True, **kwargs)[source]

Plot the anomaly detection results for the provided dataframe.

parameters:

df: pd.DataFrame

The dataframe containing the data to plot.

x_axis: list

The x-axis values for the plot.

colmns: dict

The dictionary containing the column names and their respective matplotlib plot options.

thresholds: str

The name of the anomaly method to use its thresholds.

plot_hbars: bool

Whether to plot the horizontal bars on the plot according to the thresholds of the anomaly method used.

plot_categories: bool

Whether to plot the number of values in each category of the anomaly method that fall within the thresholds.

kwargs: dict

The keyword arguments for the matplotlib plot for the figure such as title, xlabel, ylabel, legend, figsize, and grid.

smadi.plot.plot_categories_count(x_axis, results, anomaly_method)[source]

Plot the number of values in each category of the anomaly method that fall within the thresholds.

parameters:

x_axis: list

The x-axis values for the plot.

results: list

The list containing the number of values in each category of the anomaly method.

anomaly_method: str

The name of the anomaly method to use its thresholds.

smadi.plot.plot_colmns(df, x_axis, colmns_kwargs)[source]

Plot the data in each column of the dataframe with the provided x_axis.

parameters:

df: pd.DataFrame

The dataframe containing the data to plot.

x_axis: list

The x-axis values for the plot.

colmns_kwargs: dict

The dictionary containing the column names and their respective matplotlib plot options.

smadi.plot.plot_figure(plot_params)[source]

Plot the figure based on the provided plot parameters.

parameters:

plot_params: dict

The plot parameters for the figure.

smadi.plot.plot_fill_bet(df=None, x_axis=None, colmn=None, plot_style='ggplot', **kwargs)[source]

Plot the computed anomalies using the fill_between method.

parameters:

df: pd.DataFrame

The dataframe containing the data to plot. if None, the computed anomalies will be used.

x_axis: list

The x-axis values for the plot. if None, the index of the dataframe will be used.

colmn: str

The column name to plot.

plot_style: str

The plot style to use for the plot.

kwargs: dict

Additional parameters to be used for customizing the plot. It can be any of the following:

[‘title’, ‘xlabel’, ‘ylabel’, ‘legend’, ‘figsize’, ‘grid’]

smadi.plot.plot_ts(df, x_axis, colmns_kwargs, plot_raw=False, raw_df=None, raw_var=None, raw_kwargs=None, **kwargs)[source]

Plot the time series data for the provided dataframe.

parameters:

df: pd.DataFrame

The dataframe containing the data to plot.

x_axis: list

The x-axis values for the plot.

colmns_kwargs: dict

The dictionary containing the column names and their respective matplotlib plot options.

plot_raw: bool

Whether to plot the raw data on the plot as background.

raw_df: pd.DataFrame

The dataframe containing the raw data to plot.

raw_var: str

The name of the raw variable to plot.

raw_kwargs: dict

The dictionary containing the matplotlib plot options for the raw data.

kwargs: dict

The keyword arguments for the matplotlib plot for the figure such as title, xlabel, ylabel, legend, figsize, and grid.

smadi.plot.set_thresholds(method)[source]

Set the thresholds for the specified method based on the method name.

parameters:

method: str

The method name for which the thresholds are to be set. Supported methods are: ‘zscore’, ‘smapi’, ‘smdi’, ‘smca’, ‘smad’, ‘smci’, ‘smds’, ‘essmi’, ‘beta’, ‘gamma’

smadi.preprocess module

smadi.preprocess.bimonthly_agg(df: DataFrame, variable: str, agg_metric='mean') DataFrame[source]

Aggregates the time series data based on bimonth-based time step.

Parameters:

df: pd.DataFrame

The DataFrame containing the time series data to be aggregated indexed by datetime index.

variable: str

The variable/column in the DataFrame to be aggregated.

agg_metric: str

The aggregation metric to be used. Supported values: ‘mean’, ‘median’, ‘min’, ‘max’, etc.

Returns:

pd.DataFrame

The DataFrame containing the aggregated data.

smadi.preprocess.clim_groupping(df: DataFrame, time_step: str) list[source]

Groups the DataFrame based on the provided time step for climatology computation.

parameters:

df: pd.DataFrame

The DataFrame to be grouped.

returns:

list

The list of date parameters to be used for grouping.

smadi.preprocess.compute_clim(df: DataFrame, time_step: str, variable: str, metrics: List[str]) DataFrame[source]

Computes the climatology of the time series data based on the provided time step.

Parameters:

df: pd.DataFrame

The DataFrame containing the time series data to be aggregated indexed by datetime index.

time_step: str

The time step to be used for computing the climatology. Supported values: ‘month’, ‘week’, ‘dekad’, ‘bimonth’, ‘day’

variable: str

The variable/column in the DataFrame to be aggregated.

metrics: List[str]

The metrics to be computed. Supported values: ‘mean’, ‘median’, ‘min’, ‘max’, etc.

Returns:

pd.DataFrame

The DataFrame containing the climatology data.

smadi.preprocess.dekadal_agg(df: DataFrame, variable: str, agg_metric='mean') DataFrame[source]

Aggregates the time series data based on dekad-based time step.

Parameters:

df: pd.DataFrame

The DataFrame containing the time series data to be aggregated indexed by datetime index.

variable: str

The variable/column in the DataFrame to be aggregated.

agg_metric: str

The aggregation metric to be used. Supported values: ‘mean’, ‘median’, ‘min’, ‘max’, etc.

Returns:

pd.DataFrame

The DataFrame containing the aggregated data.

smadi.preprocess.fillna(df: DataFrame, variable: str, fillna_window_size: int) DataFrame[source]

Fills NaN values in the time series data using a moving window average.

Parameters:

df: pd.DataFrame

The DataFrame containing the time series data to be filled indexed by datetime index.

variable: str

The variable/column in the DataFrame to be filled.

fillna_window_size: int

The size of the moving window [days] for filling NaN values. It is recommended to be an odd number.

Returns:

pd.DataFrame

The DataFrame containing the filled time series data.

smadi.preprocess.filter_df(df: DataFrame = None, year: int | None = None, month: int | None = None, dekad: int | None = None, bimonth: int | None = None, day: int | None = None, week: int | None = None, start_date: str | None = None, end_date: str | None = None) DataFrame[source]

Filters the DataFrame based on specified time/date conditions.

Parameters:

df: pd.DataFrame, optional

The DataFrame to be filtered. It should be indexed by a datetime index.

year: int or None, optional

The year to filter the DataFrame.

month: int or None, optional

The month to filter the DataFrame.

bimonth: int or None, optional

The bimonth to filter the DataFrame.

dekad: int or None, optional

The dekad to filter the DataFrame.

week: int or None, optional

The week to filter the DataFrame.

day: int or None, optional

The day to filter the DataFrame.

start_date: str or None, optional

The start date for filtering.

end_date: str or None, optional

The end date for filtering.

Returns:

pd.DataFrame

The filtered DataFrame.

smadi.preprocess.monthly_agg(df: DataFrame, variable: str, agg_metric='mean') DataFrame[source]

Aggregates the time series data based on month-based time step.

Parameters:

df: pd.DataFrame

The DataFrame containing the time series data to be aggregated indexed by datetime index.

variable: str

The variable/column in the DataFrame to be aggregated.

agg_metric: str

The aggregation metric to be used. Supported values: ‘mean’, ‘median’, ‘min’, ‘max’, etc.

Returns:

pd.DataFrame

The DataFrame containing the aggregated data.

smadi.preprocess.smooth(df: DataFrame, variable: str, window_size: int) DataFrame[source]

Smooths the time series data using a moving window average.

Parameters:

df: pd.DataFrame

The DataFrame containing the time series data to be smoothed indexed by datetime index.

variable: str

The variable/column in the DataFrame to be smoothed.

window_size: int

The size of the moving window [days] for smoothing(. It is recommended to be an odd number.

Returns:

pd.DataFrame

The DataFrame containing the smoothed time series data.

smadi.preprocess.validate_anomaly_method(methods, _Detectors)[source]

Validate the names of the anomaly detection methods.

smadi.preprocess.validate_date_params(time_step: str, year: int | List[int] = None, month: int | List[int] = None, dekad: int | List[int] = None, week: int | List[int] = None, bimonth: int | List[int] = None, day: int | List[int] = None) Dict[str, List[int]][source]

Validate the date parameters for the anomaly detection workflow.

smadi.preprocess.weekly_agg(df: DataFrame, variable: str, agg_metric='mean') DataFrame[source]

Aggregates the time series data based on week-based time step.

Parameters:

df: pd.DataFrame

The DataFrame containing the time series data to be aggregated indexed by datetime index.

variable: str

The variable/column in the DataFrame to be aggregated.

agg_metric: str

The aggregation metric to be used. Supported values: ‘mean’, ‘median’, ‘min’, ‘max’, etc.

Returns:

pd.DataFrame

The DataFrame containing the aggregated data.

smadi.utils module

smadi.utils.create_logger(name, level=10)[source]

Create a logger with the given name and level

parameters:

name: str

name of the logger

level: logging.LEVEL

level of the logger

returns:

logger: logging.logger

a logger object

smadi.utils.get_country_code(country_name)[source]

Get the ISO 3166-1 alpha-3 country code for a given country name.

parameters:

country_name: str

name of the country

returns:

country_code: str

ISO 3166-1 alpha-3 country code

smadi.utils.get_gpis_from_bbox(bbox, res=6.25)[source]

Get the GPIS based on the bounding box

parameters:

bbox: tuple

bounding box in the format (lonmin, lonmax, latmin, latmax)

res: float

resolution of the grid. Default is 6.25 km

returns:

pd.DataFrame

a dataframe containing the GPIS, longitude, and latitude

smadi.utils.load_gpis_by_country(country, res=6.25, format='csv')[source]

Load the GPIS based on the country name from the DGG API Source: https://dgg.geo.tuwien.ac.at/

parameters:

country: str

name of the country

grid: str

name of the grid to be used. Default is “fibgrid_n6600000”. Supported grids are: - fibgrid_n6600000 (Fibonacci 6.5 km) - fibgrid_n1650000 (Fibonacci 12.5 km) - fibgrid_n430000 (Fibonacci 25 km) - warp (WARP)

format: str

format of the data to be returned. Default is “csv”. Supported formats are: - csv - json

smadi.utils.log_exception(logger)[source]

A decorator to log exceptions in a function

parameters:

logger: logging.logger

a logger object

returns:

decorator: function

a decorator function

smadi.utils.log_time(logger)[source]

A decorator to log the time taken by a function

parameters:

logger: logging.logger

a logger object

returns:

decorator: function

a decorator function

smadi.workflow module

run_workflow.py - SMADI Workflow Execution

smadi.workflow.load_ts(gpi, variable='sm')[source]

Load ASCAT time series for a given gpi

smadi.workflow.main()[source]
smadi.workflow.parse_arguments(parser)[source]

Parse the arguments and return the parsed arguments as a dictionary.

returns:

parsed_args: dict

The parsed arguments as a dictionary

smadi.workflow.run_smadi(aoi: str | Tuple[float, float, float, float], methods: str | List[str] = ['zscore'], variable: str = 'sm', time_step: str = 'month', fillna: bool = False, fillna_window_size: int = None, smoothing: bool = False, smooth_window_size: int = None, timespan: List[str] = None, year: List[int] = None, month: List[int] = None, dekad: List[int] = None, week: List[int] = None, bimonth: List[int] = None, day: List[int] = None, workers: int = None, addi_retrive: bool = False) DataFrame[source]

Run the anomaly detection workflow for multiple grid point indices with multiprocessing support.

smadi.workflow.setup_argument_parser() ArgumentParser[source]

Setup argument parser for SMADI workflow execution.

smadi.workflow.single_po_run(gpi: int, methods: str = ['zscore'], variable: str = 'sm', time_step: str = 'month', fillna: bool = False, fillna_window_size: int = None, smoothing: bool = False, smooth_window_size: int = None, year: int | List[int] = None, month: int | List[int] = None, dekad: int | List[int] = None, week: int | List[int] = None, bimonth: int | List[int] = None, day: int | List[int] = None, timespan: List[str] = None, addi_retrive: bool = False, agg_metric: list = 'mean') Tuple[int, Dict[str, float]][source]

Run the anomaly detection workflow for a single grid point index.

Module contents