smadi package
Submodules
smadi.anomaly_detectors module
Soil Moisture Anomalies Calculation Module
This module provides a suite of methods for calculating soil moisture anomalies based on climatological data. The implemented methods include:
Z-Score: Standardized z-score method.
SMAPI (Soil Moisture Anomaly Percent Index): Measures anomalies as a percentage deviation from the climatological mean or median.
SMDI (Soil Moisture Deficit Index): Quantifies soil moisture deficit based on deviations from climatological median and extremes.
ESSMI (Empirical Standardized Soil Moisture Index): Uses a nonparametric empirical probability density function for standardizing soil moisture values.
SMAD (Standardized Anomaly Absolute Deviation): Calculates anomalies using the median and interquartile range, providing robustness to outliers.
SMDS (Soil Moisture Drought Severity): Assesses drought severity based on percentile rankings of soil moisture values.
SMCI (Soil Moisture Condition Index): Measures soil moisture content relative to climatological minima and maxima.
SMCA (Soil Moisture Content Anomaly): Quantifies anomalies as a deviation from climatological mean or median relative to maxima.
ParaDis (Parametric Distribution): Fits observed data to parametric distributions (e.g., beta, gamma).
- class smadi.anomaly_detectors.AnomalyDetector(df: DataFrame, variable: str, fillna: bool = False, fillna_window_size: int = None, smoothing=False, smooth_window_size=None, timespan: List[str] = None, time_step: str = 'month', normal_metrics: List[str] = ['mean'], agg_metric: str = 'mean', dist: List[str] = None)[source]
Bases:
ABCAn abstract class for detecting anomalies in time series data based on the climatology.
- property clim_df
The DataFrame containing the climate normal data.
- property groupby_param
The column name to be used for grouping the data for the anomaly detection.
- class smadi.anomaly_detectors.AnomalyDetectorFactory[source]
Bases:
objectA factory class for creating anomaly detectors based on the provided method.
- static create_detector(method: str, **kwargs) AnomalyDetector[source]
- methods = {'essmi': <class 'smadi.anomaly_detectors.ESSMI'>, 'paradis': <class 'smadi.anomaly_detectors.ParaDis'>, 'smad': <class 'smadi.anomaly_detectors.SMAD'>, 'smapi': <class 'smadi.anomaly_detectors.SMAPI'>, 'smca': <class 'smadi.anomaly_detectors.SMCA'>, 'smci': <class 'smadi.anomaly_detectors.SMCI'>, 'smdi': <class 'smadi.anomaly_detectors.SMDI'>, 'smds': <class 'smadi.anomaly_detectors.SMDS'>, 'zscore': <class 'smadi.anomaly_detectors.ZScore'>}
- class smadi.anomaly_detectors.ESSMI(df: DataFrame, variable: str, fillna: bool = False, fillna_window_size: int = None, smoothing=False, smooth_window_size=None, timespan: List[str] = None, time_step: str = 'month', normal_metrics: List[str] = ['mean'], agg_metric: str = 'mean', dist: List[str] = None)[source]
Bases:
AnomalyDetectorA class for detecting anomalies in time series data based on the Empirical Standardized Soil Moisture Index(ESSMI) method.
The index is computed by fitting the nonparametric empirical probability density function (ePDF) using the kernel density estimator KDE
f^h = 1/nh * Σ K((x - xi) / h) K = 1/√(2π) * exp(-x^2/2)
where: f^h: the ePDF K: the Guassian kernel function h: the bandwidth of the kernel function as smoothing parameter (Scott’s rule) n: the number of observations x: the average value of the variable in the time series data. It can be any of the following: Daily average, weekly average, monthly average, etc. xi: the ith observation
The ESSMI is then computed by transforming the ePDF to the standard normal distribution with a mean of zero and a standard deviation of one using the inverse of the standard normal distribution function.
ESSMI = Φ^-1(F^h(x))
where: Φ^-1: the inverse of the standard normal distribution function F^h: the ePDF
- class smadi.anomaly_detectors.ParaDis(df: DataFrame, variable: str, fillna: bool = False, fillna_window_size: int = None, smoothing=False, smooth_window_size=None, timespan: List[str] = None, time_step: str = 'month', normal_metrics: List[str] = ['mean'], agg_metric: str = 'mean', dist: List[str] = None)[source]
Bases:
AnomalyDetectorA class for detecting anomalies in time series data based on fitting the observed data to a parametric distribution(e.g. beta and gamma).
- class smadi.anomaly_detectors.SMAD(df: DataFrame, variable: str, fillna: bool = False, fillna_window_size: int = None, smoothing=False, smooth_window_size=None, timespan: List[str] = None, time_step: str = 'month', normal_metrics: List[str] = ['mean'], agg_metric: str = 'mean', dist: List[str] = None)[source]
Bases:
AnomalyDetectorA class for detecting anomalies in time series data based on the Standardized Anomaly Absolute Deviation(SMAD) method.
SMAD = (x - η) / IQR
where: x: the average value of the variable in the time series data. It can be any of the following: Daily average, weekly average, monthly average, etc. η: the long-term median of the variable(the climate normal). IQR: the interquartile range of the variable. It is the difference between the 75th and 25th percentiles of the variable.
- class smadi.anomaly_detectors.SMAPI(df: DataFrame, variable: str, fillna: bool = False, fillna_window_size: int = None, smoothing=False, smooth_window_size=None, timespan: List[str] = None, time_step: str = 'month', normal_metrics: List[str] = ['mean'], agg_metric: str = 'mean', dist: List[str] = None)[source]
Bases:
AnomalyDetectorA class for detecting anomalies in time series data based on the Soil Moisture Anomaly Percent Index(SMAPI) method.
SMAPI = ((x - ref) / ref) * 100
where: x: the average value of the variable in the time series data. It can be any of the following: Daily average, weekly average, monthly average, etc. ref: the long-term mean (μ) or median (η) of the variable(the climate normal).
- class smadi.anomaly_detectors.SMCA(df: DataFrame, variable: str, fillna: bool = False, fillna_window_size: int = None, smoothing=False, smooth_window_size=None, timespan: List[str] = None, time_step: str = 'month', normal_metrics: List[str] = ['mean'], agg_metric: str = 'mean', dist: List[str] = None)[source]
Bases:
AnomalyDetectorA class for detecting anomalies in time series data based on the Soil Moisture Content Anomaly(SMCA) method.
SMCA = (x - ref) / (max - ref)
where: x: the average value of the variable in the time series data. It can be any of the following: Daily average, weekly average, monthly average, etc.
ref: the long-term mean (μ) or median (η) of the variable(the climate normal). max: the long-term maximum of the variable. min: the long-term minimum of the variable.
- class smadi.anomaly_detectors.SMCI(df: DataFrame, variable: str, fillna: bool = False, fillna_window_size: int = None, smoothing=False, smooth_window_size=None, timespan: List[str] = None, time_step: str = 'month', normal_metrics: List[str] = ['mean'], agg_metric: str = 'mean', dist: List[str] = None)[source]
Bases:
AnomalyDetectorA class for detecting anomalies in time series data based on the Soil Moisture Condition Index(SMCI) method.
SMCI = ((x - min) / (max - min))
where: x: the average value of the variable in the time series data. It can be any of the following: Daily average, weekly average, monthly average, etc. min: the long-term minimum of the variable. max: the long-term maximum of the variable.
- class smadi.anomaly_detectors.SMDI(df: DataFrame, variable: str, fillna: bool = False, fillna_window_size: int = None, smoothing=False, smooth_window_size=None, timespan: List[str] = None, time_step: str = 'month', normal_metrics: List[str] = ['mean'], agg_metric: str = 'mean', dist: List[str] = None)[source]
Bases:
AnomalyDetectorA class for detecting anomalies in time series data based on the Soil Moisture Deficit Index(SMDI) method.
SMDI = 0.5 * SMDI(t-1) + (SD(t) / 50)
where
SD(t) = ((x - η) / (η - min)) * 100 if x <= η SD(t) = ((x - η) / (max - η)) * 100 if x > η
x: the average value of the variable in the time series data. It can be any of the following: Daily average, weekly average, monthly average, etc. η: the long-term median of the variable(the climate normal). min: the long-term minimum of the variable. max: the long-term maximum of the variable. t: the time step of the time series data.
- class smadi.anomaly_detectors.SMDS(df: DataFrame, variable: str, fillna: bool = False, fillna_window_size: int = None, smoothing=False, smooth_window_size=None, timespan: List[str] = None, time_step: str = 'month', normal_metrics: List[str] = ['mean'], agg_metric: str = 'mean', dist: List[str] = None)[source]
Bases:
AnomalyDetectorA class for detecting anomalies in time series data based on the Soil Moisture Drought Severity(SMDS) method.
SMDS = 1 - SMP SMP = (rank(x) / (n+1))
where:
SMP: the Soil Moisture Percentile. It is the percentile of the average value of the variable in the time series data. SMDS: the Soil Moisture Drought Severity. It is the severity of the drought based on the percentile of the average value of the variable in the time series data. rank(x): the rank of the average value of the variable in the time series data. n: the number of years in the time series data.
- class smadi.anomaly_detectors.ZScore(df: DataFrame, variable: str, fillna: bool = False, fillna_window_size: int = None, smoothing=False, smooth_window_size=None, timespan: List[str] = None, time_step: str = 'month', normal_metrics: List[str] = ['mean'], agg_metric: str = 'mean', dist: List[str] = None)[source]
Bases:
AnomalyDetectorA class for detecting anomalies in time series data based on the Z-Score method.
z_score = (x - μ) / σ
where: x: the average value of the variable in the time series data. It can be any of the following: Daily average, weekly average, monthly average, etc. μ: the long-term mean of the variable(the climate normal). σ: the long-term standard deviation of the variable.
smadi.climatology module
A module for calculating climatology (climate normal) for different time steps (month, dekad, week) based on time series data.
- class smadi.climatology.Aggregator(df: DataFrame, variable: str, fillna: bool = False, fillna_window_size: int = None, smoothing: bool = False, smooth_window_size: int = None, timespan: List[str] = None, agg_metric: str = 'mean')[source]
Bases:
ABCAn abstract class for aggregating time series data based on different time steps.
Attributes:
- df: pd.DataFrame
The input DataFrame containing the time series data to be aggregated.
- variable: str
The variable/column in the DataFrame to be aggregated.
- fillna: bool
Fill NaN values in the time series data using a moving window average.
fillna_window_size: int
- smoothing: bool
Smooth the time series data using a moving window average.
- smooth_window_size: int
The size of the moving window for smoothing(n-days). It is recommended to be an odd number.
- timespan: list[str, str] optional
The start and end dates for a timespan to be aggregated. Format: [‘YYYY-MM-DD ]
- agg_metric: str
The aggregation metric to be used. Supported values: ‘mean’, ‘median’, ‘min’, ‘max’, ‘std’, etc.
Methods:
- aggregate:
Aggregates the time series data based on the provided time step.
- property preprocess_df
- class smadi.climatology.BimonthlyAggregator(df: DataFrame, variable: str, fillna: bool = False, fillna_window_size: int = None, smoothing: bool = False, smooth_window_size: int = None, timespan: List[str] = None, agg_metric: str = 'mean')[source]
Bases:
AggregatorAggregates the time series data based on bimonthly (twice a month) time step.
- class smadi.climatology.Climatology(df: DataFrame, variable: str, fillna: bool = False, fillna_window_size: int = None, smoothing=False, smooth_window_size=None, timespan: List[str] = None, time_step: str = 'month', normal_metrics: List[str] = ['mean'], agg_metric: str = 'mean')[source]
Bases:
objectA class for calculating climatology(climate normal) for time series data.
Attributes:
- df_original: pd.DataFrame
The original input DataFrame before resampling and removing NaN values.
- df: pd.DataFrame
The input DataFrame containing the preprocessed data to be aggregated.
- variable: str
The variable/column in the DataFrame to be aggregated.
- fillna: bool
Fill NaN values in the time series data using a moving window average.
- fillna_window_size: int
The size of the moving window for filling NaN values. It is recommended to be an odd number.
- smoothing: bool
Smooth the time series data using a moving window average.
- smooth_window_size: int
The size of the moving window for smoothing(n-days). It is recommended to be an odd number.
- timespan: list[str, str] optional
The start and end dates for a timespan to be aggregated. Format: [‘YYYY-MM-DD ]
- time_step: str
The time step for aggregation. Supported values: ‘day’, ‘week’, ‘dekad’, ‘bimonth’, ‘month’.
- agg_metric: str
The aggregation metric to be used. Supported values: ‘mean’, ‘median’, ‘min’, ‘max’, ‘std’, etc.
- normal_metrics: List[str]
The metrics to be used in the climatology computation. Supported values: ‘mean’, ‘median’, ‘min’, ‘max’, etc.
- clima_df: pd.DataFrame
The DataFrame containing climatology information.
Methods:
- compute_normals:
Calculates climatology based on the aggregated data.
- plot_ts:
Plot the time series data for the provided dataframe.
- property aggregated_df
- class smadi.climatology.DailyAggregator(df: DataFrame, variable: str, fillna: bool = False, fillna_window_size: int = None, smoothing: bool = False, smooth_window_size: int = None, timespan: List[str] = None, agg_metric: str = 'mean')[source]
Bases:
AggregatorAggregates the time series data based on daily time step.
- class smadi.climatology.DekadalAggregator(df: DataFrame, variable: str, fillna: bool = False, fillna_window_size: int = None, smoothing: bool = False, smooth_window_size: int = None, timespan: List[str] = None, agg_metric: str = 'mean')[source]
Bases:
AggregatorAggregates the data based on dekad-based time step.
- class smadi.climatology.MonthlyAggregator(df: DataFrame, variable: str, fillna: bool = False, fillna_window_size: int = None, smoothing: bool = False, smooth_window_size: int = None, timespan: List[str] = None, agg_metric: str = 'mean')[source]
Bases:
AggregatorAggregates the time series data based on month-based time step.
- class smadi.climatology.Preprocessor(df: DataFrame, variable: str, fillna: bool = False, fillna_window_size: int = None, smoothing: bool = False, smooth_window_size: int = None, timespan: List[str] = None)[source]
Bases:
objectA class for preprocessing the time series data before aggregation.
Attributes:
- df: pd.DataFrame
The input DataFrame containing the time series data to be aggregated.
- variable: str
The variable/column in the DataFrame to be aggregated.
- fillna: bool
Fill NaN values in the time series data using a moving window average.
- fillna_window_size: int
The size of the moving window for filling NaN values. It is recommended to be an odd number.
- smoothing: bool
Smooth the time series data using a moving window average.
- smooth_window_size: int
The size of the moving window for smoothing(n-days). It is recommended to be an odd number.
- timespan: list[str, str] optional
The start and end dates for a timespan to be aggregated. Format: [‘YYYY-MM-DD ]
Methods:
preprocess:
Preprocess the time series data by resampling, truncating, filling NaN values, and smoothing.
- class smadi.climatology.Validator(df: DataFrame, variable: str, fillna: bool = False, fillna_window_size: int = None, smoothing: bool = False, smooth_window_size: int = None, time_step: str = None, metrics: List[str] = None, time_span: List[str] = None)[source]
Bases:
objectA class for validating the input parameters for the climatology computation.
Methods:
- validate:
Validates the input parameters for the climatology computation.
- class smadi.climatology.WeeklyAggregator(df: DataFrame, variable: str, fillna: bool = False, fillna_window_size: int = None, smoothing: bool = False, smooth_window_size: int = None, timespan: List[str] = None, agg_metric: str = 'mean')[source]
Bases:
AggregatorAggregates the time series data based on week-based time step.
smadi.data_reader module
- class smadi.data_reader.AscatData(path, read_bulk=True)[source]
Bases:
GriddedNcContiguousRaggedTsClass reading ASCAT SSM 6.25 km data.
- class smadi.data_reader.Era5Land(path, read_bulk=True, celsius=True)[source]
Bases:
GriddedNcTsRead time series data from ERA5 netCDF files.
- smadi.data_reader.extract_obs_ts(loc, ascat_path, era5_land_path=None, obs_type='sm', read_bulk=False)[source]
Read time series of given observation type.
- Parameters:
loc (int, tuple) – Tuple is interpreted as longitude, latitude coordinate. Integer is interpreted as grid point index.
ascat_path (str) – Path to ASCAT soil moisture data.
era5_land_path (str) – Path to ERA5-Land data.
obs (str, optional) – Observation type (default: “sm”).
read_bulk (bool, optional) – If “True” all data will be read in memory, if “False” only a single time series is read (default: False). Use “True” to process multiple GPIs in a loop and “False” to read/analyze a single time series.
- smadi.data_reader.read_era5(era5_path, loc, interpo_method='nearest')[source]
Read ERA5 data for given location.
- smadi.data_reader.read_grid_point(loc, ascat_sm_path, era5_land_path=None, read_bulk=False)[source]
Read grid point for given lon/lat coordinates or grid_point.
- Parameters:
loc (int, tuple) – Tuple is interpreted as longitude, latitude coordinate. Integer is interpreted as grid point index.
ascat_sm_path (str) – Path to ASCAT soil moisture data.
era5_land_path (str) – Path to ERA5-Land data.
read_bulk (bool, optional) – If “True” all data will be read in memory, if “False” only a single time series is read (default: False). Use “True” to process multiple GPIs in a loop and “False” to read/analyze a single time series.
smadi.indicators module
- smadi.indicators.essmi(obs)[source]
Compute the anomalies in time series data based on the Empirical Standardized Soil Moisture Index(ESSMI) method.
parameters:
- obs: sequence-like object
The observed time series data.
returns:
- numpy.ndarray
The Empirical Standardized Soil Moisture Index computed based on the given observed value(s).
- smadi.indicators.para_dis(obs, dist='beta')[source]
Compute the anomalies in time series data based on fitting the observed data to a parametric distribution.
parameters:
- obs: pd.Series or np.ndarray or sequence-like object
The observed time series data.
- dist: str, optional
The distribution to fit the observed data to. Supported values: ‘beta’,’gamma’, ‘gam’, ‘exp’, ‘pe3’ gam: Gamma exp: Exponential pe3: Pearson III
- smadi.indicators.smad(obs, median=None, iqr=None)[source]
Computes the anomalies in time series data based on the Standardized Median Absolute Deviation(SMAD) method.
parameters:
- obs: pd.Series or np.ndarray or sequence-like object
The observed time series data.
- median: float, pd.Series or np.ndarray or sequence-like object, optional
The long-term median of the variable. if None, it will be computed from obs.
- iqr: float, pd.Series or np.ndarray or sequence-like object, optional
The long-term interquartile range of the variable. if None, it will be computed from obs.
Returns:
- numpy.ndarray
The anomalies computed based on the given observed value(s) and the long-term median.
- smadi.indicators.smapi(obs, ref=None, metric='mean')[source]
Computes anomalies in time series data based on the Soil Moisture Anomaly Percent Index(SMAPI) method.
parameters:
- obs: pd.Series or np.ndarray or sequence-like object
The observed time series data.
- ref: float, pd.Series or np.ndarray or sequence-like object, optional
The long-term mean (μ) or median (η) of the variable(the climate normal)
- metric: str, optional
The metric to be used for computing the anomalies. Supported values: ‘mean’, ‘median’
- smadi.indicators.smca(obs, metric='mean', ref=None, minimum=None, maximum=None)[source]
Computes the anomalies in time series data based on the Soil Moisture Content Anomaly(SMCA) method.
parameters:
- obs: pd.Series or np.ndarray or sequence-like object
The observed time series data.
- metric: str, optional
The metric to be used for computing the anomalies. Supported values: ‘mean’, ‘median’
- ref: float, pd.Series or np.ndarray or sequence-like object, optional
The long-term mean (μ) or median (η) of the variable(the climate normal)
- minimum: float, pd.Series or np.ndarray or sequence-like object, optional
The long-term minimum of the variable. if None, it will be computed from obs.
- maximum: float, pd.Series or np.ndarray or sequence-like object, optional
The long-term maximum of the variable. if None, it will be computed from obs.
Returns:
- numpy.ndarray
The anomalies computed based on the given observed value(s) and the long-term median.
- smadi.indicators.smci(obs, minimum=None, maximum=None)[source]
Computes the anomalies in time series data based on the Soil Moisture Condition Index(SMCI) method.
parameters:
- obs: pd.Series or np.ndarray or sequence-like object
The observed time series data.
- minimum: float, pd.Series or np.ndarray or sequence-like object
The long-term minimum of the variable. if None, it will be computed from obs.
- maximum: float, pd.Series or np.ndarray or sequence-like object
The long-term maximum of the variable. if None, it will be computed from obs.
Returns:
- numpy.ndarray
The Soil Moisture Content Index computed based on the given observed value(s).
- smadi.indicators.smd(obs, median=None, minimum=None, maximum=None)[source]
Computes the Soil Moisture Deficit (SD) based on observed value and long-term median, minimum, and maximum values.
parameters:
- obs: pd.Series or np.ndarray or sequence-like object
The observed time series data.
- median: float, pd.Series or np.ndarray or sequence-like object, optional
The long-term median of the variable. if None, it will be computed from obs.
- minimum: float, pd.Series or np.ndarray or sequence-like object, optional
The long-term minimum of the variable. if None, it will be computed from obs.
- maximum: float, pd.Series or np.ndarray or sequence-like object, optional
The long-term maximum of the variable. if None, it will be computed from obs.
Returns:
- numpy.ndarray
The Soil Moisture Deficit Index computed based on the given observed value(s).
- smadi.indicators.smdi(sd)[source]
Computes the Soil Moisture Deficit Index (SMDI) incrementally based on the Soil Moisture Deficit (SD) values.
- smadi.indicators.smds(obs)[source]
Computes anomalies in time series data based on the Soil Moisture Drought Severity(SMDS) method. SMDS = 1 - SMP SMP = (rank(x) / (n+1))
parameters:
- obs: pd.Series or np.ndarray or sequence-like object
The observed time series data.
Returns:
- numpy.ndarray
The Soil Moisture Drought Severity computed based on the given observed value(s).
- smadi.indicators.zscore(obs, mean=None, std=None)[source]
Computes the standardized z-score of the time series data.
Parameters:
- obs: pd.Series or np.ndarray or sequence-like object
The observed time series data.
- mean: float, pd.Series or np.ndarray or sequence-like object, optional
The mean of the distribution of the time series data. If None, it will be computed from obs.
- std: float, pd.Series or np.ndarray or sequence-like object, optional
The standard deviation of the distribution of the time series data. If None, it will be computed from obs.
Returns:
- pd.Series or np.ndarray
The z-score of the time series data.
smadi.map module
- smadi.map.plot_anomaly_maps(figsize=(25, 20), ax_rows=1, ax_cols=1, df=None, x='lon', y='lat', df_colms=None, map_crs=<Projected CRS: +proj=robin +a=6378137.0 +lon_0=0 +no_defs +type=c ...> Name: unknown Axis Info [cartesian]: - E[east]: Easting (metre) - N[north]: Northing (metre) Area of Use: - undefined Coordinate Operation: - name: unknown - method: Robinson Datum: unknown - Ellipsoid: unknown - Prime Meridian: Greenwich , figure_title='', figure_title_kwargs={'fontsize': 15, 'fontweight': 'bold', 'ha': 'center', 'va': 'center', 'x': 0.5, 'y': 0.95}, maps_titles=None, maps_titles_kwargs={'fontsize': 10, 'fontweight': 'bold', 'pad': 11}, add_features=True, frame_line_width=1, add_cb=True, cb_min_max=['sm_clim', 'sm_clim', 'abs', 'anomaly', 'anomaly'], cmap='RdYlBu', vmin=None, vmax=None, add_gridlines=False, cb_kwargs={'labelsize': 0.5, 'pos': 0.4, 'show_values': False, 'tick_lines': 'center'}, cb_label=None, save_to=None)[source]
- smadi.map.set_bins(colm)[source]
Set the bins and labels for color classification for the selected column.
parameters:
- colm: str
The data column name for which the bins and labels are to be set.
- smadi.map.set_extent(df, x='lon', y='lat', buffer=2)[source]
Set the extent for the map based on the provided dataframe and buffer.
parameters:
- df: pd.DataFrame
The dataframe containing the data.
- x: str
The column name for the x-axis.
- y: str
The column name for the y-axis.
- buffer: int
The buffer to be added to the min and max values of the x and y axis.
smadi.metadata module
smadi.plot module
- smadi.plot.clss_counter(df, columns, thresholds)[source]
Count the number of values in the dataframe that fall within the thresholds for each category of the anomaly method.
parameters:
- df: pd.DataFrame
The dataframe containing the data to plot.
- columns: dict
The dictionary containing the column names and their respective matplotlib plot options.
- thresholds: str
The name of the anomaly method to use its thresholds.
- smadi.plot.draw_hbars(thresholds, x_axis)[source]
Draw horizontal bars on the plot based on the provided thresholds for each anomaly method.
parameters:
- thresholds: dict
The dictionary containing the thresholds for each category of the anomaly method.
- x_axis: list
The x-axis values for the plot.
- smadi.plot.get_plot_options(**kwargs)[source]
Set the basic plot options based on the provided kwargs for the plot.
parameters:
- kwargs: dict
The keyword arguments for the matplotlib plot.
returns:
- plot_options: dict
The plot options for the figure.
- smadi.plot.plot_anomaly(df, x_axis, colmns, thresholds, plot_hbars=True, plot_categories=True, **kwargs)[source]
Plot the anomaly detection results for the provided dataframe.
parameters:
- df: pd.DataFrame
The dataframe containing the data to plot.
- x_axis: list
The x-axis values for the plot.
- colmns: dict
The dictionary containing the column names and their respective matplotlib plot options.
- thresholds: str
The name of the anomaly method to use its thresholds.
- plot_hbars: bool
Whether to plot the horizontal bars on the plot according to the thresholds of the anomaly method used.
- plot_categories: bool
Whether to plot the number of values in each category of the anomaly method that fall within the thresholds.
- kwargs: dict
The keyword arguments for the matplotlib plot for the figure such as title, xlabel, ylabel, legend, figsize, and grid.
- smadi.plot.plot_categories_count(x_axis, results, anomaly_method)[source]
Plot the number of values in each category of the anomaly method that fall within the thresholds.
parameters:
- x_axis: list
The x-axis values for the plot.
- results: list
The list containing the number of values in each category of the anomaly method.
- anomaly_method: str
The name of the anomaly method to use its thresholds.
- smadi.plot.plot_colmns(df, x_axis, colmns_kwargs)[source]
Plot the data in each column of the dataframe with the provided x_axis.
parameters:
- df: pd.DataFrame
The dataframe containing the data to plot.
- x_axis: list
The x-axis values for the plot.
- colmns_kwargs: dict
The dictionary containing the column names and their respective matplotlib plot options.
- smadi.plot.plot_figure(plot_params)[source]
Plot the figure based on the provided plot parameters.
parameters:
- plot_params: dict
The plot parameters for the figure.
- smadi.plot.plot_fill_bet(df=None, x_axis=None, colmn=None, plot_style='ggplot', **kwargs)[source]
Plot the computed anomalies using the fill_between method.
parameters:
- df: pd.DataFrame
The dataframe containing the data to plot. if None, the computed anomalies will be used.
- x_axis: list
The x-axis values for the plot. if None, the index of the dataframe will be used.
- colmn: str
The column name to plot.
- plot_style: str
The plot style to use for the plot.
- kwargs: dict
Additional parameters to be used for customizing the plot. It can be any of the following:
[‘title’, ‘xlabel’, ‘ylabel’, ‘legend’, ‘figsize’, ‘grid’]
- smadi.plot.plot_ts(df, x_axis, colmns_kwargs, plot_raw=False, raw_df=None, raw_var=None, raw_kwargs=None, **kwargs)[source]
Plot the time series data for the provided dataframe.
parameters:
- df: pd.DataFrame
The dataframe containing the data to plot.
- x_axis: list
The x-axis values for the plot.
- colmns_kwargs: dict
The dictionary containing the column names and their respective matplotlib plot options.
- plot_raw: bool
Whether to plot the raw data on the plot as background.
- raw_df: pd.DataFrame
The dataframe containing the raw data to plot.
- raw_var: str
The name of the raw variable to plot.
- raw_kwargs: dict
The dictionary containing the matplotlib plot options for the raw data.
- kwargs: dict
The keyword arguments for the matplotlib plot for the figure such as title, xlabel, ylabel, legend, figsize, and grid.
smadi.preprocess module
- smadi.preprocess.bimonthly_agg(df: DataFrame, variable: str, agg_metric='mean') DataFrame[source]
Aggregates the time series data based on bimonth-based time step.
Parameters:
- df: pd.DataFrame
The DataFrame containing the time series data to be aggregated indexed by datetime index.
- variable: str
The variable/column in the DataFrame to be aggregated.
- agg_metric: str
The aggregation metric to be used. Supported values: ‘mean’, ‘median’, ‘min’, ‘max’, etc.
Returns:
- pd.DataFrame
The DataFrame containing the aggregated data.
- smadi.preprocess.clim_groupping(df: DataFrame, time_step: str) list[source]
Groups the DataFrame based on the provided time step for climatology computation.
parameters:
- df: pd.DataFrame
The DataFrame to be grouped.
returns:
- list
The list of date parameters to be used for grouping.
- smadi.preprocess.compute_clim(df: DataFrame, time_step: str, variable: str, metrics: List[str]) DataFrame[source]
Computes the climatology of the time series data based on the provided time step.
Parameters:
- df: pd.DataFrame
The DataFrame containing the time series data to be aggregated indexed by datetime index.
- time_step: str
The time step to be used for computing the climatology. Supported values: ‘month’, ‘week’, ‘dekad’, ‘bimonth’, ‘day’
- variable: str
The variable/column in the DataFrame to be aggregated.
- metrics: List[str]
The metrics to be computed. Supported values: ‘mean’, ‘median’, ‘min’, ‘max’, etc.
Returns:
- pd.DataFrame
The DataFrame containing the climatology data.
- smadi.preprocess.dekadal_agg(df: DataFrame, variable: str, agg_metric='mean') DataFrame[source]
Aggregates the time series data based on dekad-based time step.
Parameters:
- df: pd.DataFrame
The DataFrame containing the time series data to be aggregated indexed by datetime index.
- variable: str
The variable/column in the DataFrame to be aggregated.
- agg_metric: str
The aggregation metric to be used. Supported values: ‘mean’, ‘median’, ‘min’, ‘max’, etc.
Returns:
- pd.DataFrame
The DataFrame containing the aggregated data.
- smadi.preprocess.fillna(df: DataFrame, variable: str, fillna_window_size: int) DataFrame[source]
Fills NaN values in the time series data using a moving window average.
Parameters:
- df: pd.DataFrame
The DataFrame containing the time series data to be filled indexed by datetime index.
- variable: str
The variable/column in the DataFrame to be filled.
- fillna_window_size: int
The size of the moving window [days] for filling NaN values. It is recommended to be an odd number.
Returns:
- pd.DataFrame
The DataFrame containing the filled time series data.
- smadi.preprocess.filter_df(df: DataFrame = None, year: int | None = None, month: int | None = None, dekad: int | None = None, bimonth: int | None = None, day: int | None = None, week: int | None = None, start_date: str | None = None, end_date: str | None = None) DataFrame[source]
Filters the DataFrame based on specified time/date conditions.
Parameters:
- df: pd.DataFrame, optional
The DataFrame to be filtered. It should be indexed by a datetime index.
- year: int or None, optional
The year to filter the DataFrame.
- month: int or None, optional
The month to filter the DataFrame.
- bimonth: int or None, optional
The bimonth to filter the DataFrame.
- dekad: int or None, optional
The dekad to filter the DataFrame.
- week: int or None, optional
The week to filter the DataFrame.
- day: int or None, optional
The day to filter the DataFrame.
- start_date: str or None, optional
The start date for filtering.
- end_date: str or None, optional
The end date for filtering.
Returns:
- pd.DataFrame
The filtered DataFrame.
- smadi.preprocess.monthly_agg(df: DataFrame, variable: str, agg_metric='mean') DataFrame[source]
Aggregates the time series data based on month-based time step.
Parameters:
- df: pd.DataFrame
The DataFrame containing the time series data to be aggregated indexed by datetime index.
- variable: str
The variable/column in the DataFrame to be aggregated.
- agg_metric: str
The aggregation metric to be used. Supported values: ‘mean’, ‘median’, ‘min’, ‘max’, etc.
Returns:
- pd.DataFrame
The DataFrame containing the aggregated data.
- smadi.preprocess.smooth(df: DataFrame, variable: str, window_size: int) DataFrame[source]
Smooths the time series data using a moving window average.
Parameters:
- df: pd.DataFrame
The DataFrame containing the time series data to be smoothed indexed by datetime index.
- variable: str
The variable/column in the DataFrame to be smoothed.
- window_size: int
The size of the moving window [days] for smoothing(. It is recommended to be an odd number.
Returns:
- pd.DataFrame
The DataFrame containing the smoothed time series data.
- smadi.preprocess.validate_anomaly_method(methods, _Detectors)[source]
Validate the names of the anomaly detection methods.
- smadi.preprocess.validate_date_params(time_step: str, year: int | List[int] = None, month: int | List[int] = None, dekad: int | List[int] = None, week: int | List[int] = None, bimonth: int | List[int] = None, day: int | List[int] = None) Dict[str, List[int]][source]
Validate the date parameters for the anomaly detection workflow.
- smadi.preprocess.weekly_agg(df: DataFrame, variable: str, agg_metric='mean') DataFrame[source]
Aggregates the time series data based on week-based time step.
Parameters:
- df: pd.DataFrame
The DataFrame containing the time series data to be aggregated indexed by datetime index.
- variable: str
The variable/column in the DataFrame to be aggregated.
- agg_metric: str
The aggregation metric to be used. Supported values: ‘mean’, ‘median’, ‘min’, ‘max’, etc.
Returns:
- pd.DataFrame
The DataFrame containing the aggregated data.
smadi.utils module
- smadi.utils.create_logger(name, level=10)[source]
Create a logger with the given name and level
parameters:
- name: str
name of the logger
- level: logging.LEVEL
level of the logger
returns:
- logger: logging.logger
a logger object
- smadi.utils.get_country_code(country_name)[source]
Get the ISO 3166-1 alpha-3 country code for a given country name.
parameters:
- country_name: str
name of the country
returns:
- country_code: str
ISO 3166-1 alpha-3 country code
- smadi.utils.get_gpis_from_bbox(bbox, res=6.25)[source]
Get the GPIS based on the bounding box
parameters:
- bbox: tuple
bounding box in the format (lonmin, lonmax, latmin, latmax)
- res: float
resolution of the grid. Default is 6.25 km
returns:
- pd.DataFrame
a dataframe containing the GPIS, longitude, and latitude
- smadi.utils.load_gpis_by_country(country, res=6.25, format='csv')[source]
Load the GPIS based on the country name from the DGG API Source: https://dgg.geo.tuwien.ac.at/
parameters:
- country: str
name of the country
- grid: str
name of the grid to be used. Default is “fibgrid_n6600000”. Supported grids are: - fibgrid_n6600000 (Fibonacci 6.5 km) - fibgrid_n1650000 (Fibonacci 12.5 km) - fibgrid_n430000 (Fibonacci 25 km) - warp (WARP)
- format: str
format of the data to be returned. Default is “csv”. Supported formats are: - csv - json
smadi.workflow module
run_workflow.py - SMADI Workflow Execution
- smadi.workflow.parse_arguments(parser)[source]
Parse the arguments and return the parsed arguments as a dictionary.
returns:
- parsed_args: dict
The parsed arguments as a dictionary
- smadi.workflow.run_smadi(aoi: str | Tuple[float, float, float, float], methods: str | List[str] = ['zscore'], variable: str = 'sm', time_step: str = 'month', fillna: bool = False, fillna_window_size: int = None, smoothing: bool = False, smooth_window_size: int = None, timespan: List[str] = None, year: List[int] = None, month: List[int] = None, dekad: List[int] = None, week: List[int] = None, bimonth: List[int] = None, day: List[int] = None, workers: int = None, addi_retrive: bool = False) DataFrame[source]
Run the anomaly detection workflow for multiple grid point indices with multiprocessing support.
- smadi.workflow.setup_argument_parser() ArgumentParser[source]
Setup argument parser for SMADI workflow execution.
- smadi.workflow.single_po_run(gpi: int, methods: str = ['zscore'], variable: str = 'sm', time_step: str = 'month', fillna: bool = False, fillna_window_size: int = None, smoothing: bool = False, smooth_window_size: int = None, year: int | List[int] = None, month: int | List[int] = None, dekad: int | List[int] = None, week: int | List[int] = None, bimonth: int | List[int] = None, day: int | List[int] = None, timespan: List[str] = None, addi_retrive: bool = False, agg_metric: list = 'mean') Tuple[int, Dict[str, float]][source]
Run the anomaly detection workflow for a single grid point index.