3. Computing the climatology

In this section, we’ll explore how to compute climate normals (climatology) for the variable of interest using the Climatology class from the SMADI package. The class offers a range of functionalities:

Compute normals at various time steps, including monthly, bimonthly, dekadal, weekly, and daily intervals.
Provide flexibility in computing different metrics such as mean, median, minimum, and maximum values.
Fill gaps in the time-series with a user-defined window size, allowing for more robust analysis in the presence of missing data.
Smooth the data by applying a rolling moving average window across the entire dataset with a user-defined size, helping to remove seasonality and identify underlying trends.
Compute the climatology for a subset of the data by specifying start and end dates, enabling analysis on specific time periods of interest.

3.1 Compute normals at various time steps

Load the data

[1]:

import pandas as pd
from smadi.data_reader import read_grid_point

# Set display options
pd.set_option("display.max_columns", 8)  # Limit the number of columns displayed
pd.set_option("display.precision", 2)  # Set precision to 2 decimal places

# Define the path to the ASCAT data
data_path = "/home/m294/ascat_dataset"

# Example: A grid point in Morocco
lon = -7.382
lat = 33.348

# Define the location of the observation point
loc = (lon, lat)

# Extract ASCAT soil moisture time series for the given location
data = read_grid_point(
    loc=loc, ascat_sm_path=data_path, read_bulk=False, era5_land_path=None
)  # Provide the path to the ERA5-Land data if you want mask snow
# and frozen soil conditions. For more information about
# the dataset see ERA5-Land data documentation and to download
# use the CDS API or https://ecmwf-models.readthedocs.io/en/latest/

# Get the ASCAT soil moisture time series
ascat_ts = data.get("ascat_ts")


# Display the first few rows of the time series data
ascat_ts.head()

Reading ASCAT soil moisture: /home/m294/ascat_dataset
ASCAT GPI: 3611180 - distance:   23.713 m
Warning: ERA5-Land not found: None
Warning: ERA5 Land not found - ASCAT soil moisture not masked!

[1]:

	sm	sm_noise	as_des_pass	...	sigma40	sigma40_noise	num_sigma	sm_valid
2007-01-01 21:02:04.161	34.86	3.24	0	...	-12.27	0.19	3	True
2007-01-02 11:03:22.807	23.16	3.27	1	...	-13.05	0.19	3	True
2007-01-03 10:42:47.739	33.05	3.23	1	...	-12.39	0.19	3	True
2007-01-03 22:00:39.007	25.60	3.24	0	...	-12.88	0.19	3	True
2007-01-05 10:01:27.519	28.73	3.24	1	...	-12.67	0.19	3	True

5 rows × 16 columns

Monthly Climatology

[2]:

from smadi.climatology import Climatology

# Create a climatology object
cl = Climatology(
    df=ascat_ts, variable="sm", agg_metric="mean"
)  # agg_metric is the aggregation metric before computing the climatology
# It can be "mean" , "sum", "max", "min", etc.

# Set the time step for computing the climatology
cl.time_step = (
    "month"  # Supported time steps are "month", "bimonth", "dekad","week", "day"
)


monthly_cl_df = cl.compute_normals()
monthly_cl_df.head(12)

[2]:

	sm-mean	norm-mean
2007-01-31	33.69	56.54
2007-02-28	38.15	46.62
2007-03-31	25.63	36.69
2007-04-30	24.85	33.54
2007-05-31	24.21	28.73
2007-06-30	20.65	20.18
2007-07-31	18.21	17.30
2007-08-31	16.26	17.69
2007-09-30	19.23	21.68
2007-10-31	23.37	30.02
2007-11-30	48.47	50.27
2007-12-31	60.59	60.46

**Note: You can filter the result to get a specific date range by passing date parameters (year, month, day, etc.) to the ``compute_normals`` method.

For bimonth and dekad parameters, they are only eligible for use when the time_step is set to ‘bimonth’ or ‘dekad’ where :

Dekad: Values range from 1 to 3 for each month, corresponding to the first, second, and third dekads of the month.
Bimonth: Bimonth values are 1 or 2 for each month, corresponding to the first and second half of the month.**

[3]:

monthly_cl_df = cl.compute_normals(month=2)  # February
monthly_cl_df.head(16)

[3]:

	sm-mean	norm-mean
2007-02-28	38.15	46.62
2008-02-29	44.75	46.62
2009-02-28	64.58	46.62
2010-02-28	78.51	46.62
2011-02-28	53.21	46.62
2012-02-29	25.46	46.62
2013-02-28	43.76	46.62
2014-02-28	46.80	46.62
2015-02-28	46.03	46.62
2016-02-29	42.02	46.62
2017-02-28	47.73	46.62
2018-02-28	52.88	46.62
2019-02-28	38.60	46.62
2020-02-29	29.17	46.62
2021-02-28	60.41	46.62
2022-02-28	33.83	46.62

Bimonthly Climatology

[4]:

cl.time_step = "bimonth"
bimonthly_cl_df = cl.compute_normals(month=5, bimonth=2)  # The second half of May
bimonthly_cl_df.head(24)

[4]:

	sm-mean	bimonth	norm-mean
2007-05-16	24.51	2	27.19
2008-05-17	15.44	2	27.19
2009-05-17	38.96	2	27.19
2010-05-17	40.61	2	27.19
2011-05-16	53.84	2	27.19
2012-05-16	15.11	2	27.19
2013-05-16	35.83	2	27.19
2014-05-16	15.53	2	27.19
2015-05-16	33.87	2	27.19
2016-05-16	18.85	2	27.19
2017-05-16	9.78	2	27.19
2018-05-16	49.32	2	27.19
2019-05-16	17.28	2	27.19
2020-05-16	25.08	2	27.19
2021-05-16	18.88	2	27.19
2022-05-16	22.09	2	27.19

In the above code:

cl_df : the resulted data frame  containing the SM monthly normals and average for each month

sm_avg : the monthly average for each month computed from the average of the daily observations

norm-mean: the monthly normal for each month computed from the sm-avg over the 16 years on observations (2007-2022)

3.2 Computing the normals using different metrics (median, max, ..etc)

To compute the normals using different metrics such as mean, median, minimum, and maximum, you can specify the desired metrics by passing a list containing the metrics of interest.

For example, to compute the normals using mean and median metrics, you can define the list of metrics as follows:

[5]:

cl.normal_metrics = [
    "mean",
    "median",
]  # Supported metrics are "mean", "median", "std", "min", "max"

# Compute weekly-based climatology
cl.time_step = "week"
weekly_cl_df = cl.compute_normals(week=12)  # The 12th week of the year
weekly_cl_df.head(10)

[5]:

	sm-mean	norm-mean	norm-median
2007-03-19	17.24	34.38	33.82
2008-03-17	35.11	34.38	33.82
2009-03-17	45.71	34.38	33.82
2010-03-22	41.76	34.38	33.82
2011-03-22	32.26	34.38	33.82
2012-03-19	10.64	34.38	33.82
2013-03-18	54.76	34.38	33.82
2014-03-17	21.66	34.38	33.82
2015-03-16	33.09	34.38	33.82
2016-03-22	50.84	34.38	33.82

[6]:

# Compute normals with multiple metrics

# Set the metric for computing the climatology
cl.normal_metrics = ["mean", "median", "min", "max"]

cl.time_step = "dekad"
cl_df = cl.compute_normals(month=7, dekad=3)  # The third dekad of July

cl_df.head(12)

[6]:

	sm-mean	dekad	norm-mean	norm-median	norm-min	norm-max
2007-07-22	16.20	3	17.24	17.43	11.74	23.73
2008-07-21	14.87	3	17.24	17.43	11.74	23.73
2009-07-21	18.01	3	17.24	17.43	11.74	23.73
2010-07-21	23.51	3	17.24	17.43	11.74	23.73
2011-07-21	17.28	3	17.24	17.43	11.74	23.73
2012-07-22	14.57	3	17.24	17.43	11.74	23.73
2013-07-22	18.76	3	17.24	17.43	11.74	23.73
2014-07-22	13.21	3	17.24	17.43	11.74	23.73
2015-07-22	17.77	3	17.24	17.43	11.74	23.73
2016-07-21	11.90	3	17.24	17.43	11.74	23.73
2017-07-21	11.97	3	17.24	17.43	11.74	23.73
2018-07-21	22.91	3	17.24	17.43	11.74	23.73

3.3 Filling the gaps and smoothing the time series data (optional)

[7]:

# Apply filling the gaps and smoothing the time series

cl.fillna = True
cl.fillna_window_size = 3  # number of days to fill the gaps by their mean value

cl.smoothing = True
cl.smooth_window_size = 31  # The moving average window size

cl.time_step = "dekad"
cl.normal_metrics = ["mean", "median"]
cl_df = cl.compute_normals()

cl_df

[7]:

	sm-mean	dekad	norm-mean	norm-median
2007-01-01	28.47	1	56.21	51.27
2007-01-11	33.48	2	56.80	56.69
2007-01-21	36.28	3	55.41	55.75
2007-02-01	39.48	1	51.89	52.96
2007-02-11	37.32	2	46.31	43.99
...	...	...	...	...
2022-11-11	36.23	2	50.81	47.20
2022-11-21	52.73	3	56.10	55.84
2022-12-01	68.02	1	61.36	61.70
2022-12-11	73.85	2	60.79	59.25
2022-12-21	68.94	3	59.07	62.26

576 rows × 4 columns

3.4 Computing the normals for a subset of the data

To work on a subset of the data instead of the entire historical record, users can specify the timespan, a class attribute of the Climatology class. By providing the timespan parameter, users can restrict the computation to a specific time period of interest, allowing for focused analysis within a defined timeframe.

[8]:

# set start and end date for the climatology by providing 'timespan' parameter

cl.timespan = ("2010-01-01", "2020-12-31")  # ('start_date', 'end_date')
cl.time_step = "week"
cl.normal_metrics = ["mean", "median", "min", "max"]

cl.time_step = "month"

cl_df = cl.compute_normals(month=1)
cl_df

[8]:

	sm-mean	norm-mean	norm-median	norm-min	norm-max
2010-01-31	76.10	55.66	55.56	34.4	76.1
2011-01-31	61.52	55.66	55.56	34.4	76.1
2012-01-31	41.32	55.66	55.56	34.4	76.1
2013-01-31	55.56	55.66	55.56	34.4	76.1
2014-01-31	55.80	55.66	55.56	34.4	76.1
2015-01-31	64.91	55.66	55.56	34.4	76.1
2016-01-31	34.40	55.66	55.56	34.4	76.1
2017-01-31	50.80	55.66	55.56	34.4	76.1
2018-01-31	69.84	55.66	55.56	34.4	76.1
2019-01-31	51.33	55.66	55.56	34.4	76.1
2020-01-31	50.70	55.66	55.56	34.4	76.1