3. Computing the climatology
In this section, we’ll explore how to compute climate normals (climatology) for the variable of interest using the Climatology class from the SMADI package. The class offers a range of functionalities:
Compute normals at various time steps, including monthly, bimonthly, dekadal, weekly, and daily intervals.
Provide flexibility in computing different metrics such as mean, median, minimum, and maximum values.
Fill gaps in the time-series with a user-defined window size, allowing for more robust analysis in the presence of missing data.
Smooth the data by applying a rolling moving average window across the entire dataset with a user-defined size, helping to remove seasonality and identify underlying trends.
Compute the climatology for a subset of the data by specifying start and end dates, enabling analysis on specific time periods of interest.
3.1 Compute normals at various time steps
Load the data
[1]:
import pandas as pd
from smadi.data_reader import read_grid_point
# Set display options
pd.set_option("display.max_columns", 8) # Limit the number of columns displayed
pd.set_option("display.precision", 2) # Set precision to 2 decimal places
# Define the path to the ASCAT data
data_path = "/home/m294/ascat_dataset"
# Example: A grid point in Morocco
lon = -7.382
lat = 33.348
# Define the location of the observation point
loc = (lon, lat)
# Extract ASCAT soil moisture time series for the given location
data = read_grid_point(
loc=loc, ascat_sm_path=data_path, read_bulk=False, era5_land_path=None
) # Provide the path to the ERA5-Land data if you want mask snow
# and frozen soil conditions. For more information about
# the dataset see ERA5-Land data documentation and to download
# use the CDS API or https://ecmwf-models.readthedocs.io/en/latest/
# Get the ASCAT soil moisture time series
ascat_ts = data.get("ascat_ts")
# Display the first few rows of the time series data
ascat_ts.head()
Reading ASCAT soil moisture: /home/m294/ascat_dataset
ASCAT GPI: 3611180 - distance: 23.713 m
Warning: ERA5-Land not found: None
Warning: ERA5 Land not found - ASCAT soil moisture not masked!
[1]:
| sm | sm_noise | as_des_pass | ssf | ... | sigma40 | sigma40_noise | num_sigma | sm_valid | |
|---|---|---|---|---|---|---|---|---|---|
| 2007-01-01 21:02:04.161 | 34.86 | 3.24 | 0 | 0 | ... | -12.27 | 0.19 | 3 | True |
| 2007-01-02 11:03:22.807 | 23.16 | 3.27 | 1 | 0 | ... | -13.05 | 0.19 | 3 | True |
| 2007-01-03 10:42:47.739 | 33.05 | 3.23 | 1 | 0 | ... | -12.39 | 0.19 | 3 | True |
| 2007-01-03 22:00:39.007 | 25.60 | 3.24 | 0 | 0 | ... | -12.88 | 0.19 | 3 | True |
| 2007-01-05 10:01:27.519 | 28.73 | 3.24 | 1 | 0 | ... | -12.67 | 0.19 | 3 | True |
5 rows × 16 columns
Monthly Climatology
[2]:
from smadi.climatology import Climatology
# Create a climatology object
cl = Climatology(
df=ascat_ts, variable="sm", agg_metric="mean"
) # agg_metric is the aggregation metric before computing the climatology
# It can be "mean" , "sum", "max", "min", etc.
# Set the time step for computing the climatology
cl.time_step = (
"month" # Supported time steps are "month", "bimonth", "dekad","week", "day"
)
monthly_cl_df = cl.compute_normals()
monthly_cl_df.head(12)
[2]:
| sm-mean | norm-mean | |
|---|---|---|
| 2007-01-31 | 33.69 | 56.54 |
| 2007-02-28 | 38.15 | 46.62 |
| 2007-03-31 | 25.63 | 36.69 |
| 2007-04-30 | 24.85 | 33.54 |
| 2007-05-31 | 24.21 | 28.73 |
| 2007-06-30 | 20.65 | 20.18 |
| 2007-07-31 | 18.21 | 17.30 |
| 2007-08-31 | 16.26 | 17.69 |
| 2007-09-30 | 19.23 | 21.68 |
| 2007-10-31 | 23.37 | 30.02 |
| 2007-11-30 | 48.47 | 50.27 |
| 2007-12-31 | 60.59 | 60.46 |
**Note: You can filter the result to get a specific date range by passing date parameters (year, month, day, etc.) to the ``compute_normals`` method.
For bimonth and dekad parameters, they are only eligible for use when the time_step is set to ‘bimonth’ or ‘dekad’ where :
Dekad: Values range from 1 to 3 for each month, corresponding to the first, second, and third dekads of the month.
Bimonth: Bimonth values are 1 or 2 for each month, corresponding to the first and second half of the month.**
[3]:
monthly_cl_df = cl.compute_normals(month=2) # February
monthly_cl_df.head(16)
[3]:
| sm-mean | norm-mean | |
|---|---|---|
| 2007-02-28 | 38.15 | 46.62 |
| 2008-02-29 | 44.75 | 46.62 |
| 2009-02-28 | 64.58 | 46.62 |
| 2010-02-28 | 78.51 | 46.62 |
| 2011-02-28 | 53.21 | 46.62 |
| 2012-02-29 | 25.46 | 46.62 |
| 2013-02-28 | 43.76 | 46.62 |
| 2014-02-28 | 46.80 | 46.62 |
| 2015-02-28 | 46.03 | 46.62 |
| 2016-02-29 | 42.02 | 46.62 |
| 2017-02-28 | 47.73 | 46.62 |
| 2018-02-28 | 52.88 | 46.62 |
| 2019-02-28 | 38.60 | 46.62 |
| 2020-02-29 | 29.17 | 46.62 |
| 2021-02-28 | 60.41 | 46.62 |
| 2022-02-28 | 33.83 | 46.62 |
Bimonthly Climatology
[4]:
cl.time_step = "bimonth"
bimonthly_cl_df = cl.compute_normals(month=5, bimonth=2) # The second half of May
bimonthly_cl_df.head(24)
[4]:
| sm-mean | bimonth | norm-mean | |
|---|---|---|---|
| 2007-05-16 | 24.51 | 2 | 27.19 |
| 2008-05-17 | 15.44 | 2 | 27.19 |
| 2009-05-17 | 38.96 | 2 | 27.19 |
| 2010-05-17 | 40.61 | 2 | 27.19 |
| 2011-05-16 | 53.84 | 2 | 27.19 |
| 2012-05-16 | 15.11 | 2 | 27.19 |
| 2013-05-16 | 35.83 | 2 | 27.19 |
| 2014-05-16 | 15.53 | 2 | 27.19 |
| 2015-05-16 | 33.87 | 2 | 27.19 |
| 2016-05-16 | 18.85 | 2 | 27.19 |
| 2017-05-16 | 9.78 | 2 | 27.19 |
| 2018-05-16 | 49.32 | 2 | 27.19 |
| 2019-05-16 | 17.28 | 2 | 27.19 |
| 2020-05-16 | 25.08 | 2 | 27.19 |
| 2021-05-16 | 18.88 | 2 | 27.19 |
| 2022-05-16 | 22.09 | 2 | 27.19 |
In the above code:
cl_df : the resulted data frame containing the SM monthly normals and average for each month
sm_avg : the monthly average for each month computed from the average of the daily observations
norm-mean: the monthly normal for each month computed from the sm-avg over the 16 years on observations (2007-2022)
3.2 Computing the normals using different metrics (median, max, ..etc)
To compute the normals using different metrics such as mean, median, minimum, and maximum, you can specify the desired metrics by passing a list containing the metrics of interest.
For example, to compute the normals using mean and median metrics, you can define the list of metrics as follows:
[5]:
cl.normal_metrics = [
"mean",
"median",
] # Supported metrics are "mean", "median", "std", "min", "max"
# Compute weekly-based climatology
cl.time_step = "week"
weekly_cl_df = cl.compute_normals(week=12) # The 12th week of the year
weekly_cl_df.head(10)
[5]:
| sm-mean | norm-mean | norm-median | |
|---|---|---|---|
| 2007-03-19 | 17.24 | 34.38 | 33.82 |
| 2008-03-17 | 35.11 | 34.38 | 33.82 |
| 2009-03-17 | 45.71 | 34.38 | 33.82 |
| 2010-03-22 | 41.76 | 34.38 | 33.82 |
| 2011-03-22 | 32.26 | 34.38 | 33.82 |
| 2012-03-19 | 10.64 | 34.38 | 33.82 |
| 2013-03-18 | 54.76 | 34.38 | 33.82 |
| 2014-03-17 | 21.66 | 34.38 | 33.82 |
| 2015-03-16 | 33.09 | 34.38 | 33.82 |
| 2016-03-22 | 50.84 | 34.38 | 33.82 |
[6]:
# Compute normals with multiple metrics
# Set the metric for computing the climatology
cl.normal_metrics = ["mean", "median", "min", "max"]
cl.time_step = "dekad"
cl_df = cl.compute_normals(month=7, dekad=3) # The third dekad of July
cl_df.head(12)
[6]:
| sm-mean | dekad | norm-mean | norm-median | norm-min | norm-max | |
|---|---|---|---|---|---|---|
| 2007-07-22 | 16.20 | 3 | 17.24 | 17.43 | 11.74 | 23.73 |
| 2008-07-21 | 14.87 | 3 | 17.24 | 17.43 | 11.74 | 23.73 |
| 2009-07-21 | 18.01 | 3 | 17.24 | 17.43 | 11.74 | 23.73 |
| 2010-07-21 | 23.51 | 3 | 17.24 | 17.43 | 11.74 | 23.73 |
| 2011-07-21 | 17.28 | 3 | 17.24 | 17.43 | 11.74 | 23.73 |
| 2012-07-22 | 14.57 | 3 | 17.24 | 17.43 | 11.74 | 23.73 |
| 2013-07-22 | 18.76 | 3 | 17.24 | 17.43 | 11.74 | 23.73 |
| 2014-07-22 | 13.21 | 3 | 17.24 | 17.43 | 11.74 | 23.73 |
| 2015-07-22 | 17.77 | 3 | 17.24 | 17.43 | 11.74 | 23.73 |
| 2016-07-21 | 11.90 | 3 | 17.24 | 17.43 | 11.74 | 23.73 |
| 2017-07-21 | 11.97 | 3 | 17.24 | 17.43 | 11.74 | 23.73 |
| 2018-07-21 | 22.91 | 3 | 17.24 | 17.43 | 11.74 | 23.73 |
3.3 Filling the gaps and smoothing the time series data (optional)
[7]:
# Apply filling the gaps and smoothing the time series
cl.fillna = True
cl.fillna_window_size = 3 # number of days to fill the gaps by their mean value
cl.smoothing = True
cl.smooth_window_size = 31 # The moving average window size
cl.time_step = "dekad"
cl.normal_metrics = ["mean", "median"]
cl_df = cl.compute_normals()
cl_df
[7]:
| sm-mean | dekad | norm-mean | norm-median | |
|---|---|---|---|---|
| 2007-01-01 | 28.47 | 1 | 56.21 | 51.27 |
| 2007-01-11 | 33.48 | 2 | 56.80 | 56.69 |
| 2007-01-21 | 36.28 | 3 | 55.41 | 55.75 |
| 2007-02-01 | 39.48 | 1 | 51.89 | 52.96 |
| 2007-02-11 | 37.32 | 2 | 46.31 | 43.99 |
| ... | ... | ... | ... | ... |
| 2022-11-11 | 36.23 | 2 | 50.81 | 47.20 |
| 2022-11-21 | 52.73 | 3 | 56.10 | 55.84 |
| 2022-12-01 | 68.02 | 1 | 61.36 | 61.70 |
| 2022-12-11 | 73.85 | 2 | 60.79 | 59.25 |
| 2022-12-21 | 68.94 | 3 | 59.07 | 62.26 |
576 rows × 4 columns
3.4 Computing the normals for a subset of the data
To work on a subset of the data instead of the entire historical record, users can specify the timespan, a class attribute of the Climatology class. By providing the timespan parameter, users can restrict the computation to a specific time period of interest, allowing for focused analysis within a defined timeframe.
[8]:
# set start and end date for the climatology by providing 'timespan' parameter
cl.timespan = ("2010-01-01", "2020-12-31") # ('start_date', 'end_date')
cl.time_step = "week"
cl.normal_metrics = ["mean", "median", "min", "max"]
cl.time_step = "month"
cl_df = cl.compute_normals(month=1)
cl_df
[8]:
| sm-mean | norm-mean | norm-median | norm-min | norm-max | |
|---|---|---|---|---|---|
| 2010-01-31 | 76.10 | 55.66 | 55.56 | 34.4 | 76.1 |
| 2011-01-31 | 61.52 | 55.66 | 55.56 | 34.4 | 76.1 |
| 2012-01-31 | 41.32 | 55.66 | 55.56 | 34.4 | 76.1 |
| 2013-01-31 | 55.56 | 55.66 | 55.56 | 34.4 | 76.1 |
| 2014-01-31 | 55.80 | 55.66 | 55.56 | 34.4 | 76.1 |
| 2015-01-31 | 64.91 | 55.66 | 55.56 | 34.4 | 76.1 |
| 2016-01-31 | 34.40 | 55.66 | 55.56 | 34.4 | 76.1 |
| 2017-01-31 | 50.80 | 55.66 | 55.56 | 34.4 | 76.1 |
| 2018-01-31 | 69.84 | 55.66 | 55.56 | 34.4 | 76.1 |
| 2019-01-31 | 51.33 | 55.66 | 55.56 | 34.4 | 76.1 |
| 2020-01-31 | 50.70 | 55.66 | 55.56 | 34.4 | 76.1 |