How to Resample Time Series Data in Python
Published on Aug. 22, 2023, 12:16 p.m.
Here’s an example of how to resample time series data in Python using the pandas library:
- Import required libraries and load data
import pandas as pd
data = pd.read_csv(‘data.csv’, index_col=’date’, parse_dates=True)
2. Resample the data
data_resampled = data.resample(‘D’).sum()
In this example, `data` is a pandas DataFrame with a datetime index. `index_col='date'` specifies that the `date` column of the CSV file should be used as the index of the DataFrame, and `parse_dates=True` tells pandas to parse the dates in the index. `data_resampled` is a new DataFrame with the data resampled at daily frequency using the `sum()` function to aggregate the data by day.
Here's another example that demonstrates how to resample and interpolate time series data at a higher frequency:
data_resampled = data.resample(‘H’).interpolate(method=’linear’)
This resamples the data at hourly frequency and interpolates any missing values in the data using linear interpolation.
There are various resampling frequency codes are available:
- `'D'`: daily frequency
- `'W'`: weekly frequency
- `'M'`: monthly frequency
- `'A'`: annual frequency
You can also use other frequency codes like `H` for hourly frequency, `T` for minute-wise frequency, etc based on the granularity of your data.
Note that resampling can be used to aggregate or upsample data (i.e. converting lower frequency data to higher frequency) or downsample data (i.e. converting higher frequency data to lower frequency). The method used for resampling and aggregation (e.g. `sum()`, `mean()`, `std()`, etc.) is specified according to the needs of the analysis.