How to extract the year from a date column in a Pandas DataFrame?
Published on Aug. 22, 2023, 12:18 p.m.
To extract the year from a date column in a Pandas DataFrame, you can use the dt.year
attribute. Here’s an example:
import pandas as pd
df = pd.DataFrame({'date': ['2022-02-15', '2023-01-01', '2023-12-31']}) # create a DataFrame with a 'date' column
df['year'] = pd.to_datetime(df['date']).dt.year # extract the year from the 'date' column and store it in a new 'year' column
print(df)
Output:
date year
0 2022-02-15 2022
1 2023-01-01 2023
2 2023-12-31 2023
In the example above, pd.to_datetime(df['date'])
converts the ‘date’ column to a pandas datetime object. Then, dt.year
extracts the year from the pandas datetime object and returns a new series of ‘year’ values. We then assign this series to a new ‘year’ column in the original DataFrame.
If the date column is not in a standard format, you may need to use the format
parameter of pd.to_datetime()
to specify the format of the input dates. For example, if the dates are in the format ‘MM/DD/YYYY’, you can use the following code:
df['year'] = pd.to_datetime(df['date'], format='%m/%d/%Y').dt.year
In summary, to extract the year from a date column in a Pandas DataFrame, use pd.to_datetime()
to convert the date column to a pandas datetime object, then use dt.year
to extract the year and store the results in a new column in the original DataFrame.