How to extract the year from a date column in a Pandas DataFrame?

Published on Aug. 22, 2023, 12:18 p.m.

To extract the year from a date column in a Pandas DataFrame, you can use the dt.year attribute. Here’s an example:

import pandas as pd

df = pd.DataFrame({'date': ['2022-02-15', '2023-01-01', '2023-12-31']}) # create a DataFrame with a 'date' column
df['year'] = pd.to_datetime(df['date']).dt.year # extract the year from the 'date' column and store it in a new 'year' column
print(df)

Output:

         date  year
0  2022-02-15  2022
1  2023-01-01  2023
2  2023-12-31  2023

In the example above, pd.to_datetime(df['date']) converts the ‘date’ column to a pandas datetime object. Then, dt.year extracts the year from the pandas datetime object and returns a new series of ‘year’ values. We then assign this series to a new ‘year’ column in the original DataFrame.

If the date column is not in a standard format, you may need to use the format parameter of pd.to_datetime() to specify the format of the input dates. For example, if the dates are in the format ‘MM/DD/YYYY’, you can use the following code:

df['year'] = pd.to_datetime(df['date'], format='%m/%d/%Y').dt.year

In summary, to extract the year from a date column in a Pandas DataFrame, use pd.to_datetime() to convert the date column to a pandas datetime object, then use dt.year to extract the year and store the results in a new column in the original DataFrame.

Tags: