How to split a string column into multiple columns in a Pandas DataFrame?

Published on Aug. 22, 2023, 12:19 p.m.

To split a string column into multiple columns in a Pandas DataFrame, you can use the str.split() method to split the column into separate strings, and then the DataFrame() constructor to create a new DataFrame with those strings as separate columns. Here is an example that splits a column called ‘my_column’ in a DataFrame df into three separate columns:

new_cols = df['my_column'].str.split(expand=True)
new_cols.columns = ['col1', 'col2', 'col3']
df = pd.concat([df, new_cols], axis=1)

In this example, str.split() splits the column into separate strings, which are then assigned to new columns using the expand=True parameter. The columns attribute is used to assign names to the new columns. Then, concat() function is used to concatenate the new columns with the original DataFrame.

Note that when using str.split() with expand=True, null values become empty strings. If you want to retain null values, you can use na=True parameter of str.split() method.

Also, if you have a date contained in a string column, you can use the pd.to_datetime() method to convert that column into a Datetime.

Tags: