How to extract a numeric value from a string column in a Pandas DataFrame?

Published on Aug. 22, 2023, 12:19 p.m.

To extract a numeric value from a string column in a Pandas DataFrame, you can use regular expressions with the str.extract() method. Here is an example:

import pandas as pd

# Create a sample DataFrame with a column of strings containing numeric values
data = {'value': ['10 units', '5.6 kg', '22.5%', '3']}
df = pd.DataFrame(data)

# Extract the numeric values from the 'value' column
df['value_numeric'] = df['value'].str.extract('(\d+\.?\d*)').astype(float)

# Print the updated DataFrame
print(df)

This code will output the following DataFrame with a new ‘value_numeric’ column containing the extracted numeric values:

value  value_numeric
0  10 units           10.0
1    5.6 kg            5.6
2     22.5%           22.5
3         3            3.0

In this example, the regular expression captures one or more digits (plus an optional decimal point and more digits) using the \d+\.?\d* pattern. The astype(float) method is used to convert the extracted values to floating point numbers.

Note that depending on the format of the strings in your DataFrame, you may need to adjust the regular expression pattern to correctly capture the numeric values.

Tags: