How to Remove non-ASCII characters Python

Published on Aug. 22, 2023, 12:16 p.m.

To remove non-ASCII characters from a string in Python, you can use regular expressions or the string.printable attribute. Here are a few examples:

Using regular expressions:

import re

my_string = "Héllo wörld!"
my_string = re.sub(r'[^\x00-\x7F]+', '', my_string)

print(my_string)

In this example, re.sub() replaces any character that is not within the range of \x00-\x7F with an empty string. The output will be “Hello world!”.

Using string.printable:

import string

my_string = "Héllo wörld!"
my_string = ''.join(filter(lambda x: x in string.printable, my_string))

print(my_string)

In this example, string.printable contains all the ASCII characters that are considered printable. filter() is used to keep only the characters in my_string that are within string.printable. The output will also be “Hello world!”.

Note that the second method may give you unexpected results if you have non-ASCII characters that are considered printable.

Tags:

related content