How to Remove non-ASCII characters Python
Published on Aug. 22, 2023, 12:16 p.m.
To remove non-ASCII characters from a string in Python, you can use regular expressions or the string.printable
attribute. Here are a few examples:
Using regular expressions:
import re
my_string = "Héllo wörld!"
my_string = re.sub(r'[^\x00-\x7F]+', '', my_string)
print(my_string)
In this example, re.sub()
replaces any character that is not within the range of \x00-\x7F
with an empty string. The output will be “Hello world!”.
Using string.printable
:
import string
my_string = "Héllo wörld!"
my_string = ''.join(filter(lambda x: x in string.printable, my_string))
print(my_string)
In this example, string.printable
contains all the ASCII characters that are considered printable. filter()
is used to keep only the characters in my_string
that are within string.printable
. The output will also be “Hello world!”.
Note that the second method may give you unexpected results if you have non-ASCII characters that are considered printable.