解决python错误 UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 2892: invalid continuation byte

Published on Aug. 22, 2023, 12:11 p.m.

使用二进制读取

You could resolve the problem with:

for line in open(your_file_path, ‘rb’):

‘rb’ is reading the file in binary mode. Read more here.

指定编码

I was using a dataset downloaded from Kaggle while reading this dataset it threw this error:

UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0xf1 in position 183: invalid continuation byte

So this is how I fixed it.

import pandas as pd
pd.read_csv('top50.csv', encoding='ISO-8859-1')
pd.read_csv('ml-100k/u.item', sep='|', names=m_cols , encoding='latin-1')

参考链接:

https://stackoverflow.com/questions/19699367/for-line-in-results-in-unicodedecodeerror-utf-8-codec-cant-decode-byte
https://grabthiscode.com/whatever/unicodedecodeerror-utf-8-codec-cant-decode-byte-0xe9-in-position-2892-invalid-continuation-byte

Tags: