背景


以一个python的request的demo为例,访问百度首页并打印百度首页文本信息:

1
2
3
4
5
import requests

url = "https://baidu.com"
response = requests.get(url)
print(response.text)

会存在如下报错信息:

1
2
3
4
5
C:\Users\liu.ziyi\AppData\Local\Programs\Python\Python38\python.exe E:/Python/httpRunnerDemo/kom/kom_session2.py
Traceback (most recent call last):
File "E:/Python/httpRunnerDemo/kom/kom_session2.py", line 8, in <module>
print(r.text)
UnicodeEncodeError: 'gbk' codec can't encode character '\xa9' in position 25283: illegal multibyte sequence

原因


如果在window下运行,对于Unicode字符,需要print出来的话,由于本地系统是Windows中的cmd,默认即GBK的编码,所以python解释器需要先将上述的Unicode字符编码为GBK,然后在cmd中显示出来。

但是由于该Unicode字符串中包含一些GBK中无法显示的字符,比如部分网站存在的特殊符号©,导致此时提示'gbk' codec can't encode character

baidu.png

解决


在对unicode字符编码时添加ignore参数忽略无法编码的字符,即:

1
2
3
4
5
import requests

url = "https://baidu.com"
response = requests.get(url)
print(response.text.encode("GBK", "ignore"))

再次运行即可:

1
2
3
C:\Users\liu.ziyi\AppData\Local\Programs\Python\Python38\python.exe E:/Python/httpRunnerDemo/requests/request2.py
b'<!DOCTYPE html>\r\n<!--STATUS OK--><html> <head>
...