我正在创建 Whatsapp scraper 供我个人使用。我正在尝试从下面的 html 代码下载图像:<div class="_2n28r" style="background-image: url("data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wCEABsbGxscGx4hIR4qLSgtKj04MzM4PV1CR0JHQl2NWGdYWGdYjX2Xe3N7l33gsJycsOD/2c7Z//////////////8BGxsbGxwbHiEhHiotKC0qPTgzMzg9XUJHQkdCXY1YZ1hYZ1iNfZd7c3uXfeCwnJyw4P/Zztn////////////////CABEIADAAQQMBIgACEQEDEQH/xAAwAAEAAgMBAAAAAAAAAAAAAAAEAAUBAgMGAQADAQEAAAAAAAAAAAAAAAABAgMABP/aAAwDAQACEAMQAAAAGZZo1nXnkFXJNleNWo3V1XBRCYqCxtvtyzi6787a9fOe0p7h1RFyD+LL3NG2c6TZXQOnVFlx56ww9hKKRb//xAAqEAACAQMDAQcFAQAAAAAAAAAAAQIDBBESITEzEBQiIzJBYgUTQlJhcf/aAAgBAQABPwC86pjYTlHhirVV+RCtUal/EK9qrlC+oJeqBSvaUzvFL9i86ou2y0Kfi4KlvS40Lgr0KS04XsRs4aMn2S86onsZEyg0t/kiclhl36qSIx8ETu0S86pnYyJlPOz+SHUxnLK8lKcN+GQeyFwi8fm9qZGeIxX9JV1uTrrUv9Kd3GelCl4Vv7F2/NMmTJJTklpTFGpN4SFbVs8FG2nHDlM758j/xAAdEQABBAIDAAAAAAAAAAAAAAABAAIDERATISJh/9oACAECAQE/AMRsBHK1Ba/cxDqE5XmORrWUjIDatf/EABsRAAICAwEAAAAAAAAAAAAAAAABAhEDEBIy/9oACAEDAQE/AFqbpnZ0LWT0IoWpxuRyUf/Z");">但是当我清理 html 并获取 Base 64 字符串并将其转换为图像时,我每次都会得到损坏的图像,但是当我打印 Base 64 字符串并将其粘贴到在线转换网站时,网站将其完美转换xx = driver.find_elements_by_class_name("_1iHeu")d = 0for m in xx: getList = m.find_element_by_class_name("_2kLly").find_element_by_class_name("_2n28r").get_attribute("style").split('url("')[1] d+=1 if len(getList)<10: continue var = getList[0:len(getList)-3] result = base64.b64decode(str(var)) content = result f1 = open("d"+'_'+str("d")+str(d)+'.png', 'wb') f1.write( content ) f1.close()这是我的代码片段。
1 回答
红糖糍粑
TA贡献1815条经验 获得超6个赞
正如评论所说,您忘记删除代码中的“data:image/jpeg;base64,”。
如果你的python版本大于3.4,你可以这样做:
from urllib.request import urlopen
getList = m.find_element_by_class_name("_2kLly").find_element_by_class_name("_2n28r").get_attribute("style").split('"')[1]
# now getList is "data:image/jpeg;base64,/9j/4AAQ..."
with urlopen(getList) as response, open('image.png', 'wb') as f:
f.write(response.read())
添加回答
举报
0/150
提交
取消