1 回答
TA贡献2012条经验 获得超12个赞
我想建议一种不同的方法。该属性值看起来像JSON,那么为什么不使用json模块呢?这样,您就有了一个现成的数据结构,可以进一步修改。
import json
from bs4 import BeautifulSoup
html_list = [
"""<div class="impressions" data-impressions=\'{"id":"01920","name":"Sleepy","price":12.95,"brand":"Lush","category":"Bubble Bar","variant":"7 oz.","quantity":1,"list":"/bath/bubble-bars/sleepy/9999901920.html","dimension11":"","dimension12":"Naked,Self Preserving,Vegan","dimension13":1,"dimension14":1,"dimension15":true}\'></div>""",
]
data_structures = []
for html_item in html_list:
soup = BeautifulSoup(html_item, "html.parser").find("div", {"class": "impressions"})
data_structures.append(json.loads(soup["data-impressions"]))
print(data_structures)
这会输出一个字典列表:
[{'id': '01920', 'name': 'Sleepy', 'price': 12.95, 'brand': 'Lush', 'category': 'Bubble Bar', 'variant': '7 oz.', 'quantity': 1, 'list': '/bath/bubble-bars/sleepy/9999901920.html', 'dimension11': '', 'dimension12': 'Naked,Self Preserving,Vegan', 'dimension13': 1, 'dimension14': 1, 'dimension15': True}]
要访问所需的密钥,只需执行以下操作:
for data_item in data_structures:
print(data_item["dimension12"])
印刷:Naked,Self Preserving,Vegan
添加回答
举报