2 回答
data:image/s3,"s3://crabby-images/d5dd8/d5dd8ec0bbe63f65cb267a399f3b33544ea31090" alt="?"
TA贡献1874条经验 获得超12个赞
您可以遍历每一行div并找到嵌套div值:
from bs4 import BeautifulSoup as soup
import re
d = soup(content, 'html.parser')
results = [[re.sub('\s{2,}|\n+', '', i.text) for i in b.find_all('div')] for b in d.find_all('div', {'class':'row'})]
输出:
[['Type of property:', 'Apartment '], ['Building style:', '50 year '], ['Sale price:', '12 000 CUC '], ['Rooms:', '1 '], ['Bathrooms:', '1 '], ['Kitchens:', '1 '], ['Surface:', '38 mts2 '], ['Year of construction:', '1945 '], ['Building style:', '50 year '], ['Construction type:', 'Masonry and plate '], ['Home conditions:', 'Good '], ['Other peculiarities:'], []]
data:image/s3,"s3://crabby-images/2ef5d/2ef5d009e0504646f9141ca57be52aca471874ed" alt="?"
TA贡献1817条经验 获得超6个赞
例如,如果您知道您特别想查找字符串“Building style:”,那么您可以捕获.next_sibling. 或者只是使用next:
>>> from bs4 import BeautifulSoup
>>> html = "<c><div>hello</div> <div>hi</div></c>"
>>> soup = BeautifulSoup(html, 'html.parser')
>>> print(soup.find(string="hello").find_next('div').contents[0])
hi
如果你想要所有这些,你可以使用.find_all获取类“ row”的所有 div 标签,然后获取每个的孩子。
data = []
soup = BeautifulSoup(html, 'html.parser')
for row in soup.find_all('div', class_="row"):
rowdata = [ c.text.strip() for c in row.find_all('div')]
data.append(rowdata)
print(data)
# Outputs the nested list:
# [u'Type of property:', u'Apartment'], [u'Building style:', u'50 year'], etc ]
添加回答
举报