1 回答
TA贡献1859条经验 获得超6个赞
一旦找到该元素,然后使用ele.string.replace_with("")
基于您的示例 html
html='''<html>
<head>
<title>HTML Tables</title>
</head>
<body>
<table border = "1">
<tr>
<td><p><span>Row 1, Column 1, This should be kept because it has more than two tokens</span></p></td>
<td><p><span>not kept</span></p></td>
</tr>
<tr>
<td><p><span>Row 2, Column 1, should be kept</span></p></td>
<td><p><span>Row 2, Column 2, should be kept</span></p></td>
</tr>
</table>
</body>
</html>'''
soup=BeautifulSoup(html,'html.parser')
tables = soup.find_all('table')
for table in tables:
rows = table.find_all('tr')
for row in rows:
cols = row.find_all('td')
for ele in cols:
if len(ele.text.split(' '))<3:
ele.string.replace_with("")
print(soup)
输出:
<html>
<head>
<title>HTML Tables</title>
</head>
<body>
<table border="1">
<tr>
<td><p><span>Row 1, Column 1, This should be kept because it has more than two tokens</span></p></td>
<td><p><span></span></p></td>
</tr>
<tr>
<td><p><span>Row 2, Column 1, should be kept</span></p></td>
<td><p><span>Row 2, Column 2, should be kept</span></p></td>
</tr>
</table>
</body>
</html>
- 1 回答
- 0 关注
- 74 浏览
添加回答
举报