1 回答
TA贡献1851条经验 获得超3个赞
好吧,我曾经缩短了选择值以开头的re所有标签的路径,您也可以用不同的方式来完成,例如。ahreftopten
for item in soup.select("a[href^=topten]"):
然后我得到了标签内的所有文本,然后stripped将其与strip=True并放置一个空separator,这样text就不会一起分配。
import requests
from bs4 import BeautifulSoup
import re
def main(url):
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
for item in soup.findAll("a", href=re.compile("^topten")):
item = item.get_text(strip=True, separator=" ")
if item:
print(item)
main("http://edition.cnn.com/EVENTS/1996/year.in.review/main.html")
输出:
Israel elects Netanyahu
Crash of TWA Flight 800
Russia elects Yeltsin
U.S . elects Clinton
Hutu-Tutsi conflict in central Africa
Peace, elections in Bosnia
U.S . base bombed in Saudi Arabia
Centennial Olympic Games
Advances against AIDS
Unabomb suspect Ted Kaczynski arrested
- 1 回答
- 0 关注
- 67 浏览
添加回答
举报