1 回答
TA贡献1773条经验 获得超3个赞
import requests
from bs4 import BeautifulSoup
import re
r = requests.get("https://www.nytimes.com/section/us")
soup = BeautifulSoup(r.content, 'html.parser')
urls = []
for item in soup.findAll("a", href=re.compile("2020")):
item = item.get("href")
if not item.startswith("http"):
item = f"https://www.nytimes.com{item}"
else:
pass
if item not in urls:
urls.append(item)
print(item)
输出:
https://www.nytimes.com/2020/03/18/us/coronavirus-immigrants.html
https://www.nytimes.com/2020/03/18/us/coronavirus-nebraska-biocontainment.html
https://www.nytimes.com/2020/03/18/us/coronavirus-janitors-cleaners.html
https://www.nytimes.com/2020/03/18/us/small-business-coronavirus-charlotte.html
https://www.nytimes.com/2020/03/19/us/politics/coronavirus-heaven-frilot-mark-frilot.html
https://www.nytimes.com/2020/03/19/us/politics/coronavirus-state-department-travel.html
https://www.nytimes.com/2020/03/19/us/politics/coronavirus-congress-voting.html
https://www.nytimes.com/2020/03/19/us/coronavirus-foster-pets.html
https://www.nytimes.com/2020/03/19/books/molly-brodak-dies.html
https://www.nytimes.com/2020/03/19/us/politics/joe-biden-vice-president.html
https://www.nytimes.com/2020/03/19/us/coronavirus-location-tracking.html
https://www.nytimes.com/2020/03/19/climate/us-flood-season-forescast.html
https://www.nytimes.com/2020/03/19/health/coronavirus-masks-shortage.html
https://www.nytimes.com/2020/03/19/health/coronavirus-travel-ban.html
https://www.nytimes.com/2020/03/19/arts/mal-sharpe-dead.html
https://www.nytimes.com/2020/03/19/business/coronavirus-unemployment-states.html
https://www.nytimes.com/2020/03/19/us/work-from-home-mothers-coronavirus-covid19.html
https://www.nytimes.com/2020/03/19/us/politics/1000-checks-coronavirus-stimulus.html
https://www.nytimes.com/news-event/2020-election
添加回答
举报