已解决430363个问题，去搜搜看，总会有你想问的

如何使用 BeautifulSoup 在 Python 中接收网站链接

首页猿问如何使用...

如何使用 BeautifulSoup 在 Python 中接收网站链接

Python

慕的地6264312 2022-06-22 18:00:09

我想从一个站点（https://www.vanglaini.org/）收集链接：/hmarchhak/102217 并将其打印为https://www.vanglaini.org/hmarchhak/102217。请帮忙看图import requestsimport pandas as pdfrom bs4 import BeautifulSoupsource = requests.get('https://www.vanglaini.org/').textsoup = BeautifulSoup(source, 'lxml')for article in soup.find_all('article'): headline = article.a.text summary=article.p.text link = article.a.href print(headline) print(summary) print(link)print()这是我的代码。

查看完整描述

1 回答

慕无忌1623718

TA贡献1744条经验获得超4个赞

除非我遗漏了一些标题和摘要似乎是相同的文本。您可以使用:hasbs4 4.7.1+ 来确保您article有一个孩子href；这似乎去掉了article不属于主体的标签元素，我怀疑这实际上是你的目标

from bs4 import BeautifulSoup as bs

import requests

base = 'https://www.vanglaini.org'

r = requests.get(base)

soup = bs(r.content, 'lxml')

for article in soup.select('article:has([href])'):

headline = article.h5.text.strip()

summary = re.sub(r'\n+|\r+',' ',article.p.text.strip())

link = f"{base}{article.a['href']})"

print(headline)

print(summary)

print(link)

反对回复 2022-06-22

1 回答
0 关注
104 浏览

关注

添加回答

0/150

提交

取消

微信客服

购课补贴
联系客服咨询优惠详情

帮助反馈 APP下载

慕课网APP
您的移动学习伙伴

公众号

扫描二维码
关注慕课网微信公众号

热搜

最近搜索清空

如何使用 BeautifulSoup 在 Python 中接收网站链接

如何使用 BeautifulSoup 在 Python 中接收网站链接

1 回答

添加回答