使用 selenium 遍历链接并获取页面源

我正在尝试使用以下链接抓取两个网页：https://www.boligportal.dk/lejebolig/dp/2-vaerelses-lejlighed-holstebro/id-5792074 ' https://www.boligportal.dk/lejebolig/dp/2-vaerelses-lejlighed-odense-m/ id-5769482我想在链接中提取有关每个房屋的信息。我使用 selenium 而不是 beautifulsoup 因为页面是动态的，beautifulsoup 不会检索所有的 HTML 代码。我使用下面的代码试图实现这一点。page_links=['https://www.boligportal.dk/lejebolig/dp/2-vaerelses-lejlighed-holstebro/id-5792074','https://www.boligportal.dk/lejebolig/dp/2-vaerelses-lejlighed-odense-m/id-5769482']def render_page(url): driver = webdriver.Firefox() driver.get(url) time.sleep(3) r = driver.page_source driver.quit() return(r)def remove_html_tags(text): clean = re.compile('<.*?>') return(re.sub(clean, '', text))houses_html_code = []housing_data = []address = []# Loop through main pages, render them and extract codefor i in page_links: html = render_page(str(i)) soup = BeautifulSoup(html, "html.parser") houses_html_code.append(soup)for i in houses_html_code: for span_1 in soup.findAll('span', {"class": "AdFeatures__item-value"}): housing_data.append(remove_html_tags(str(span_1)))所以我总结我渲染页面，获取页面源，将页面源附加到列表中，并在两个呈现页面的页面源中搜索 span 类。但是，我的代码返回第一个链接的页面源 TWICE 实际上忽略了第二页链接，即使它呈现每个页面（firefox 与每个页面一起弹出）。请参阅下面的输出。为什么这不起作用？对不起，如果答案很明显。我对 Python 比较陌生，这是我第一次使用 selenium

查看完整描述

使用 selenium 遍历链接并获取页面源

使用 selenium 遍历链接并获取页面源

1 回答

添加回答

热搜

最近搜索清空

使用 selenium 遍历链接并获取页面源

使用 selenium 遍历链接并获取页面源

1 回答

添加回答