为了账号安全,请及时绑定邮箱和手机立即绑定

无法获取相关链接并丢弃其他链接

无法获取相关链接并丢弃其他链接

ABOUTYOU 2021-11-09 19:36:21
我已经在 python 中编写了一个脚本,结合 selenium 和 BeautifulSoup,从网页中获取指向属性详细信息的链接。由于内容非常动态,我使用 selenium 来获取页面源。当我运行我的脚本时,我得到了很多链接,包括那些必需的链接。如何仅从三个容器中的每个容器中获取相关链接?我的尝试:from bs4 import BeautifulSoupfrom selenium import webdriverfrom selenium.webdriver.common.by import Byfrom selenium.webdriver.common.keys import Keysfrom selenium.webdriver.support.ui import WebDriverWaitfrom selenium.webdriver.support import expected_conditions as ECdef fetch_info(link):    driver.get(link)    wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "#community-search-homes .propertyWrapper > a")))    soup = BeautifulSoup(driver.page_source, "lxml")    linklist = [item.get("href") for item in soup.select("#community-search-homes .propertyWrapper > a")]    return linklistif __name__ == '__main__':    url = "https://www.khov.com/find-new-homes/arizona/buckeye"    driver = webdriver.Chrome()    wait = WebDriverWait(driver,10)    for newlink in fetch_info(url):        print(newlink)    driver.quit()结果我有:/find-new-homes/arizona/buckeye/85396/k-hovnanian-homes/aspire-at-sienna-hills/find-new-homes/arizona/buckeye/85396/k-hovnanian-homes/affinity-at-verrado/find-new-homes/arizona/buckeye/85396/four-seasons/k.-hovnanian's-four-seasons-at-victory-at-verrado/find-new-homes/arizona/scottsdale/85255/k-hovnanian-homes/summit-at-silverstone/find-new-homes/arizona/scottsdale/85257/k-hovnanian-homes/skye/find-new-homes/arizona/phoenix/85020/k-hovnanian-homes/pointe-16/find-new-homes/arizona/peoria/85383/k-hovnanian-homes/fusion-ii-at-the-meadows/find-new-homes/arizona/scottsdale/85257/k-hovnanian-homes/aire/find-new-homes/arizona/scottsdale/85255/k-hovnanian-homes/pinnacle-at-silverstone/find-new-homes/arizona/peoria/85383/k-hovnanian-homes/montage-at-the-meadows/find-new-homes/arizona/sun-city/85373/four-seasons/k.-hovnanian-s-four-seasons-at-ventana-lakes
查看完整描述

3 回答

?
隔江千里

TA贡献1906条经验 获得超10个赞

您需要包括特色 ID 和结果。您可以使用 Or 进行组合。最新的 bs4 支持not.

#propertyResultsContainer .propertyWrapper :not([onclick])[href*=find], #propertyFeaturedResultsContainer  .propertyWrapper :not([onclick])[href*=find]

这也可以缩短为

#propertyResultsContainer .propertyWrapper :not([onclick])[href*=find], #propertyFeaturedResultsContainer

但这种缩短可能不那么强大。


查看完整回答
反对 回复 2021-11-09
?
杨魅力

TA贡献1811条经验 获得超6个赞

列表切片会起作用吗?


def fetch_info(link):

    driver.get(link)

    wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "#community-search-homes .propertyWrapper > a")))

    soup = BeautifulSoup(driver.page_source, "lxml")

    linklist = [item.get("href") for item in soup.select("#community-search-homes .propertyWrapper > a")][:3]

    return linklist


查看完整回答
反对 回复 2021-11-09
?
慕婉清6462132

TA贡献1804条经验 获得超2个赞

您可以只检查链接中所需的关键字并打印它们,而忽略其他关键字:


if __name__ == '__main__':

    url = "https://www.khov.com/find-new-homes/arizona/buckeye"

    driver = webdriver.Chrome()

    wait = WebDriverWait(driver,10)

    for newlink in fetch_info(url):

        if url.split('/')[-1] in newlink:

            print(newlink)

    driver.quit()

输出:


/find-new-homes/arizona/buckeye/85396/k-hovnanian-homes/aspire-at-sienna-hills

/find-new-homes/arizona/buckeye/85396/k-hovnanian-homes/affinity-at-verrado

/find-new-homes/arizona/buckeye/85396/four-seasons/k.-hovnanian's-four-seasons-at-victory-at-verrado



查看完整回答
反对 回复 2021-11-09
  • 3 回答
  • 0 关注
  • 153 浏览
慕课专栏
更多

添加回答

举报

0/150
提交
取消
意见反馈 帮助中心 APP下载
官方微信