为了账号安全,请及时绑定邮箱和手机立即绑定

如何通过 xpath 或 css 选择器循环访问一组类

如何通过 xpath 或 css 选择器循环访问一组类

猛跑小猪 2023-01-04 10:23:14
我想遍历本网站https://www.dccomics.com/comics中的元素网页底部有一个浏览漫画的部分,我想从每部漫画中抓取名字这是我现在的代码# importsfrom selenium import webdriverfrom bs4 import BeautifulSoup from selenium.webdriver.common.keys import Keysfrom selenium.webdriver.support.ui import WebDriverWaitfrom selenium.webdriver.support import expected_conditions as ECfrom selenium.webdriver.common.by import By# website urlsbase_url = "https://www.dccomics.com/"comics_url = "https://www.dccomics.com/comics"# Chrome sessiondriver = webdriver.Chrome("C:\\laragon\\www\\Proftaak\\chromedriver.exe")driver.get(comics_url)driver.implicitly_wait(500)cookies = driver.find_element_by_xpath('/html/body/div[1]/div[2]/div[4]/div[2]/div/button')driver.execute_script("arguments[0].click();", cookies)driver.implicitly_wait(100)clear_filter = driver.find_element_by_class_name('clear-all-action')driver.execute_script("arguments[0].click();", clear_filter)array = []for titles in driver.find_elements_by_class_name('result-title'):title = titles.find_element_by_xpath('/html/body/div[2]/section/section/div[2]/div/div/div/div/div[3]/div[7]/div[2]/div/div/div/div/div[3]/div[3]/div[2]/div[1]/a/p[1]').text    array.append({'title': title,})    print(array)driver.quit()我正在使用下面的 xpath:/html/body/div[2]/section/section/div[2]/div/div/div/div/div[3]/div[7]/div[2]/div/div/div/div/div[3]/div[3]/div[2]/div[1]/a/p[1] 这可行,但只获取结果标题 CSS 类的第一个元素,在本例中为 818。我将如何使用 CSS 选择器或 Xpath 遍历每个结果标题类?
查看完整描述

1 回答

?
喵喵时光机

TA贡献1846条经验 获得超7个赞

使用Selenium 和Python您必须引入WebDriverWait并且visibility_of_all_elements_located()可以使用以下任一定位器策略

使用CSS_SELECTOR:


driver.get('https://www.dccomics.com/comics')

print([my_elem.text for my_elem in WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div.browse-result>a p:not(.result-date)")))])

使用XPATH:


driver.get('https://www.dccomics.com/comics')

print([my_elem.text for my_elem in WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[contains(@class, 'browse-result')]/a//p[not(contains(@class, 'result-date'))]")))])

控制台输出:


['PRIMER', 'DOOMSDAY CLOCK PART 2', 'CATWOMAN #22', 'YOU BROUGHT ME THE OCEAN', 'ACTION COMICS #1022', 'BATMAN/SUPERMAN #9', 'BATMAN: GOTHAM NIGHTS #7', 'BATMAN: THE ADVENTURES CONTINUE #5', 'BIRDS OF PREY #1', 'CATWOMAN 80TH ANNIVERSARY 100-PAGE SUPER SPECTACULAR #1', 'DC GOES TO WAR', "DCEASED: HOPE AT WORLD'S END #2", 'DETECTIVE COMICS #1022', 'FAR SECTOR #6', "HARLEY QUINN: MAKE 'EM LAUGH #1", 'HOUSE OF WHISPERS #21', 'JOHN CONSTANTINE: HELLBLAZER #6', 'JUSTICE LEAGUE DARK #22', 'MARTIAN MANHUNTER: IDENTITY', 'SCOOBY-DOO, WHERE ARE YOU? #104', 'SHAZAM! #12', 'TEEN TITANS GO! TO CAMP #15', 'THE JOKER: 80 YEARS OF THE CLOWN PRINCE OF CRIME THE DELUXE EDITION', 'THE LAST GOD: TALES FROM THE BOOK OF AGES #1', 'THE TERRIFICS VOL. 3: THE GOD GAME']

注意:您必须添加以下导入:


from selenium.webdriver.support.ui import WebDriverWait

from selenium.webdriver.common.by import By

from selenium.webdriver.support import expected_conditions as EC


查看完整回答
反对 回复 2023-01-04
  • 1 回答
  • 0 关注
  • 79 浏览
慕课专栏
更多

添加回答

举报

0/150
提交
取消
意见反馈 帮助中心 APP下载
官方微信