2 回答
TA贡献1784条经验 获得超7个赞
图像是动态加载的,因此您必须使用selenium它们来抓取它们。这是执行此操作的完整代码:
from selenium import webdriver
import time
import requests
driver = webdriver.Chrome()
driver.get('https://www.pokemon.com/us/pokedex/')
time.sleep(4)
li_tags = driver.find_elements_by_class_name('animating')[:-3]
li_num = 1
for li in li_tags:
img_link = li.find_element_by_xpath('.//img').get_attribute('src')
name = li.find_element_by_xpath(f'/html/body/div[4]/section[5]/ul/li[{li_num}]/div/h5').text
r = requests.get(img_link)
with open(f"D:\\{name}.png", "wb") as f:
f.write(r.content)
li_num += 1
driver.close()
输出:
12张口袋妖怪图片。这是前两张图片:
图片1:
图片2:
另外,我注意到页面底部有一个加载更多按钮。单击时,它会加载更多图像。单击“加载更多”按钮后,我们必须继续向下滚动才能加载更多图像。如果我没记错的话,网站上一共有 893 张图片。为了抓取所有 893 张图像,您可以使用以下代码:
from selenium import webdriver
import time
import requests
driver = webdriver.Chrome()
driver.get('https://www.pokemon.com/us/pokedex/')
time.sleep(3)
load_more = driver.find_element_by_xpath('//*[@id="loadMore"]')
driver.execute_script("arguments[0].click();",load_more)
lenOfPage = driver.execute_script("window.scrollTo(0, document.body.scrollHeight);var lenOfPage=document.body.scrollHeight;return lenOfPage;")
match=False
while(match==False):
lastCount = lenOfPage
time.sleep(1.5)
lenOfPage = driver.execute_script("window.scrollTo(0, document.body.scrollHeight);var lenOfPage=document.body.scrollHeight;return lenOfPage;")
if lastCount==lenOfPage:
match=True
li_tags = driver.find_elements_by_class_name('animating')[:-3]
li_num = 1
for li in li_tags:
img_link = li.find_element_by_xpath('.//img').get_attribute('src')
name = li.find_element_by_xpath(f'/html/body/div[4]/section[5]/ul/li[{li_num}]/div/h5').text
r = requests.get(img_link)
with open(f"D:\\{name}.png", "wb") as f:
f.write(r.content)
li_num += 1
driver.close()
TA贡献1798条经验 获得超7个赞
如果您首先检查网络选项卡,这可能会更容易完成:
import time
import requests
endpoint = "https://www.pokemon.com/us/api/pokedex/kalos"
# contains all metadata
data = requests.get(endpoint).json()
# collect keys needed to save the picture
items = [{"name": item["name"], "link": item["ThumbnailImage"]} for item in data]
# remove duplicates
d = [dict(t) for t in {tuple(d.items()) for d in items}]
assert len(d) == 893
for pokemon in d:
response = requests.get(pokemon["link"])
time.sleep(1)
with open(f"{pokemon['name']}.png", "wb") as f:
f.write(response.content)
添加回答
举报