如何在Instagram上向下滚动到结尾

Python

MYYA 2022-08-16 17:44:22

我试图根据标签“foody”从instagram上抓取帖子的网址。使用硒和beautifulsoup，我可以抓取大约2，160个url的帖子。但是，我无法超越这一点（有超过4，000，000个帖子）。有没有其他办法可以用“食物”标签来抓取整个帖子？或者至少是在2018-2019之间发布的帖子的网址？以下是我的抓取代码。谢谢！ instagram_url = "https://www.instagram.com" tag_url = "https://www.instagram.com/explore/tags" ads = "foody" # hashtag #pausetime pause_time = 2 #driver driver = webdriver.Chrome("chromedriver.exe") #go to hashtag page driver.get(f"{tag_url}/{ads}") time.sleep(pause_time) #scroll down lenOfPage = driver.execute_script("window.scrollTo(0, document.body.scrollHeight);var lenOfPage=document.body.scrollHeight;return lenOfPage;") match=False i = 0 while(match==False): #urls html = driver.page_source bs_html = BeautifulSoup(html, "lxml") for roots in bs_html.find_all(name="div", attrs={"class":"Nnq7C weEfm"}): for link in roots.select("a"): real = link.attrs["href"] if real not in reallink: reallink.append(real) print("appendend data: ", len(reallink)) #Scroll down lastCount = lenOfPage print(f"scrolling down {i}") i += 1 time.sleep(pause_time) lenOfPage = driver.execute_script("window.scrollTo(0, document.body.scrollHeight);var lenOfPage=document.body.scrollHeight;return lenOfPage;") if lastCount==lenOfPage: match=True

查看完整描述

2 回答

交互式爱情

TA贡献1712条经验获得超3个赞

尝试社交滚动Instagram扩展（我知道它真的很基本，但它对我有用）。正如Alvaro Bataller所说，如果你写了一些脚本来向下滚动，那么在滚动几个帖子instagram系统之后，系统会在一段时间内原子地阻止你，认为你可以成为一个机器人。

但是这个扩展有一个内置的冷却系统，它会暂停滚动，这样insta系统就不会把你误认为是一个机器人。而且它可以很容易地到达终点站，而不会被insta阻止时间。

反对回复 2022-08-16

红颜莎娜

TA贡献1842条经验获得超12个赞

使用Javascript，我能够向下滚动3176张图像，可以追溯到2年零4个月。

我总共找到了3166张图片。之后，它显示“无法加载”。

我再次尝试重复这个实验，现在似乎它不会让我向下滚动太多。

我的猜测是，Instagram对你可以抓取多少有某种限制，这样人们就不会滥用他们的服务器。

反对回复 2022-08-16

热搜

最近搜索清空

如何在Instagram上向下滚动到结尾

如何在Instagram上向下滚动到结尾

2 回答

添加回答