Python scraper 总能找到网站的前一个版本和当前版本之间的差异，而没有

我正在编写一个机器人，它检查自上次抓取以来给定网站上是否有任何更改。为此，它抓取网站，将其 html 代码存储在本地文件中，然后一遍又一遍地抓取它，如果新旧版本之间存在差异，它会覆盖本地文件并打印“已触发”。问题是我的脚本总是会发现差异并覆盖文件，即使没有更改。可重现的例子：import requestsimport timeimport osdef compare(file, url): if os.path.isfile("./" + file): scrape = requests.get(url).text with open(file) as f: txt=f.read() if not txt == scrape: with open(file, "w") as f: f.write(scrape) print("Triggered") else: scrape=requests.get(url).text with open(file, "w") as f: f.write(scrape)ceu = "https://hro.ceu.edu/find-job"ceu_file = "ceu.html"while True: compare(ceu, ceu_file) time.sleep(10)因此，问题在于每次抓取网站时都会触发脚本 - 即使该网站不会每 10 秒更改一次。为什么然后txt==scrape在函数中始终为 false 从而触发脚本？

查看完整描述

Python scraper 总能找到网站的前一个版本和当前版本之间的差异，而没有

Python scraper 总能找到网站的前一个版本和当前版本之间的差异，而没有

1 回答

添加回答

热搜

最近搜索清空

Python scraper 总能找到网站的前一个版本和当前版本之间的差异，而没有

Python scraper 总能找到网站的前一个版本和当前版本之间的差异，而没有

1 回答

添加回答