虽然能够成功地抓取第一页,但它不允许我执行第二页。请注意,我不想对 Selinum 执行此操作。import requestsfrom bs4 import BeautifulSoupurl = 'https://google.com/search?q=In+order+to&hl=en'headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0'}page = 1while True: print() print('Page {}...'.format(page)) print('-' * 80) soup = BeautifulSoup(requests.get(url, headers=headers).content, 'html.parser') for h in soup.select('h3'): print(h.get_text(strip=True)) next_link = soup.select_one('a:contains("Next")') if not next_link: break url = 'https://google.com' + next_link['href'] page += 1结果:Page 1...--------------------------------------------------------------------------------In order to Synonyms, In order to Antonyms | Thesaurus.comIn order to - English Grammar Today - Cambridge Dictionaryin order to - WiktionaryWhat is another word for "in order to"? - WordHippoIn Order For (someone or something) To | Definition of In ...In Order For | Definition of In Order For by Merriam-WebsterIn order to definition and meaning | Collins English DictionaryUsing "in order to" in English - English Study PageIN ORDER (FOR SOMEONE / SOMETHING ) TO DO ...262 In Order To synonyms - Other Words for In Order ToSearches related to In order toOnly the following pseudo-classes are implemented: nth-of-type.错误就出在这里:next_link = soup.select_one('a:contains("Next")')
1 回答
![?](http://img1.sycdn.imooc.com/54584ed2000152a202200220-100-100.jpg)
MMMHUHU
TA贡献1834条经验 获得超8个赞
您可以用作lxml解析器而不是html.parser
安装它pip install lxml
import requests
from bs4 import BeautifulSoup
url = 'https://google.com/search?q=In+order+to&hl=en'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0'}
page = 1
while True:
print()
print('Page {}...'.format(page))
print('-' * 80)
soup = BeautifulSoup(requests.get(url, headers=headers).content, 'lxml')
for h in soup.select('h3'):
print(h.get_text(strip=True))
next_link = soup.select_one('a:contains("Next")')
if not next_link:
break
url = 'https://google.com' + next_link['href']
page += 1
添加回答
举报
0/150
提交
取消