2 回答
TA贡献1816条经验 获得超4个赞
您需要将解析器更改为 lxml 并使用 headers = {'user-agent': 'Mozilla/5.0'}
def getAmazonPrice(productUrl):
headers = {'user-agent': 'Mozilla/5.0'} # to make the server think its a web browser and not a bot
res = requests.get(productUrl, headers=headers)
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, 'lxml')
elems = soup.select_one('#mediaNoAccordion > div.a-row > div.a-column.a-span4.a-text-right.a-span-last')
return elems.text.strip()
price = getAmazonPrice('https://www.amazon.com/Automate-Boring-Stuff-Python-2nd-ebook/dp/B07VSXS4NK/ref=sr_1_1?crid=30NW5VCV06ZMP&dchild=1&keywords=automate+the+boring+stuff+with+python&qid=1586810720&sprefix=automate+the+bo%2Caps%2C288&sr=8-1')
print('The price is ' + price)
快照:
如果你想使用选择然后
def getAmazonPrice(productUrl):
headers = {'user-agent': 'Mozilla/5.0'} # to make the server think its a web browser and not a bot
res = requests.get(productUrl, headers=headers)
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, 'lxml')
elems = soup.select('#mediaNoAccordion > div.a-row > div.a-column.a-span4.a-text-right.a-span-last')
return elems[0].text.strip()
price = getAmazonPrice('https://www.amazon.com/Automate-Boring-Stuff-Python-2nd-ebook/dp/B07VSXS4NK/ref=sr_1_1?crid=30NW5VCV06ZMP&dchild=1&keywords=automate+the+boring+stuff+with+python&qid=1586810720&sprefix=automate+the+bo%2Caps%2C288&sr=8-1')
print('The price is ' + price)
尝试用这个。
def getAmazonPrice(productUrl):
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0'} # to make the server think its a web browser and not a bot
res = requests.get(productUrl, headers=headers)
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, 'lxml')
elems = soup.select('#mediaNoAccordion > div.a-row > div.a-column.a-span4.a-text-right.a-span-last')
return elems[0].text.strip()
price = getAmazonPrice('https://www.amazon.com/Automate-Boring-Stuff-Python-2nd-ebook/dp/B07VSXS4NK/ref=sr_1_1?crid=30NW5VCV06ZMP&dchild=1&keywords=automate+the+boring+stuff+with+python&qid=1586810720&sprefix=automate+the+bo%2Caps%2C288&sr=8-1')
print('The price is ' + price)
TA贡献1807条经验 获得超9个赞
您的请求将触发亚马逊的 503 错误。也许是由于亚马逊的反抓取努力。所以也许你应该考虑一些其他的方法。
import requests
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0'} # to make the server think its a web browser and not a bot
productUrl = 'https://www.amazon.com/Automate-Boring-Stuff-Python-2nd-ebook/dp/B07VSXS4NK/ref=sr_1_1?crid=30NW5VCV06ZMP&dchild=1&keywords=automate+the+boring+stuff+with+python&qid=1586810720&sprefix=automate+the+bo%2Caps%2C288&sr=8-1'
res = requests.get(productUrl, headers=headers)
print (res)
输出:
<Response [503]>
- 2 回答
- 0 关注
- 133 浏览
添加回答
举报