1 回答
TA贡献1851条经验 获得超4个赞
如果使用requestspackage 并在标头中添加用户代理,则看起来它会收到200所有 4 个链接的响应。所以尝试添加用户代理标头:
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36'}
import requests
from bs4 import BeautifulSoup as soup
# create urls
url1 = 'https://en.titolo.ch/sale'
url2 = 'https://en.titolo.ch/sale?limit=108'
url3 = 'https://en.titolo.ch/sale?category_styles=29838_21212'
url4 = 'https://en.titolo.ch/sale?category_styles=31066&limit=108'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36'}
url_list = [url1, url2, url3, url4]
for url in url_list:
# opening up connection on each url, grabbing the page
response = requests.get(url, headers=headers)
print (response.status_code)
输出:
200
200
200
200
所以:
import requests
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36'}
url = 'https://en.titolo.ch/sale?category_styles=31066&limit=108'
r = requests.get(url, headers=headers)
html = r.text
print(html)
添加回答
举报