为了账号安全,请及时绑定邮箱和手机立即绑定

抓取请求。

抓取请求。

largeQ 2021-12-09 15:57:56
我的代码有什么问题,我尝试获取与https://koleo.pl/rozklad-pkp/krakow-glowny/radom/19-03-2019_10:00/all/EIP-IC--EIC-相同的内容EIP-IC-KM-REG但结果与我想要的不同。import requestsfrom bs4 import BeautifulSoups = requests.Session()s.headers.update({"User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.186 Safari/537.36'})response=s.get('https://koleo.pl/rozklad-pkp/krakow-glowny/radom/19-03- 2019_10:00/all/EIP-IC--EIC-EIP-IC-KM-REG')soup=BeautifulSoup(response.text,'lxml')print(soup.prettify())
查看完整描述

2 回答

?
宝慕林4294392

TA贡献2021条经验 获得超8个赞

您可以使用请求并传入参数来获取火车信息和价格的 json。我没有解析出所有信息,因为这只是为了向您展示这是可能的。我解析出火车 ID 以便能够从价格信息发出后续请求,这些请求通过 ID 链接到火车信息


import requests

from bs4 import BeautifulSoup as bs


url = 'https://koleo.pl/pl/connections/?'


headers = {

    'Accept' : 'application/json, text/javascript, */*; q=0.01',

    'Accept-Encoding' : 'gzip, deflate, br',

    'Accept-Language' : 'en-US,en;q=0.9',

    'Connection' : 'keep-alive',

    'Cookie' : '_ga=GA1.2.2048035736.1553000429; _gid=GA1.2.600745193.1553000429; _gat=1; _koleo_session=bkN4dWRrZGx0UnkyZ3hjMWpFNGhiS1I3TzhQMGNyWitvZlZ0QVRUVVVtWUFPMUwxL0hJYWJyYnlGTUdHYXNuL1N6QlhHMHlRZFM3eFZFcjRuK3ZubllmMjdSaU5CMWRBSTFOc1JRc2lDUGV0Y2NtTjRzbzZEd0laZWI1bjJoK1UrYnc5NWNzZzNJdXVtUlpnVE15QnRnPT0tLTc1YzV1Q2xoRHF4VFpWWTdWZDJXUnc9PQ%3D%3D--3b5fe9bb7b0ce5960bc5bd6a00bf405df87f8bd4',

    'Host' : 'koleo.pl',

    'Referer' : 'https://koleo.pl/rozklad-pkp/krakow-glowny/radom/19-03-2019_10:00/all/EIP-IC--EIC-EIP-IC-KM-REG',

    'User-Agent' : 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36',

    'X-CSRF-Token' : 'heag3Y5/fh0hyOfgdmSGJBmdJR3Perle2vJI0VjB81KClATLsJxFAO4SO9bY6Ag8h6IkpFieW1mtZbD4mga7ZQ==',

    'X-Requested-With' : 'XMLHttpRequest'


}

params = {

    'v' : 'a0dec240d8d016fbfca9b552898aba9c38fc19d5',

    'query[date]' : '19-03-2019 10:00:00',

    'query[start_station]' : 'krakow-glowny',

    'query[end_station]': 'radom',

    'query[brand_ids][]' : '29',

    'query[brand_ids][]' : '28',

    'query[only_direct]' : 'false',

    'query[only_purchasable]': 'false'

}


with requests.Session() as s:

    data= s.get(url, params = params, headers = headers).json()

    print(data)

    priceUrl = 'https://koleo.pl/pl/prices/{}?v=a0dec240d8d016fbfca9b552898aba9c38fc19d5'

    for item in data['connections']:

        r = s.get(priceUrl.format(item['id'])).json()

        print(r)


查看完整回答
反对 回复 2021-12-09
?
喵喔喔

TA贡献1735条经验 获得超5个赞

您必须使用selenium才能获得动态生成的内容。然后你可以用BS解析html。例如,我解析了日期:


from bs4 import BeautifulSoup

from selenium import webdriver


driver = webdriver.Firefox()

driver.get('https://koleo.pl/rozklad-pkp/krakow-glowny/radom/19-03-2019_10:00/all/EIP-IC--EIC-EIP-IC-KM-REG')

soup = BeautifulSoup(driver.page_source, 'lxml')

for div in soup.findAll("div", {"class": 'date custom-panel'}):

    date = div.findAll("div", {"class": 'row'})[0].string.strip()

    print(date)

输出:


wtorek, 19 marca

środa, 20 marca


查看完整回答
反对 回复 2021-12-09
  • 2 回答
  • 0 关注
  • 165 浏览
慕课专栏
更多

添加回答

举报

0/150
提交
取消
意见反馈 帮助中心 APP下载
官方微信