为了账号安全,请及时绑定邮箱和手机立即绑定

使用 Selenium 和 BeautifulSoup,如何提取 javascript 变量?

使用 Selenium 和 BeautifulSoup,如何提取 javascript 变量?

临摹微笑 2022-06-07 18:54:41
网址 = ' https://gma-threads4thought.com/ '我已经使用 BeautifulSoup 成功地抓取了产品标题和价格,但是数量变量在 javascript 后面。所以我的问题是:我该如何提取 product.variants[0].inventory 数量。我这个项目的目标是用 XML 组织数据,唯一阻止我的是这个 javascript 变量任何帮助深表感谢!import urllib.requestfrom bs4 import BeautifulSoupfrom selenium import webdriverimport timeimport pandas as pdurl = 'https://gma-threads4thought.com/'driver = webdriver.Chrome()driver.get(url)driver.execute_script("window.scrollTo(0,document.body.scrollHeight);var lenOfPage=document.body.scrollHeight;return lenOfPage;")time.sleep(5)#driver.quit()results = driver.find_elements_by_xpath("//*[@id='product-actions-4102174081087']")print('Number of results', len(results))data = []product = driver.execute_script('$(function() { return product.variants[0].inventory_quantity;});')
查看完整描述

2 回答

?
大话西游666

TA贡献1817条经验 获得超14个赞

这是另一个想法,我可以解释为什么您的 javascript 不起作用,而其他 2 个答案没有。

首先,您没有返回值,因此返回值将始终为 None。其次,产品变量在您无权访问的范围内是本地的。所以我们必须再次运行它:

首先获取脚本

script = driver.execute_script('return document.querySelector("#product-actions-4102174081087 script").innerText')

然后删除使其超出范围的包装器部分

modified = re.sub(r'\$\(function\(\)\{|\}\);', '', script).strip()

然后运行它,但在最后包含一个返回:

variants = driver.execute_script(modified + ';return product.variants')


查看完整回答
反对 回复 2022-06-07
?
繁花如伊

TA贡献2012条经验 获得超12个赞

另一个答案中的 html 没有加载 javascript,可以通过 Requests 库 ( pip install requests) 的简单请求来访问。我所要做的就是解析出数量(类似于其他答案)。另外,我注意到每种尺寸都有不同的数量,因此我将其包含在最终输出中。


代码

import requests, re, json

from bs4 import BeautifulSoup


r = requests.get('https://gma-threads4thought.com/')

soup = BeautifulSoup(r.text, 'html.parser')


all_products= {}

for i, product in enumerate(soup.find_all('div', class_='grid__item')):

    script = product.find('script').text

    all_quantities = []

    for section in script.split('\n'):

        section_rw = section.strip()

        if section_rw.startswith('var product = '):

            json_data = json.loads(re.search(r'({.*?});', section_rw).group(1))

            item_name = json_data['title']

            sizes = [size['title'] for size in json_data['variants']]

        if section_rw.startswith('product.variants'):

            quant = re.search('(\d+);', section.strip()).group(1)

            all_quantities.append(quant)


    product_quantities = {size: quant for size, quant in zip(sizes, all_quantities)}

    all_products[item_name] = product_quantities


print(all_products)

输出

{'Malana Sports Bra - Raw Denim': {'XS': '40', 'S': '149', 'M': '195', 'L': '144', 'XL': '248', '1X': '197', '2X': '139', '3X': '89'}, 'Moto Skinny Legging - Raw Denim': {'XS': '59', 'S': '177', 'M': '190', 'L': '91', 'XL': '246', '1X': '246', '2X': '184', '3X': '73'}, 'Betty High Waist Legging - Jet Black': {'XS': '29', 'S': '65', 'M': '67', 'L': '53', 'XL': '16', 'XXL': '5'}, 'Betty High Waist Legging - Heather Royal Burgundy': {'XS': '16', 'S': '40', 'M': '49', 'L': '32', 'XL': '17', 'XXL': '6'}, 'Betty High Waist Legging - Heather Fig': {'XS': '15', 'S': '41', 'M': '49', 'L': '32', 'XL': '16', 'XXL': '7'}, 'Betty High Waist Legging - Heather Chambray': {'XS': '15', 'S': '37', 'M': '45', 'L': '28', 'XL': '10', 'XXL': '6'}, 'Betty Mid Rise Legging - Jet Black': {'XS': '16', 'S': '39', 'M': '43', 'L': '28', 'XL': '14', 'XXL': '6'}, 'Betty Mid Rise Legging - Heather Royal Burgundy': {'XS': '17', 'S': '43', 'M': '49', 'L': '30', 'XL': '17', 'XXL': '8'}, 'Betty Mid Rise Legging - Heather Fig': {'XS': '17', 'S': '39', 'M': '52', 'L': '33', 'XL': '17', 'XXL': '9'}, 'Betty Mid Rise Legging - Heather Chambray': {'XS': '17', 'S': '40', 'M': '50', 'L': '31', 'XL': '17', 'XXL': '7'}, 'Claire High Waist 7/8 Legging - Jet Black': {'XS': '16', 'S': '34', 'M': '48', 'L': '24', 'XL': '9', 'XXL': '1'}, 'Claire High Waist 7/8 Legging - Heather Chambray': {'XS': '17', 'S': '42', 'M': '52', 'L': '32', 'XL': '15', 'XXL': '7'}, 'Claire High Waist 7/8 Legging - Heather Royal Burgundy': {'XS': '16', 'S': '43', 'M': '52', 'L': '33', 'XL': '16', 'XXL': '9'}, 'Claire High Waist 7/8 Legging - Heather Fig': {'XS': '17', 'S': '42', 'M': '49', 'L': '33', 'XL': '17', 'XXL': '8'}, 'Claire Mid Rise 7/8 Legging - Jet Black': {'XS': '14', 'S': '41', 'M': '48', 'L': '31', 'XL': '15', 'XXL': '7'}, 'Claire Mid Rise 7/8 Legging - Heather Royal Burgundy': {'XS': '17', 'S': '41', 'M': '50', 'L': '34', 'XL': '17', 'XXL': '9'}, 'Claire Mid Rise 7/8 Legging - Heather Fig': {'XS': '17', 'S': '43', 'M': '50', 'L': '34', 'XL': '17', 'XXL': '9'}, 'Claire Mid Rise 7/8 Legging - Heather Chambray': {'XS': '16', 'S': '42', 'M': '48', 'L': '33', 'XL': '17', 'XXL': '8'}, 'High Rise Monica Legging - Jet Black': {'XS': '104', 'S': '307', 'M': '334', 'L': '368', 'XL': '269', 'XXL': '102'}, 'Leigh Long Sleeve Scoop Neck - Raw Denim': {'XS': '12', 'S': '14', 'M': '4', 'L': '0', 'XL': '0'}, 'Leigh Long Sleeve Scoop Neck - Black': {'XS': '17', 'S': '22', 'M': '25', 'L': '0', 'XL': '3'}, 'Leigh Long Sleeve Scoop Neck - Ultra Maroon': {'XS': '14', 'S': '25', 'M': '16', 'L': '4', 'XL': '0'}, 'Leigh Long Sleeve Scoop Neck - White': {'XS': '15', 'S': '22', 'M': '11', 'L': '0', 'XL': '5'}, 'Liza Long Sleeve V-Neck - Raw Denim': {'XS': '15', 'S': '27', 'M': '26', 'L': '0', 'XL': '0'}, 'Liza Long Sleeve V-Neck - Black': {'XS': '15', 'S': '28', 'M': '20', 'L': '1', 'XL': '0'}, 'Liza Long Sleeve V-Neck - Ultra Maroon': {'XS': '20', 'S': '33', 'M': '26', 'L': '7', 'XL': '4'}, 'Liza Long Sleeve V-Neck - White': {'XS': '17', 'S': '38', 'M': '21', 'L': '10', 'XL': '2'}, 'Lunette Sports Bra - Black': {'XS': '11', 'S': '35', 'M': '36', 'L': '29', 'XL': '8', 'XXL': '2'}, 'Lunette Sports Bra - Heather Charcoal': {'XS': '12', 'S': '37', 'M': '35', 'L': '28', 'XL': '12', 'XXL': '0'}, 'Malana Sports Bra - Heather Chambray': {'XS': '13', 'S': '39', 'M': '42', 'L': '34', 'XL': '16', 'XXL': '3'}, 'Malana Sports Bra - Heather Charcoal': {'XS': '26', 'S': '63', 'M': '59', 'L': '39', 'XL': '20'}, 'Malana Sports Bra - Jet Black': {'XS': '11', 'S': '37', 'M': '38', 'L': '29', 'XL': '15', 'XXL': '4'}, 'Malana Sports Bra - Heather Royal Burgundy': {'XS': '12', 'S': '38', 'M': '41', 'L': '34', 'XL': '17', 'XXL': '8'}, 'Malana Sports Bra - Heather Fig': {'XS': '12', 'S': '38', 'M': '43', 'L': '33', 'XL': '17', 'XXL': '9'}, 'Monica Legging - Jet Black': {'XS': '174', 'S': '513', 'M': '556', 'L': '624', 'XL': '449', 'XXL': '175'}, 'Sileni Thermal Top - Heather Grey': {'XS': '18', 'S': '51', 'M': '59', 'L': '38', 'XL': '13', 'XXL': '1'}, 'Sileni Thermal Top - Heather Charcoal': {'XS': '16', 'S': '51', 'M': '47', 'L': '34', 'XL': '16', 'XXL': '1'}, 'Thermal Jogger - Heather Grey': {'XS': '4', 'S': '27', 'M': '32', 'L': '23', 'XL': '8', 'XXL': '2'}, 'Thermal Jogger - Heather Charcoal': {'XS': '10', 'S': '35', 'M': '17', 'L': '13', 'XL': '7', 'XXL': '0'}}



查看完整回答
反对 回复 2022-06-07
  • 2 回答
  • 0 关注
  • 271 浏览
慕课专栏
更多

添加回答

举报

0/150
提交
取消
意见反馈 帮助中心 APP下载
官方微信