2 回答
TA贡献1802条经验 获得超5个赞
这可以通过使用Promise.all
, 和await
/async
如果我的理解是正确的,您正在尝试执行以下步骤:
获取原始 HTML
提取一些 HTML(看起来你想要更多的 url)
对于提取的每个网址,您要重新调用
cloudscraper
将每次调用的结果放回到原始数据对象中。
const getData = async (pageUrl) => {
const html = await cloudscraper.get(pageUrl);
const data = extractHtml(html);
const promises = data.array.map( d => cloudscraper.get(d));
const results = await Promise.all(promises);
// If you wanted to map the results back into the originaly data object
data.array.forEach( (a, idx) => a = results[idx] );
return data;
};
TA贡献2003条经验 获得超2个赞
避免此类问题的最佳方法是使用async/await,如评论中建议的那样。这是基于您的代码的示例:
const getData = async function(pageUrl) {
var data;
// first cloudscraper call:
// retrieve main html
try {
const html = await cloudscraper.get(pageUrl);
// scrape data from it
data = extract(html);
for (let i = 0; i < data.array.length; ++i) {
// for each URL scraped, call cloudscraper
// to retrieve other data
const newHtml = await cloudscraper.get(data.array[i]);
// get other data with cheerio
// and stores it in the same array
data.array[i] = getNewData(newHtml); // if getNewData is also async, you need to add await
}
} catch (error) {
// handle error
}
return data;
}
// You can call getData with .then().catch() outside of async functions
// and with await inside async functions
添加回答
举报