获取是没问题。。但是似乎字符编码上有些问题,
<?php
//header( "Content-type:text/html;Charset=utf-8" );
$urls = [
'http://jobs.51job.com/'
];
$array = [
// 'user-agent:Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.101 Safari/537.36;'
// 'accept-language:zh-CN,zh;q=0.8,zh-TW;q=0.6;
'Content-Type:text/html; charset=utf-8'
];
var_dump($urls);
foreach ($urls as $url) {
$ch = curl_init();
curl_setopt_array($ch, [
CURLOPT_URL => $url,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_FOLLOWLOCATION => 10,
CURLOPT_TIMEOUT => 30,
CURLOPT_BINARYTRANSFER=>true,
CURLOPT_ENCODING => 'gzip,deflate',
CURLOPT_HTTPHEADER => $array
]);
$output = curl_exec($ch);
$info = curl_getinfo($ch);
curl_close($ch);
var_dump($info);
mb_convert_encoding($output, 'utf-8', 'GBK,UTF-8,ASCII');
echo $output;
// file_put_contents('str.txt' , $output,FILE_APPEND);
}
顺带问一下获取拉钩内容一直显示页面加载中。。。
<br><html><head><meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1"><meta name="renderer" content="webkit"><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><script type="text/javascript" src="https://www.lagou.com/utrack/trackMid.js?version=1.0.0.3&t=1503291026"></script><body><input type="hidden" id="KEY" value="rsagIwk3yl2hnrkI98FuQACf9eerWodYa0dPJ"/><script type="text/javascript">kfGNYOsx();</script>页面加载中...<script type="text/javascript" src="https://www.lagou.com/upload/oss.js"></script></body></html>
2 回答
![?](http://img1.sycdn.imooc.com/533e4c2300012ab002200220-100-100.jpg)
慕哥9229398
TA贡献1877条经验 获得超6个赞
51job是gb2312编码,抓到内容转换一下就行
mb_convert_encoding($contents,'utf-8','gb2312');
- 2 回答
- 0 关注
- 1473 浏览
添加回答
举报
0/150
提交
取消