为了账号安全,请及时绑定邮箱和手机立即绑定

php curl 获取网页内容 中文乱码

php curl 获取网页内容 中文乱码

PHP
慕工程0101907 2019-03-18 18:02:24
获取是没问题。。但是似乎字符编码上有些问题, <?php //header( "Content-type:text/html;Charset=utf-8" ); $urls = [ 'http://jobs.51job.com/' ]; $array = [ // 'user-agent:Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.101 Safari/537.36;' // 'accept-language:zh-CN,zh;q=0.8,zh-TW;q=0.6; 'Content-Type:text/html; charset=utf-8' ]; var_dump($urls); foreach ($urls as $url) { $ch = curl_init(); curl_setopt_array($ch, [ CURLOPT_URL => $url, CURLOPT_RETURNTRANSFER => true, CURLOPT_FOLLOWLOCATION => 10, CURLOPT_TIMEOUT => 30, CURLOPT_BINARYTRANSFER=>true, CURLOPT_ENCODING => 'gzip,deflate', CURLOPT_HTTPHEADER => $array ]); $output = curl_exec($ch); $info = curl_getinfo($ch); curl_close($ch); var_dump($info); mb_convert_encoding($output, 'utf-8', 'GBK,UTF-8,ASCII'); echo $output; // file_put_contents('str.txt' , $output,FILE_APPEND); } 顺带问一下获取拉钩内容一直显示页面加载中。。。 <br><html><head><meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1"><meta name="renderer" content="webkit"><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><script type="text/javascript" src="https://www.lagou.com/utrack/trackMid.js?version=1.0.0.3&t=1503291026"></script><body><input type="hidden" id="KEY" value="rsagIwk3yl2hnrkI98FuQACf9eerWodYa0dPJ"/><script type="text/javascript">kfGNYOsx();</script>页面加载中...<script type="text/javascript" src="https://www.lagou.com/upload/oss.js"></script></body></html>
查看完整描述

2 回答

?
慕哥9229398

TA贡献1877条经验 获得超6个赞

51job是gb2312编码,抓到内容转换一下就行

https://img1.sycdn.imooc.com//5c8f6caf0001d66805790059.jpg

mb_convert_encoding($contents,'utf-8','gb2312');
查看完整回答
反对 回复 2019-03-18
?
慕婉清6462132

TA贡献1804条经验 获得超2个赞

iconv('gbk','utf-8//IGNORE', $content);

查看完整回答
反对 回复 2019-03-18
  • 2 回答
  • 0 关注
  • 1473 浏览

添加回答

举报

0/150
提交
取消
意见反馈 帮助中心 APP下载
官方微信