我可以从 .DOC/.DOCX 文件中获得文本/纯文本输出。我只想用 PHP 计算此输出的单词数(无数字或标点符号)并显示在 HTML 页面中。所以我有这个:<button type="button" id="load" class="btn btn-md btn-info">LOAD FILES</button><br><div id="result"></div><script src="../vendors/jquery/dist/jquery.min.js"></script><script src="https://static.filestackapi.com/v3/filestack.js"></script><script> function numWordsR(urlk){ $.post("result_filestack.php",{ molk: urlk //urlk, example: https://process.filestackapi.com/output=format:txt/AXXXXAXeeeeW33A"; }).done(function(resp){ $("#result").html(resp); }); }</script>我的文件result_filestack.php:$url = $_POST['molk'];$content = file_get_contents($url); //get txt/plain output content$onlywords = preg_replace('/[[:punct:]\d]+/', '', $content); //no numbers nor punctuation symbolsfunction get_num_of_words($string) { $string = preg_replace('/\s+/', ' ', trim($string)); $words = explode(" ", $string); return count($words);}$numwords = get_num_of_words($onlywords);echo "<b>TEXT:</b>: ".$onlywords."<br><br>Number of words: ".$numwords;我得到这个结果:例如,在这种情况下,结果显示文本中有 585 个单词,但是如果我将该文本复制并粘贴到 MS Word 中,它会显示 612 个单词。我更改 PHP 代码以映射文本数组:function get_text($string) { $string = preg_replace('/\s+/', ' ', trim($string)); $words = explode(" ", $string); return $words;}$texto002 = get_text($onlywords);echo print_r($texto002);我注意到数词有错误,有的地方把两三个词合二为一:我该如何解决?
1 回答
BIG阳
TA贡献1859条经验 获得超6个赞
这可能是因为空格不是常规空格而是特殊字符,前一段时间经历过这种情况,在爆炸常规空格之前,我用空格替换了实体
function get_num_of_words($string) {
$string = preg_replace('/\s+/', ' ', trim($string));
$string = str_replace(" ", " ", $string);
$string = str_replace(" ", " ", $string);
$words = explode(" ", $string);
return count($words);
}
- 1 回答
- 0 关注
- 166 浏览
添加回答
举报
0/150
提交
取消