从 java PDFBOX 获取违规行为

Java

莫回无 2021-11-03 14:26:26

我正在使用 Java PDFBOX 2.0.12 来尝试阅读乳胶生成的 PDF。一切似乎都很好，但由于某种原因，某些值（<、<=、>、>=）被更改为问号（？），并且我收到各种警告，例如“警告：a105（105）没有Unicode映射字体 F18"。任何帮助将不胜感激。爪哇代码：try { PDDocument document = PDDocument.load(file); PDFTextStripper pdfStripper = new PDFTextStripper(); //Retrieving text from PDF document String text = pdfStripper.getText(document); System.out.println(text); //Closing the document document.close(); } catch (InvalidPasswordException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); }乳胶代码：\documentclass[12pt]{article}\usepackage[a5paper]{geometry}\usepackage[T1]{fontenc} % font encoding\usepackage[utf8]{inputenc}\title{algorithmicx (algpseudocode) example}\usepackage{algpseudocode}\begin{document}\begin{algorithmic}[1]\If{$quality\ge 9$}:\State $a\gets perfect$\ElsIf{$quality\ge 7$}:\State $a\gets good$\ElsIf{$quality\ge 5$}:\State $a\gets medium$\ElsIf{$quality\ge 3$}:\State $a\gets bad$\Else\State $a\gets unusable$\EndIf\end{algorithmic}\end{document}生成/使用的文档：https : //drive.google.com/file/d/1P16FMHc1Pkd897G448Zd_6pgmnoWQLGt/view?usp=sharing

查看完整描述

1 回答

慕桂英546537

TA贡献1848条经验获得超10个赞

正如评论中所讨论的 - 该文件应该写为 utf8 并使用像 NOTEPAD++ 这样的好编辑器打开。

try (OutputStreamWriter out = new OutputStreamWriter(new FileOutputStream(txtFile), Charsets.UTF_8);

PDDocument document = PDDocument.load(pdfFile))

{

PDFTextStripper stripper = new PDFTextStripper();

stripper.writeText(document, out);

}

反对回复 2021-11-03

热搜

最近搜索清空

从 java PDFBOX 获取违规行为

从 java PDFBOX 获取违规行为

1 回答

添加回答