使用 JAVA 搜索和替换 PDF 中的文本

需要用不同的语言替换pdf中的文本。在第一步中，我尝试使用 itextpdf ad pdfbox API 搜索和替换 pdf 文件中的文本。使用下面的代码片段，它使用 itextpdf api 从源 PDF 文件中搜索文本“Hello”并将其替换为“Hi”。创建新 PDF 时没有任何文本替换。public void manipulatePdf(String src, String dest) throws Exception { PdfDocument pdfDoc = new PdfDocument(new PdfReader(SRC), new PdfWriter(DEST)); int noOfPages = pdfDoc.getNumberOfPages(); for (int i = 1; i < noOfPages; i++) { PdfPage page = pdfDoc.getPage(i); PdfDictionary dict = page.getPdfObject(); PdfObject object = dict.get(PdfName.Contents); if (object instanceof PdfStream) { PdfStream stream = (PdfStream) object; byte[] data = stream.getBytes(); stream.setData(new String(data).replace("Hello", "Hi").getBytes("UTF-8")); } } pdfDoc.close();}还使用 apache pdfbox 来实现相同的目的，但没有运气。下面是参考的代码片段。 public static PDDocument replaceText(PDDocument document, String searchString, String replacement) throws IOException { for (PDPage page : document.getPages()) { PDFStreamParser parser = new PDFStreamParser(page); parser.parse(); List tokens = parser.getTokens(); for (int j = 0; j < tokens.size(); j++) { Object next = tokens.get(j); if (next instanceof Operator) { Operator op = (Operator) next; // Tj and TJ are the two operators that display strings in a PDF if (op.getName().equals("Tj")) { // Tj takes one operator and that is the string to display // so lets update that operator COSString previous = (COSString) tokens.get(j - 1); String string = previous.getString(); //System.out.println(new String(string.getBytes(StandardCharsets.UTF_8), StandardCharsets.UTF_8)); string = string.replaceFirst(searchString, replacement); previous.setValue(string.getBytes()); } 任何解决方案/建议都受到高度赞赏。

查看完整描述

使用 JAVA 搜索和替换 PDF 中的文本

使用 JAVA 搜索和替换 PDF 中的文本

2 回答

添加回答

热搜

最近搜索清空

使用 JAVA 搜索和替换 PDF 中的文本

使用 JAVA 搜索和替换 PDF 中的文本

2 回答

添加回答