使用 java Apache PDFBOX 添加 HTML 标记

我一直在使用 PDFBOX 和 EasyTable，它扩展了 PDFBOX 来绘制数据表。我遇到了一个问题，我有一个带有 HTML 数据字符串的 java 对象，我需要使用 PDFBOX 将其添加到 PDF 中。对文档的挖掘似乎没有产生任何成果。下面的代码是一个片段 hello world，我希望生成的 pdf 具有 H1 格式。// Create a document and add a page to it PDDocument document = new PDDocument(); PDPage page = new PDPage(); document.addPage( page );// Create a new font object selecting one of the PDF base fonts PDFont font = PDType1Font.HELVETICA_BOLD;// Start a new content stream which will "hold" the to be created content PDPageContentStream contentStream = new PDPageContentStream(document, page);// Define a text content stream using the selected font, moving the cursor and drawing the text "Hello World" contentStream.beginText(); contentStream.setFont( font, 12 ); contentStream.moveTextPositionByAmount( 100, 700 ); contentStream.drawString( "<h1>HelloWorld</h1>" ); contentStream.endText();// Make sure that the content stream is closed: contentStream.close();// Save the results and ensure that the document is properly closed: document.save( "Hello World.pdf"); document.close(); }

查看完整描述

2 回答

慕娘9325324

TA贡献1783条经验获得超5个赞

使用jerico将 html 格式化为自由文本，同时正确映射标签的输出。

样本

public String extractAllText(String htmlText){
    return new net.htmlparser.jericho
            .Source(htmlText)
            .getRenderer()
            .setMaxLineLength(Integer.MAX_VALUE)
            .setNewLine(null)
            .toString();
}

在你的 gradle 或 Maven 中包含：

compile group: 'net.htmlparser.jericho', name: 'jericho-html', version: '3.4'

反对回复 2023-09-20

繁花不似锦

TA贡献1851条经验获得超4个赞

PDFBox 不支持 HTML，至少不支持创建内容。

因此，使用普通 PDFBox，您必须自己解析 HTML 并从文本所在的标签中派生特殊的文本绘制特征。

例如，当您遇到时"<h1>HelloWorld</h1>"，您必须提取文本"HelloWorld"并使用标签中的信息h1来选择适当的主要标题字体和字体大小来绘制该文本"HelloWorld"。

或者，您可以寻找一个为 PDFBox 执行 HTML 解析和转换为 PDF 文本绘制指令的库，例如Open HTML to PDF。

反对回复 2023-09-20

热搜

最近搜索清空

使用 java Apache PDFBOX 添加 HTML 标记

使用 java Apache PDFBOX 添加 HTML 标记

2 回答

添加回答