1 回答
TA贡献1735条经验 获得超5个赞
正如评论中所建议的,您可以使用PdfCanvasEditorfrom this answer根据需要从内容流中过滤操作。实际上,我稍微扩展了该类,以便能够正确支持'和"文本绘制运算符。您可以在此处找到该课程。
就像在您的方法中一样,要清除的行是在第一次运行时确定的:我RegexBasedLocationExtractionStrategy为此使用了一个实例。
此后,在该PdfCanvasEditor步骤中,将在这些行上绘制文本的指令更改为仅绘制空字符串。
不过,由于不是您检查的事件导致在此处绘制文本,而是更基本的运算符和操作数结构,因此确切的机制不是从IEventFilter. 但是机制与您的方法相似。
try (PdfDocument pdfDocument = new PdfDocument(SOURCE_PDF_READER, TARGET_PDF_WRITER)) {
List<Rectangle> triggerRectangles = new ArrayList<>();
PdfCanvasEditor editor = new PdfCanvasEditor()
{
{
Field field = PdfCanvasProcessor.class.getDeclaredField("textMatrix");
field.setAccessible(true);
textMatrixField = field;
}
@Override
protected void nextOperation(PdfLiteral operator, List<PdfObject> operands) {
try {
recentTextMatrix = (Matrix)textMatrixField.get(this);
} catch (IllegalArgumentException | IllegalAccessException e) {
throw new RuntimeException(e);
}
}
@Override
protected void write(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands)
{
String operatorString = operator.toString();
if (TEXT_SHOWING_OPERATORS.contains(operatorString))
{
Matrix matrix = null;
try {
matrix = recentTextMatrix.multiply(getGraphicsState().getCtm());
} catch (IllegalArgumentException e) {
throw new RuntimeException(e);
}
float y = matrix.get(Matrix.I32);
if (triggerRectangles.stream().anyMatch(rect -> rect.getBottom() <= y && y <= rect.getTop())) {
if ("TJ".equals(operatorString))
operands.set(0, new PdfArray());
else
operands.set(operands.size() - 2, new PdfString(""));
}
}
super.write(processor, operator, operands);
}
final List<String> TEXT_SHOWING_OPERATORS = Arrays.asList("Tj", "'", "\"", "TJ");
final Field textMatrixField;
Matrix recentTextMatrix;
};
for (int i = 1; i <= pdfDocument.getNumberOfPages(); i++)
{
PdfPage page = pdfDocument.getPage(i);
Set<PdfName> xobjectNames = page.getResources().getResourceNames(PdfName.XObject);
for (PdfName xobjectName : xobjectNames) {
PdfFormXObject xobject = page.getResources().getForm(xobjectName);
byte[] content = xobject.getPdfObject().getBytes();
PdfResources resources = xobject.getResources();
RegexBasedLocationExtractionStrategy regexLocator = new RegexBasedLocationExtractionStrategy("Created by:|Calendar:");
new PdfCanvasProcessor(regexLocator).processContent(content, resources);
triggerRectangles.clear();
triggerRectangles.addAll(regexLocator.getResultantLocations().stream().map(loc -> loc.getRectangle()).collect(Collectors.toSet()));
PdfCanvas pdfCanvas = new PdfCanvas(new PdfStream(), resources, pdfDocument);
editor.editContent(content, resources, pdfCanvas);
xobject.getPdfObject().setData(pdfCanvas.getContentStream().getBytes());
}
}
}
(EditPageContent测试testRemoveSpecificLinesCalendar)
请注意,这是一个概念验证,它是为 OP 的用例特别定制的:PdfCanvasEditor此处仅用于检查和编辑每个页面的第一级表单 XObjects,因为从 Google 日历以 Agenda 格式创建的 PDF 包含他们所有的页面内容都以 XObject 形式呈现,而 XObject 又会在页面内容流中绘制。此外,预计文本将与页面顶部平行。
添加回答
举报