我正在尝试使用 PDF 框阅读器获取当前页面。听到的是我写的代码。公共类 PDFTextExtractor{ArrayList extractText(String fileName) 抛出异常 {PDDocument document = null;try { document = PDDocument.load( new File(fileName) ); PDFTextAnalyzer stripper = new PDFTextAnalyzer(); stripper.setSortByPosition( true ); stripper.setStartPage( 0 ); stripper.setEndPage( document.getNumberOfPages() ); Writer dummy = new OutputStreamWriter(new ByteArrayOutputStream()); stripper.writeText(document, dummy); return stripper.getCharactersList();}finally { if( document != null ) { document.close(); }}}当我试图获取详细信息时,我正在编写以下代码。public class PDFTextAnalyzer extends PDFTextStripper { public PDFTextAnalyzer() throws IOException { super(); // TODO Auto-generated constructor stub } private ArrayList<CharInfo> charactersList = new ArrayList<CharInfo>(); public ArrayList<CharInfo> getCharactersList() { return charactersList; } public void setCharactersList(ArrayList<CharInfo> charactersList) { this.charactersList = charactersList; } @Override protected void writeString(String string, List<TextPosition> textPositions) throws IOException { System.out.println("----->"+document.getPages().getCount());/* for(int i = 0 ; i < document.getPages().getCount();i++) { */ float docHeight = +document.getPage(1).getMediaBox().getHeight(); for (TextPosition text : textPositions) { /* * System.out.println((int)text.getUnicode().charAt(0)+" "+text. * getUnicode()+ " [(X=" + text.getXDirAdj()+" "+text.getX() + ",Y=" * + text.getYDirAdj() + ") height=" + text.getHeightDir() + * " width=" + text.getWidthDirAdj() + "]"); */ ); }但我无法获取页码。请参阅行注释“当前文本的页码”。有没有办法获取页码。
添加回答
举报
0/150
提交
取消