我是一名大学生,正在从事一个学期的项目,但我的项目遇到了瓶颈。在我继续之前,请知道我查看了有关堆栈溢出的类似线程,它们似乎都与我的情况不符。我有一个从 pdf 生成的字符串输入,其中包含来自表格的丰富数据。问题是,由于格式的原因,部门列的某些表条目从 1 行变为 2 行,我无法解决它。例如,PS 253(由我的算法处理得很好)嘛243HON(打破一切)我需要最终能够将它们放在同一行并删除 MA 之后的“\n”以将其发送到程序的其余部分。我尝试在部门代码 (MA) 之后检查 \n 一两个索引位置,并更改从中获得 243HON 的索引,但这不起作用。我也试过 String = string.replaceAll("MA \n", "MA ") 如代码所示。删除 MA 和 \n 之间的空格没有任何作用。这是我的代码的相关部分。谢谢!public static String[] departments = {"\nAS","\nSF","\nAE","\nAF","\nAT","\nLAR","\nAMS","\nBIO","\nBA","\nCHM","\nLCH","\nCIV","\nCSO", "\nCOM","\nCEC","\nCS","\nCYB","\nEC","\nEE","\nEGR","\nEP","\nES","\nFA","\nGCS","\nHS","\nHON","\nHF","\nHU","\nMA","\nME","\nWX", "\nMSL","\nNSC","\nPE","\nPS","\nPSY","\nSIM","\nSS","\nSE","\nSP","\nSYS","\nUNIV","\nUA"};public static String[] departmentsFix = {"\nAS \n","\nSF \n","\nAE \n","\nAF \n","\nAT \n","\nLAR \n","\nAMS \n","\nBIO \n","\nBA \n","\nCHM \n","\nLCH \n","\nCIV \n","\nCSO \n", "\nCOM \n","\nCEC \n","\nCS \n","\nCYB \n","\nEC \n","\nEE \n","\nEGR \n","\nEP \n","\nES \n","\nFA \n","\nGCS \n","\nHS \n","\nHON \n","\nHF \n","\nHU \n","\nMA \n","\nME \n","\nWX \n", "\nMSL \n","\nNSC \n","\nPE \n","\nPS \n","\nPSY \n","\nSIM \n","\nSS \n","\nSE \n","\nSP \n","\nSYS \n","\nUNIV \n","\nUA \n"};public static void main(String[] args) { // TODO Auto-generated method stub Loader loader = new Loader(); try { File file = new File("C:\\Users\\User\\Desktop\\EclipseWorkspace\\SE 300\\ER_SCHED_PRT.pdf"); PDDocument document = PDDocument.load(file); PDFTextStripper s = new PDFTextStripper(); loader.content = s.getText(document); String[] splitString = loader.content.split("Instructor", 2); loader.content = splitString[1]; int index = 0; for (String y : departmentsFix) { //find any departments with a \n after them and replace it with a space loader.content = loader.content.replaceAll(y, departments[index] + " "); index++; }
1 回答
白衣染霜花
TA贡献1796条经验 获得超10个赞
我刚修好。通过find函数,我发现格式不是\nMA\n,而是\nMA\r\n。改变它在很大程度上解决了一个无关紧要的小错误的问题,可以通过使用额外的空间来补偿。尽管如此,感谢您的帮助。
添加回答
举报
0/150
提交
取消