如何使用pyPDF2从PDF格式的多页简历中提取文本数据？

我从 PDF 格式的多页简历中提取文本内容，并尝试使用pyPDF2. 但是我在尝试编写内容时收到以下错误消息。这是我的代码：import PyPDF2newFile = open('details.txt', 'w')file = open("cv3.pdf", 'rb')pdfreader = PyPDF2.PdfFileReader(file)numPages = pdfreader.getNumPages()print(numPages)page_content = ""for page_number in range(numPages): page = pdfreader.getPage(page_number) page_content += page.extractText()newFile.write(page_content)print(page_content)file.close()newFile.close()错误信息：回溯（最近一次调用）：文件“C:/Users/HP/PycharmProjects/CVParser/pdf.py”，第 16 行，在 newFile.write(page_content) 文件“C:\Program Files\Python37\lib\encodings\ cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\u0141' in position 827: character maps to进程以退出代码 1 结束此代码已成功处理多页 PDF 文件（转换为 PDF 的 docx 文件）。如果有人知道解决方案，请帮助我。

查看完整描述

1 回答

慕工程0101907

TA贡献1887条经验获得超5个赞

这将解决您在 Python 3 中的问题：

with open("Output.txt", "w") as text_file:

print("{}".format(page_content), file=text_file)

如果以上方法对您不起作用，请尝试以下操作：

with open("Output1.txt", "wb") as text_file:

text_file.write(page_content.encode("UTF-8"))

反对回复 2021-06-22

热搜

最近搜索清空

如何使用pyPDF2从PDF格式的多页简历中提取文本数据？

如何使用pyPDF2从PDF格式的多页简历中提取文本数据？

1 回答

添加回答