使用 zipfile 库解压缩 .docx 文件

我正在尝试编写一个应用程序从 word docx 文件中的表中获取信息，以便通过将其转换为 pandas 对其进行一些分析DataFrame。第一步是正确读取 docx 文件，为此，我遵循 Virantha Ekanayake 的Reading and writing Microsoft Word docx files with Python指南。我在第一步，他们说要使用库Zipfile的方法zipfile将 docx 文件解压缩到 xml 文件中。我将指南中的函数定义改编为我的代码（下面包含的代码），但是当我运行我的代码时，我收到一条错误消息，指出 docx 文件“不是 zip 文件”。指南中的这个人说，“本质上，docx 文件只是一个 zip 文件（尝试在其上运行解压缩！）……”我尝试将 docx 文件重命名为 zip 文件，并使用 WinZip 成功解压缩。但是，在我的程序中，我希望能够解压缩 docx 文件而不必手动.zip将其重命名为文件。我能以某种方式解压缩 docx 文件而不重命名它吗？或者，如果我必须重命名它才能使用该方法，我该如何在我的 python 代码中执行此操作？Zipfileimport zipfilefrom lxml import etreeimport pandas as pdFILE_PATH = 'C:/Users/user/Documents/Python Project'class Application(): def __init__(self): #debug print('Initialized!') xml_content = self.get_word_xml(f'{FILE_PATH}/DocxFile.docx') xml_tree = self.get_xml_tree(xml_content) def get_word_xml(self, docx_filename): with open(docx_filename) as f: zip = zipfile.ZipFile(f) xml_content = zip.read('word/document.xml') return xml_content def get_xml_tree(self, xml_string): return (etree.fromstring(xml_string))a = Application()a.mainloop()错误：Traceback (most recent call last):File "C:\Users\user\Documents\New_Tool.py", line 39, in <module>a = Application()File "C:\Users\user\Documents\New_Tool.py", line 27, in __init__xml_content = self.get_word_xml(f'{FILE_PATH}/DocxFile.docx')File "C:\Users\user\Documents\New_Tool.py", line 32, in get_word_xmlzip = zipfile.ZipFile(f)File "C:\Progra~1\Anaconda3\lib\zipfile.py", line 1222, in __init__self._RealGetContents()File "C:\Progra~1\Anaconda3\lib\zipfile.py", line 1289, in _RealGetContentsraise BadZipFile("File is not a zip file")zipfile.BadZipFile: File is not a zip file

查看完整描述

使用 zipfile 库解压缩 .docx 文件

使用 zipfile 库解压缩 .docx 文件

1 回答

添加回答

热搜

最近搜索清空

使用 zipfile 库解压缩 .docx 文件

使用 zipfile 库解压缩 .docx 文件

1 回答

添加回答