1 回答

TA贡献1886条经验 获得超2个赞
一些一般指导:
您正在创建一个池。池大小应取决于计算机,而不是作业的大小。例如,您希望池中有 4 个进程而不是 10000 个进程,即使您有 10000 个文件要处理
在每个进程上运行的作业应该简单但已参数化。在您的例子中,创建一个函数来获取文件名作为输入并执行转换。然后将输入文件映射到其中。过滤应在调用之前完成。
map
因此,我会将您的代码转换为如下所示的内容:
import os
from dbfread import DBF
import pandas as pd
import multiprocessing
directory = 'C:\\Path_to_DBF_Files' #define file directory
files_in = os.listdir(directory) #store files in directory to list
def convert(file):
file_path = os.path.join(files_in, file)
print(f'\nReading in {file}...')
dbf = DBF(file_path) #create DBF object
dbf.encoding = 'utf-8' #set encoding attribute to utf-8 instead of acsii
dbf.char_decode_errors = 'ignore' #set decoding errors attribute to ignore any errors and read in DBF file as is
print('\nConverting to DataFrame...')
df = pd.DataFrame(iter(dbf)) #convert to Pandas dataframe
df.columns.astype(str) #convert column datatypes to string
print(df)
print('\nWriting to CSV...')
dest_directory = 'C:\\Path_to_output_directory\\%s.csv' % ('D' + file.strip('.DBF')) #define destination directory and names for output files
df.to_csv(dest_directory, index = False)
print(f'\nConverted {file} to CSV. Moving to next file...')
pool = multiprocessing.Pool(processes = 4)
pool.map(convert, [file for file in files_in if file.startswith('D') and file.endswith('.DBF')])
添加回答
举报