Python麻烦保存几个池处理的文件

Python

米脂 2021-08-17 10:27:41

我需要在 praralel 中处理一些文件。我正在使用池，但我无法保存池处理的文件。这是代码：... All imports...def extract(text_lines): line_tr01 = [] line_tr02 = [] line_tr03 = [] line_tr03 = [] for line in text_lines: treatment01 = treatment_a(line, args) line_tr01.append(treatment01) treatment02 = treatment_b(line, args) line_tr02.append(treatment02) treatment03 = treatment_c(line, args) line_tr03.append(treatment03) treatment04 = treatment_d(line, args) line_tr04.append(treatment04)for file in folder: text_lines = read_file_into_list(file_path) chunk_size=len(text_lines)/6 divided=[] divided.append(text_lines[0:chunk_size]) divided.append(text_lines[chunk_size:2*chunk_size]) divided.append(text_lines[2*chunk_size:3*chunk_size]) divided.append(text_lines[3*chunk_size:4*chunk_size]) divided.append(text_lines[4*chunk_size:5*chunk_size]) divided.append(text_lines[5*chunk_size:6*chunk_size]) lines=[] p = Pool(6) lines.extend(p.map(extract(text_lines),divided)) p.close() p.join() p.terminate() line_tr01=lines[0] with open(pkl_filename, 'wb') as f: pickle.dump(line_tr01, f) line_tr02=lines[1] with open(pkl_filename, 'wb') as f: pickle.dump(line_tr02, f) line_tr03=lines[2] with open(pkl_filename, 'wb') as f: pickle.dump(line_tr03, f) line_tr04=lines[3] with open(pkl_filename, 'wb') as f: pickle.dump(line_tr04, f)关于如何停止覆盖文件的任何信息都将受到欢迎。提前致谢

查看完整描述

1 回答

莫回无

TA贡献1865条经验获得超7个赞

所以问题是，当你将东西分解成池时，你不再拥有当前（ab）使用的公共全局命名空间。所以让我们重写它以正确传递东西。

def extract(text_lines):

treatments = dict(tr01=[], tr02=[], tr03=[], tr04=[])

for line in text_lines:

treatments['tr01'].append(treatment_a(line, args))

treatments['tr02'].append(treatment_b(line, args))

treatments['tr03'].append(treatment_c(line, args))

treatments['tr04'].append(treatment_d(line, args))

return treatments

def line_gen(lines, chunk_size=1):

for i in range(0, len(lines), chunk_size):

yield lines[i:i + chunk_size]

for file in folder:

text_lines = read_file_into_list(file_path)

treatments = dict(tr01=[], tr02=[], tr03=[], tr04=[])

p = Pool(6)

for treat_data in p.imap(extract, line_gen(text_lines, chunk_size=int(len(text_lines)/6))):

for tr, data in treat_data.items():

treatments[tr].extend(data)

# Do something with all your data in the treatments dict

这应该将所有数据堆积到一个名为的 dict 中treatments，因为它从正在运行的子进程返回数据extract，然后您可以以任何您喜欢的方式写出数据。

反对回复 2021-08-17

热搜

最近搜索清空

Python麻烦保存几个池处理的文件

Python麻烦保存几个池处理的文件

1 回答

添加回答