首页猿问优化解析海量Python字典，多线程

优化解析海量Python字典，多线程

Python

慕桂英4014372 2022-08-25 13:59:00

让我们举一个小的python字典示例，其中的值是整数列表。example_dict1 = {'key1':[367, 30, 847, 482, 887, 654, 347, 504, 413, 821], 'key2':[754, 915, 622, 149, 279, 192, 312, 203, 742, 846], 'key3':[586, 521, 470, 476, 693, 426, 746, 733, 528, 565]}假设我需要解析列表的值，我已将其实现为以下函数：def manipulate_values(input_list): return_values = [] for i in input_list: new_value = i ** 2 - 13 return_values.append(new_value) return return_values现在，我可以轻松解析此字典的值，如下所示：for key, value in example_dict1.items(): example_dict1[key] = manipulate_values(value)导致以下情况：example_dict1 = {'key1': [134676, 887, 717396, 232311, 786756, 427703, 120396, 254003, 170556, 674028], 'key2': [568503, 837212, 386871, 22188, 77828, 36851, 97331, 41196, 550551, 715703], 'key3': [343383, 271428, 220887, 226563, 480236, 181463, 556503, 537276, 278771, 319212]}这对于小型词典非常有效。我的问题是，我有一个巨大的字典，里面有数百万个键和长列表。如果我应用上述方法，算法将非常慢。如何优化上述内容？（1）多线程---除了传统的模块之外，是否有更有效的选项可用于字典中的语句的多线程处理？threading（2）更好的数据结构是否合适？我问这个问题是因为，在这种情况下，我完全陷入了如何最好地进行。我没有看到比字典更好的数据结构，但是跨字典（然后跨值列表）的for循环非常慢。这里可能有一些东西被设计得更快。编辑：可以想象，这有点像玩具的例子---有问题的功能比x**2-13复杂一些。我更感兴趣的是如何用一个包含数百万个键的字典来可能值得，以及一长串的值。

查看完整描述

2 回答

喵喵时光机

TA贡献1846条经验获得超7个赞

如果您可以将所有内容存储在numpy数组中，则处理速度会更快。我将每个列表的大小增加了 50 万倍以测试可伸缩性，以下是我的结果：

from timeit import timeit

import numpy as np

n = 500000

example_dict1 = {'key1':[367, 30, 847, 482, 887, 654, 347, 504, 413, 821]*n,

'key2':[754, 915, 622, 149, 279, 192, 312, 203, 742, 846]*n,

'key3':[586, 521, 470, 476, 693, 426, 746, 733, 528, 565]*n}

def manipulate_values(input_list):

return_values = []

for i in input_list:

new_value = i ** 2 - 13

return_values.append(new_value)

return return_values

使用您的方法：

for_with_dictionary = timeit("""

for key, value in example_dict1.items():

example_dict1[key] = manipulate_values(value)

""", "from __main__ import example_dict1,manipulate_values ",number=5)

print(for_with_dictionary)

>>> 33.2095841

使用 numpy：

numpy_broadcasting = timeit("""

array = np.array(list(example_dict1.values()))

array = array ** 2 - 13

""", "from __main__ import example_dict1, np",number=5)

print(numpy_broadcasting)

>>> 5.039885

速度有显着提升，至少6倍。

反对回复 2022-08-25

桃花长相依

TA贡献1860条经验获得超8个赞

如果您有足够的内存：

example_dict2 = dict(zip(example_dict1.keys(), np.array(list(example_dict1.values()))**2 -13))

>>> example_dict2

{'key1': array([134676, 887, 717396, 232311, 786756, 427703, 120396, 254003,

170556, 674028]), 'key2': array([568503, 837212, 386871, 22188, 77828, 36851, 97331, 41196,

550551, 715703]), 'key3': array([343383, 271428, 220887, 226563, 480236, 181463, 556503, 537276,

278771, 319212])}

反对回复 2022-08-25

2 回答
0 关注
164 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

优化解析海量Python字典，多线程

优化解析海量Python字典，多线程

2 回答

添加回答