我正在尝试计算值对的出现次数。运行以下代码时,numpy 版本 (pairs_frequency2) 比依赖 collections.Counter 的版本慢 50% 以上(随着点数的增加,情况变得更糟)。有人可以解释原因。是否有可能的 numpy 重写以实现更好的性能?提前致谢。import numpy as npfrom collections import Counterdef pairs_frequency(x, y): counts = Counter(zip(x, y)) res = np.array([[f, a, b] for ((a, b), f) in counts.items()]) return res[:, 0], res[:, 1], res[:, 2]def pairs_frequency2(x, y): unique, counts = np.unique(np.column_stack((x,y)), axis=0, return_counts=True) return counts, unique[:,0], unique[:,1]x = np.random.randint(low=1, high=11, size=50000)y = x + np.random.randint(1, 5, size=x.size)%timeit pairs_frequency(x, y)%timeit pairs_frequency2(x, y)
添加回答
举报
0/150
提交
取消