2 回答
TA贡献1799条经验 获得超9个赞
我建议您使用字典,因为它的时间复杂度很高。
>>> import numpy as np
>>> pair_list = np.array([['samp1', 'samp4'],
['samp2', 'samp7'],
['samp2', 'samp4']])
>>> samples = {'samp0':0, 'samp1':1, 'samp2':2, 'samp3':3, 'samp4':4, 'samp5':5,
'samp6':6, 'samp7':7, 'samp8':8, 'samp9':9}
>>> vfunc = np.vectorize(lambda x: samples[x])
>>> pair_indices = vfunc(pair_list)
>>> print(pair_indices)
[[1 4]
[2 7]
[2 4]]
TA贡献1789条经验 获得超8个赞
pair_list = np.array([['samp1', 'samp4'],
['samp2', 'samp7'],
['samp2', 'samp4']])
samples = np.array(['samp0', 'samp1', 'samp2', 'samp3', 'samp4', 'samp5',
'samp6', 'samp7', 'samp8', 'samp9'])
def f1(pair_list,samples):
vfunc = np.vectorize(lambda s: np.where(samples == s)[0])
return vfunc(pair_list)
def f2(pair_list,samples):
d = dict()
for idx,el in enumerate(samples): d[el]=idx
return np.array([d[el] for row in pair_list for el in row]).reshape(pair_list.shape[0],2)
f2看起来很笨拙,但是...
timeit f1(pair_list,samples)
25.7 µs ± 78 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
timeit f2(pair_list,samples)
9.09 µs ± 68.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
在您的机器上试一试,看看效果如何!当然,如果你有重用的能力会更好samples,因为在这种情况下你只需要转换samples为dict一次。
编辑:正如 Mohsen_Fatemi 所建议的那样,矢量化dict访问要好得多,即使samples不能重用。
def f3(pair_list,samples):
d = dict()
for idx,el in enumerate(samples): d[el]=idx
vfunc = np.vectorize(lambda x: d[x])
return vfunc(pair_list)
timeit f3
16.1 ns ± 0.0138 ns per loop (mean ± std. dev. of 7 runs, 100000000 loops each)
添加回答
举报