首页猿问总是使用 numpy 数组而不是...

总是使用 numpy 数组而不是 python 列表有什么缺点？

Python

慕标5832272 2022-05-24 18:11:40

我正在编写一个要展平数组的程序，因此我使用了以下代码：list_of_lists = [["a","b","c"], ["d","e","f"], ["g","h","i"]]flattened_list = [i for j in list_of_lists for i in j]这会产生['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']所需的输出。然后我发现使用numpy数组，我可以简单地使用np.array(((1,2),(3,4),(5,6))).flatten().我想知道总是使用numpy数组代替常规 Python 列表是否有任何缺点？换句话说，Python 列表可以做哪些 numpy 数组不能做的事情？

查看完整描述

3 回答

呼唤远方

TA贡献1856条经验获得超11个赞

对于您的小示例，列表理解比数组方法更快，即使将数组创建从计时循环中取出：

In [204]: list_of_lists = [["a","b","c"], ["d","e","f"], ["g","h","i"]]

...: flattened_list = [i for j in list_of_lists for i in j]

In [205]: timeit [i for j in list_of_lists for i in j]

757 ns ± 17.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [206]: np.ravel(list_of_lists)

Out[206]: array(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i'], dtype='<U1')

In [207]: timeit np.ravel(list_of_lists)

8.05 µs ± 12.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [208]: %%timeit x = np.array(list_of_lists)

...: np.ravel(x)

2.33 µs ± 22.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

有了一个更大的例子，我希望 [208] 会变得更好。

如果子列表大小不同，则数组不是 2d，flatten 什么也不做：

In [209]: list_of_lists = [["a","b","c",23], ["d",None,"f"], ["g","h","i"]]

...: flattened_list = [i for j in list_of_lists for i in j]

In [210]: flattened_list

Out[210]: ['a', 'b', 'c', 23, 'd', None, 'f', 'g', 'h', 'i']

In [211]: np.array(list_of_lists)

Out[211]:

array([list(['a', 'b', 'c', 23]), list(['d', None, 'f']),

list(['g', 'h', 'i'])], dtype=object)

增长列表更有效：

In [217]: alist = []

In [218]: for row in list_of_lists:

...: alist.append(row)

...:

In [219]: alist

Out[219]: [['a', 'b', 23], ['d', None, 'f'], ['g', 'h', 'i']]

In [220]: np.array(alist)

Out[220]:

array([['a', 'b', 23],

['d', None, 'f'],

['g', 'h', 'i']], dtype=object)

我们强烈反对迭代连接。首先收集列表中的子列表或数组。

反对回复 2022-05-24

UYOU

TA贡献1878条经验获得超4个赞

是的，有。经验法则是记住numpy.array对于相同数据类型的数据（所有整数，所有双精度 fp，所有布尔值，相同长度的字符串等）而不是混合包的数据更好。在后一种情况下，您也可以使用通用列表，考虑到这一点：

In [93]: a = [b'5', 5, '55', 'ab', 'cde', 'ef', 4, 6]

In [94]: b = np.array(a)

In [95]: %timeit 5 in a

65.6 ns ± 0.79 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [96]: %timeit 6 in a # worst case

219 ns ± 5.48 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [97]: %timeit 5 in b

10.9 µs ± 217 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

看看这几个数量级的性能差异，哪里numpy.array慢！当然，这取决于列表的维度，在这种特殊情况下，取决于索引 5 或 6（O(n) 复杂度的最坏情况），但你明白了。

反对回复 2022-05-24

守候你守候我

TA贡献1802条经验获得超10个赞

Numpy 数组和函数在大多数情况下更好。如果您想了解更多，这里有一篇文章：https ://webcourses.ucf.edu/courses/1249560/pages/python-lists-vs-numpy-arrays-what-is-the-difference

反对回复 2022-05-24

3 回答
0 关注
164 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

总是使用 numpy 数组而不是 python 列表有什么缺点？

总是使用 numpy 数组而不是 python 列表有什么缺点？

3 回答

添加回答