仅一次迭代即可从未知长度的序列中随机选择N个不同的项目

我正在尝试编写一种算法，该算法将从序列中随机选择N个不同的项，而无需事先知道序列的大小，并且在一个以上的序列上进行多次迭代的开销很大。例如，序列的元素可能是一个巨大文件的行。当N = 1（即“从一个巨大的序列中随机挑选一个元素”）时，我找到了一种解决方案：import randomitems = range(1, 10) # Imagine this is a huge sequence of unknown lengthcount = 1selected = Nonefor item in items: if random.random() * count < 1: selected = item count += 1但是，对于其他N值（例如N = 3），我该如何实现相同的目标呢？

查看完整描述

3 回答

月关宝盒

TA贡献1772条经验获得超5个赞

使用储层取样。这是一个非常简单的算法，适用于任何算法N。

这是一个Python实现，这是另一个。

反对回复 2019-10-19

慕无忌1623718

TA贡献1744条经验获得超4个赞

如果您的序列足够短，以至于可以将其读入内存并对其进行随机排序，那么一种简单的方法就是使用random.shuffle：

import random

arr=[1,2,3,4]

# In-place shuffle

random.shuffle(arr)

# Take the first 2 elements of the now randomized array

print arr[0:2]

[1, 3]

根据序列的类型，您可能需要通过调用将其转换为列表list(your_sequence)，但是不管序列中对象的类型如何，此方法都可以工作。

自然，如果您无法将序列适合内存，或者此方法对内存或CPU的要求过高，则需要使用其他解决方案。

反对回复 2019-10-19

茅侃侃

TA贡献1842条经验获得超21个赞

import random

my_list = [1, 2, 3, 4, 5]

num_selections = 2

new_list = random.sample(my_list, num_selections)

# To preserve the order of the list, you could do:

randIndex = random.sample(range(len(my_list)), n_selections)

randIndex.sort()

new_list = [my_list[i] for i in randIndex]

反对回复 2019-10-19

热搜

最近搜索清空

仅一次迭代即可从未知长度的序列中随机选择N个不同的项目

仅一次迭代即可从未知长度的序列中随机选择N个不同的项目

3 回答

添加回答