首页猿问在将点添加到节点之前是否预先制作了...

在将点添加到节点之前是否预先制作了二元分区树？

Python

拉莫斯之舞 2021-09-23 09:08:41

我正在尝试在 Python 中实现这个算法，但是由于我缺乏对树结构的理解，我对分区树的创建过程感到困惑。简要说明：链接算法，用于将高维特征空间划分为内部节点和叶节点，以便可以快速执行查询。它使用特定的随机测试划分大空间，超平面将一个大单元格分成两个。这个答案更准确地解释了一切。代码片段：def random_test(self, main_point): # Main point is np.ndarray instance dimension = main_point.ravel().size random_coefficients = self.random_coefficients(dimension) scale_values = np.array(sorted([np.inner(random_coefficients, point.ravel()) for point in self.points])) percentile = random.choice([np.percentile(scale_values, 100 * self.ratio), # Just as described on Section 3.1 np.percentile(scale_values, 100 * (1 - self.ratio))]) main_term = np.inner(main_point.ravel(), random_coefficients) if self.is_leaf(): return 0 # Next node is the center leaf child else: if (main_term - percentile) >= 0: # Hyper-plane equation defined in the document return -1 # Next node is the left child else: return 1 # Next node is the right childself.ratio正如上面链接的算法中提到的，正在确定树的平衡和浅层，1/2它应该生成最平衡和浅层的树。然后我们进入迭代部分，树不断地划分空间，直到它到达叶节点（注意关键字reach），问题是，它永远不会真正到达叶节点。因为，上面链接的文档中叶节点的定义是这样的：def is_leaf(self): return (self.capacity * self.ratio) <= self.cell_count() <= self.capacity其中self.cell_count()是单元格中self.capacity的点数，是单元格可以拥有的最大点数，self.ratio是分割比。我的完整代码应该通过在初始迭代时创建新节点来划分特征空间，直到创建叶节点（但从未创建叶节点）。请参阅包含除法过程的片段。在我们向它们添加任何点之前是否准备好二叉树（填充空节点）？如果是这样，我们不需要定义树的级别（深度）吗？如果不是，是否在向其中添加点时创建了二元分区树？如果是这样，那么第一个点（来自第一次迭代）是如何添加到树中的？

查看完整描述

2 回答

偶然的你

TA贡献1841条经验获得超3个赞

它们是在您进行时构建的。第一个节点在第 1 行的右侧或左侧。然后下一个级别询问第 2 行的右侧或左侧……您提供的论文中的插图显示了这一点，其中的行编号与用于查找节点的选择相关联。

当然，向右或向左是不准确的。一些线被水平切割。但是创建的空间是二进制的。

反对回复 2021-09-23

阿晨1998

TA贡献2037条经验获得超6个赞

我已经能够测试评论中提到的新方法，并且效果很好。

上面链接的算法隐含地指出，该点应单独下降到分区树中，通过所有随机测试并在下降时创建新节点。

但是这种方法有一个明显的问题，因为为了获得平衡的高效浅层树，左右节点必须均匀分布。

因此，为了分裂节点，在树的每一层，节点的每个点都必须传递到左节点或右节点（通过随机测试），直到树到达该层所有节点的深度叶子。

在数学术语中，根节点包含一个向量空间，该向量空间被分为左右两个节点，其中包含以分离超平面支撑超平面为边界的凸多面体：

//img1.sycdn.imooc.com//614bd3e70001749d02770073.jpg

等式的负项（我相信我们可以称之为偏差），是分裂比开始发挥作用的地方，它应该是 100*r 到 100*(1-r) 之间所有节点的百分位数，这样树就被分离了更均匀，更浅。基本上它决定了超平面分离应该如何均匀，这就是为什么我们需要包含所有点的节点。

我已经能够实现这样的系统：

def index_space(self):

shuffled_space = self.shuffle_space()

current_tree = PartitionTree()

level = 0

root_node = RootNode(shuffled_space, self.capacity, self.split_ratio, self.indices)

current_tree.root_node = root_node

current_tree.node_array.append(root_node)

current_position = root_node.node_position

node_array = {0: [root_node]}

while True:

current_nodes = node_array[level]

if all([node.is_leaf() for node in current_nodes]):

break

else:

level += 1

node_array[level] = []

for current_node in current_nodes:

if not current_node.is_leaf():

left_child = InternalNode(self.capacity, self.split_ratio, self.indices,

self._append(current_position, [-1]), current_node)

right_child = InternalNode(self.capacity, self.split_ratio, self.indices,

self._append(current_position, [1]), current_node)

for point in current_node.list_points():

if current_node.random_test(point) == 1:

right_child.add_point(point)

else:

left_child.add_point(point)

node_array[level].extend([left_child, right_child])

其中node_array包含树的所有节点（根、内部和叶）。

不幸的是，node.random_test(x)方法：

def random_test(self, main_point):

random_coefficients = self.random_coefficients()

scale_values = [np.inner(self.random_coefficients(), point[:self.indices].ravel())

for point in self.points]

percentile = np.percentile(scale_values, self.ratio * 100)

main_term = np.inner(main_point[:self.indices].ravel(), random_coefficients)

if self.is_leaf():

return 0 # Next node is the center leaf child

else:

if (main_term - percentile) >= 0: # Hyper-plane equation defined in the document

return -1 # Next node is the left child

else:

return 1 # Next node is the right child

效率低下，因为计算百分位数需要太多时间。因此，我必须找到另一种计算百分位数的方法（也许通过执行短路二分搜索来优化百分位数）。

结论：

这只是克林顿·雷·穆里根 (Clinton Ray Mulligan) 答案的一个大扩展——它简要解释了创建此类树的解决方案，因此仍将作为公认的答案。

我刚刚添加了更多细节，以防有人对实现随机二叉树感兴趣。

反对回复 2021-09-23

2 回答
0 关注
150 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

在将点添加到节点之前是否预先制作了二元分区树？

在将点添加到节点之前是否预先制作了二元分区树？

2 回答

添加回答