1 回答

TA贡献1801条经验 获得超16个赞
您可以按置信度值和规则长度对两列中的对值进行排序。然后我们将首先获得最低的 conf 分数,并且在具有相同 conf 分数的规则中,将首先出现最短的列表。我们使用“两指”方法迭代这个排序的规则/配置对。第一根手指是当前的规则/配置对。第二根手指移动,直到我们找到第一条规则,该规则要么是不相等的 conf 分数(例如,如果我们的第一根手指在 0.1 上,则为 0.5)或者如果该规则不是一个子集(例如,如果我们的第一根手指在上,则遇到 ['Hamster'] ['狗'])。当我们找到这样的规则/配置对时,我们附加我们第一根手指的规则/配置对,并将我们的第一根手指推进到我们刚刚处理的对。我们继续迭代,跳过符合我们删除标准的对,当我们发现不符合“删除”标准的对时,追加和推进。希望这是有道理的。
rules = [['Dog'],['Dog','Cat'],['Dog','Cat','Hamster','Goldfish'], ['Dog','Cat','Hamster']]
confs = [0.1, 0.5, 0.1, 0.5]
# sort by conf values and size of rules to put the shortest sub-rule in the front
ruleConfPairs = sorted(zip(rules, confs), key=lambda x: (x[1], len(x[0])))
# initialize iteration
new_rules = []
new_confs = []
current_rule = ruleConfPairs[0][0]
current_conf = ruleConfPairs[0][1]
for rule, conf in ruleConfPairs[1:]:
if current_conf == conf and set(current_rule).issubset(rule):
# skip (i.e. remove) pair if it has the same confidence value AND rule is a subset
continue
# append current rule/conf pair if either confidence score is not equal OR rule is not a subset
new_rules.append(current_rule)
new_confs.append(current_conf)
# advance our pair
current_rule = rule
current_conf = conf
# make sure to append the last pair
new_rules.append(current_rule)
new_confs.append(current_conf)
print(new_rules)
print(new_confs)
输出:
[['Dog'], ['Dog', 'Cat']]
[0.1, 0.5]
添加回答
举报