首页猿问从超字符串中去除子字符串

从超字符串中去除子字符串

Python

慕田峪4524236 2021-11-16 10:50:39

我有一个字典，它的键是一个字符串元组，值是它的频率，例如 {('this','is'):2,('some','word'):3....}我需要消除一些包含这些子键的键，例如：d={('large','blue'):4,('cute','blue'):3,('large','blue','dog'):2, ('cute','blue','dog'):2,('cute','blue','elephant'):1}我需要消除，('large','blue')因为它只出现在'large blue dog'但是我不能删除“可爱的蓝色”，因为它出现在'cute blue dog'和'cute blue elephant'd={('large','blue'):4,('cute','blue'):3,('large','blue','dog'):2,('cute','blue','dog'):2,('cute','blue','elephant'):1}final_list=[]for k,v in d.items(): final_list.append(' '.join(f for f in k))final_list=sorted(final_list, key=len,reverse=True)completed=set()for f in final_list: if not completed: completed.add(f) else: if sum(f in s for s in completed)==1: continueprint(final_list)print(completed)但这只给了我 ['可爱的蓝象'] 我需要[large blue dog] :2[cute blue dog]:2[cute blue elephant]:1[cute blue]:3

查看完整描述

3 回答

慕容3067478

TA贡献1773条经验获得超3个赞

更新。如果您也想要计数，我宁愿将大部分代码重写为：

d={('large','blue'):4,('cute','blue'):3,('large','blue','dog'):2,

('cute','blue','dog'):2,('cute','blue','elephant'):1}

completed = {}

for k,v in d.items():

if len([k1 for k1,v1 in d.items() if k != k1 and set(k).issubset(set(k1))]) != 1:

completed[k] = v

print(completed)

结果

{('cute', 'blue'): 3, ('large', 'blue', 'dog'): 2, ('cute', 'blue', 'dog'): 2, ('cute', '蓝色', '大象'): 1}

我还没有检查性能。我就交给你了。

换个怎么样

for f in final_list:

if not completed:

completed.add(f)

else:

if sum(f in s for s in completed)==1:

continue

和

for f in final_list:

if len([x for x in final_list if f != x and f in x]) != 1:

completed.add(f)

这是你想要的？

反对回复 2021-11-16

暮色呼如

TA贡献1853条经验获得超9个赞

这应该有效：

previous = " "

previousCount = 0

for words in sorted([ " ".join(key) for key in d ]) + [" "]:

if words.startswith(previous):

previousCount += 1

else:

print(previous,previousCount)

if previousCount < 2 and previous != " ":

del d[tuple(previous.split(" "))]

previous = words

previousCount = 0

反对回复 2021-11-16

江户川乱折腾

TA贡献1851条经验获得超5个赞

必须有更有效的（非O(n^2)）方法来做到这一点，但这似乎是您想要的：

input = {

('large','blue'): 4,

('cute','blue'): 3,

('large','blue','dog'): 2,

('cute','blue','dog'): 2,

('cute','blue','elephant'): 1,

}

keys = set(' '.join(k) for k in input)

filtered = {

tuple(f.split())

for f in keys

if sum(f != k and f in k for k in keys) == 1

}

result = {k: v for k, v in input.items() if k not in filtered}

from pprint import pprint

pprint(sorted(result.items()))

结果：

[(('cute', 'blue'), 3),

(('cute', 'blue', 'dog'), 2),

(('cute', 'blue', 'elephant'), 1),

(('large', 'blue', 'dog'), 2)]

根据您的要求，这个想法是将出现一次的键识别为其他键的一部分。

反对回复 2021-11-16

3 回答
0 关注
266 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

从超字符串中去除子字符串

从超字符串中去除子字符串

3 回答

添加回答