首页猿问搜索字符串的非蛮力建议解决方案

搜索字符串的非蛮力建议解决方案

Python

沧海一幻觉 2021-09-11 13:29:40

我想知道您是否可以就用于迭代字符串、将其分成两半并检查两半是否存在于字典/哈希中的非蛮力解决方案的算法提出建议？例如，字符串“peanutbutter”被拆分为“peanut”和“butter”（是的，那里还有其他词，但出于示例目的，我们可以使用这两个词）这是我提出的蛮力解决方案以供参考：def break_into_spaces(S): i = 1 while i < len(S): left = S[i:] right = S[:i] if left in DICTIONARY and right in DICTIONARY: print("Found them!") print("{} {}".format(right, left)) i += 1break_into_spaces("peanutbutter")

查看完整描述

2 回答

Smart猫小萌

TA贡献1911条经验获得超7个赞

我的选择：

wordlist = ['air', 'pack', 'port', 'hard', 'back', 'bag', 'disk', 'ground', 'play']

word = 'playground'

lenw, minlen = len(word), min([len(w) for w in wordlist])

pairs = [(word[:n], word[n:]) for n in range(1,lenw) if (n >= minlen and n < lenw-minlen+1) ]

found = False

for w1, w2 in pairs:

if w1 in wordlist and w2 in wordlist:

print('Found ' + word + ' as: ' + w1 + ' + ' + w2)

found = True

break

if not found: print('No words found')

#=> Found playground as: play + ground

pairs是将词一分为二的映射，其中两个子词不小于词表中最小的词。这减少了查找次数。

打印出来看看：

print(pairs)

#=> [('pla', 'yground'), ('play', 'ground'), ('playg', 'round'), ('playgr', 'ound'), ('playgro', 'und')]

如果是大量单词，我建议按起始字母（作为词汇表）分组，然后仅查找单词字母和起始单词集之间的交集内的单词。这里不完整的代码：

letters = set(word)

print(letters) #=> {'r', 'a', 'u', 'g', 'l', 'n', 'd', 'o', 'y', 'p'}

alphabet = {}

for word in wordlist:

alphabet.setdefault(word[0], set()).add(word)

print(alphabet)

#=> {'a': {'air'}, 'p': {'port', 'play', 'pack'}, 'h': {'hard'}, 'b': {'back', 'bag'}, 'd': {'disk'}, 'g': {'ground'}}

所以交集是：{'g', 'p', 'd', 'a'} 然后建立查找列表：

lookuplist = []

for i in intersection:

for word in alphabet[i]:

lookuplist.append(word)

lookuplist #=> ['air', 'disk', 'ground', 'port', 'pack', 'play']

所以用lookuplist代替wordlist

用一些方法在抽屉里订购

def vocabulary(wordlist):

res = {}

for word in wordlist:

res.setdefault(word[0], set()).add(word)

return res

def lookuplist(vocabulary, word):

vocabulary_alphabet = set(vocabulary.keys())

word_letters = set(word)

intersection = vocabulary_alphabet.intersection(word_letters)

lookuplist = []

for i in intersection:

for word in vocabulary[i]:

lookuplist.append(word)

return lookuplist

def find_word(word, lookuplist):

lenw, minlen = len(word), min([len(w) for w in lookuplist])

pairs = [(word[:n], word[n:]) for n in range(1,lenw) if (n >= minlen and n < lenw-minlen+1) ]

for w1, w2 in pairs:

if w1 in lookuplist and w2 in lookuplist: return (word, w1, w2)

return []

您可以按如下方式使用：

wordlist = ['air', 'pack', 'port', 'hard', 'back', 'bag', 'disk', 'ground', 'play']

word = 'playground'

vocabulary = vocabulary(wordlist) # run once then store the result

lookuplist = lookuplist(vocabulary, word)

found_word = find_word(word, lookuplist)

print(found_word)

#=> ('playground', 'play', 'ground')

反对回复 2021-09-11

开心每一天1111

TA贡献1836条经验获得超13个赞

不是一个完整的解决方案，但一个好主意可能是将单词存储在字典中，例如，其中键是单词的长度，值是一组单词。然后创建一个长度列表来迭代它，而不是迭代输入词 ( s)，例如：

words = ['toothpaste',

'hard-to-find',

'economic',

'point',

'food',

'seal',

'outrageous',

'motionless',

'ice',

'tow',

'boot',

'cruel',

'peanut',

'butter']

index = {}

for word in words:

index.setdefault(len(word), set()).add(word)

lengths = sorted(index)

def break_into_spaces(s):

s_length = len(s)

for length in lengths:

if length < s_length:

left = s[length:]

right = s[:length]

if left in index[length] and s_length - length in index and right in index[s_length - length]:

print("{} {}".format(right, left))

print("Found them!")

else:

break

break_into_spaces('peanutbutter')

输出

peanut butter

Found them!

这样做是否可以节省时间：

它避免了遍历整个输入词，想象一下输入词比字典中的所有词都短的情况，这将立即打破循环并且不打印任何内容。

通过将单词存储在相同长度的集合中，您只需要检查是否存在相同长度的匹配单词，而不是检查所有单词。请注意，这可能毫无意义，因为字典是哈希表，因此理论上检查包含情况是O(1).

反对回复 2021-09-11

2 回答
0 关注
174 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

搜索字符串的非蛮力建议解决方案

搜索字符串的非蛮力建议解决方案

2 回答

添加回答