2 回答
TA贡献1911条经验 获得超7个赞
我的选择:
wordlist = ['air', 'pack', 'port', 'hard', 'back', 'bag', 'disk', 'ground', 'play']
word = 'playground'
lenw, minlen = len(word), min([len(w) for w in wordlist])
pairs = [(word[:n], word[n:]) for n in range(1,lenw) if (n >= minlen and n < lenw-minlen+1) ]
found = False
for w1, w2 in pairs:
if w1 in wordlist and w2 in wordlist:
print('Found ' + word + ' as: ' + w1 + ' + ' + w2)
found = True
break
if not found: print('No words found')
#=> Found playground as: play + ground
pairs是将词一分为二的映射,其中两个子词不小于词表中最小的词。这减少了查找次数。
打印出来看看:
print(pairs)
#=> [('pla', 'yground'), ('play', 'ground'), ('playg', 'round'), ('playgr', 'ound'), ('playgro', 'und')]
如果是大量单词,我建议按起始字母(作为词汇表)分组,然后仅查找单词字母和起始单词集之间的交集内的单词。这里不完整的代码:
letters = set(word)
print(letters) #=> {'r', 'a', 'u', 'g', 'l', 'n', 'd', 'o', 'y', 'p'}
alphabet = {}
for word in wordlist:
alphabet.setdefault(word[0], set()).add(word)
print(alphabet)
#=> {'a': {'air'}, 'p': {'port', 'play', 'pack'}, 'h': {'hard'}, 'b': {'back', 'bag'}, 'd': {'disk'}, 'g': {'ground'}}
所以交集是:{'g', 'p', 'd', 'a'} 然后建立查找列表:
lookuplist = []
for i in intersection:
for word in alphabet[i]:
lookuplist.append(word)
lookuplist #=> ['air', 'disk', 'ground', 'port', 'pack', 'play']
所以用lookuplist代替wordlist
用一些方法在抽屉里订购
def vocabulary(wordlist):
res = {}
for word in wordlist:
res.setdefault(word[0], set()).add(word)
return res
def lookuplist(vocabulary, word):
vocabulary_alphabet = set(vocabulary.keys())
word_letters = set(word)
intersection = vocabulary_alphabet.intersection(word_letters)
lookuplist = []
for i in intersection:
for word in vocabulary[i]:
lookuplist.append(word)
return lookuplist
def find_word(word, lookuplist):
lenw, minlen = len(word), min([len(w) for w in lookuplist])
pairs = [(word[:n], word[n:]) for n in range(1,lenw) if (n >= minlen and n < lenw-minlen+1) ]
for w1, w2 in pairs:
if w1 in lookuplist and w2 in lookuplist: return (word, w1, w2)
return []
您可以按如下方式使用:
wordlist = ['air', 'pack', 'port', 'hard', 'back', 'bag', 'disk', 'ground', 'play']
word = 'playground'
vocabulary = vocabulary(wordlist) # run once then store the result
lookuplist = lookuplist(vocabulary, word)
found_word = find_word(word, lookuplist)
print(found_word)
#=> ('playground', 'play', 'ground')
TA贡献1836条经验 获得超13个赞
不是一个完整的解决方案,但一个好主意可能是将单词存储在字典中,例如,其中键是单词的长度,值是一组单词。然后创建一个长度列表来迭代它,而不是迭代输入词 ( s),例如:
words = ['toothpaste',
'hard-to-find',
'economic',
'point',
'food',
'seal',
'outrageous',
'motionless',
'ice',
'tow',
'boot',
'cruel',
'peanut',
'butter']
index = {}
for word in words:
index.setdefault(len(word), set()).add(word)
lengths = sorted(index)
def break_into_spaces(s):
s_length = len(s)
for length in lengths:
if length < s_length:
left = s[length:]
right = s[:length]
if left in index[length] and s_length - length in index and right in index[s_length - length]:
print("{} {}".format(right, left))
print("Found them!")
else:
break
break_into_spaces('peanutbutter')
输出
peanut butter
Found them!
这样做是否可以节省时间:
它避免了遍历整个输入词,想象一下输入词比字典中的所有词都短的情况,这将立即打破循环并且不打印任何内容。
通过将单词存储在相同长度的集合中,您只需要检查是否存在相同长度的匹配单词,而不是检查所有单词。请注意,这可能毫无意义,因为字典是哈希表,因此理论上检查包含情况是O(1).
添加回答
举报