给定文本中单词的索引,我需要获取字符索引。例如,在下面的文本中:"The cat called other cats."单词“ cat”的索引是1。我需要cat的第一个字符的索引,即c,它将是4。我不知道这是否相关,但是我正在使用python-nltk来获取单词。现在,我能想到的唯一方法是: - Get the first character, find the number of words in this piece of text - Get the first two characters, find the number of words in this piece of text - Get the first three characters, find the number of words in this piece of text Repeat until we get to the required word.但这将是非常低效的。任何想法将不胜感激。
1 回答

MMTTMM
TA贡献1869条经验 获得超4个赞
您可以在dict此处使用:
>>> import re
>>> r = re.compile(r'\w+')
>>> text = "The cat called other cats."
>>> dic = { i :(m.start(0), m.group(0)) for i, m in enumerate(r.finditer(text))}
>>> dic
{0: (0, 'The'), 1: (4, 'cat'), 2: (8, 'called'), 3: (15, 'other'), 4: (21, 'cats')}
def char_index(char, word_ind):
start, word = dic[word_ind]
ind = word.find(char)
if ind != -1:
return start + ind
...
>>> char_index('c',1)
4
>>> char_index('c',2)
8
>>> char_index('c',3)
>>> char_index('c',4)
21
添加回答
举报
0/150
提交
取消