3 回答
TA贡献1830条经验 获得超3个赞
好的,我根据您的修改更新了答案:
from functools import reduce
sent = "terras ipsius Azar vocatas Ta Ta Zagra Ta Zagra Xellule et Ginen Chagem in contrata Deyr Issafisaf cum iuribus suis omnibus"
places = [ 'Ras il Huichile', 'Ras il Hued', 'Ta Richardu', 'Roma', 'Russilion', 'La Rukiha', 'Irrukiha ta il Bayada',
'Casalis Milleri', 'Ta Sabat', 'Casalis Zebug', 'Ta Zagra', 'Sagra in Ras il Hued', 'Ta Isalme', 'Ta Xellule', 'Ginen Chagem',
'Deyr Issafisaf']
places_map = {p:[('PLACE', l) for l in p.split()] for p in places}
def find_places(sent, places):
if len(places) is 0:
return [('O', l) for l in sent.split()]
place = places[0]
remaining_places = places[1:]
sent_splits = sent.split(place)
return reduce(lambda a,b:a+places_map[place]+b, [find_places(s, remaining_places) for s in sent_splits])
print(find_places(sent, places))
输出为:
[('O', 'terras'), ('O', 'ipsius'), ('O', 'Azar'), ('O', 'vocatas'), ('O', 'Ta'), ('PLACE', 'Ta'), ('PLACE', 'Zagra'), ('PLACE', 'Ta'), ('PLACE', 'Zagra'), ('O', 'Xellule'), ('O', 'et'), ('PLACE', 'Ginen'), ('PLACE', 'Chagem'), ('O', 'in'), ('O', 'contrata'), ('PLACE', 'Deyr'), ('PLACE', 'Issafisaf'), ('O', 'cum'), ('O', 'iuribus'), ('O', 'suis'), ('O', 'omnibus')]
因此,我使用了一种递归方法来查找句子中的位置,以所需的格式对其进行更改,然后对句子的其余部分与其余位置进行递归处理,然后将它们最终合并在一起。
TA贡献1828条经验 获得超3个赞
尝试这样的事情:
res = []
for x in sent:
for place in places:
if x in place:
# add 'PLACE' if it matches
res.append(('PLACE', x))
if ('PLACE', x) not in res:
# add '0' if we find nothing
res.append(('0', x))
print(res)
TA贡献1811条经验 获得超4个赞
这是一个仅基于列表理解的建议,适用于理解爱好者:
sent = ['terras', 'ipsius', 'Azar', 'vocatas', 'Ta', 'Xellule', 'et', 'Ginen', 'Chagem', 'in', 'contrata', 'Deyr', 'Issafisaf']
places = ['Ta Xellule', 'Ginen Chagem', 'Deyr Issafisaf']
p = [i for place in places for i in place.split()]
result = [('PLACE',word) if word in p else ('O',word) for word in sent]
print(result)
# [('O', 'terras'), ('O', 'ipsius'), ('O', 'Azar'), ('O', 'vocatas'), ('PLACE', 'Ta'),
# ('PLACE', 'Xellule'), ('O', 'et'), ('PLACE', 'Ginen'), ('PLACE', 'Chagem'),
# ('O', 'in'), ('O', 'contrata'), ('PLACE', 'Deyr'), ('PLACE', 'Issafisaf')]
添加回答
举报