为了账号安全,请及时绑定邮箱和手机立即绑定

扩展包含并列连词如 ( / , and, or, & )

扩展包含并列连词如 ( / , and, or, & )

慕侠2389804 2021-11-02 09:57:52
我有一个包含并列连词的短语列表,如(和、或、/、&)。我想将它们中的每一个扩展到所有可能的单独短语。扩展包含连词的短语的最佳方法是什么?使用 NLP 库或 python 函数。喜欢" alphabet a/b/c can have color red/blue/green"。这可以扩展到九个短语[" alphabet a can have color red", "alphabet a can have color blue",... "alphabet b can have color blue",..."alphabet c can have color green"].其他示例:    ['bag of apples/oranges', 'case of citrus (lemon or limes)','chocolates/candy box' , 'bag of shoes & socks', 'pear red/brown/green', 'match box and/or lighter', 'milkshake (soy and almond) added ']应该将其扩展为    ['bag of apples','bag of oranges', 'case of citrus lemon', 'case of citrus limes','chocolates box' , 'candy box' ,'bag of socks', 'bag of shoes', 'pear red', 'pear brown','pear green', 'match box ', 'lighter','milkshake almond added', 'milkshake soy added']
查看完整描述

1 回答

?
慕的地6264312

TA贡献1817条经验 获得超6个赞

总有蛮力方法可以解决这个问题。我正在寻找一些聪明的东西。


def expand_by_conjuction(item): 

    def get_slash_index(item):           

        for num , ele in enumerate(item):

            if "/" in ele:

                return num  

    items = [item]

    while any([True for item in items for ele in item if "/" in ele]):

        for item in items:

            item_org = item

            item = item.split()

            if any([ True for ele in item if "/" in ele]):


                sls_index = get_slash_index(item)                       

                split_conjucted = item[sls_index].split("/")


                for idx, part in enumerate(split_conjucted):

                    n_item = []

                    n_item += item[:sls_index]

                    n_item.append(part)

                    sls_p1 = sls_index +1

                    if not sls_p1 > len(item):

                        n_item += item[sls_p1:]   

                    n_item = " ".join(n_item)

                    #print(n_item)

                    items.append(n_item)

                    if item_org in items:

                        items.remove(item_org)

    return items


def slashize_conjuctions(item):

    slashize = [' or ', ' and ', ' and/or ', ' or/and ', ' & ']

    for conj in slashize:

        if conj in item:

            item = item.replace(conj,"/")

    return item



items = ['bag of apples/oranges', 'case of citrus (lemon or limes)',

'chocolates/candy box' , 'bag of shoes & socks', 

'pear red/brown/green', 'match box and/or lighter',

 'milkshake (soy and almond) added ']


new_items = []

for string in items:

    item = slashize_conjuctions(string)

    lst = expand_by_conjuction(item)

    lst = [ele.replace("(","").replace(")","") for ele in lst]

    [new_items.append(ele) for ele in lst]

    #print(f'String:{string} ITEM:{item} --> list{lst}')

print(new_items)


查看完整回答
反对 回复 2021-11-02
  • 1 回答
  • 0 关注
  • 170 浏览
慕课专栏
更多

添加回答

举报

0/150
提交
取消
意见反馈 帮助中心 APP下载
官方微信