为了账号安全,请及时绑定邮箱和手机立即绑定

当模式包含重复项时,如何在python中顺序替换模式

当模式包含重复项时,如何在python中顺序替换模式

茅侃侃 2021-05-12 18:13:29
我有一个模式列表和一个替换列表。该模式包含重复元素,但它们对应于不同的替换。txt=132GOasmHOMEwokdslNOWsdwkGO239NOWpattern=['GO','HOME','NOW','GO','NOW']REPLACEMENT=['why','nope','later','aha','genes']所需的输出将是132whyasmnopewokdsllatersdwkaha239genes完成顺序替换的最有效方法是什么?
查看完整描述

3 回答

?
蝴蝶刀刀

TA贡献1801条经验 获得超8个赞

txt='132GOasmHOMEwokdslNOWsdwkGO239NOW'

pattern=['GO','HOME','NOW','GO','NOW']

REPLACEMENT=['why','nope','later','aha','genes']


for i,x in enumerate(pattern):

    txt = txt.replace(x,REPLACEMENT[i], 1)

有趣的是,这里是时间测试,因为这个问题要求最有效。


pattern=['GO','HOME','NOW','GO','NOW']

REPLACEMENT=['why','nope','later','aha','genes']


t = time.time()

for z in xrange(1000000):

    txt = '132GOasmHOMEwokdslNOWsdwkGO239NOW'

    for a,b in zip(pattern,REPLACEMENT):

        txt=txt.replace(a,b,1)

print time.time() - t


t = time.time()

for z in xrange(1000000):

    txt2 = '132GOasmHOMEwokdslNOWsdwkGO239NOW'

    for i,x in enumerate(pattern):

        txt2 = txt2.replace(x,REPLACEMENT[i], 1)

print time.time() - t


t = time.time()

for z in xrange(1000000):

    txt3 = '132GOasmHOMEwokdslNOWsdwkGO239NOW'

    x = dict(zip(reversed(pattern), reversed(REPLACEMENT)))

    for k in x:

      txt3 = txt3.replace(k,x[k], 1)

print time.time() - t



t = time.time()

for z in xrange(1000000):

    txt = '132GOasmHOMEwokdslNOWsdwkGO239NOW'

    new_d = iter(REPLACEMENT)

    new_result = re.sub('\b' + '|'.join(pattern) + '\b', lambda _: next(new_d), txt)

print time.time() - t

结果是:


2.57099986076

2.48500013351

3.50499987602

4.23699998856

如您所见,枚举比zip效率更高,而其他两个不在同一范围内。


查看完整回答
反对 回复 2021-05-25
?
眼眸繁星

TA贡献1873条经验 获得超9个赞

您可以同时遍历两个列表,并且每次仅替换模式的第一个实例:


for a,b in zip(pattern,REPLACEMENT):

    txt=txt.replace(a,b,1)


查看完整回答
反对 回复 2021-05-25
?
慕盖茨4494581

TA贡献1850条经验 获得超11个赞

使用dict减少您需要迭代的项目数量,这对于某些长输入可能是有价值的。


txt = '132GOasmHOMEwokdslNOWsdwkGO239NOW'

pattern = ['GO','HOME','NOW','GO','NOW']

REPLACEMENT = ['why','nope','later','aha','genes']


x = dict(zip(reversed(pattern), reversed(REPLACEMENT)))

for k in x:

  txt = txt.replace(k,x[k], 1)

print(txt)

编辑:为了好玩,我为备份添加了一个基准,以说明减少一些需要迭代的项的数量对于某些长输入可能很有价值。当您使用琐碎的测试数据集时,最有效的方法并不总是显而易见的。


 #! /usr/bin/env python

# -*- coding: UTF8 -*- 


def alpha(pattern, REPLACEMENT, txt):

  for a,b in zip(pattern,REPLACEMENT):

    txt=txt.replace(a,b,1)


def beta(pattern, REPLACEMENT, txt):

  for i,x in enumerate(pattern):

    txt = txt.replace(x,REPLACEMENT[i], 1)


def gamma(pattern, REPLACEMENT, txt):

  x = dict(zip(reversed(pattern), reversed(REPLACEMENT)))

  for k in x:

    txt = txt.replace(k,x[k], 1)


def delta(pattern, REPLACEMENT, txt):

  new_d = iter(REPLACEMENT)

  new_result = re.sub('\b' + '|'.join(pattern) + '\b', lambda _: next(new_d), txt)


if __name__ == '__main__':

  import timeit, re


  txt = '132GOasmHOMEwokdslNOWsdwkGO239NOW'

  pattern = ['GO','HOME','NOW','GO','NOW']

  REPLACEMENT = ['why','nope','later','aha','genes']


  print("Trivial inputs:  len(pattern): {}, len(REPLACEMENT): {}, len(txt): {}".format(len(pattern), len(REPLACEMENT), len(txt)));

  print("alpha: ", timeit.timeit("alpha(pattern, REPLACEMENT, txt)", setup="from __main__ import alpha, txt, pattern, REPLACEMENT"))

  print("beta:  ", timeit.timeit("beta( pattern, REPLACEMENT, txt)", setup="from __main__ import beta,  txt, pattern, REPLACEMENT"))

  print("gamma: ", timeit.timeit("gamma(pattern, REPLACEMENT, txt)", setup="from __main__ import gamma, txt, pattern, REPLACEMENT"))

  print("delta: ", timeit.timeit("delta(pattern, REPLACEMENT, txt)", setup="from __main__ import delta, txt, pattern, REPLACEMENT"))

  print("")


  txtcopy = txt

  patterncopy = pattern.copy()

  REPLACEMENTcopy = REPLACEMENT.copy()


  for _ in range(3):

    txt = txt + txtcopy

    pattern.extend(patterncopy)

    REPLACEMENT.extend(REPLACEMENTcopy)


  print("Small inputs: len(pattern): {}, len(REPLACEMENT): {}, len(txt): {}".format(len(pattern), len(REPLACEMENT), len(txt)));

  print("alpha: ", timeit.timeit("alpha(pattern, REPLACEMENT, txt)", setup="from __main__ import alpha, txt, pattern, REPLACEMENT"))

  print("beta:  ", timeit.timeit("beta( pattern, REPLACEMENT, txt)", setup="from __main__ import beta,  txt, pattern, REPLACEMENT"))

  print("gamma: ", timeit.timeit("gamma(pattern, REPLACEMENT, txt)", setup="from __main__ import gamma, txt, pattern, REPLACEMENT"))

  print("delta: ", timeit.timeit("delta(pattern, REPLACEMENT, txt)", setup="from __main__ import delta, txt, pattern, REPLACEMENT"))

  print("")


  txt = txtcopy

  pattern = patterncopy.copy()

  REPLACEMENT = REPLACEMENTcopy.copy()


  for _ in range(300):

    txt = txt + txtcopy

    pattern.extend(patterncopy)

    REPLACEMENT.extend(REPLACEMENTcopy)


  print("Larger inputs: len(pattern): {}, len(REPLACEMENT): {}, len(txt): {}".format(len(pattern), len(REPLACEMENT), len(txt)));

  print("alpha: ", timeit.timeit("alpha(pattern, REPLACEMENT, txt)", setup="from __main__ import alpha, txt, pattern, REPLACEMENT"))

  print("beta:  ", timeit.timeit("beta(pattern, REPLACEMENT, txt)", setup="from __main__ import beta,  txt, pattern, REPLACEMENT"))

  print("gamma: ", timeit.timeit("gamma(pattern, REPLACEMENT, txt)", setup="from __main__ import gamma, txt, pattern, REPLACEMENT"))

  print("delta: ", timeit.timeit("delta(pattern, REPLACEMENT, txt)", setup="from __main__ import delta, txt, pattern, REPLACEMENT"))

结果:


Trivial inputs:  len(pattern): 5, len(REPLACEMENT): 5, len(txt): 33

alpha:  4.60048107800003

beta:   4.169088881999869

gamma:  5.7612637450001785

delta:  11.371387353000046


Small inputs: len(pattern): 20, len(REPLACEMENT): 20, len(txt): 132

alpha:  17.281149661999734

beta:   15.131949634000193

gamma:  7.339897444000144

delta:  26.50896787900001


Larger inputs: len(pattern): 1505, len(REPLACEMENT): 1505, len(txt): 9933

alpha:  18766.660852467998

beta:   17640.960064803

gamma:  64.01868645999639

delta:  901.3577002189995

因此,对于平凡的输入,enumerate解决方案比zip快一点,比zip快很多iter。当输入的长度略微增加时,不删除重复项的成本开始显示出来,并且我的解决方案的运行时间不到一半。当运行包含大量重复项的长输入时,@ eatmeimadanish解决方案完成的时间比删除重复项时要花费27555%。哎哟。


查看完整回答
反对 回复 2021-05-25
  • 3 回答
  • 0 关注
  • 133 浏览
慕课专栏
更多

添加回答

举报

0/150
提交
取消
微信客服

购课补贴
联系客服咨询优惠详情

帮助反馈 APP下载

慕课网APP
您的移动学习伙伴

公众号

扫描二维码
关注慕课网微信公众号